Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"In this cycle, f2fs has some performance improvements for Android
workloads such as using read-unfair rwsems and adding some sysfs
entries to control GCs and discard commands in more details. In
addtiion, it has some tunings to improve the recovery speed after
sudden power-cut.

Enhancement:
- add reader-unfair rwsems with F2FS_UNFAIR_RWSEM: will replace with
generic API support
- adjust to make the readahead/recovery flow more efficiently
- sysfs entries to control issue speeds of GCs and Discard commands
- enable idmapped mounts

Bug fix:
- correct wrong error handling routines
- fix missing conditions in quota
- fix a potential deadlock between writeback and block plug routines
- fix a deadlock btween freezefs and evict_inode

We've added some boundary checks to avoid kernel panics on corrupted
images, and several minor code clean-ups"

* tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (27 commits)
f2fs: fix to do sanity check on .cp_pack_total_block_count
f2fs: make gc_urgent and gc_segment_mode sysfs node readable
f2fs: use aggressive GC policy during f2fs_disable_checkpoint()
f2fs: fix compressed file start atomic write may cause data corruption
f2fs: initialize sbi->gc_mode explicitly
f2fs: introduce gc_urgent_mid mode
f2fs: compress: fix to print raw data size in error path of lz4 decompression
f2fs: remove redundant parameter judgment
f2fs: use spin_lock to avoid hang
f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
f2fs: remove unnecessary read for F2FS_FITS_IN_INODE
f2fs: introduce F2FS_UNFAIR_RWSEM to support unfair rwsem
f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes
f2fs: fix to do sanity check on curseg->alloc_type
f2fs: fix to avoid potential deadlock
f2fs: quota: fix loop condition at f2fs_quota_sync()
f2fs: Restore rwsem lockdep support
f2fs: fix missing free nid in f2fs_handle_failed_inode
f2fs: support idmapped mounts
f2fs: add a way to limit roll forward recovery time
...

+699 -391
+47 -7
Documentation/ABI/testing/sysfs-fs-f2fs
··· 55 55 0x04 F2FS_IPU_UTIL 56 56 0x08 F2FS_IPU_SSR_UTIL 57 57 0x10 F2FS_IPU_FSYNC 58 - 0x20 F2FS_IPU_ASYNC, 58 + 0x20 F2FS_IPU_ASYNC 59 59 0x40 F2FS_IPU_NOCACHE 60 + 0x80 F2FS_IPU_HONOR_OPU_WRITE 60 61 ==== ================= 61 62 62 63 Refer segment.h for details. ··· 98 97 blocks less than 2MB. The candidates to be discarded are cached until 99 98 checkpoint is triggered, and issued during the checkpoint. 100 99 By default, it is disabled with 0. 100 + 101 + What: /sys/fs/f2fs/<disk>/max_discard_request 102 + Date: December 2021 103 + Contact: "Konstantin Vyshetsky" <vkon@google.com> 104 + Description: Controls the number of discards a thread will issue at a time. 105 + Higher number will allow the discard thread to finish its work 106 + faster, at the cost of higher latency for incomming I/O. 107 + 108 + What: /sys/fs/f2fs/<disk>/min_discard_issue_time 109 + Date: December 2021 110 + Contact: "Konstantin Vyshetsky" <vkon@google.com> 111 + Description: Controls the interval the discard thread will wait between 112 + issuing discard requests when there are discards to be issued and 113 + no I/O aware interruptions occur. 114 + 115 + What: /sys/fs/f2fs/<disk>/mid_discard_issue_time 116 + Date: December 2021 117 + Contact: "Konstantin Vyshetsky" <vkon@google.com> 118 + Description: Controls the interval the discard thread will wait between 119 + issuing discard requests when there are discards to be issued and 120 + an I/O aware interruption occurs. 121 + 122 + What: /sys/fs/f2fs/<disk>/max_discard_issue_time 123 + Date: December 2021 124 + Contact: "Konstantin Vyshetsky" <vkon@google.com> 125 + Description: Controls the interval the discard thread will wait when there are 126 + no discard operations to be issued. 101 127 102 128 What: /sys/fs/f2fs/<disk>/discard_granularity 103 129 Date: July 2017 ··· 297 269 What: /sys/fs/f2fs/<disk>/gc_urgent 298 270 Date: August 2017 299 271 Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> 300 - Description: Do background GC aggressively when set. When gc_urgent = 1, 301 - background thread starts to do GC by given gc_urgent_sleep_time 302 - interval. When gc_urgent = 2, F2FS will lower the bar of 303 - checking idle in order to process outstanding discard commands 304 - and GC a little bit aggressively. It is set to 0 by default. 272 + Description: Do background GC aggressively when set. Set to 0 by default. 273 + gc urgent high(1): does GC forcibly in a period of given 274 + gc_urgent_sleep_time and ignores I/O idling check. uses greedy 275 + GC approach and turns SSR mode on. 276 + gc urgent low(2): lowers the bar of checking I/O idling in 277 + order to process outstanding discard commands and GC a 278 + little bit aggressively. uses cost benefit GC approach. 279 + gc urgent mid(3): does GC forcibly in a period of given 280 + gc_urgent_sleep_time and executes a mid level of I/O idling check. 281 + uses cost benefit GC approach. 305 282 306 283 What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time 307 284 Date: August 2017 ··· 463 430 0x800 SBI_QUOTA_SKIP_FLUSH skip flushing quota in current CP 464 431 0x1000 SBI_QUOTA_NEED_REPAIR quota file may be corrupted 465 432 0x2000 SBI_IS_RESIZEFS resizefs is in process 433 + 0x4000 SBI_IS_FREEZING freefs is in process 466 434 ====== ===================== ================================= 467 435 468 436 What: /sys/fs/f2fs/<disk>/ckpt_thread_ioprio ··· 537 503 Contact: "Daeho Jeong" <daehojeong@google.com> 538 504 Description: Show how many segments have been reclaimed by GC during a specific 539 505 GC mode (0: GC normal, 1: GC idle CB, 2: GC idle greedy, 540 - 3: GC idle AT, 4: GC urgent high, 5: GC urgent low) 506 + 3: GC idle AT, 4: GC urgent high, 5: GC urgent low 6: GC urgent mid) 541 507 You can re-initialize this value to "0". 542 508 543 509 What: /sys/fs/f2fs/<disk>/gc_segment_mode ··· 574 540 Description: You can set the trial count limit for GC urgent high mode with this value. 575 541 If GC thread gets to the limit, the mode will turn back to GC normal mode. 576 542 By default, the value is zero, which means there is no limit like before. 543 + 544 + What: /sys/fs/f2fs/<disk>/max_roll_forward_node_blocks 545 + Date: January 2022 546 + Contact: "Jaegeuk Kim" <jaegeuk@kernel.org> 547 + Description: Controls max # of node block writes to be used for roll forward 548 + recovery. This can limit the roll forward recovery time.
+7
fs/f2fs/Kconfig
··· 143 143 Support getting IO statistics through sysfs and printing out periodic 144 144 IO statistics tracepoint events. You have to turn on "iostat_enable" 145 145 sysfs node to enable this feature. 146 + 147 + config F2FS_UNFAIR_RWSEM 148 + bool "F2FS unfair rw_semaphore" 149 + depends on F2FS_FS && BLK_CGROUP 150 + help 151 + Use unfair rw_semaphore, if system configured IO priority by block 152 + cgroup.
+12 -9
fs/f2fs/acl.c
··· 204 204 return __f2fs_get_acl(inode, type, NULL); 205 205 } 206 206 207 - static int f2fs_acl_update_mode(struct inode *inode, umode_t *mode_p, 208 - struct posix_acl **acl) 207 + static int f2fs_acl_update_mode(struct user_namespace *mnt_userns, 208 + struct inode *inode, umode_t *mode_p, 209 + struct posix_acl **acl) 209 210 { 210 211 umode_t mode = inode->i_mode; 211 212 int error; ··· 219 218 return error; 220 219 if (error == 0) 221 220 *acl = NULL; 222 - if (!in_group_p(i_gid_into_mnt(&init_user_ns, inode)) && 223 - !capable_wrt_inode_uidgid(&init_user_ns, inode, CAP_FSETID)) 221 + if (!in_group_p(i_gid_into_mnt(mnt_userns, inode)) && 222 + !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)) 224 223 mode &= ~S_ISGID; 225 224 *mode_p = mode; 226 225 return 0; 227 226 } 228 227 229 - static int __f2fs_set_acl(struct inode *inode, int type, 228 + static int __f2fs_set_acl(struct user_namespace *mnt_userns, 229 + struct inode *inode, int type, 230 230 struct posix_acl *acl, struct page *ipage) 231 231 { 232 232 int name_index; ··· 240 238 case ACL_TYPE_ACCESS: 241 239 name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS; 242 240 if (acl && !ipage) { 243 - error = f2fs_acl_update_mode(inode, &mode, &acl); 241 + error = f2fs_acl_update_mode(mnt_userns, inode, 242 + &mode, &acl); 244 243 if (error) 245 244 return error; 246 245 set_acl_inode(inode, mode); ··· 282 279 if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) 283 280 return -EIO; 284 281 285 - return __f2fs_set_acl(inode, type, acl, NULL); 282 + return __f2fs_set_acl(mnt_userns, inode, type, acl, NULL); 286 283 } 287 284 288 285 /* ··· 422 419 f2fs_mark_inode_dirty_sync(inode, true); 423 420 424 421 if (default_acl) { 425 - error = __f2fs_set_acl(inode, ACL_TYPE_DEFAULT, default_acl, 422 + error = __f2fs_set_acl(NULL, inode, ACL_TYPE_DEFAULT, default_acl, 426 423 ipage); 427 424 posix_acl_release(default_acl); 428 425 } else { ··· 430 427 } 431 428 if (acl) { 432 429 if (!error) 433 - error = __f2fs_set_acl(inode, ACL_TYPE_ACCESS, acl, 430 + error = __f2fs_set_acl(NULL, inode, ACL_TYPE_ACCESS, acl, 434 431 ipage); 435 432 posix_acl_release(acl); 436 433 } else {
+36 -22
fs/f2fs/checkpoint.c
··· 98 98 } 99 99 100 100 if (unlikely(!PageUptodate(page))) { 101 + if (page->index == sbi->metapage_eio_ofs && 102 + sbi->metapage_eio_cnt++ == MAX_RETRY_META_PAGE_EIO) { 103 + set_ckpt_flags(sbi, CP_ERROR_FLAG); 104 + } else { 105 + sbi->metapage_eio_ofs = page->index; 106 + sbi->metapage_eio_cnt = 0; 107 + } 101 108 f2fs_put_page(page, 1); 102 109 return ERR_PTR(-EIO); 103 110 } ··· 289 282 return blkno - start; 290 283 } 291 284 292 - void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index) 285 + void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index, 286 + unsigned int ra_blocks) 293 287 { 294 288 struct page *page; 295 289 bool readahead = false; 290 + 291 + if (ra_blocks == RECOVERY_MIN_RA_BLOCKS) 292 + return; 296 293 297 294 page = find_get_page(META_MAPPING(sbi), index); 298 295 if (!page || !PageUptodate(page)) ··· 304 293 f2fs_put_page(page, 0); 305 294 306 295 if (readahead) 307 - f2fs_ra_meta_pages(sbi, index, BIO_MAX_VECS, META_POR, true); 296 + f2fs_ra_meta_pages(sbi, index, ra_blocks, META_POR, true); 308 297 } 309 298 310 299 static int __f2fs_write_meta_page(struct page *page, ··· 362 351 goto skip_write; 363 352 364 353 /* if locked failed, cp will flush dirty pages instead */ 365 - if (!down_write_trylock(&sbi->cp_global_sem)) 354 + if (!f2fs_down_write_trylock(&sbi->cp_global_sem)) 366 355 goto skip_write; 367 356 368 357 trace_f2fs_writepages(mapping->host, wbc, META); 369 358 diff = nr_pages_to_write(sbi, META, wbc); 370 359 written = f2fs_sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO); 371 - up_write(&sbi->cp_global_sem); 360 + f2fs_up_write(&sbi->cp_global_sem); 372 361 wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff); 373 362 return 0; 374 363 ··· 875 864 struct page *cp_page_1 = NULL, *cp_page_2 = NULL; 876 865 struct f2fs_checkpoint *cp_block = NULL; 877 866 unsigned long long cur_version = 0, pre_version = 0; 867 + unsigned int cp_blocks; 878 868 int err; 879 869 880 870 err = get_checkpoint_version(sbi, cp_addr, &cp_block, ··· 883 871 if (err) 884 872 return NULL; 885 873 886 - if (le32_to_cpu(cp_block->cp_pack_total_block_count) > 887 - sbi->blocks_per_seg) { 874 + cp_blocks = le32_to_cpu(cp_block->cp_pack_total_block_count); 875 + 876 + if (cp_blocks > sbi->blocks_per_seg || cp_blocks <= F2FS_CP_PACKS) { 888 877 f2fs_warn(sbi, "invalid cp_pack_total_block_count:%u", 889 878 le32_to_cpu(cp_block->cp_pack_total_block_count)); 890 879 goto invalid_cp; 891 880 } 892 881 pre_version = *version; 893 882 894 - cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1; 883 + cp_addr += cp_blocks - 1; 895 884 err = get_checkpoint_version(sbi, cp_addr, &cp_block, 896 885 &cp_page_2, version); 897 886 if (err) ··· 1172 1159 if (!is_journalled_quota(sbi)) 1173 1160 return false; 1174 1161 1175 - if (!down_write_trylock(&sbi->quota_sem)) 1162 + if (!f2fs_down_write_trylock(&sbi->quota_sem)) 1176 1163 return true; 1177 1164 if (is_sbi_flag_set(sbi, SBI_QUOTA_SKIP_FLUSH)) { 1178 1165 ret = false; ··· 1184 1171 } else if (get_pages(sbi, F2FS_DIRTY_QDATA)) { 1185 1172 ret = true; 1186 1173 } 1187 - up_write(&sbi->quota_sem); 1174 + f2fs_up_write(&sbi->quota_sem); 1188 1175 return ret; 1189 1176 } 1190 1177 ··· 1241 1228 * POR: we should ensure that there are no dirty node pages 1242 1229 * until finishing nat/sit flush. inode->i_blocks can be updated. 1243 1230 */ 1244 - down_write(&sbi->node_change); 1231 + f2fs_down_write(&sbi->node_change); 1245 1232 1246 1233 if (get_pages(sbi, F2FS_DIRTY_IMETA)) { 1247 - up_write(&sbi->node_change); 1234 + f2fs_up_write(&sbi->node_change); 1248 1235 f2fs_unlock_all(sbi); 1249 1236 err = f2fs_sync_inode_meta(sbi); 1250 1237 if (err) ··· 1254 1241 } 1255 1242 1256 1243 retry_flush_nodes: 1257 - down_write(&sbi->node_write); 1244 + f2fs_down_write(&sbi->node_write); 1258 1245 1259 1246 if (get_pages(sbi, F2FS_DIRTY_NODES)) { 1260 - up_write(&sbi->node_write); 1247 + f2fs_up_write(&sbi->node_write); 1261 1248 atomic_inc(&sbi->wb_sync_req[NODE]); 1262 1249 err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO); 1263 1250 atomic_dec(&sbi->wb_sync_req[NODE]); 1264 1251 if (err) { 1265 - up_write(&sbi->node_change); 1252 + f2fs_up_write(&sbi->node_change); 1266 1253 f2fs_unlock_all(sbi); 1267 1254 return err; 1268 1255 } ··· 1275 1262 * dirty node blocks and some checkpoint values by block allocation. 1276 1263 */ 1277 1264 __prepare_cp_block(sbi); 1278 - up_write(&sbi->node_change); 1265 + f2fs_up_write(&sbi->node_change); 1279 1266 return err; 1280 1267 } 1281 1268 1282 1269 static void unblock_operations(struct f2fs_sb_info *sbi) 1283 1270 { 1284 - up_write(&sbi->node_write); 1271 + f2fs_up_write(&sbi->node_write); 1285 1272 f2fs_unlock_all(sbi); 1286 1273 } 1287 1274 ··· 1556 1543 /* update user_block_counts */ 1557 1544 sbi->last_valid_block_count = sbi->total_valid_block_count; 1558 1545 percpu_counter_set(&sbi->alloc_valid_block_count, 0); 1546 + percpu_counter_set(&sbi->rf_node_block_count, 0); 1559 1547 1560 1548 /* Here, we have one bio having CP pack except cp pack 2 page */ 1561 1549 f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); ··· 1626 1612 f2fs_warn(sbi, "Start checkpoint disabled!"); 1627 1613 } 1628 1614 if (cpc->reason != CP_RESIZE) 1629 - down_write(&sbi->cp_global_sem); 1615 + f2fs_down_write(&sbi->cp_global_sem); 1630 1616 1631 1617 if (!is_sbi_flag_set(sbi, SBI_IS_DIRTY) && 1632 1618 ((cpc->reason & CP_FASTBOOT) || (cpc->reason & CP_SYNC) || ··· 1707 1693 trace_f2fs_write_checkpoint(sbi->sb, cpc->reason, "finish checkpoint"); 1708 1694 out: 1709 1695 if (cpc->reason != CP_RESIZE) 1710 - up_write(&sbi->cp_global_sem); 1696 + f2fs_up_write(&sbi->cp_global_sem); 1711 1697 return err; 1712 1698 } 1713 1699 ··· 1755 1741 struct cp_control cpc = { .reason = CP_SYNC, }; 1756 1742 int err; 1757 1743 1758 - down_write(&sbi->gc_lock); 1744 + f2fs_down_write(&sbi->gc_lock); 1759 1745 err = f2fs_write_checkpoint(sbi, &cpc); 1760 - up_write(&sbi->gc_lock); 1746 + f2fs_up_write(&sbi->gc_lock); 1761 1747 1762 1748 return err; 1763 1749 } ··· 1845 1831 if (!test_opt(sbi, MERGE_CHECKPOINT) || cpc.reason != CP_SYNC) { 1846 1832 int ret; 1847 1833 1848 - down_write(&sbi->gc_lock); 1834 + f2fs_down_write(&sbi->gc_lock); 1849 1835 ret = f2fs_write_checkpoint(sbi, &cpc); 1850 - up_write(&sbi->gc_lock); 1836 + f2fs_up_write(&sbi->gc_lock); 1851 1837 1852 1838 return ret; 1853 1839 }
+5 -6
fs/f2fs/compress.c
··· 314 314 } 315 315 316 316 if (ret != PAGE_SIZE << dic->log_cluster_size) { 317 - printk_ratelimited("%sF2FS-fs (%s): lz4 invalid rlen:%zu, " 317 + printk_ratelimited("%sF2FS-fs (%s): lz4 invalid ret:%d, " 318 318 "expected:%lu\n", KERN_ERR, 319 - F2FS_I_SB(dic->inode)->sb->s_id, 320 - dic->rlen, 319 + F2FS_I_SB(dic->inode)->sb->s_id, ret, 321 320 PAGE_SIZE << dic->log_cluster_size); 322 321 return -EIO; 323 322 } ··· 1266 1267 * checkpoint. This can only happen to quota writes which can cause 1267 1268 * the below discard race condition. 1268 1269 */ 1269 - down_read(&sbi->node_write); 1270 + f2fs_down_read(&sbi->node_write); 1270 1271 } else if (!f2fs_trylock_op(sbi)) { 1271 1272 goto out_free; 1272 1273 } ··· 1383 1384 1384 1385 f2fs_put_dnode(&dn); 1385 1386 if (IS_NOQUOTA(inode)) 1386 - up_read(&sbi->node_write); 1387 + f2fs_up_read(&sbi->node_write); 1387 1388 else 1388 1389 f2fs_unlock_op(sbi); 1389 1390 ··· 1409 1410 f2fs_put_dnode(&dn); 1410 1411 out_unlock_op: 1411 1412 if (IS_NOQUOTA(inode)) 1412 - up_read(&sbi->node_write); 1413 + f2fs_up_read(&sbi->node_write); 1413 1414 else 1414 1415 f2fs_unlock_op(sbi); 1415 1416 out_free:
+44 -32
fs/f2fs/data.c
··· 584 584 enum page_type btype = PAGE_TYPE_OF_BIO(type); 585 585 struct f2fs_bio_info *io = sbi->write_io[btype] + temp; 586 586 587 - down_write(&io->io_rwsem); 587 + f2fs_down_write(&io->io_rwsem); 588 588 589 589 /* change META to META_FLUSH in the checkpoint procedure */ 590 590 if (type >= META_FLUSH) { ··· 594 594 io->bio->bi_opf |= REQ_PREFLUSH | REQ_FUA; 595 595 } 596 596 __submit_merged_bio(io); 597 - up_write(&io->io_rwsem); 597 + f2fs_up_write(&io->io_rwsem); 598 598 } 599 599 600 600 static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, ··· 609 609 enum page_type btype = PAGE_TYPE_OF_BIO(type); 610 610 struct f2fs_bio_info *io = sbi->write_io[btype] + temp; 611 611 612 - down_read(&io->io_rwsem); 612 + f2fs_down_read(&io->io_rwsem); 613 613 ret = __has_merged_page(io->bio, inode, page, ino); 614 - up_read(&io->io_rwsem); 614 + f2fs_up_read(&io->io_rwsem); 615 615 } 616 616 if (ret) 617 617 __f2fs_submit_merged_write(sbi, type, temp); ··· 732 732 if (bio_add_page(bio, page, PAGE_SIZE, 0) != PAGE_SIZE) 733 733 f2fs_bug_on(sbi, 1); 734 734 735 - down_write(&io->bio_list_lock); 735 + f2fs_down_write(&io->bio_list_lock); 736 736 list_add_tail(&be->list, &io->bio_list); 737 - up_write(&io->bio_list_lock); 737 + f2fs_up_write(&io->bio_list_lock); 738 738 } 739 739 740 740 static void del_bio_entry(struct bio_entry *be) ··· 756 756 struct list_head *head = &io->bio_list; 757 757 struct bio_entry *be; 758 758 759 - down_write(&io->bio_list_lock); 759 + f2fs_down_write(&io->bio_list_lock); 760 760 list_for_each_entry(be, head, list) { 761 761 if (be->bio != *bio) 762 762 continue; ··· 780 780 __submit_bio(sbi, *bio, DATA); 781 781 break; 782 782 } 783 - up_write(&io->bio_list_lock); 783 + f2fs_up_write(&io->bio_list_lock); 784 784 } 785 785 786 786 if (ret) { ··· 806 806 if (list_empty(head)) 807 807 continue; 808 808 809 - down_read(&io->bio_list_lock); 809 + f2fs_down_read(&io->bio_list_lock); 810 810 list_for_each_entry(be, head, list) { 811 811 if (target) 812 812 found = (target == be->bio); ··· 816 816 if (found) 817 817 break; 818 818 } 819 - up_read(&io->bio_list_lock); 819 + f2fs_up_read(&io->bio_list_lock); 820 820 821 821 if (!found) 822 822 continue; 823 823 824 824 found = false; 825 825 826 - down_write(&io->bio_list_lock); 826 + f2fs_down_write(&io->bio_list_lock); 827 827 list_for_each_entry(be, head, list) { 828 828 if (target) 829 829 found = (target == be->bio); ··· 836 836 break; 837 837 } 838 838 } 839 - up_write(&io->bio_list_lock); 839 + f2fs_up_write(&io->bio_list_lock); 840 840 } 841 841 842 842 if (found) ··· 894 894 895 895 f2fs_bug_on(sbi, is_read_io(fio->op)); 896 896 897 - down_write(&io->io_rwsem); 897 + f2fs_down_write(&io->io_rwsem); 898 898 next: 899 899 if (fio->in_list) { 900 900 spin_lock(&io->io_lock); ··· 961 961 if (is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) || 962 962 !f2fs_is_checkpoint_ready(sbi)) 963 963 __submit_merged_bio(io); 964 - up_write(&io->io_rwsem); 964 + f2fs_up_write(&io->io_rwsem); 965 965 } 966 966 967 967 static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr, ··· 1371 1371 { 1372 1372 if (flag == F2FS_GET_BLOCK_PRE_AIO) { 1373 1373 if (lock) 1374 - down_read(&sbi->node_change); 1374 + f2fs_down_read(&sbi->node_change); 1375 1375 else 1376 - up_read(&sbi->node_change); 1376 + f2fs_up_read(&sbi->node_change); 1377 1377 } else { 1378 1378 if (lock) 1379 1379 f2fs_lock_op(sbi); ··· 2448 2448 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 2449 2449 unsigned int policy = SM_I(sbi)->ipu_policy; 2450 2450 2451 + if (policy & (0x1 << F2FS_IPU_HONOR_OPU_WRITE) && 2452 + is_inode_flag_set(inode, FI_OPU_WRITE)) 2453 + return false; 2451 2454 if (policy & (0x1 << F2FS_IPU_FORCE)) 2452 2455 return true; 2453 2456 if (policy & (0x1 << F2FS_IPU_SSR) && f2fs_need_SSR(sbi)) ··· 2519 2516 2520 2517 /* swap file is migrating in aligned write mode */ 2521 2518 if (is_inode_flag_set(inode, FI_ALIGNED_WRITE)) 2519 + return true; 2520 + 2521 + if (is_inode_flag_set(inode, FI_OPU_WRITE)) 2522 2522 return true; 2523 2523 2524 2524 if (fio) { ··· 2743 2737 * the below discard race condition. 2744 2738 */ 2745 2739 if (IS_NOQUOTA(inode)) 2746 - down_read(&sbi->node_write); 2740 + f2fs_down_read(&sbi->node_write); 2747 2741 2748 2742 fio.need_lock = LOCK_DONE; 2749 2743 err = f2fs_do_write_data_page(&fio); 2750 2744 2751 2745 if (IS_NOQUOTA(inode)) 2752 - up_read(&sbi->node_write); 2746 + f2fs_up_read(&sbi->node_write); 2753 2747 2754 2748 goto done; 2755 2749 } ··· 3148 3142 f2fs_available_free_memory(sbi, DIRTY_DENTS)) 3149 3143 goto skip_write; 3150 3144 3151 - /* skip writing during file defragment */ 3152 - if (is_inode_flag_set(inode, FI_DO_DEFRAG)) 3145 + /* skip writing in file defragment preparing stage */ 3146 + if (is_inode_flag_set(inode, FI_SKIP_WRITES)) 3153 3147 goto skip_write; 3154 3148 3155 3149 trace_f2fs_writepages(mapping->host, wbc, DATA); ··· 3157 3151 /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */ 3158 3152 if (wbc->sync_mode == WB_SYNC_ALL) 3159 3153 atomic_inc(&sbi->wb_sync_req[DATA]); 3160 - else if (atomic_read(&sbi->wb_sync_req[DATA])) 3154 + else if (atomic_read(&sbi->wb_sync_req[DATA])) { 3155 + /* to avoid potential deadlock */ 3156 + if (current->plug) 3157 + blk_finish_plug(current->plug); 3161 3158 goto skip_write; 3159 + } 3162 3160 3163 3161 if (__should_serialize_io(inode, wbc)) { 3164 3162 mutex_lock(&sbi->writepages); ··· 3211 3201 3212 3202 /* In the fs-verity case, f2fs_end_enable_verity() does the truncate */ 3213 3203 if (to > i_size && !f2fs_verity_in_progress(inode)) { 3214 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3204 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3215 3205 filemap_invalidate_lock(inode->i_mapping); 3216 3206 3217 3207 truncate_pagecache(inode, i_size); 3218 3208 f2fs_truncate_blocks(inode, i_size, true); 3219 3209 3220 3210 filemap_invalidate_unlock(inode->i_mapping); 3221 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3211 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3222 3212 } 3223 3213 } 3224 3214 ··· 3351 3341 3352 3342 *fsdata = NULL; 3353 3343 3354 - if (len == PAGE_SIZE) 3344 + if (len == PAGE_SIZE && !(f2fs_is_atomic_file(inode))) 3355 3345 goto repeat; 3356 3346 3357 3347 ret = f2fs_prepare_compress_overwrite(inode, pagep, ··· 3719 3709 unsigned int end_sec = secidx + blkcnt / blk_per_sec; 3720 3710 int ret = 0; 3721 3711 3722 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3712 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3723 3713 filemap_invalidate_lock(inode->i_mapping); 3724 3714 3725 3715 set_inode_flag(inode, FI_ALIGNED_WRITE); 3716 + set_inode_flag(inode, FI_OPU_WRITE); 3726 3717 3727 3718 for (; secidx < end_sec; secidx++) { 3728 - down_write(&sbi->pin_sem); 3719 + f2fs_down_write(&sbi->pin_sem); 3729 3720 3730 3721 f2fs_lock_op(sbi); 3731 3722 f2fs_allocate_new_section(sbi, CURSEG_COLD_DATA_PINNED, false); 3732 3723 f2fs_unlock_op(sbi); 3733 3724 3734 - set_inode_flag(inode, FI_DO_DEFRAG); 3725 + set_inode_flag(inode, FI_SKIP_WRITES); 3735 3726 3736 3727 for (blkofs = 0; blkofs < blk_per_sec; blkofs++) { 3737 3728 struct page *page; ··· 3740 3729 3741 3730 page = f2fs_get_lock_data_page(inode, blkidx, true); 3742 3731 if (IS_ERR(page)) { 3743 - up_write(&sbi->pin_sem); 3732 + f2fs_up_write(&sbi->pin_sem); 3744 3733 ret = PTR_ERR(page); 3745 3734 goto done; 3746 3735 } ··· 3749 3738 f2fs_put_page(page, 1); 3750 3739 } 3751 3740 3752 - clear_inode_flag(inode, FI_DO_DEFRAG); 3741 + clear_inode_flag(inode, FI_SKIP_WRITES); 3753 3742 3754 3743 ret = filemap_fdatawrite(inode->i_mapping); 3755 3744 3756 - up_write(&sbi->pin_sem); 3745 + f2fs_up_write(&sbi->pin_sem); 3757 3746 3758 3747 if (ret) 3759 3748 break; 3760 3749 } 3761 3750 3762 3751 done: 3763 - clear_inode_flag(inode, FI_DO_DEFRAG); 3752 + clear_inode_flag(inode, FI_SKIP_WRITES); 3753 + clear_inode_flag(inode, FI_OPU_WRITE); 3764 3754 clear_inode_flag(inode, FI_ALIGNED_WRITE); 3765 3755 3766 3756 filemap_invalidate_unlock(inode->i_mapping); 3767 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3757 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3768 3758 3769 3759 return ret; 3770 3760 }
+17 -8
fs/f2fs/debug.c
··· 21 21 #include "gc.h" 22 22 23 23 static LIST_HEAD(f2fs_stat_list); 24 - static DEFINE_MUTEX(f2fs_stat_mutex); 24 + static DEFINE_RAW_SPINLOCK(f2fs_stat_lock); 25 25 #ifdef CONFIG_DEBUG_FS 26 26 static struct dentry *f2fs_debugfs_root; 27 27 #endif ··· 338 338 [SBI_QUOTA_SKIP_FLUSH] = " quota_skip_flush", 339 339 [SBI_QUOTA_NEED_REPAIR] = " quota_need_repair", 340 340 [SBI_IS_RESIZEFS] = " resizefs", 341 + [SBI_IS_FREEZING] = " freezefs", 341 342 }; 342 343 343 344 static int stat_show(struct seq_file *s, void *v) 344 345 { 345 346 struct f2fs_stat_info *si; 346 347 int i = 0, j = 0; 348 + unsigned long flags; 347 349 348 - mutex_lock(&f2fs_stat_mutex); 350 + raw_spin_lock_irqsave(&f2fs_stat_lock, flags); 349 351 list_for_each_entry(si, &f2fs_stat_list, stat_list) { 350 352 update_general_status(si->sbi); 351 353 ··· 476 474 si->node_segs, si->bg_node_segs); 477 475 seq_printf(s, " - Reclaimed segs : Normal (%d), Idle CB (%d), " 478 476 "Idle Greedy (%d), Idle AT (%d), " 479 - "Urgent High (%d), Urgent Low (%d)\n", 477 + "Urgent High (%d), Urgent Mid (%d), " 478 + "Urgent Low (%d)\n", 480 479 si->sbi->gc_reclaimed_segs[GC_NORMAL], 481 480 si->sbi->gc_reclaimed_segs[GC_IDLE_CB], 482 481 si->sbi->gc_reclaimed_segs[GC_IDLE_GREEDY], 483 482 si->sbi->gc_reclaimed_segs[GC_IDLE_AT], 484 483 si->sbi->gc_reclaimed_segs[GC_URGENT_HIGH], 484 + si->sbi->gc_reclaimed_segs[GC_URGENT_MID], 485 485 si->sbi->gc_reclaimed_segs[GC_URGENT_LOW]); 486 486 seq_printf(s, "Try to move %d blocks (BG: %d)\n", si->tot_blks, 487 487 si->bg_data_blks + si->bg_node_blks); ··· 536 532 si->ndirty_meta, si->meta_pages); 537 533 seq_printf(s, " - imeta: %4d\n", 538 534 si->ndirty_imeta); 535 + seq_printf(s, " - fsync mark: %4lld\n", 536 + percpu_counter_sum_positive( 537 + &si->sbi->rf_node_block_count)); 539 538 seq_printf(s, " - NATs: %9d/%9d\n - SITs: %9d/%9d\n", 540 539 si->dirty_nats, si->nats, si->dirty_sits, si->sits); 541 540 seq_printf(s, " - free_nids: %9d/%9d\n - alloc_nids: %9d\n", ··· 580 573 seq_printf(s, " - paged : %llu KB\n", 581 574 si->page_mem >> 10); 582 575 } 583 - mutex_unlock(&f2fs_stat_mutex); 576 + raw_spin_unlock_irqrestore(&f2fs_stat_lock, flags); 584 577 return 0; 585 578 } 586 579 ··· 591 584 { 592 585 struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi); 593 586 struct f2fs_stat_info *si; 587 + unsigned long flags; 594 588 int i; 595 589 596 590 si = f2fs_kzalloc(sbi, sizeof(struct f2fs_stat_info), GFP_KERNEL); ··· 627 619 atomic_set(&sbi->max_aw_cnt, 0); 628 620 atomic_set(&sbi->max_vw_cnt, 0); 629 621 630 - mutex_lock(&f2fs_stat_mutex); 622 + raw_spin_lock_irqsave(&f2fs_stat_lock, flags); 631 623 list_add_tail(&si->stat_list, &f2fs_stat_list); 632 - mutex_unlock(&f2fs_stat_mutex); 624 + raw_spin_unlock_irqrestore(&f2fs_stat_lock, flags); 633 625 634 626 return 0; 635 627 } ··· 637 629 void f2fs_destroy_stats(struct f2fs_sb_info *sbi) 638 630 { 639 631 struct f2fs_stat_info *si = F2FS_STAT(sbi); 632 + unsigned long flags; 640 633 641 - mutex_lock(&f2fs_stat_mutex); 634 + raw_spin_lock_irqsave(&f2fs_stat_lock, flags); 642 635 list_del(&si->stat_list); 643 - mutex_unlock(&f2fs_stat_mutex); 636 + raw_spin_unlock_irqrestore(&f2fs_stat_lock, flags); 644 637 645 638 kfree(si); 646 639 }
+6 -6
fs/f2fs/dir.c
··· 766 766 f2fs_wait_on_page_writeback(dentry_page, DATA, true, true); 767 767 768 768 if (inode) { 769 - down_write(&F2FS_I(inode)->i_sem); 769 + f2fs_down_write(&F2FS_I(inode)->i_sem); 770 770 page = f2fs_init_inode_metadata(inode, dir, fname, NULL); 771 771 if (IS_ERR(page)) { 772 772 err = PTR_ERR(page); ··· 793 793 f2fs_update_parent_metadata(dir, inode, current_depth); 794 794 fail: 795 795 if (inode) 796 - up_write(&F2FS_I(inode)->i_sem); 796 + f2fs_up_write(&F2FS_I(inode)->i_sem); 797 797 798 798 f2fs_put_page(dentry_page, 1); 799 799 ··· 858 858 struct page *page; 859 859 int err = 0; 860 860 861 - down_write(&F2FS_I(inode)->i_sem); 861 + f2fs_down_write(&F2FS_I(inode)->i_sem); 862 862 page = f2fs_init_inode_metadata(inode, dir, NULL, NULL); 863 863 if (IS_ERR(page)) { 864 864 err = PTR_ERR(page); ··· 869 869 clear_inode_flag(inode, FI_NEW_INODE); 870 870 f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); 871 871 fail: 872 - up_write(&F2FS_I(inode)->i_sem); 872 + f2fs_up_write(&F2FS_I(inode)->i_sem); 873 873 return err; 874 874 } 875 875 ··· 877 877 { 878 878 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 879 879 880 - down_write(&F2FS_I(inode)->i_sem); 880 + f2fs_down_write(&F2FS_I(inode)->i_sem); 881 881 882 882 if (S_ISDIR(inode->i_mode)) 883 883 f2fs_i_links_write(dir, false); ··· 888 888 f2fs_i_links_write(inode, false); 889 889 f2fs_i_size_write(inode, 0); 890 890 } 891 - up_write(&F2FS_I(inode)->i_sem); 891 + f2fs_up_write(&F2FS_I(inode)->i_sem); 892 892 893 893 if (inode->i_nlink == 0) 894 894 f2fs_add_orphan_inode(inode);
+131 -23
fs/f2fs/f2fs.h
··· 123 123 124 124 #define COMPRESS_EXT_NUM 16 125 125 126 + /* 127 + * An implementation of an rwsem that is explicitly unfair to readers. This 128 + * prevents priority inversion when a low-priority reader acquires the read lock 129 + * while sleeping on the write lock but the write lock is needed by 130 + * higher-priority clients. 131 + */ 132 + 133 + struct f2fs_rwsem { 134 + struct rw_semaphore internal_rwsem; 135 + #ifdef CONFIG_F2FS_UNFAIR_RWSEM 136 + wait_queue_head_t read_waiters; 137 + #endif 138 + }; 139 + 126 140 struct f2fs_mount_info { 127 141 unsigned int opt; 128 142 int write_io_size_bits; /* Write IO size bits */ ··· 400 386 struct mutex cmd_lock; 401 387 unsigned int nr_discards; /* # of discards in the list */ 402 388 unsigned int max_discards; /* max. discards to be issued */ 389 + unsigned int max_discard_request; /* max. discard request per round */ 390 + unsigned int min_discard_issue_time; /* min. interval between discard issue */ 391 + unsigned int mid_discard_issue_time; /* mid. interval between discard issue */ 392 + unsigned int max_discard_issue_time; /* max. interval between discard issue */ 403 393 unsigned int discard_granularity; /* discard granularity */ 404 394 unsigned int undiscard_blks; /* # of undiscard blocks */ 405 395 unsigned int next_pos; /* next discard position */ ··· 579 561 /* maximum retry quota flush count */ 580 562 #define DEFAULT_RETRY_QUOTA_FLUSH_COUNT 8 581 563 564 + /* maximum retry of EIO'ed meta page */ 565 + #define MAX_RETRY_META_PAGE_EIO 100 566 + 582 567 #define F2FS_LINK_MAX 0xffffffff /* maximum link count per file */ 583 568 584 569 #define MAX_DIR_RA_PAGES 4 /* maximum ra pages of dir */ ··· 594 573 595 574 /* number of extent info in extent cache we try to shrink */ 596 575 #define EXTENT_CACHE_SHRINK_NUMBER 128 576 + 577 + #define RECOVERY_MAX_RA_BLOCKS BIO_MAX_VECS 578 + #define RECOVERY_MIN_RA_BLOCKS 1 597 579 598 580 struct rb_entry { 599 581 struct rb_node rb_node; /* rb node located in rb-tree */ ··· 745 721 FI_DROP_CACHE, /* drop dirty page cache */ 746 722 FI_DATA_EXIST, /* indicate data exists */ 747 723 FI_INLINE_DOTS, /* indicate inline dot dentries */ 748 - FI_DO_DEFRAG, /* indicate defragment is running */ 724 + FI_SKIP_WRITES, /* should skip data page writeback */ 725 + FI_OPU_WRITE, /* used for opu per file */ 749 726 FI_DIRTY_FILE, /* indicate regular/symlink has dirty pages */ 750 727 FI_PREALLOCATED_ALL, /* all blocks for write were preallocated */ 751 728 FI_HOT_DATA, /* indicate file is hot */ ··· 777 752 778 753 /* Use below internally in f2fs*/ 779 754 unsigned long flags[BITS_TO_LONGS(FI_MAX)]; /* use to pass per-file flags */ 780 - struct rw_semaphore i_sem; /* protect fi info */ 755 + struct f2fs_rwsem i_sem; /* protect fi info */ 781 756 atomic_t dirty_pages; /* # of dirty pages */ 782 757 f2fs_hash_t chash; /* hash value of given file name */ 783 758 unsigned int clevel; /* maximum level of given file name */ ··· 802 777 struct extent_tree *extent_tree; /* cached extent_tree entry */ 803 778 804 779 /* avoid racing between foreground op and gc */ 805 - struct rw_semaphore i_gc_rwsem[2]; 806 - struct rw_semaphore i_xattr_sem; /* avoid racing between reading and changing EAs */ 780 + struct f2fs_rwsem i_gc_rwsem[2]; 781 + struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */ 807 782 808 783 int i_extra_isize; /* size of extra space located in i_addr */ 809 784 kprojid_t i_projid; /* id for project quota */ ··· 922 897 nid_t max_nid; /* maximum possible node ids */ 923 898 nid_t available_nids; /* # of available node ids */ 924 899 nid_t next_scan_nid; /* the next nid to be scanned */ 900 + nid_t max_rf_node_blocks; /* max # of nodes for recovery */ 925 901 unsigned int ram_thresh; /* control the memory footprint */ 926 902 unsigned int ra_nid_pages; /* # of nid pages to be readaheaded */ 927 903 unsigned int dirty_nats_ratio; /* control dirty nats ratio threshold */ ··· 930 904 /* NAT cache management */ 931 905 struct radix_tree_root nat_root;/* root of the nat entry cache */ 932 906 struct radix_tree_root nat_set_root;/* root of the nat set cache */ 933 - struct rw_semaphore nat_tree_lock; /* protect nat entry tree */ 907 + struct f2fs_rwsem nat_tree_lock; /* protect nat entry tree */ 934 908 struct list_head nat_entries; /* cached nat entry list (clean) */ 935 909 spinlock_t nat_list_lock; /* protect clean nat entry list */ 936 910 unsigned int nat_cnt[MAX_NAT_STATE]; /* the # of cached nat entries */ ··· 1043 1017 struct dirty_seglist_info *dirty_info; /* dirty segment information */ 1044 1018 struct curseg_info *curseg_array; /* active segment information */ 1045 1019 1046 - struct rw_semaphore curseg_lock; /* for preventing curseg change */ 1020 + struct f2fs_rwsem curseg_lock; /* for preventing curseg change */ 1047 1021 1048 1022 block_t seg0_blkaddr; /* block address of 0'th segment */ 1049 1023 block_t main_blkaddr; /* start block address of main area */ ··· 1227 1201 struct bio *bio; /* bios to merge */ 1228 1202 sector_t last_block_in_bio; /* last block number */ 1229 1203 struct f2fs_io_info fio; /* store buffered io info. */ 1230 - struct rw_semaphore io_rwsem; /* blocking op for bio */ 1204 + struct f2fs_rwsem io_rwsem; /* blocking op for bio */ 1231 1205 spinlock_t io_lock; /* serialize DATA/NODE IOs */ 1232 1206 struct list_head io_list; /* track fios */ 1233 1207 struct list_head bio_list; /* bio entry list head */ 1234 - struct rw_semaphore bio_list_lock; /* lock to protect bio entry list */ 1208 + struct f2fs_rwsem bio_list_lock; /* lock to protect bio entry list */ 1235 1209 }; 1236 1210 1237 1211 #define FDEV(i) (sbi->devs[i]) ··· 1293 1267 SBI_QUOTA_SKIP_FLUSH, /* skip flushing quota in current CP */ 1294 1268 SBI_QUOTA_NEED_REPAIR, /* quota file may be corrupted */ 1295 1269 SBI_IS_RESIZEFS, /* resizefs is in process */ 1270 + SBI_IS_FREEZING, /* freezefs is in process */ 1296 1271 }; 1297 1272 1298 1273 enum { ··· 1313 1286 GC_IDLE_AT, 1314 1287 GC_URGENT_HIGH, 1315 1288 GC_URGENT_LOW, 1289 + GC_URGENT_MID, 1316 1290 MAX_GC_MODE, 1317 1291 }; 1318 1292 ··· 1599 1571 struct super_block *sb; /* pointer to VFS super block */ 1600 1572 struct proc_dir_entry *s_proc; /* proc entry */ 1601 1573 struct f2fs_super_block *raw_super; /* raw super block pointer */ 1602 - struct rw_semaphore sb_lock; /* lock for raw super block */ 1574 + struct f2fs_rwsem sb_lock; /* lock for raw super block */ 1603 1575 int valid_super_block; /* valid super block no */ 1604 1576 unsigned long s_flag; /* flags for sbi */ 1605 1577 struct mutex writepages; /* mutex for writepages() */ ··· 1619 1591 /* for bio operations */ 1620 1592 struct f2fs_bio_info *write_io[NR_PAGE_TYPE]; /* for write bios */ 1621 1593 /* keep migration IO order for LFS mode */ 1622 - struct rw_semaphore io_order_lock; 1594 + struct f2fs_rwsem io_order_lock; 1623 1595 mempool_t *write_io_dummy; /* Dummy pages */ 1596 + pgoff_t metapage_eio_ofs; /* EIO page offset */ 1597 + int metapage_eio_cnt; /* EIO count */ 1624 1598 1625 1599 /* for checkpoint */ 1626 1600 struct f2fs_checkpoint *ckpt; /* raw checkpoint pointer */ 1627 1601 int cur_cp_pack; /* remain current cp pack */ 1628 1602 spinlock_t cp_lock; /* for flag in ckpt */ 1629 1603 struct inode *meta_inode; /* cache meta blocks */ 1630 - struct rw_semaphore cp_global_sem; /* checkpoint procedure lock */ 1631 - struct rw_semaphore cp_rwsem; /* blocking FS operations */ 1632 - struct rw_semaphore node_write; /* locking node writes */ 1633 - struct rw_semaphore node_change; /* locking node change */ 1604 + struct f2fs_rwsem cp_global_sem; /* checkpoint procedure lock */ 1605 + struct f2fs_rwsem cp_rwsem; /* blocking FS operations */ 1606 + struct f2fs_rwsem node_write; /* locking node writes */ 1607 + struct f2fs_rwsem node_change; /* locking node change */ 1634 1608 wait_queue_head_t cp_wait; 1635 1609 unsigned long last_time[MAX_TIME]; /* to store time in jiffies */ 1636 1610 long interval_time[MAX_TIME]; /* to store thresholds */ ··· 1692 1662 block_t unusable_block_count; /* # of blocks saved by last cp */ 1693 1663 1694 1664 unsigned int nquota_files; /* # of quota sysfile */ 1695 - struct rw_semaphore quota_sem; /* blocking cp for flags */ 1665 + struct f2fs_rwsem quota_sem; /* blocking cp for flags */ 1696 1666 1697 1667 /* # of pages, see count_type */ 1698 1668 atomic_t nr_pages[NR_COUNT_TYPE]; 1699 1669 /* # of allocated blocks */ 1700 1670 struct percpu_counter alloc_valid_block_count; 1671 + /* # of node block writes as roll forward recovery */ 1672 + struct percpu_counter rf_node_block_count; 1701 1673 1702 1674 /* writeback control */ 1703 1675 atomic_t wb_sync_req[META]; /* count # of WB_SYNC threads */ ··· 1710 1678 struct f2fs_mount_info mount_opt; /* mount options */ 1711 1679 1712 1680 /* for cleaning operations */ 1713 - struct rw_semaphore gc_lock; /* 1681 + struct f2fs_rwsem gc_lock; /* 1714 1682 * semaphore for GC, avoid 1715 1683 * race between GC and GC or CP 1716 1684 */ ··· 1730 1698 1731 1699 /* threshold for gc trials on pinned files */ 1732 1700 u64 gc_pin_file_threshold; 1733 - struct rw_semaphore pin_sem; 1701 + struct f2fs_rwsem pin_sem; 1734 1702 1735 1703 /* maximum # of trials to find a victim segment for SSR and GC */ 1736 1704 unsigned int max_victim_search; ··· 2124 2092 spin_unlock_irqrestore(&sbi->cp_lock, flags); 2125 2093 } 2126 2094 2095 + #define init_f2fs_rwsem(sem) \ 2096 + do { \ 2097 + static struct lock_class_key __key; \ 2098 + \ 2099 + __init_f2fs_rwsem((sem), #sem, &__key); \ 2100 + } while (0) 2101 + 2102 + static inline void __init_f2fs_rwsem(struct f2fs_rwsem *sem, 2103 + const char *sem_name, struct lock_class_key *key) 2104 + { 2105 + __init_rwsem(&sem->internal_rwsem, sem_name, key); 2106 + #ifdef CONFIG_F2FS_UNFAIR_RWSEM 2107 + init_waitqueue_head(&sem->read_waiters); 2108 + #endif 2109 + } 2110 + 2111 + static inline int f2fs_rwsem_is_locked(struct f2fs_rwsem *sem) 2112 + { 2113 + return rwsem_is_locked(&sem->internal_rwsem); 2114 + } 2115 + 2116 + static inline int f2fs_rwsem_is_contended(struct f2fs_rwsem *sem) 2117 + { 2118 + return rwsem_is_contended(&sem->internal_rwsem); 2119 + } 2120 + 2121 + static inline void f2fs_down_read(struct f2fs_rwsem *sem) 2122 + { 2123 + #ifdef CONFIG_F2FS_UNFAIR_RWSEM 2124 + wait_event(sem->read_waiters, down_read_trylock(&sem->internal_rwsem)); 2125 + #else 2126 + down_read(&sem->internal_rwsem); 2127 + #endif 2128 + } 2129 + 2130 + static inline int f2fs_down_read_trylock(struct f2fs_rwsem *sem) 2131 + { 2132 + return down_read_trylock(&sem->internal_rwsem); 2133 + } 2134 + 2135 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 2136 + static inline void f2fs_down_read_nested(struct f2fs_rwsem *sem, int subclass) 2137 + { 2138 + down_read_nested(&sem->internal_rwsem, subclass); 2139 + } 2140 + #else 2141 + #define f2fs_down_read_nested(sem, subclass) f2fs_down_read(sem) 2142 + #endif 2143 + 2144 + static inline void f2fs_up_read(struct f2fs_rwsem *sem) 2145 + { 2146 + up_read(&sem->internal_rwsem); 2147 + } 2148 + 2149 + static inline void f2fs_down_write(struct f2fs_rwsem *sem) 2150 + { 2151 + down_write(&sem->internal_rwsem); 2152 + } 2153 + 2154 + static inline int f2fs_down_write_trylock(struct f2fs_rwsem *sem) 2155 + { 2156 + return down_write_trylock(&sem->internal_rwsem); 2157 + } 2158 + 2159 + static inline void f2fs_up_write(struct f2fs_rwsem *sem) 2160 + { 2161 + up_write(&sem->internal_rwsem); 2162 + #ifdef CONFIG_F2FS_UNFAIR_RWSEM 2163 + wake_up_all(&sem->read_waiters); 2164 + #endif 2165 + } 2166 + 2127 2167 static inline void f2fs_lock_op(struct f2fs_sb_info *sbi) 2128 2168 { 2129 - down_read(&sbi->cp_rwsem); 2169 + f2fs_down_read(&sbi->cp_rwsem); 2130 2170 } 2131 2171 2132 2172 static inline int f2fs_trylock_op(struct f2fs_sb_info *sbi) ··· 2207 2103 f2fs_show_injection_info(sbi, FAULT_LOCK_OP); 2208 2104 return 0; 2209 2105 } 2210 - return down_read_trylock(&sbi->cp_rwsem); 2106 + return f2fs_down_read_trylock(&sbi->cp_rwsem); 2211 2107 } 2212 2108 2213 2109 static inline void f2fs_unlock_op(struct f2fs_sb_info *sbi) 2214 2110 { 2215 - up_read(&sbi->cp_rwsem); 2111 + f2fs_up_read(&sbi->cp_rwsem); 2216 2112 } 2217 2113 2218 2114 static inline void f2fs_lock_all(struct f2fs_sb_info *sbi) 2219 2115 { 2220 - down_write(&sbi->cp_rwsem); 2116 + f2fs_down_write(&sbi->cp_rwsem); 2221 2117 } 2222 2118 2223 2119 static inline void f2fs_unlock_all(struct f2fs_sb_info *sbi) 2224 2120 { 2225 - up_write(&sbi->cp_rwsem); 2121 + f2fs_up_write(&sbi->cp_rwsem); 2226 2122 } 2227 2123 2228 2124 static inline int __get_cp_reason(struct f2fs_sb_info *sbi) ··· 2784 2680 2785 2681 if (is_inflight_io(sbi, type)) 2786 2682 return false; 2683 + 2684 + if (sbi->gc_mode == GC_URGENT_MID) 2685 + return true; 2787 2686 2788 2687 if (sbi->gc_mode == GC_URGENT_LOW && 2789 2688 (type == DISCARD_TIME || type == GC_TIME)) ··· 3686 3579 block_t blkaddr, int type); 3687 3580 int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, 3688 3581 int type, bool sync); 3689 - void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index); 3582 + void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index, 3583 + unsigned int ra_blocks); 3690 3584 long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, 3691 3585 long nr_to_write, enum iostat_type io_type); 3692 3586 void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type);
+89 -86
fs/f2fs/file.c
··· 237 237 struct f2fs_inode_info *fi = F2FS_I(inode); 238 238 nid_t pino; 239 239 240 - down_write(&fi->i_sem); 240 + f2fs_down_write(&fi->i_sem); 241 241 if (file_wrong_pino(inode) && inode->i_nlink == 1 && 242 242 get_parent_ino(inode, &pino)) { 243 243 f2fs_i_pino_write(inode, pino); 244 244 file_got_pino(inode); 245 245 } 246 - up_write(&fi->i_sem); 246 + f2fs_up_write(&fi->i_sem); 247 247 } 248 248 249 249 static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, ··· 318 318 * Both of fdatasync() and fsync() are able to be recovered from 319 319 * sudden-power-off. 320 320 */ 321 - down_read(&F2FS_I(inode)->i_sem); 321 + f2fs_down_read(&F2FS_I(inode)->i_sem); 322 322 cp_reason = need_do_checkpoint(inode); 323 - up_read(&F2FS_I(inode)->i_sem); 323 + f2fs_up_read(&F2FS_I(inode)->i_sem); 324 324 325 325 if (cp_reason) { 326 326 /* all the dirty node pages should be flushed for POR */ ··· 812 812 { 813 813 struct inode *inode = d_inode(path->dentry); 814 814 struct f2fs_inode_info *fi = F2FS_I(inode); 815 - struct f2fs_inode *ri; 815 + struct f2fs_inode *ri = NULL; 816 816 unsigned int flags; 817 817 818 818 if (f2fs_has_extra_attr(inode) && ··· 844 844 STATX_ATTR_NODUMP | 845 845 STATX_ATTR_VERITY); 846 846 847 - generic_fillattr(&init_user_ns, inode, stat); 847 + generic_fillattr(mnt_userns, inode, stat); 848 848 849 849 /* we need to show initial sectors used for inline_data/dentries */ 850 850 if ((S_ISREG(inode->i_mode) && f2fs_has_inline_data(inode)) || ··· 904 904 !f2fs_is_compress_backend_ready(inode)) 905 905 return -EOPNOTSUPP; 906 906 907 - err = setattr_prepare(&init_user_ns, dentry, attr); 907 + err = setattr_prepare(mnt_userns, dentry, attr); 908 908 if (err) 909 909 return err; 910 910 ··· 958 958 return err; 959 959 } 960 960 961 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 961 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 962 962 filemap_invalidate_lock(inode->i_mapping); 963 963 964 964 truncate_setsize(inode, attr->ia_size); ··· 970 970 * larger than i_size. 971 971 */ 972 972 filemap_invalidate_unlock(inode->i_mapping); 973 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 973 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 974 974 if (err) 975 975 return err; 976 976 ··· 980 980 spin_unlock(&F2FS_I(inode)->i_size_lock); 981 981 } 982 982 983 - __setattr_copy(&init_user_ns, inode, attr); 983 + __setattr_copy(mnt_userns, inode, attr); 984 984 985 985 if (attr->ia_valid & ATTR_MODE) { 986 - err = posix_acl_chmod(&init_user_ns, inode, f2fs_get_inode_mode(inode)); 986 + err = posix_acl_chmod(mnt_userns, inode, f2fs_get_inode_mode(inode)); 987 987 988 988 if (is_inode_flag_set(inode, FI_ACL_MODE)) { 989 989 if (!err) ··· 1112 1112 blk_start = (loff_t)pg_start << PAGE_SHIFT; 1113 1113 blk_end = (loff_t)pg_end << PAGE_SHIFT; 1114 1114 1115 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1115 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1116 1116 filemap_invalidate_lock(inode->i_mapping); 1117 1117 1118 1118 truncate_pagecache_range(inode, blk_start, blk_end - 1); ··· 1122 1122 f2fs_unlock_op(sbi); 1123 1123 1124 1124 filemap_invalidate_unlock(inode->i_mapping); 1125 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1125 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1126 1126 } 1127 1127 } 1128 1128 ··· 1355 1355 f2fs_balance_fs(sbi, true); 1356 1356 1357 1357 /* avoid gc operation during block exchange */ 1358 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1358 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1359 1359 filemap_invalidate_lock(inode->i_mapping); 1360 1360 1361 1361 f2fs_lock_op(sbi); ··· 1365 1365 f2fs_unlock_op(sbi); 1366 1366 1367 1367 filemap_invalidate_unlock(inode->i_mapping); 1368 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1368 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1369 1369 return ret; 1370 1370 } 1371 1371 ··· 1500 1500 unsigned int end_offset; 1501 1501 pgoff_t end; 1502 1502 1503 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1503 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1504 1504 filemap_invalidate_lock(mapping); 1505 1505 1506 1506 truncate_pagecache_range(inode, ··· 1514 1514 if (ret) { 1515 1515 f2fs_unlock_op(sbi); 1516 1516 filemap_invalidate_unlock(mapping); 1517 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1517 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1518 1518 goto out; 1519 1519 } 1520 1520 ··· 1526 1526 1527 1527 f2fs_unlock_op(sbi); 1528 1528 filemap_invalidate_unlock(mapping); 1529 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1529 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1530 1530 1531 1531 f2fs_balance_fs(sbi, dn.node_changed); 1532 1532 ··· 1600 1600 idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); 1601 1601 1602 1602 /* avoid gc operation during block exchange */ 1603 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1603 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1604 1604 filemap_invalidate_lock(mapping); 1605 1605 truncate_pagecache(inode, offset); 1606 1606 ··· 1618 1618 f2fs_unlock_op(sbi); 1619 1619 } 1620 1620 filemap_invalidate_unlock(mapping); 1621 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1621 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1622 1622 1623 1623 /* write out all moved pages, if possible */ 1624 1624 filemap_invalidate_lock(mapping); ··· 1674 1674 next_alloc: 1675 1675 if (has_not_enough_free_secs(sbi, 0, 1676 1676 GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi)))) { 1677 - down_write(&sbi->gc_lock); 1677 + f2fs_down_write(&sbi->gc_lock); 1678 1678 err = f2fs_gc(sbi, true, false, false, NULL_SEGNO); 1679 1679 if (err && err != -ENODATA && err != -EAGAIN) 1680 1680 goto out_err; 1681 1681 } 1682 1682 1683 - down_write(&sbi->pin_sem); 1683 + f2fs_down_write(&sbi->pin_sem); 1684 1684 1685 1685 f2fs_lock_op(sbi); 1686 1686 f2fs_allocate_new_section(sbi, CURSEG_COLD_DATA_PINNED, false); ··· 1690 1690 err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_DIO); 1691 1691 file_dont_truncate(inode); 1692 1692 1693 - up_write(&sbi->pin_sem); 1693 + f2fs_up_write(&sbi->pin_sem); 1694 1694 1695 1695 expanded += map.m_len; 1696 1696 sec_len -= map.m_len; ··· 1989 1989 static int f2fs_ioc_start_atomic_write(struct file *filp) 1990 1990 { 1991 1991 struct inode *inode = file_inode(filp); 1992 + struct user_namespace *mnt_userns = file_mnt_user_ns(filp); 1992 1993 struct f2fs_inode_info *fi = F2FS_I(inode); 1993 1994 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 1994 1995 int ret; 1995 1996 1996 - if (!inode_owner_or_capable(&init_user_ns, inode)) 1997 + if (!inode_owner_or_capable(mnt_userns, inode)) 1997 1998 return -EACCES; 1998 1999 1999 2000 if (!S_ISREG(inode->i_mode)) ··· 2009 2008 2010 2009 inode_lock(inode); 2011 2010 2012 - f2fs_disable_compressed_file(inode); 2011 + if (!f2fs_disable_compressed_file(inode)) { 2012 + ret = -EINVAL; 2013 + goto out; 2014 + } 2013 2015 2014 2016 if (f2fs_is_atomic_file(inode)) { 2015 2017 if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) ··· 2024 2020 if (ret) 2025 2021 goto out; 2026 2022 2027 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 2023 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 2028 2024 2029 2025 /* 2030 2026 * Should wait end_io to count F2FS_WB_CP_DATA correctly by ··· 2035 2031 inode->i_ino, get_dirty_pages(inode)); 2036 2032 ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); 2037 2033 if (ret) { 2038 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 2034 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 2039 2035 goto out; 2040 2036 } 2041 2037 ··· 2048 2044 /* add inode in inmem_list first and set atomic_file */ 2049 2045 set_inode_flag(inode, FI_ATOMIC_FILE); 2050 2046 clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); 2051 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 2047 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 2052 2048 2053 2049 f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); 2054 2050 F2FS_I(inode)->inmem_task = current; ··· 2062 2058 static int f2fs_ioc_commit_atomic_write(struct file *filp) 2063 2059 { 2064 2060 struct inode *inode = file_inode(filp); 2061 + struct user_namespace *mnt_userns = file_mnt_user_ns(filp); 2065 2062 int ret; 2066 2063 2067 - if (!inode_owner_or_capable(&init_user_ns, inode)) 2064 + if (!inode_owner_or_capable(mnt_userns, inode)) 2068 2065 return -EACCES; 2069 2066 2070 2067 ret = mnt_want_write_file(filp); ··· 2105 2100 static int f2fs_ioc_start_volatile_write(struct file *filp) 2106 2101 { 2107 2102 struct inode *inode = file_inode(filp); 2103 + struct user_namespace *mnt_userns = file_mnt_user_ns(filp); 2108 2104 int ret; 2109 2105 2110 - if (!inode_owner_or_capable(&init_user_ns, inode)) 2106 + if (!inode_owner_or_capable(mnt_userns, inode)) 2111 2107 return -EACCES; 2112 2108 2113 2109 if (!S_ISREG(inode->i_mode)) ··· 2141 2135 static int f2fs_ioc_release_volatile_write(struct file *filp) 2142 2136 { 2143 2137 struct inode *inode = file_inode(filp); 2138 + struct user_namespace *mnt_userns = file_mnt_user_ns(filp); 2144 2139 int ret; 2145 2140 2146 - if (!inode_owner_or_capable(&init_user_ns, inode)) 2141 + if (!inode_owner_or_capable(mnt_userns, inode)) 2147 2142 return -EACCES; 2148 2143 2149 2144 ret = mnt_want_write_file(filp); ··· 2171 2164 static int f2fs_ioc_abort_volatile_write(struct file *filp) 2172 2165 { 2173 2166 struct inode *inode = file_inode(filp); 2167 + struct user_namespace *mnt_userns = file_mnt_user_ns(filp); 2174 2168 int ret; 2175 2169 2176 - if (!inode_owner_or_capable(&init_user_ns, inode)) 2170 + if (!inode_owner_or_capable(mnt_userns, inode)) 2177 2171 return -EACCES; 2178 2172 2179 2173 ret = mnt_want_write_file(filp); ··· 2359 2351 if (err) 2360 2352 return err; 2361 2353 2362 - down_write(&sbi->sb_lock); 2354 + f2fs_down_write(&sbi->sb_lock); 2363 2355 2364 2356 if (uuid_is_nonzero(sbi->raw_super->encrypt_pw_salt)) 2365 2357 goto got_it; ··· 2378 2370 16)) 2379 2371 err = -EFAULT; 2380 2372 out_err: 2381 - up_write(&sbi->sb_lock); 2373 + f2fs_up_write(&sbi->sb_lock); 2382 2374 mnt_drop_write_file(filp); 2383 2375 return err; 2384 2376 } ··· 2455 2447 return ret; 2456 2448 2457 2449 if (!sync) { 2458 - if (!down_write_trylock(&sbi->gc_lock)) { 2450 + if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 2459 2451 ret = -EBUSY; 2460 2452 goto out; 2461 2453 } 2462 2454 } else { 2463 - down_write(&sbi->gc_lock); 2455 + f2fs_down_write(&sbi->gc_lock); 2464 2456 } 2465 2457 2466 2458 ret = f2fs_gc(sbi, sync, true, false, NULL_SEGNO); ··· 2491 2483 2492 2484 do_more: 2493 2485 if (!range->sync) { 2494 - if (!down_write_trylock(&sbi->gc_lock)) { 2486 + if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 2495 2487 ret = -EBUSY; 2496 2488 goto out; 2497 2489 } 2498 2490 } else { 2499 - down_write(&sbi->gc_lock); 2491 + f2fs_down_write(&sbi->gc_lock); 2500 2492 } 2501 2493 2502 2494 ret = f2fs_gc(sbi, range->sync, true, false, ··· 2567 2559 bool fragmented = false; 2568 2560 int err; 2569 2561 2570 - /* if in-place-update policy is enabled, don't waste time here */ 2571 - if (f2fs_should_update_inplace(inode, NULL)) 2572 - return -EINVAL; 2573 - 2574 2562 pg_start = range->start >> PAGE_SHIFT; 2575 2563 pg_end = (range->start + range->len) >> PAGE_SHIFT; 2576 2564 2577 2565 f2fs_balance_fs(sbi, true); 2578 2566 2579 2567 inode_lock(inode); 2568 + 2569 + /* if in-place-update policy is enabled, don't waste time here */ 2570 + set_inode_flag(inode, FI_OPU_WRITE); 2571 + if (f2fs_should_update_inplace(inode, NULL)) { 2572 + err = -EINVAL; 2573 + goto out; 2574 + } 2580 2575 2581 2576 /* writeback all dirty pages in the range */ 2582 2577 err = filemap_write_and_wait_range(inode->i_mapping, range->start, ··· 2662 2651 goto check; 2663 2652 } 2664 2653 2665 - set_inode_flag(inode, FI_DO_DEFRAG); 2654 + set_inode_flag(inode, FI_SKIP_WRITES); 2666 2655 2667 2656 idx = map.m_lblk; 2668 2657 while (idx < map.m_lblk + map.m_len && cnt < blk_per_seg) { ··· 2687 2676 if (map.m_lblk < pg_end && cnt < blk_per_seg) 2688 2677 goto do_map; 2689 2678 2690 - clear_inode_flag(inode, FI_DO_DEFRAG); 2679 + clear_inode_flag(inode, FI_SKIP_WRITES); 2691 2680 2692 2681 err = filemap_fdatawrite(inode->i_mapping); 2693 2682 if (err) 2694 2683 goto out; 2695 2684 } 2696 2685 clear_out: 2697 - clear_inode_flag(inode, FI_DO_DEFRAG); 2686 + clear_inode_flag(inode, FI_SKIP_WRITES); 2698 2687 out: 2688 + clear_inode_flag(inode, FI_OPU_WRITE); 2699 2689 inode_unlock(inode); 2700 2690 if (!err) 2701 2691 range->len = (u64)total << PAGE_SHIFT; ··· 2832 2820 2833 2821 f2fs_balance_fs(sbi, true); 2834 2822 2835 - down_write(&F2FS_I(src)->i_gc_rwsem[WRITE]); 2823 + f2fs_down_write(&F2FS_I(src)->i_gc_rwsem[WRITE]); 2836 2824 if (src != dst) { 2837 2825 ret = -EBUSY; 2838 - if (!down_write_trylock(&F2FS_I(dst)->i_gc_rwsem[WRITE])) 2826 + if (!f2fs_down_write_trylock(&F2FS_I(dst)->i_gc_rwsem[WRITE])) 2839 2827 goto out_src; 2840 2828 } 2841 2829 ··· 2853 2841 f2fs_unlock_op(sbi); 2854 2842 2855 2843 if (src != dst) 2856 - up_write(&F2FS_I(dst)->i_gc_rwsem[WRITE]); 2844 + f2fs_up_write(&F2FS_I(dst)->i_gc_rwsem[WRITE]); 2857 2845 out_src: 2858 - up_write(&F2FS_I(src)->i_gc_rwsem[WRITE]); 2846 + f2fs_up_write(&F2FS_I(src)->i_gc_rwsem[WRITE]); 2859 2847 out_unlock: 2860 2848 if (src != dst) 2861 2849 inode_unlock(dst); ··· 2950 2938 end_segno = min(start_segno + range.segments, dev_end_segno); 2951 2939 2952 2940 while (start_segno < end_segno) { 2953 - if (!down_write_trylock(&sbi->gc_lock)) { 2941 + if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 2954 2942 ret = -EBUSY; 2955 2943 goto out; 2956 2944 } ··· 3002 2990 { 3003 2991 struct f2fs_inode_info *fi = F2FS_I(inode); 3004 2992 struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 3005 - struct page *ipage; 2993 + struct f2fs_inode *ri = NULL; 3006 2994 kprojid_t kprojid; 3007 2995 int err; 3008 2996 ··· 3026 3014 if (IS_NOQUOTA(inode)) 3027 3015 return err; 3028 3016 3029 - ipage = f2fs_get_node_page(sbi, inode->i_ino); 3030 - if (IS_ERR(ipage)) 3031 - return PTR_ERR(ipage); 3032 - 3033 - if (!F2FS_FITS_IN_INODE(F2FS_INODE(ipage), fi->i_extra_isize, 3034 - i_projid)) { 3035 - err = -EOVERFLOW; 3036 - f2fs_put_page(ipage, 1); 3037 - return err; 3038 - } 3039 - f2fs_put_page(ipage, 1); 3017 + if (!F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_projid)) 3018 + return -EOVERFLOW; 3040 3019 3041 3020 err = f2fs_dquot_initialize(inode); 3042 3021 if (err) ··· 3218 3215 while (map.m_lblk < end) { 3219 3216 map.m_len = end - map.m_lblk; 3220 3217 3221 - down_write(&fi->i_gc_rwsem[WRITE]); 3218 + f2fs_down_write(&fi->i_gc_rwsem[WRITE]); 3222 3219 err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_PRECACHE); 3223 - up_write(&fi->i_gc_rwsem[WRITE]); 3220 + f2fs_up_write(&fi->i_gc_rwsem[WRITE]); 3224 3221 if (err) 3225 3222 return err; 3226 3223 ··· 3297 3294 if (!vbuf) 3298 3295 return -ENOMEM; 3299 3296 3300 - down_read(&sbi->sb_lock); 3297 + f2fs_down_read(&sbi->sb_lock); 3301 3298 count = utf16s_to_utf8s(sbi->raw_super->volume_name, 3302 3299 ARRAY_SIZE(sbi->raw_super->volume_name), 3303 3300 UTF16_LITTLE_ENDIAN, vbuf, MAX_VOLUME_NAME); 3304 - up_read(&sbi->sb_lock); 3301 + f2fs_up_read(&sbi->sb_lock); 3305 3302 3306 3303 if (copy_to_user((char __user *)arg, vbuf, 3307 3304 min(FSLABEL_MAX, count))) ··· 3329 3326 if (err) 3330 3327 goto out; 3331 3328 3332 - down_write(&sbi->sb_lock); 3329 + f2fs_down_write(&sbi->sb_lock); 3333 3330 3334 3331 memset(sbi->raw_super->volume_name, 0, 3335 3332 sizeof(sbi->raw_super->volume_name)); ··· 3339 3336 3340 3337 err = f2fs_commit_super(sbi, false); 3341 3338 3342 - up_write(&sbi->sb_lock); 3339 + f2fs_up_write(&sbi->sb_lock); 3343 3340 3344 3341 mnt_drop_write_file(filp); 3345 3342 out: ··· 3465 3462 if (!atomic_read(&F2FS_I(inode)->i_compr_blocks)) 3466 3463 goto out; 3467 3464 3468 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3465 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3469 3466 filemap_invalidate_lock(inode->i_mapping); 3470 3467 3471 3468 last_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); ··· 3502 3499 } 3503 3500 3504 3501 filemap_invalidate_unlock(inode->i_mapping); 3505 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3502 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3506 3503 out: 3507 3504 inode_unlock(inode); 3508 3505 ··· 3618 3615 goto unlock_inode; 3619 3616 } 3620 3617 3621 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3618 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3622 3619 filemap_invalidate_lock(inode->i_mapping); 3623 3620 3624 3621 last_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); ··· 3655 3652 } 3656 3653 3657 3654 filemap_invalidate_unlock(inode->i_mapping); 3658 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3655 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3659 3656 3660 3657 if (ret >= 0) { 3661 3658 clear_inode_flag(inode, FI_COMPRESS_RELEASED); ··· 3773 3770 if (ret) 3774 3771 goto err; 3775 3772 3776 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3773 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3777 3774 filemap_invalidate_lock(mapping); 3778 3775 3779 3776 ret = filemap_write_and_wait_range(mapping, range.start, ··· 3862 3859 prev_block, len, range.flags); 3863 3860 out: 3864 3861 filemap_invalidate_unlock(mapping); 3865 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3862 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 3866 3863 err: 3867 3864 inode_unlock(inode); 3868 3865 file_end_write(filp); ··· 4294 4291 trace_f2fs_direct_IO_enter(inode, iocb, count, READ); 4295 4292 4296 4293 if (iocb->ki_flags & IOCB_NOWAIT) { 4297 - if (!down_read_trylock(&fi->i_gc_rwsem[READ])) { 4294 + if (!f2fs_down_read_trylock(&fi->i_gc_rwsem[READ])) { 4298 4295 ret = -EAGAIN; 4299 4296 goto out; 4300 4297 } 4301 4298 } else { 4302 - down_read(&fi->i_gc_rwsem[READ]); 4299 + f2fs_down_read(&fi->i_gc_rwsem[READ]); 4303 4300 } 4304 4301 4305 4302 /* ··· 4318 4315 ret = iomap_dio_complete(dio); 4319 4316 } 4320 4317 4321 - up_read(&fi->i_gc_rwsem[READ]); 4318 + f2fs_up_read(&fi->i_gc_rwsem[READ]); 4322 4319 4323 4320 file_accessed(file); 4324 4321 out: ··· 4500 4497 goto out; 4501 4498 } 4502 4499 4503 - if (!down_read_trylock(&fi->i_gc_rwsem[WRITE])) { 4500 + if (!f2fs_down_read_trylock(&fi->i_gc_rwsem[WRITE])) { 4504 4501 ret = -EAGAIN; 4505 4502 goto out; 4506 4503 } 4507 - if (do_opu && !down_read_trylock(&fi->i_gc_rwsem[READ])) { 4508 - up_read(&fi->i_gc_rwsem[WRITE]); 4504 + if (do_opu && !f2fs_down_read_trylock(&fi->i_gc_rwsem[READ])) { 4505 + f2fs_up_read(&fi->i_gc_rwsem[WRITE]); 4509 4506 ret = -EAGAIN; 4510 4507 goto out; 4511 4508 } ··· 4514 4511 if (ret) 4515 4512 goto out; 4516 4513 4517 - down_read(&fi->i_gc_rwsem[WRITE]); 4514 + f2fs_down_read(&fi->i_gc_rwsem[WRITE]); 4518 4515 if (do_opu) 4519 - down_read(&fi->i_gc_rwsem[READ]); 4516 + f2fs_down_read(&fi->i_gc_rwsem[READ]); 4520 4517 } 4521 4518 if (whint_mode == WHINT_MODE_OFF) 4522 4519 iocb->ki_hint = WRITE_LIFE_NOT_SET; ··· 4545 4542 if (whint_mode == WHINT_MODE_OFF) 4546 4543 iocb->ki_hint = hint; 4547 4544 if (do_opu) 4548 - up_read(&fi->i_gc_rwsem[READ]); 4549 - up_read(&fi->i_gc_rwsem[WRITE]); 4545 + f2fs_up_read(&fi->i_gc_rwsem[READ]); 4546 + f2fs_up_read(&fi->i_gc_rwsem[WRITE]); 4550 4547 4551 4548 if (ret < 0) 4552 4549 goto out; ··· 4647 4644 4648 4645 /* Don't leave any preallocated blocks around past i_size. */ 4649 4646 if (preallocated && i_size_read(inode) < target_size) { 4650 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 4647 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 4651 4648 filemap_invalidate_lock(inode->i_mapping); 4652 4649 if (!f2fs_truncate(inode)) 4653 4650 file_dont_truncate(inode); 4654 4651 filemap_invalidate_unlock(inode->i_mapping); 4655 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 4652 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 4656 4653 } else { 4657 4654 file_dont_truncate(inode); 4658 4655 }
+29 -24
fs/f2fs/gc.c
··· 103 103 sbi->gc_urgent_high_remaining--; 104 104 } 105 105 spin_unlock(&sbi->gc_urgent_high_lock); 106 + } 106 107 108 + if (sbi->gc_mode == GC_URGENT_HIGH || 109 + sbi->gc_mode == GC_URGENT_MID) { 107 110 wait_ms = gc_th->urgent_sleep_time; 108 - down_write(&sbi->gc_lock); 111 + f2fs_down_write(&sbi->gc_lock); 109 112 goto do_gc; 110 113 } 111 114 112 115 if (foreground) { 113 - down_write(&sbi->gc_lock); 116 + f2fs_down_write(&sbi->gc_lock); 114 117 goto do_gc; 115 - } else if (!down_write_trylock(&sbi->gc_lock)) { 118 + } else if (!f2fs_down_write_trylock(&sbi->gc_lock)) { 116 119 stat_other_skip_bggc_count(sbi); 117 120 goto next; 118 121 } 119 122 120 123 if (!is_idle(sbi, GC_TIME)) { 121 124 increase_sleep_time(gc_th, &wait_ms); 122 - up_write(&sbi->gc_lock); 125 + f2fs_up_write(&sbi->gc_lock); 123 126 stat_io_skip_bggc_count(sbi); 124 127 goto next; 125 128 } ··· 1041 1038 set_sbi_flag(sbi, SBI_NEED_FSCK); 1042 1039 } 1043 1040 1044 - if (f2fs_check_nid_range(sbi, dni->ino)) 1041 + if (f2fs_check_nid_range(sbi, dni->ino)) { 1042 + f2fs_put_page(node_page, 1); 1045 1043 return false; 1044 + } 1046 1045 1047 1046 *nofs = ofs_of_node(node_page); 1048 1047 source_blkaddr = data_blkaddr(NULL, node_page, ofs_in_node); ··· 1235 1230 fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr; 1236 1231 1237 1232 if (lfs_mode) 1238 - down_write(&fio.sbi->io_order_lock); 1233 + f2fs_down_write(&fio.sbi->io_order_lock); 1239 1234 1240 1235 mpage = f2fs_grab_cache_page(META_MAPPING(fio.sbi), 1241 1236 fio.old_blkaddr, false); ··· 1321 1316 true, true, true); 1322 1317 up_out: 1323 1318 if (lfs_mode) 1324 - up_write(&fio.sbi->io_order_lock); 1319 + f2fs_up_write(&fio.sbi->io_order_lock); 1325 1320 put_out: 1326 1321 f2fs_put_dnode(&dn); 1327 1322 out: ··· 1480 1475 special_file(inode->i_mode)) 1481 1476 continue; 1482 1477 1483 - if (!down_write_trylock( 1478 + if (!f2fs_down_write_trylock( 1484 1479 &F2FS_I(inode)->i_gc_rwsem[WRITE])) { 1485 1480 iput(inode); 1486 1481 sbi->skipped_gc_rwsem++; ··· 1493 1488 if (f2fs_post_read_required(inode)) { 1494 1489 int err = ra_data_block(inode, start_bidx); 1495 1490 1496 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1491 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1497 1492 if (err) { 1498 1493 iput(inode); 1499 1494 continue; ··· 1504 1499 1505 1500 data_page = f2fs_get_read_data_page(inode, 1506 1501 start_bidx, REQ_RAHEAD, true); 1507 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1502 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 1508 1503 if (IS_ERR(data_page)) { 1509 1504 iput(inode); 1510 1505 continue; ··· 1523 1518 int err; 1524 1519 1525 1520 if (S_ISREG(inode->i_mode)) { 1526 - if (!down_write_trylock(&fi->i_gc_rwsem[READ])) { 1521 + if (!f2fs_down_write_trylock(&fi->i_gc_rwsem[READ])) { 1527 1522 sbi->skipped_gc_rwsem++; 1528 1523 continue; 1529 1524 } 1530 - if (!down_write_trylock( 1525 + if (!f2fs_down_write_trylock( 1531 1526 &fi->i_gc_rwsem[WRITE])) { 1532 1527 sbi->skipped_gc_rwsem++; 1533 - up_write(&fi->i_gc_rwsem[READ]); 1528 + f2fs_up_write(&fi->i_gc_rwsem[READ]); 1534 1529 continue; 1535 1530 } 1536 1531 locked = true; ··· 1553 1548 submitted++; 1554 1549 1555 1550 if (locked) { 1556 - up_write(&fi->i_gc_rwsem[WRITE]); 1557 - up_write(&fi->i_gc_rwsem[READ]); 1551 + f2fs_up_write(&fi->i_gc_rwsem[WRITE]); 1552 + f2fs_up_write(&fi->i_gc_rwsem[READ]); 1558 1553 } 1559 1554 1560 1555 stat_inc_data_blk_count(sbi, 1, gc_type); ··· 1812 1807 reserved_segments(sbi), 1813 1808 prefree_segments(sbi)); 1814 1809 1815 - up_write(&sbi->gc_lock); 1810 + f2fs_up_write(&sbi->gc_lock); 1816 1811 1817 1812 put_gc_inode(&gc_list); 1818 1813 ··· 1941 1936 long long block_count; 1942 1937 int segs = secs * sbi->segs_per_sec; 1943 1938 1944 - down_write(&sbi->sb_lock); 1939 + f2fs_down_write(&sbi->sb_lock); 1945 1940 1946 1941 section_count = le32_to_cpu(raw_sb->section_count); 1947 1942 segment_count = le32_to_cpu(raw_sb->segment_count); ··· 1962 1957 cpu_to_le32(dev_segs + segs); 1963 1958 } 1964 1959 1965 - up_write(&sbi->sb_lock); 1960 + f2fs_up_write(&sbi->sb_lock); 1966 1961 } 1967 1962 1968 1963 static void update_fs_metadata(struct f2fs_sb_info *sbi, int secs) ··· 2036 2031 secs = div_u64(shrunk_blocks, BLKS_PER_SEC(sbi)); 2037 2032 2038 2033 /* stop other GC */ 2039 - if (!down_write_trylock(&sbi->gc_lock)) 2034 + if (!f2fs_down_write_trylock(&sbi->gc_lock)) 2040 2035 return -EAGAIN; 2041 2036 2042 2037 /* stop CP to protect MAIN_SEC in free_segment_range */ ··· 2056 2051 2057 2052 out_unlock: 2058 2053 f2fs_unlock_op(sbi); 2059 - up_write(&sbi->gc_lock); 2054 + f2fs_up_write(&sbi->gc_lock); 2060 2055 if (err) 2061 2056 return err; 2062 2057 2063 2058 set_sbi_flag(sbi, SBI_IS_RESIZEFS); 2064 2059 2065 2060 freeze_super(sbi->sb); 2066 - down_write(&sbi->gc_lock); 2067 - down_write(&sbi->cp_global_sem); 2061 + f2fs_down_write(&sbi->gc_lock); 2062 + f2fs_down_write(&sbi->cp_global_sem); 2068 2063 2069 2064 spin_lock(&sbi->stat_lock); 2070 2065 if (shrunk_blocks + valid_user_blocks(sbi) + ··· 2109 2104 spin_unlock(&sbi->stat_lock); 2110 2105 } 2111 2106 out_err: 2112 - up_write(&sbi->cp_global_sem); 2113 - up_write(&sbi->gc_lock); 2107 + f2fs_up_write(&sbi->cp_global_sem); 2108 + f2fs_up_write(&sbi->gc_lock); 2114 2109 thaw_super(sbi->sb); 2115 2110 clear_sbi_flag(sbi, SBI_IS_RESIZEFS); 2116 2111 return err;
+2 -2
fs/f2fs/inline.c
··· 629 629 } 630 630 631 631 if (inode) { 632 - down_write(&F2FS_I(inode)->i_sem); 632 + f2fs_down_write(&F2FS_I(inode)->i_sem); 633 633 page = f2fs_init_inode_metadata(inode, dir, fname, ipage); 634 634 if (IS_ERR(page)) { 635 635 err = PTR_ERR(page); ··· 658 658 f2fs_update_parent_metadata(dir, inode, 0); 659 659 fail: 660 660 if (inode) 661 - up_write(&F2FS_I(inode)->i_sem); 661 + f2fs_up_write(&F2FS_I(inode)->i_sem); 662 662 out: 663 663 f2fs_put_page(ipage, 1); 664 664 return err;
+5 -2
fs/f2fs/inode.c
··· 778 778 f2fs_remove_ino_entry(sbi, inode->i_ino, UPDATE_INO); 779 779 f2fs_remove_ino_entry(sbi, inode->i_ino, FLUSH_INO); 780 780 781 - sb_start_intwrite(inode->i_sb); 781 + if (!is_sbi_flag_set(sbi, SBI_IS_FREEZING)) 782 + sb_start_intwrite(inode->i_sb); 782 783 set_inode_flag(inode, FI_NO_ALLOC); 783 784 i_size_write(inode, 0); 784 785 retry: ··· 810 809 if (dquot_initialize_needed(inode)) 811 810 set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR); 812 811 } 813 - sb_end_intwrite(inode->i_sb); 812 + if (!is_sbi_flag_set(sbi, SBI_IS_FREEZING)) 813 + sb_end_intwrite(inode->i_sb); 814 814 no_delete: 815 815 dquot_drop(inode); 816 816 ··· 887 885 err = f2fs_get_node_info(sbi, inode->i_ino, &ni, false); 888 886 if (err) { 889 887 set_sbi_flag(sbi, SBI_NEED_FSCK); 888 + set_inode_flag(inode, FI_FREE_NID); 890 889 f2fs_warn(sbi, "May loss orphan inode, run fsck to fix."); 891 890 goto out; 892 891 }
+41 -37
fs/f2fs/namei.c
··· 22 22 #include "acl.h" 23 23 #include <trace/events/f2fs.h> 24 24 25 - static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) 25 + static struct inode *f2fs_new_inode(struct user_namespace *mnt_userns, 26 + struct inode *dir, umode_t mode) 26 27 { 27 28 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 28 29 nid_t ino; ··· 47 46 48 47 nid_free = true; 49 48 50 - inode_init_owner(&init_user_ns, inode, dir, mode); 49 + inode_init_owner(mnt_userns, inode, dir, mode); 51 50 52 51 inode->i_ino = ino; 53 52 inode->i_blocks = 0; ··· 68 67 (F2FS_I(dir)->i_flags & F2FS_PROJINHERIT_FL)) 69 68 F2FS_I(inode)->i_projid = F2FS_I(dir)->i_projid; 70 69 else 71 - F2FS_I(inode)->i_projid = make_kprojid(&init_user_ns, 70 + F2FS_I(inode)->i_projid = make_kprojid(mnt_userns, 72 71 F2FS_DEF_PROJID); 73 72 74 73 err = fscrypt_prepare_new_inode(dir, inode, &encrypt); ··· 197 196 __u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list; 198 197 int i, cold_count, hot_count; 199 198 200 - down_read(&sbi->sb_lock); 199 + f2fs_down_read(&sbi->sb_lock); 201 200 202 201 cold_count = le32_to_cpu(sbi->raw_super->extension_count); 203 202 hot_count = sbi->raw_super->hot_ext_count; ··· 207 206 break; 208 207 } 209 208 210 - up_read(&sbi->sb_lock); 209 + f2fs_up_read(&sbi->sb_lock); 211 210 212 211 if (i == cold_count + hot_count) 213 212 return; ··· 300 299 (!ext_cnt && !noext_cnt)) 301 300 return; 302 301 303 - down_read(&sbi->sb_lock); 302 + f2fs_down_read(&sbi->sb_lock); 304 303 305 304 cold_count = le32_to_cpu(sbi->raw_super->extension_count); 306 305 hot_count = sbi->raw_super->hot_ext_count; 307 306 308 307 for (i = cold_count; i < cold_count + hot_count; i++) { 309 308 if (is_extension_exist(name, extlist[i], false)) { 310 - up_read(&sbi->sb_lock); 309 + f2fs_up_read(&sbi->sb_lock); 311 310 return; 312 311 } 313 312 } 314 313 315 - up_read(&sbi->sb_lock); 314 + f2fs_up_read(&sbi->sb_lock); 316 315 317 316 for (i = 0; i < noext_cnt; i++) { 318 317 if (is_extension_exist(name, noext[i], false)) { ··· 350 349 if (err) 351 350 return err; 352 351 353 - inode = f2fs_new_inode(dir, mode); 352 + inode = f2fs_new_inode(mnt_userns, dir, mode); 354 353 if (IS_ERR(inode)) 355 354 return PTR_ERR(inode); 356 355 ··· 680 679 if (err) 681 680 return err; 682 681 683 - inode = f2fs_new_inode(dir, S_IFLNK | S_IRWXUGO); 682 + inode = f2fs_new_inode(mnt_userns, dir, S_IFLNK | S_IRWXUGO); 684 683 if (IS_ERR(inode)) 685 684 return PTR_ERR(inode); 686 685 ··· 751 750 if (err) 752 751 return err; 753 752 754 - inode = f2fs_new_inode(dir, S_IFDIR | mode); 753 + inode = f2fs_new_inode(mnt_userns, dir, S_IFDIR | mode); 755 754 if (IS_ERR(inode)) 756 755 return PTR_ERR(inode); 757 756 ··· 808 807 if (err) 809 808 return err; 810 809 811 - inode = f2fs_new_inode(dir, mode); 810 + inode = f2fs_new_inode(mnt_userns, dir, mode); 812 811 if (IS_ERR(inode)) 813 812 return PTR_ERR(inode); 814 813 ··· 835 834 return err; 836 835 } 837 836 838 - static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry, 839 - umode_t mode, struct inode **whiteout) 837 + static int __f2fs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir, 838 + struct dentry *dentry, umode_t mode, 839 + struct inode **whiteout) 840 840 { 841 841 struct f2fs_sb_info *sbi = F2FS_I_SB(dir); 842 842 struct inode *inode; ··· 847 845 if (err) 848 846 return err; 849 847 850 - inode = f2fs_new_inode(dir, mode); 848 + inode = f2fs_new_inode(mnt_userns, dir, mode); 851 849 if (IS_ERR(inode)) 852 850 return PTR_ERR(inode); 853 851 ··· 911 909 if (!f2fs_is_checkpoint_ready(sbi)) 912 910 return -ENOSPC; 913 911 914 - return __f2fs_tmpfile(dir, dentry, mode, NULL); 912 + return __f2fs_tmpfile(mnt_userns, dir, dentry, mode, NULL); 915 913 } 916 914 917 - static int f2fs_create_whiteout(struct inode *dir, struct inode **whiteout) 915 + static int f2fs_create_whiteout(struct user_namespace *mnt_userns, 916 + struct inode *dir, struct inode **whiteout) 918 917 { 919 918 if (unlikely(f2fs_cp_error(F2FS_I_SB(dir)))) 920 919 return -EIO; 921 920 922 - return __f2fs_tmpfile(dir, NULL, S_IFCHR | WHITEOUT_MODE, whiteout); 921 + return __f2fs_tmpfile(mnt_userns, dir, NULL, 922 + S_IFCHR | WHITEOUT_MODE, whiteout); 923 923 } 924 924 925 - static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, 926 - struct inode *new_dir, struct dentry *new_dentry, 927 - unsigned int flags) 925 + static int f2fs_rename(struct user_namespace *mnt_userns, struct inode *old_dir, 926 + struct dentry *old_dentry, struct inode *new_dir, 927 + struct dentry *new_dentry, unsigned int flags) 928 928 { 929 929 struct f2fs_sb_info *sbi = F2FS_I_SB(old_dir); 930 930 struct inode *old_inode = d_inode(old_dentry); ··· 964 960 } 965 961 966 962 if (flags & RENAME_WHITEOUT) { 967 - err = f2fs_create_whiteout(old_dir, &whiteout); 963 + err = f2fs_create_whiteout(mnt_userns, old_dir, &whiteout); 968 964 if (err) 969 965 return err; 970 966 } ··· 1027 1023 new_page = NULL; 1028 1024 1029 1025 new_inode->i_ctime = current_time(new_inode); 1030 - down_write(&F2FS_I(new_inode)->i_sem); 1026 + f2fs_down_write(&F2FS_I(new_inode)->i_sem); 1031 1027 if (old_dir_entry) 1032 1028 f2fs_i_links_write(new_inode, false); 1033 1029 f2fs_i_links_write(new_inode, false); 1034 - up_write(&F2FS_I(new_inode)->i_sem); 1030 + f2fs_up_write(&F2FS_I(new_inode)->i_sem); 1035 1031 1036 1032 if (!new_inode->i_nlink) 1037 1033 f2fs_add_orphan_inode(new_inode); ··· 1052 1048 f2fs_i_links_write(new_dir, true); 1053 1049 } 1054 1050 1055 - down_write(&F2FS_I(old_inode)->i_sem); 1051 + f2fs_down_write(&F2FS_I(old_inode)->i_sem); 1056 1052 if (!old_dir_entry || whiteout) 1057 1053 file_lost_pino(old_inode); 1058 1054 else 1059 1055 /* adjust dir's i_pino to pass fsck check */ 1060 1056 f2fs_i_pino_write(old_inode, new_dir->i_ino); 1061 - up_write(&F2FS_I(old_inode)->i_sem); 1057 + f2fs_up_write(&F2FS_I(old_inode)->i_sem); 1062 1058 1063 1059 old_inode->i_ctime = current_time(old_inode); 1064 1060 f2fs_mark_inode_dirty_sync(old_inode, false); ··· 1111 1107 out_old: 1112 1108 f2fs_put_page(old_page, 0); 1113 1109 out: 1114 - if (whiteout) 1115 - iput(whiteout); 1110 + iput(whiteout); 1116 1111 return err; 1117 1112 } 1118 1113 ··· 1217 1214 /* update directory entry info of old dir inode */ 1218 1215 f2fs_set_link(old_dir, old_entry, old_page, new_inode); 1219 1216 1220 - down_write(&F2FS_I(old_inode)->i_sem); 1217 + f2fs_down_write(&F2FS_I(old_inode)->i_sem); 1221 1218 if (!old_dir_entry) 1222 1219 file_lost_pino(old_inode); 1223 1220 else 1224 1221 /* adjust dir's i_pino to pass fsck check */ 1225 1222 f2fs_i_pino_write(old_inode, new_dir->i_ino); 1226 - up_write(&F2FS_I(old_inode)->i_sem); 1223 + f2fs_up_write(&F2FS_I(old_inode)->i_sem); 1227 1224 1228 1225 old_dir->i_ctime = current_time(old_dir); 1229 1226 if (old_nlink) { 1230 - down_write(&F2FS_I(old_dir)->i_sem); 1227 + f2fs_down_write(&F2FS_I(old_dir)->i_sem); 1231 1228 f2fs_i_links_write(old_dir, old_nlink > 0); 1232 - up_write(&F2FS_I(old_dir)->i_sem); 1229 + f2fs_up_write(&F2FS_I(old_dir)->i_sem); 1233 1230 } 1234 1231 f2fs_mark_inode_dirty_sync(old_dir, false); 1235 1232 1236 1233 /* update directory entry info of new dir inode */ 1237 1234 f2fs_set_link(new_dir, new_entry, new_page, old_inode); 1238 1235 1239 - down_write(&F2FS_I(new_inode)->i_sem); 1236 + f2fs_down_write(&F2FS_I(new_inode)->i_sem); 1240 1237 if (!new_dir_entry) 1241 1238 file_lost_pino(new_inode); 1242 1239 else 1243 1240 /* adjust dir's i_pino to pass fsck check */ 1244 1241 f2fs_i_pino_write(new_inode, old_dir->i_ino); 1245 - up_write(&F2FS_I(new_inode)->i_sem); 1242 + f2fs_up_write(&F2FS_I(new_inode)->i_sem); 1246 1243 1247 1244 new_dir->i_ctime = current_time(new_dir); 1248 1245 if (new_nlink) { 1249 - down_write(&F2FS_I(new_dir)->i_sem); 1246 + f2fs_down_write(&F2FS_I(new_dir)->i_sem); 1250 1247 f2fs_i_links_write(new_dir, new_nlink > 0); 1251 - up_write(&F2FS_I(new_dir)->i_sem); 1248 + f2fs_up_write(&F2FS_I(new_dir)->i_sem); 1252 1249 } 1253 1250 f2fs_mark_inode_dirty_sync(new_dir, false); 1254 1251 ··· 1303 1300 * VFS has already handled the new dentry existence case, 1304 1301 * here, we just deal with "RENAME_NOREPLACE" as regular rename. 1305 1302 */ 1306 - return f2fs_rename(old_dir, old_dentry, new_dir, new_dentry, flags); 1303 + return f2fs_rename(mnt_userns, old_dir, old_dentry, 1304 + new_dir, new_dentry, flags); 1307 1305 } 1308 1306 1309 1307 static const char *f2fs_encrypted_get_link(struct dentry *dentry,
+49 -43
fs/f2fs/node.c
··· 382 382 struct nat_entry *e; 383 383 bool need = false; 384 384 385 - down_read(&nm_i->nat_tree_lock); 385 + f2fs_down_read(&nm_i->nat_tree_lock); 386 386 e = __lookup_nat_cache(nm_i, nid); 387 387 if (e) { 388 388 if (!get_nat_flag(e, IS_CHECKPOINTED) && 389 389 !get_nat_flag(e, HAS_FSYNCED_INODE)) 390 390 need = true; 391 391 } 392 - up_read(&nm_i->nat_tree_lock); 392 + f2fs_up_read(&nm_i->nat_tree_lock); 393 393 return need; 394 394 } 395 395 ··· 399 399 struct nat_entry *e; 400 400 bool is_cp = true; 401 401 402 - down_read(&nm_i->nat_tree_lock); 402 + f2fs_down_read(&nm_i->nat_tree_lock); 403 403 e = __lookup_nat_cache(nm_i, nid); 404 404 if (e && !get_nat_flag(e, IS_CHECKPOINTED)) 405 405 is_cp = false; 406 - up_read(&nm_i->nat_tree_lock); 406 + f2fs_up_read(&nm_i->nat_tree_lock); 407 407 return is_cp; 408 408 } 409 409 ··· 413 413 struct nat_entry *e; 414 414 bool need_update = true; 415 415 416 - down_read(&nm_i->nat_tree_lock); 416 + f2fs_down_read(&nm_i->nat_tree_lock); 417 417 e = __lookup_nat_cache(nm_i, ino); 418 418 if (e && get_nat_flag(e, HAS_LAST_FSYNC) && 419 419 (get_nat_flag(e, IS_CHECKPOINTED) || 420 420 get_nat_flag(e, HAS_FSYNCED_INODE))) 421 421 need_update = false; 422 - up_read(&nm_i->nat_tree_lock); 422 + f2fs_up_read(&nm_i->nat_tree_lock); 423 423 return need_update; 424 424 } 425 425 ··· 431 431 struct nat_entry *new, *e; 432 432 433 433 /* Let's mitigate lock contention of nat_tree_lock during checkpoint */ 434 - if (rwsem_is_locked(&sbi->cp_global_sem)) 434 + if (f2fs_rwsem_is_locked(&sbi->cp_global_sem)) 435 435 return; 436 436 437 437 new = __alloc_nat_entry(sbi, nid, false); 438 438 if (!new) 439 439 return; 440 440 441 - down_write(&nm_i->nat_tree_lock); 441 + f2fs_down_write(&nm_i->nat_tree_lock); 442 442 e = __lookup_nat_cache(nm_i, nid); 443 443 if (!e) 444 444 e = __init_nat_entry(nm_i, new, ne, false); ··· 447 447 nat_get_blkaddr(e) != 448 448 le32_to_cpu(ne->block_addr) || 449 449 nat_get_version(e) != ne->version); 450 - up_write(&nm_i->nat_tree_lock); 450 + f2fs_up_write(&nm_i->nat_tree_lock); 451 451 if (e != new) 452 452 __free_nat_entry(new); 453 453 } ··· 459 459 struct nat_entry *e; 460 460 struct nat_entry *new = __alloc_nat_entry(sbi, ni->nid, true); 461 461 462 - down_write(&nm_i->nat_tree_lock); 462 + f2fs_down_write(&nm_i->nat_tree_lock); 463 463 e = __lookup_nat_cache(nm_i, ni->nid); 464 464 if (!e) { 465 465 e = __init_nat_entry(nm_i, new, NULL, true); ··· 508 508 set_nat_flag(e, HAS_FSYNCED_INODE, true); 509 509 set_nat_flag(e, HAS_LAST_FSYNC, fsync_done); 510 510 } 511 - up_write(&nm_i->nat_tree_lock); 511 + f2fs_up_write(&nm_i->nat_tree_lock); 512 512 } 513 513 514 514 int f2fs_try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink) ··· 516 516 struct f2fs_nm_info *nm_i = NM_I(sbi); 517 517 int nr = nr_shrink; 518 518 519 - if (!down_write_trylock(&nm_i->nat_tree_lock)) 519 + if (!f2fs_down_write_trylock(&nm_i->nat_tree_lock)) 520 520 return 0; 521 521 522 522 spin_lock(&nm_i->nat_list_lock); ··· 538 538 } 539 539 spin_unlock(&nm_i->nat_list_lock); 540 540 541 - up_write(&nm_i->nat_tree_lock); 541 + f2fs_up_write(&nm_i->nat_tree_lock); 542 542 return nr - nr_shrink; 543 543 } 544 544 ··· 560 560 ni->nid = nid; 561 561 retry: 562 562 /* Check nat cache */ 563 - down_read(&nm_i->nat_tree_lock); 563 + f2fs_down_read(&nm_i->nat_tree_lock); 564 564 e = __lookup_nat_cache(nm_i, nid); 565 565 if (e) { 566 566 ni->ino = nat_get_ino(e); 567 567 ni->blk_addr = nat_get_blkaddr(e); 568 568 ni->version = nat_get_version(e); 569 - up_read(&nm_i->nat_tree_lock); 569 + f2fs_up_read(&nm_i->nat_tree_lock); 570 570 return 0; 571 571 } 572 572 ··· 576 576 * nat_tree_lock. Therefore, we should retry, if we failed to grab here 577 577 * while not bothering checkpoint. 578 578 */ 579 - if (!rwsem_is_locked(&sbi->cp_global_sem) || checkpoint_context) { 579 + if (!f2fs_rwsem_is_locked(&sbi->cp_global_sem) || checkpoint_context) { 580 580 down_read(&curseg->journal_rwsem); 581 - } else if (rwsem_is_contended(&nm_i->nat_tree_lock) || 581 + } else if (f2fs_rwsem_is_contended(&nm_i->nat_tree_lock) || 582 582 !down_read_trylock(&curseg->journal_rwsem)) { 583 - up_read(&nm_i->nat_tree_lock); 583 + f2fs_up_read(&nm_i->nat_tree_lock); 584 584 goto retry; 585 585 } 586 586 ··· 589 589 ne = nat_in_journal(journal, i); 590 590 node_info_from_raw_nat(ni, &ne); 591 591 } 592 - up_read(&curseg->journal_rwsem); 592 + up_read(&curseg->journal_rwsem); 593 593 if (i >= 0) { 594 - up_read(&nm_i->nat_tree_lock); 594 + f2fs_up_read(&nm_i->nat_tree_lock); 595 595 goto cache; 596 596 } 597 597 598 598 /* Fill node_info from nat page */ 599 599 index = current_nat_addr(sbi, nid); 600 - up_read(&nm_i->nat_tree_lock); 600 + f2fs_up_read(&nm_i->nat_tree_lock); 601 601 602 602 page = f2fs_get_meta_page(sbi, index); 603 603 if (IS_ERR(page)) ··· 1609 1609 goto redirty_out; 1610 1610 1611 1611 if (wbc->for_reclaim) { 1612 - if (!down_read_trylock(&sbi->node_write)) 1612 + if (!f2fs_down_read_trylock(&sbi->node_write)) 1613 1613 goto redirty_out; 1614 1614 } else { 1615 - down_read(&sbi->node_write); 1615 + f2fs_down_read(&sbi->node_write); 1616 1616 } 1617 1617 1618 1618 /* This page is already truncated */ 1619 1619 if (unlikely(ni.blk_addr == NULL_ADDR)) { 1620 1620 ClearPageUptodate(page); 1621 1621 dec_page_count(sbi, F2FS_DIRTY_NODES); 1622 - up_read(&sbi->node_write); 1622 + f2fs_up_read(&sbi->node_write); 1623 1623 unlock_page(page); 1624 1624 return 0; 1625 1625 } ··· 1627 1627 if (__is_valid_data_blkaddr(ni.blk_addr) && 1628 1628 !f2fs_is_valid_blkaddr(sbi, ni.blk_addr, 1629 1629 DATA_GENERIC_ENHANCE)) { 1630 - up_read(&sbi->node_write); 1630 + f2fs_up_read(&sbi->node_write); 1631 1631 goto redirty_out; 1632 1632 } 1633 1633 ··· 1648 1648 f2fs_do_write_node_page(nid, &fio); 1649 1649 set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(page)); 1650 1650 dec_page_count(sbi, F2FS_DIRTY_NODES); 1651 - up_read(&sbi->node_write); 1651 + f2fs_up_read(&sbi->node_write); 1652 1652 1653 1653 if (wbc->for_reclaim) { 1654 1654 f2fs_submit_merged_write_cond(sbi, NULL, page, 0, NODE); ··· 1782 1782 1783 1783 if (!atomic || page == last_page) { 1784 1784 set_fsync_mark(page, 1); 1785 + percpu_counter_inc(&sbi->rf_node_block_count); 1785 1786 if (IS_INODE(page)) { 1786 1787 if (is_inode_flag_set(inode, 1787 1788 FI_DIRTY_INODE)) ··· 2112 2111 2113 2112 if (wbc->sync_mode == WB_SYNC_ALL) 2114 2113 atomic_inc(&sbi->wb_sync_req[NODE]); 2115 - else if (atomic_read(&sbi->wb_sync_req[NODE])) 2114 + else if (atomic_read(&sbi->wb_sync_req[NODE])) { 2115 + /* to avoid potential deadlock */ 2116 + if (current->plug) 2117 + blk_finish_plug(current->plug); 2116 2118 goto skip_write; 2119 + } 2117 2120 2118 2121 trace_f2fs_writepages(mapping->host, wbc, NODE); 2119 2122 ··· 2230 2225 unsigned int i; 2231 2226 bool ret = true; 2232 2227 2233 - down_read(&nm_i->nat_tree_lock); 2228 + f2fs_down_read(&nm_i->nat_tree_lock); 2234 2229 for (i = 0; i < nm_i->nat_blocks; i++) { 2235 2230 if (!test_bit_le(i, nm_i->nat_block_bitmap)) { 2236 2231 ret = false; 2237 2232 break; 2238 2233 } 2239 2234 } 2240 - up_read(&nm_i->nat_tree_lock); 2235 + f2fs_up_read(&nm_i->nat_tree_lock); 2241 2236 2242 2237 return ret; 2243 2238 } ··· 2420 2415 unsigned int i, idx; 2421 2416 nid_t nid; 2422 2417 2423 - down_read(&nm_i->nat_tree_lock); 2418 + f2fs_down_read(&nm_i->nat_tree_lock); 2424 2419 2425 2420 for (i = 0; i < nm_i->nat_blocks; i++) { 2426 2421 if (!test_bit_le(i, nm_i->nat_block_bitmap)) ··· 2443 2438 out: 2444 2439 scan_curseg_cache(sbi); 2445 2440 2446 - up_read(&nm_i->nat_tree_lock); 2441 + f2fs_up_read(&nm_i->nat_tree_lock); 2447 2442 } 2448 2443 2449 2444 static int __f2fs_build_free_nids(struct f2fs_sb_info *sbi, ··· 2478 2473 f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), FREE_NID_PAGES, 2479 2474 META_NAT, true); 2480 2475 2481 - down_read(&nm_i->nat_tree_lock); 2476 + f2fs_down_read(&nm_i->nat_tree_lock); 2482 2477 2483 2478 while (1) { 2484 2479 if (!test_bit_le(NAT_BLOCK_OFFSET(nid), ··· 2493 2488 } 2494 2489 2495 2490 if (ret) { 2496 - up_read(&nm_i->nat_tree_lock); 2491 + f2fs_up_read(&nm_i->nat_tree_lock); 2497 2492 f2fs_err(sbi, "NAT is corrupt, run fsck to fix it"); 2498 2493 return ret; 2499 2494 } ··· 2513 2508 /* find free nids from current sum_pages */ 2514 2509 scan_curseg_cache(sbi); 2515 2510 2516 - up_read(&nm_i->nat_tree_lock); 2511 + f2fs_up_read(&nm_i->nat_tree_lock); 2517 2512 2518 2513 f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nm_i->next_scan_nid), 2519 2514 nm_i->ra_nid_pages, META_NAT, false); ··· 2958 2953 struct f2fs_nm_info *nm_i = NM_I(sbi); 2959 2954 unsigned int nat_ofs; 2960 2955 2961 - down_read(&nm_i->nat_tree_lock); 2956 + f2fs_down_read(&nm_i->nat_tree_lock); 2962 2957 2963 2958 for (nat_ofs = 0; nat_ofs < nm_i->nat_blocks; nat_ofs++) { 2964 2959 unsigned int valid = 0, nid_ofs = 0; ··· 2978 2973 __update_nat_bits(nm_i, nat_ofs, valid); 2979 2974 } 2980 2975 2981 - up_read(&nm_i->nat_tree_lock); 2976 + f2fs_up_read(&nm_i->nat_tree_lock); 2982 2977 } 2983 2978 2984 2979 static int __flush_nat_entry_set(struct f2fs_sb_info *sbi, ··· 3076 3071 * nat_cnt[DIRTY_NAT]. 3077 3072 */ 3078 3073 if (cpc->reason & CP_UMOUNT) { 3079 - down_write(&nm_i->nat_tree_lock); 3074 + f2fs_down_write(&nm_i->nat_tree_lock); 3080 3075 remove_nats_in_journal(sbi); 3081 - up_write(&nm_i->nat_tree_lock); 3076 + f2fs_up_write(&nm_i->nat_tree_lock); 3082 3077 } 3083 3078 3084 3079 if (!nm_i->nat_cnt[DIRTY_NAT]) 3085 3080 return 0; 3086 3081 3087 - down_write(&nm_i->nat_tree_lock); 3082 + f2fs_down_write(&nm_i->nat_tree_lock); 3088 3083 3089 3084 /* 3090 3085 * if there are no enough space in journal to store dirty nat ··· 3113 3108 break; 3114 3109 } 3115 3110 3116 - up_write(&nm_i->nat_tree_lock); 3111 + f2fs_up_write(&nm_i->nat_tree_lock); 3117 3112 /* Allow dirty nats by node block allocation in write_begin */ 3118 3113 3119 3114 return err; ··· 3223 3218 nm_i->ram_thresh = DEF_RAM_THRESHOLD; 3224 3219 nm_i->ra_nid_pages = DEF_RA_NID_PAGES; 3225 3220 nm_i->dirty_nats_ratio = DEF_DIRTY_NAT_RATIO_THRESHOLD; 3221 + nm_i->max_rf_node_blocks = DEF_RF_NODE_BLOCKS; 3226 3222 3227 3223 INIT_RADIX_TREE(&nm_i->free_nid_root, GFP_ATOMIC); 3228 3224 INIT_LIST_HEAD(&nm_i->free_nid_list); ··· 3234 3228 3235 3229 mutex_init(&nm_i->build_lock); 3236 3230 spin_lock_init(&nm_i->nid_list_lock); 3237 - init_rwsem(&nm_i->nat_tree_lock); 3231 + init_f2fs_rwsem(&nm_i->nat_tree_lock); 3238 3232 3239 3233 nm_i->next_scan_nid = le32_to_cpu(sbi->ckpt->next_free_nid); 3240 3234 nm_i->bitmap_size = __bitmap_size(sbi, NAT_BITMAP); ··· 3340 3334 spin_unlock(&nm_i->nid_list_lock); 3341 3335 3342 3336 /* destroy nat cache */ 3343 - down_write(&nm_i->nat_tree_lock); 3337 + f2fs_down_write(&nm_i->nat_tree_lock); 3344 3338 while ((found = __gang_lookup_nat_cache(nm_i, 3345 3339 nid, NATVEC_SIZE, natvec))) { 3346 3340 unsigned idx; ··· 3370 3364 kmem_cache_free(nat_entry_set_slab, setvec[idx]); 3371 3365 } 3372 3366 } 3373 - up_write(&nm_i->nat_tree_lock); 3367 + f2fs_up_write(&nm_i->nat_tree_lock); 3374 3368 3375 3369 kvfree(nm_i->nat_block_bitmap); 3376 3370 if (nm_i->free_nid_bitmap) {
+3
fs/f2fs/node.h
··· 31 31 /* control total # of nats */ 32 32 #define DEF_NAT_CACHE_THRESHOLD 100000 33 33 34 + /* control total # of node writes used for roll-fowrad recovery */ 35 + #define DEF_RF_NODE_BLOCKS 0 36 + 34 37 /* vector size for gang look-up from nat cache that consists of radix tree */ 35 38 #define NATVEC_SIZE 64 36 39 #define SETVEC_SIZE 32
+30 -5
fs/f2fs/recovery.c
··· 56 56 57 57 if (sbi->last_valid_block_count + nalloc > sbi->user_block_count) 58 58 return false; 59 + if (NM_I(sbi)->max_rf_node_blocks && 60 + percpu_counter_sum_positive(&sbi->rf_node_block_count) >= 61 + NM_I(sbi)->max_rf_node_blocks) 62 + return false; 59 63 return true; 60 64 } 61 65 ··· 347 343 return 0; 348 344 } 349 345 346 + static unsigned int adjust_por_ra_blocks(struct f2fs_sb_info *sbi, 347 + unsigned int ra_blocks, unsigned int blkaddr, 348 + unsigned int next_blkaddr) 349 + { 350 + if (blkaddr + 1 == next_blkaddr) 351 + ra_blocks = min_t(unsigned int, RECOVERY_MAX_RA_BLOCKS, 352 + ra_blocks * 2); 353 + else if (next_blkaddr % sbi->blocks_per_seg) 354 + ra_blocks = max_t(unsigned int, RECOVERY_MIN_RA_BLOCKS, 355 + ra_blocks / 2); 356 + return ra_blocks; 357 + } 358 + 350 359 static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head, 351 360 bool check_only) 352 361 { ··· 367 350 struct page *page = NULL; 368 351 block_t blkaddr; 369 352 unsigned int loop_cnt = 0; 353 + unsigned int ra_blocks = RECOVERY_MAX_RA_BLOCKS; 370 354 unsigned int free_blocks = MAIN_SEGS(sbi) * sbi->blocks_per_seg - 371 355 valid_user_blocks(sbi); 372 356 int err = 0; ··· 442 424 break; 443 425 } 444 426 427 + ra_blocks = adjust_por_ra_blocks(sbi, ra_blocks, blkaddr, 428 + next_blkaddr_of_node(page)); 429 + 445 430 /* check next segment */ 446 431 blkaddr = next_blkaddr_of_node(page); 447 432 f2fs_put_page(page, 1); 448 433 449 - f2fs_ra_meta_pages_cond(sbi, blkaddr); 434 + f2fs_ra_meta_pages_cond(sbi, blkaddr, ra_blocks); 450 435 } 451 436 return err; 452 437 } ··· 725 704 struct page *page = NULL; 726 705 int err = 0; 727 706 block_t blkaddr; 707 + unsigned int ra_blocks = RECOVERY_MAX_RA_BLOCKS; 728 708 729 709 /* get node pages in the current segment */ 730 710 curseg = CURSEG_I(sbi, CURSEG_WARM_NODE); ··· 736 714 737 715 if (!f2fs_is_valid_blkaddr(sbi, blkaddr, META_POR)) 738 716 break; 739 - 740 - f2fs_ra_meta_pages_cond(sbi, blkaddr); 741 717 742 718 page = f2fs_get_tmp_page(sbi, blkaddr); 743 719 if (IS_ERR(page)) { ··· 779 759 if (entry->blkaddr == blkaddr) 780 760 list_move_tail(&entry->list, tmp_inode_list); 781 761 next: 762 + ra_blocks = adjust_por_ra_blocks(sbi, ra_blocks, blkaddr, 763 + next_blkaddr_of_node(page)); 764 + 782 765 /* check next segment */ 783 766 blkaddr = next_blkaddr_of_node(page); 784 767 f2fs_put_page(page, 1); 768 + 769 + f2fs_ra_meta_pages_cond(sbi, blkaddr, ra_blocks); 785 770 } 786 771 if (!err) 787 772 f2fs_allocate_new_segments(sbi); ··· 821 796 INIT_LIST_HEAD(&dir_list); 822 797 823 798 /* prevent checkpoint */ 824 - down_write(&sbi->cp_global_sem); 799 + f2fs_down_write(&sbi->cp_global_sem); 825 800 826 801 /* step #1: find fsynced inode numbers */ 827 802 err = find_fsync_dnodes(sbi, &inode_list, check_only); ··· 870 845 if (!err) 871 846 clear_sbi_flag(sbi, SBI_POR_DOING); 872 847 873 - up_write(&sbi->cp_global_sem); 848 + f2fs_up_write(&sbi->cp_global_sem); 874 849 875 850 /* let's drop all the directory inodes for clean checkpoint */ 876 851 destroy_fsync_dnodes(&dir_list, err);
+42 -31
fs/f2fs/segment.c
··· 471 471 472 472 f2fs_balance_fs(sbi, true); 473 473 474 - down_write(&fi->i_gc_rwsem[WRITE]); 474 + f2fs_down_write(&fi->i_gc_rwsem[WRITE]); 475 475 476 476 f2fs_lock_op(sbi); 477 477 set_inode_flag(inode, FI_ATOMIC_COMMIT); ··· 483 483 clear_inode_flag(inode, FI_ATOMIC_COMMIT); 484 484 485 485 f2fs_unlock_op(sbi); 486 - up_write(&fi->i_gc_rwsem[WRITE]); 486 + f2fs_up_write(&fi->i_gc_rwsem[WRITE]); 487 487 488 488 return err; 489 489 } ··· 521 521 io_schedule(); 522 522 finish_wait(&sbi->gc_thread->fggc_wq, &wait); 523 523 } else { 524 - down_write(&sbi->gc_lock); 524 + f2fs_down_write(&sbi->gc_lock); 525 525 f2fs_gc(sbi, false, false, false, NULL_SEGNO); 526 526 } 527 527 } ··· 529 529 530 530 static inline bool excess_dirty_threshold(struct f2fs_sb_info *sbi) 531 531 { 532 - int factor = rwsem_is_locked(&sbi->cp_rwsem) ? 3 : 2; 532 + int factor = f2fs_rwsem_is_locked(&sbi->cp_rwsem) ? 3 : 2; 533 533 unsigned int dents = get_pages(sbi, F2FS_DIRTY_DENTS); 534 534 unsigned int qdata = get_pages(sbi, F2FS_DIRTY_QDATA); 535 535 unsigned int nodes = get_pages(sbi, F2FS_DIRTY_NODES); ··· 570 570 571 571 /* there is background inflight IO or foreground operation recently */ 572 572 if (is_inflight_io(sbi, REQ_TIME) || 573 - (!f2fs_time_over(sbi, REQ_TIME) && rwsem_is_locked(&sbi->cp_rwsem))) 573 + (!f2fs_time_over(sbi, REQ_TIME) && f2fs_rwsem_is_locked(&sbi->cp_rwsem))) 574 574 return; 575 575 576 576 /* exceed periodical checkpoint timeout threshold */ ··· 1156 1156 dpolicy->ordered = false; 1157 1157 dpolicy->granularity = granularity; 1158 1158 1159 - dpolicy->max_requests = DEF_MAX_DISCARD_REQUEST; 1159 + dpolicy->max_requests = dcc->max_discard_request; 1160 1160 dpolicy->io_aware_gran = MAX_PLIST_NUM; 1161 1161 dpolicy->timeout = false; 1162 1162 1163 1163 if (discard_type == DPOLICY_BG) { 1164 - dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME; 1165 - dpolicy->mid_interval = DEF_MID_DISCARD_ISSUE_TIME; 1166 - dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME; 1164 + dpolicy->min_interval = dcc->min_discard_issue_time; 1165 + dpolicy->mid_interval = dcc->mid_discard_issue_time; 1166 + dpolicy->max_interval = dcc->max_discard_issue_time; 1167 1167 dpolicy->io_aware = true; 1168 1168 dpolicy->sync = false; 1169 1169 dpolicy->ordered = true; ··· 1171 1171 dpolicy->granularity = 1; 1172 1172 if (atomic_read(&dcc->discard_cmd_cnt)) 1173 1173 dpolicy->max_interval = 1174 - DEF_MIN_DISCARD_ISSUE_TIME; 1174 + dcc->min_discard_issue_time; 1175 1175 } 1176 1176 } else if (discard_type == DPOLICY_FORCE) { 1177 - dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME; 1178 - dpolicy->mid_interval = DEF_MID_DISCARD_ISSUE_TIME; 1179 - dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME; 1177 + dpolicy->min_interval = dcc->min_discard_issue_time; 1178 + dpolicy->mid_interval = dcc->mid_discard_issue_time; 1179 + dpolicy->max_interval = dcc->max_discard_issue_time; 1180 1180 dpolicy->io_aware = false; 1181 1181 } else if (discard_type == DPOLICY_FSTRIM) { 1182 1182 dpolicy->io_aware = false; ··· 1781 1781 struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; 1782 1782 wait_queue_head_t *q = &dcc->discard_wait_queue; 1783 1783 struct discard_policy dpolicy; 1784 - unsigned int wait_ms = DEF_MIN_DISCARD_ISSUE_TIME; 1784 + unsigned int wait_ms = dcc->min_discard_issue_time; 1785 1785 int issued; 1786 1786 1787 1787 set_freezable(); ··· 2180 2180 atomic_set(&dcc->discard_cmd_cnt, 0); 2181 2181 dcc->nr_discards = 0; 2182 2182 dcc->max_discards = MAIN_SEGS(sbi) << sbi->log_blocks_per_seg; 2183 + dcc->max_discard_request = DEF_MAX_DISCARD_REQUEST; 2184 + dcc->min_discard_issue_time = DEF_MIN_DISCARD_ISSUE_TIME; 2185 + dcc->mid_discard_issue_time = DEF_MID_DISCARD_ISSUE_TIME; 2186 + dcc->max_discard_issue_time = DEF_MAX_DISCARD_ISSUE_TIME; 2183 2187 dcc->undiscard_blks = 0; 2184 2188 dcc->next_pos = 0; 2185 2189 dcc->root = RB_ROOT_CACHED; ··· 2825 2821 if (!sbi->am.atgc_enabled) 2826 2822 return; 2827 2823 2828 - down_read(&SM_I(sbi)->curseg_lock); 2824 + f2fs_down_read(&SM_I(sbi)->curseg_lock); 2829 2825 2830 2826 mutex_lock(&curseg->curseg_mutex); 2831 2827 down_write(&SIT_I(sbi)->sentry_lock); ··· 2835 2831 up_write(&SIT_I(sbi)->sentry_lock); 2836 2832 mutex_unlock(&curseg->curseg_mutex); 2837 2833 2838 - up_read(&SM_I(sbi)->curseg_lock); 2834 + f2fs_up_read(&SM_I(sbi)->curseg_lock); 2839 2835 2840 2836 } 2841 2837 void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi) ··· 2986 2982 struct curseg_info *curseg = CURSEG_I(sbi, type); 2987 2983 unsigned int segno; 2988 2984 2989 - down_read(&SM_I(sbi)->curseg_lock); 2985 + f2fs_down_read(&SM_I(sbi)->curseg_lock); 2990 2986 mutex_lock(&curseg->curseg_mutex); 2991 2987 down_write(&SIT_I(sbi)->sentry_lock); 2992 2988 ··· 3010 3006 type, segno, curseg->segno); 3011 3007 3012 3008 mutex_unlock(&curseg->curseg_mutex); 3013 - up_read(&SM_I(sbi)->curseg_lock); 3009 + f2fs_up_read(&SM_I(sbi)->curseg_lock); 3014 3010 } 3015 3011 3016 3012 static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type, ··· 3042 3038 3043 3039 void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type, bool force) 3044 3040 { 3045 - down_read(&SM_I(sbi)->curseg_lock); 3041 + f2fs_down_read(&SM_I(sbi)->curseg_lock); 3046 3042 down_write(&SIT_I(sbi)->sentry_lock); 3047 3043 __allocate_new_section(sbi, type, force); 3048 3044 up_write(&SIT_I(sbi)->sentry_lock); 3049 - up_read(&SM_I(sbi)->curseg_lock); 3045 + f2fs_up_read(&SM_I(sbi)->curseg_lock); 3050 3046 } 3051 3047 3052 3048 void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi) 3053 3049 { 3054 3050 int i; 3055 3051 3056 - down_read(&SM_I(sbi)->curseg_lock); 3052 + f2fs_down_read(&SM_I(sbi)->curseg_lock); 3057 3053 down_write(&SIT_I(sbi)->sentry_lock); 3058 3054 for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) 3059 3055 __allocate_new_segment(sbi, i, false, false); 3060 3056 up_write(&SIT_I(sbi)->sentry_lock); 3061 - up_read(&SM_I(sbi)->curseg_lock); 3057 + f2fs_up_read(&SM_I(sbi)->curseg_lock); 3062 3058 } 3063 3059 3064 3060 static const struct segment_allocation default_salloc_ops = { ··· 3196 3192 if (sbi->discard_blks == 0) 3197 3193 goto out; 3198 3194 3199 - down_write(&sbi->gc_lock); 3195 + f2fs_down_write(&sbi->gc_lock); 3200 3196 err = f2fs_write_checkpoint(sbi, &cpc); 3201 - up_write(&sbi->gc_lock); 3197 + f2fs_up_write(&sbi->gc_lock); 3202 3198 if (err) 3203 3199 goto out; 3204 3200 ··· 3435 3431 bool from_gc = (type == CURSEG_ALL_DATA_ATGC); 3436 3432 struct seg_entry *se = NULL; 3437 3433 3438 - down_read(&SM_I(sbi)->curseg_lock); 3434 + f2fs_down_read(&SM_I(sbi)->curseg_lock); 3439 3435 3440 3436 mutex_lock(&curseg->curseg_mutex); 3441 3437 down_write(&sit_i->sentry_lock); ··· 3518 3514 3519 3515 mutex_unlock(&curseg->curseg_mutex); 3520 3516 3521 - up_read(&SM_I(sbi)->curseg_lock); 3517 + f2fs_up_read(&SM_I(sbi)->curseg_lock); 3522 3518 } 3523 3519 3524 3520 void f2fs_update_device_state(struct f2fs_sb_info *sbi, nid_t ino, ··· 3554 3550 bool keep_order = (f2fs_lfs_mode(fio->sbi) && type == CURSEG_COLD_DATA); 3555 3551 3556 3552 if (keep_order) 3557 - down_read(&fio->sbi->io_order_lock); 3553 + f2fs_down_read(&fio->sbi->io_order_lock); 3558 3554 reallocate: 3559 3555 f2fs_allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr, 3560 3556 &fio->new_blkaddr, sum, type, fio); ··· 3574 3570 f2fs_update_device_state(fio->sbi, fio->ino, fio->new_blkaddr, 1); 3575 3571 3576 3572 if (keep_order) 3577 - up_read(&fio->sbi->io_order_lock); 3573 + f2fs_up_read(&fio->sbi->io_order_lock); 3578 3574 } 3579 3575 3580 3576 void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page, ··· 3709 3705 se = get_seg_entry(sbi, segno); 3710 3706 type = se->type; 3711 3707 3712 - down_write(&SM_I(sbi)->curseg_lock); 3708 + f2fs_down_write(&SM_I(sbi)->curseg_lock); 3713 3709 3714 3710 if (!recover_curseg) { 3715 3711 /* for recovery flow */ ··· 3778 3774 3779 3775 up_write(&sit_i->sentry_lock); 3780 3776 mutex_unlock(&curseg->curseg_mutex); 3781 - up_write(&SM_I(sbi)->curseg_lock); 3777 + f2fs_up_write(&SM_I(sbi)->curseg_lock); 3782 3778 } 3783 3779 3784 3780 void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn, ··· 4793 4789 4794 4790 sanity_check_seg_type(sbi, curseg->seg_type); 4795 4791 4792 + if (curseg->alloc_type != LFS && curseg->alloc_type != SSR) { 4793 + f2fs_err(sbi, 4794 + "Current segment has invalid alloc_type:%d", 4795 + curseg->alloc_type); 4796 + return -EFSCORRUPTED; 4797 + } 4798 + 4796 4799 if (f2fs_test_bit(blkofs, se->cur_valid_map)) 4797 4800 goto out; 4798 4801 ··· 5269 5258 5270 5259 INIT_LIST_HEAD(&sm_info->sit_entry_set); 5271 5260 5272 - init_rwsem(&sm_info->curseg_lock); 5261 + init_f2fs_rwsem(&sm_info->curseg_lock); 5273 5262 5274 5263 if (!f2fs_readonly(sbi->sb)) { 5275 5264 err = f2fs_create_flush_cmd_control(sbi);
+4 -1
fs/f2fs/segment.h
··· 651 651 * pages over min_fsync_blocks. (=default option) 652 652 * F2FS_IPU_ASYNC - do IPU given by asynchronous write requests. 653 653 * F2FS_IPU_NOCACHE - disable IPU bio cache. 654 - * F2FS_IPUT_DISABLE - disable IPU. (=default option in LFS mode) 654 + * F2FS_IPU_HONOR_OPU_WRITE - use OPU write prior to IPU write if inode has 655 + * FI_OPU_WRITE flag. 656 + * F2FS_IPU_DISABLE - disable IPU. (=default option in LFS mode) 655 657 */ 656 658 #define DEF_MIN_IPU_UTIL 70 657 659 #define DEF_MIN_FSYNC_BLOCKS 8 ··· 669 667 F2FS_IPU_FSYNC, 670 668 F2FS_IPU_ASYNC, 671 669 F2FS_IPU_NOCACHE, 670 + F2FS_IPU_HONOR_OPU_WRITE, 672 671 }; 673 672 674 673 static inline unsigned int curseg_segno(struct f2fs_sb_info *sbi,
+56 -35
fs/f2fs/super.c
··· 1355 1355 /* Initialize f2fs-specific inode info */ 1356 1356 atomic_set(&fi->dirty_pages, 0); 1357 1357 atomic_set(&fi->i_compr_blocks, 0); 1358 - init_rwsem(&fi->i_sem); 1358 + init_f2fs_rwsem(&fi->i_sem); 1359 1359 spin_lock_init(&fi->i_size_lock); 1360 1360 INIT_LIST_HEAD(&fi->dirty_list); 1361 1361 INIT_LIST_HEAD(&fi->gdirty_list); 1362 1362 INIT_LIST_HEAD(&fi->inmem_ilist); 1363 1363 INIT_LIST_HEAD(&fi->inmem_pages); 1364 1364 mutex_init(&fi->inmem_lock); 1365 - init_rwsem(&fi->i_gc_rwsem[READ]); 1366 - init_rwsem(&fi->i_gc_rwsem[WRITE]); 1367 - init_rwsem(&fi->i_xattr_sem); 1365 + init_f2fs_rwsem(&fi->i_gc_rwsem[READ]); 1366 + init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]); 1367 + init_f2fs_rwsem(&fi->i_xattr_sem); 1368 1368 1369 1369 /* Will be used by directory only */ 1370 1370 fi->i_dir_level = F2FS_SB(sb)->dir_level; ··· 1501 1501 1502 1502 static void destroy_percpu_info(struct f2fs_sb_info *sbi) 1503 1503 { 1504 - percpu_counter_destroy(&sbi->alloc_valid_block_count); 1505 1504 percpu_counter_destroy(&sbi->total_valid_inode_count); 1505 + percpu_counter_destroy(&sbi->rf_node_block_count); 1506 + percpu_counter_destroy(&sbi->alloc_valid_block_count); 1506 1507 } 1507 1508 1508 1509 static void destroy_device_list(struct f2fs_sb_info *sbi) ··· 1663 1662 /* ensure no checkpoint required */ 1664 1663 if (!llist_empty(&F2FS_SB(sb)->cprc_info.issue_list)) 1665 1664 return -EINVAL; 1665 + 1666 + /* to avoid deadlock on f2fs_evict_inode->SB_FREEZE_FS */ 1667 + set_sbi_flag(F2FS_SB(sb), SBI_IS_FREEZING); 1666 1668 return 0; 1667 1669 } 1668 1670 1669 1671 static int f2fs_unfreeze(struct super_block *sb) 1670 1672 { 1673 + clear_sbi_flag(F2FS_SB(sb), SBI_IS_FREEZING); 1671 1674 return 0; 1672 1675 } 1673 1676 ··· 2080 2075 { 2081 2076 unsigned int s_flags = sbi->sb->s_flags; 2082 2077 struct cp_control cpc; 2078 + unsigned int gc_mode; 2083 2079 int err = 0; 2084 2080 int ret; 2085 2081 block_t unusable; ··· 2093 2087 2094 2088 f2fs_update_time(sbi, DISABLE_TIME); 2095 2089 2090 + gc_mode = sbi->gc_mode; 2091 + sbi->gc_mode = GC_URGENT_HIGH; 2092 + 2096 2093 while (!f2fs_time_over(sbi, DISABLE_TIME)) { 2097 - down_write(&sbi->gc_lock); 2094 + f2fs_down_write(&sbi->gc_lock); 2098 2095 err = f2fs_gc(sbi, true, false, false, NULL_SEGNO); 2099 2096 if (err == -ENODATA) { 2100 2097 err = 0; ··· 2119 2110 goto restore_flag; 2120 2111 } 2121 2112 2122 - down_write(&sbi->gc_lock); 2113 + f2fs_down_write(&sbi->gc_lock); 2123 2114 cpc.reason = CP_PAUSE; 2124 2115 set_sbi_flag(sbi, SBI_CP_DISABLED); 2125 2116 err = f2fs_write_checkpoint(sbi, &cpc); ··· 2131 2122 spin_unlock(&sbi->stat_lock); 2132 2123 2133 2124 out_unlock: 2134 - up_write(&sbi->gc_lock); 2125 + f2fs_up_write(&sbi->gc_lock); 2135 2126 restore_flag: 2127 + sbi->gc_mode = gc_mode; 2136 2128 sbi->sb->s_flags = s_flags; /* Restore SB_RDONLY status */ 2137 2129 return err; 2138 2130 } ··· 2152 2142 if (unlikely(retry < 0)) 2153 2143 f2fs_warn(sbi, "checkpoint=enable has some unwritten data."); 2154 2144 2155 - down_write(&sbi->gc_lock); 2145 + f2fs_down_write(&sbi->gc_lock); 2156 2146 f2fs_dirty_to_prefree(sbi); 2157 2147 2158 2148 clear_sbi_flag(sbi, SBI_CP_DISABLED); 2159 2149 set_sbi_flag(sbi, SBI_IS_DIRTY); 2160 - up_write(&sbi->gc_lock); 2150 + f2fs_up_write(&sbi->gc_lock); 2161 2151 2162 2152 f2fs_sync_fs(sbi->sb, 1); 2163 2153 } ··· 2698 2688 struct f2fs_sb_info *sbi = F2FS_SB(sb); 2699 2689 struct quota_info *dqopt = sb_dqopt(sb); 2700 2690 int cnt; 2701 - int ret; 2691 + int ret = 0; 2702 2692 2703 2693 /* 2704 2694 * Now when everything is written we can discard the pagecache so ··· 2709 2699 if (type != -1 && cnt != type) 2710 2700 continue; 2711 2701 2712 - if (!sb_has_quota_active(sb, type)) 2713 - return 0; 2702 + if (!sb_has_quota_active(sb, cnt)) 2703 + continue; 2714 2704 2715 2705 inode_lock(dqopt->files[cnt]); 2716 2706 2717 2707 /* 2718 2708 * do_quotactl 2719 2709 * f2fs_quota_sync 2720 - * down_read(quota_sem) 2710 + * f2fs_down_read(quota_sem) 2721 2711 * dquot_writeback_dquots() 2722 2712 * f2fs_dquot_commit 2723 2713 * block_operation 2724 - * down_read(quota_sem) 2714 + * f2fs_down_read(quota_sem) 2725 2715 */ 2726 2716 f2fs_lock_op(sbi); 2727 - down_read(&sbi->quota_sem); 2717 + f2fs_down_read(&sbi->quota_sem); 2728 2718 2729 2719 ret = f2fs_quota_sync_file(sbi, cnt); 2730 2720 2731 - up_read(&sbi->quota_sem); 2721 + f2fs_up_read(&sbi->quota_sem); 2732 2722 f2fs_unlock_op(sbi); 2733 2723 2734 2724 inode_unlock(dqopt->files[cnt]); ··· 2853 2843 struct f2fs_sb_info *sbi = F2FS_SB(dquot->dq_sb); 2854 2844 int ret; 2855 2845 2856 - down_read_nested(&sbi->quota_sem, SINGLE_DEPTH_NESTING); 2846 + f2fs_down_read_nested(&sbi->quota_sem, SINGLE_DEPTH_NESTING); 2857 2847 ret = dquot_commit(dquot); 2858 2848 if (ret < 0) 2859 2849 set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR); 2860 - up_read(&sbi->quota_sem); 2850 + f2fs_up_read(&sbi->quota_sem); 2861 2851 return ret; 2862 2852 } 2863 2853 ··· 2866 2856 struct f2fs_sb_info *sbi = F2FS_SB(dquot->dq_sb); 2867 2857 int ret; 2868 2858 2869 - down_read(&sbi->quota_sem); 2859 + f2fs_down_read(&sbi->quota_sem); 2870 2860 ret = dquot_acquire(dquot); 2871 2861 if (ret < 0) 2872 2862 set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR); 2873 - up_read(&sbi->quota_sem); 2863 + f2fs_up_read(&sbi->quota_sem); 2874 2864 return ret; 2875 2865 } 2876 2866 ··· 3584 3574 F2FS_NODE_INO(sbi) = le32_to_cpu(raw_super->node_ino); 3585 3575 F2FS_META_INO(sbi) = le32_to_cpu(raw_super->meta_ino); 3586 3576 sbi->cur_victim_sec = NULL_SECNO; 3577 + sbi->gc_mode = GC_NORMAL; 3587 3578 sbi->next_victim_seg[BG_GC] = NULL_SEGNO; 3588 3579 sbi->next_victim_seg[FG_GC] = NULL_SEGNO; 3589 3580 sbi->max_victim_search = DEF_MAX_VICTIM_SEARCH; ··· 3612 3601 3613 3602 INIT_LIST_HEAD(&sbi->s_list); 3614 3603 mutex_init(&sbi->umount_mutex); 3615 - init_rwsem(&sbi->io_order_lock); 3604 + init_f2fs_rwsem(&sbi->io_order_lock); 3616 3605 spin_lock_init(&sbi->cp_lock); 3617 3606 3618 3607 sbi->dirty_device = 0; 3619 3608 spin_lock_init(&sbi->dev_lock); 3620 3609 3621 - init_rwsem(&sbi->sb_lock); 3622 - init_rwsem(&sbi->pin_sem); 3610 + init_f2fs_rwsem(&sbi->sb_lock); 3611 + init_f2fs_rwsem(&sbi->pin_sem); 3623 3612 } 3624 3613 3625 3614 static int init_percpu_info(struct f2fs_sb_info *sbi) ··· 3630 3619 if (err) 3631 3620 return err; 3632 3621 3622 + err = percpu_counter_init(&sbi->rf_node_block_count, 0, GFP_KERNEL); 3623 + if (err) 3624 + goto err_valid_block; 3625 + 3633 3626 err = percpu_counter_init(&sbi->total_valid_inode_count, 0, 3634 3627 GFP_KERNEL); 3635 3628 if (err) 3636 - percpu_counter_destroy(&sbi->alloc_valid_block_count); 3629 + goto err_node_block; 3630 + return 0; 3637 3631 3632 + err_node_block: 3633 + percpu_counter_destroy(&sbi->rf_node_block_count); 3634 + err_valid_block: 3635 + percpu_counter_destroy(&sbi->alloc_valid_block_count); 3638 3636 return err; 3639 3637 } 3640 3638 ··· 3977 3957 F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE; 3978 3958 if (f2fs_block_unit_discard(sbi)) 3979 3959 sm_i->dcc_info->discard_granularity = 1; 3980 - sm_i->ipu_policy = 1 << F2FS_IPU_FORCE; 3960 + sm_i->ipu_policy = 1 << F2FS_IPU_FORCE | 3961 + 1 << F2FS_IPU_HONOR_OPU_WRITE; 3981 3962 } 3982 3963 3983 3964 sbi->readdir_ra = 1; ··· 4088 4067 4089 4068 /* init f2fs-specific super block info */ 4090 4069 sbi->valid_super_block = valid_super_block; 4091 - init_rwsem(&sbi->gc_lock); 4070 + init_f2fs_rwsem(&sbi->gc_lock); 4092 4071 mutex_init(&sbi->writepages); 4093 - init_rwsem(&sbi->cp_global_sem); 4094 - init_rwsem(&sbi->node_write); 4095 - init_rwsem(&sbi->node_change); 4072 + init_f2fs_rwsem(&sbi->cp_global_sem); 4073 + init_f2fs_rwsem(&sbi->node_write); 4074 + init_f2fs_rwsem(&sbi->node_change); 4096 4075 4097 4076 /* disallow all the data/node/meta page writes */ 4098 4077 set_sbi_flag(sbi, SBI_POR_DOING); ··· 4113 4092 } 4114 4093 4115 4094 for (j = HOT; j < n; j++) { 4116 - init_rwsem(&sbi->write_io[i][j].io_rwsem); 4095 + init_f2fs_rwsem(&sbi->write_io[i][j].io_rwsem); 4117 4096 sbi->write_io[i][j].sbi = sbi; 4118 4097 sbi->write_io[i][j].bio = NULL; 4119 4098 spin_lock_init(&sbi->write_io[i][j].io_lock); 4120 4099 INIT_LIST_HEAD(&sbi->write_io[i][j].io_list); 4121 4100 INIT_LIST_HEAD(&sbi->write_io[i][j].bio_list); 4122 - init_rwsem(&sbi->write_io[i][j].bio_list_lock); 4101 + init_f2fs_rwsem(&sbi->write_io[i][j].bio_list_lock); 4123 4102 } 4124 4103 } 4125 4104 4126 - init_rwsem(&sbi->cp_rwsem); 4127 - init_rwsem(&sbi->quota_sem); 4105 + init_f2fs_rwsem(&sbi->cp_rwsem); 4106 + init_f2fs_rwsem(&sbi->quota_sem); 4128 4107 init_waitqueue_head(&sbi->cp_wait); 4129 4108 init_sb_info(sbi); 4130 4109 ··· 4549 4528 .name = "f2fs", 4550 4529 .mount = f2fs_mount, 4551 4530 .kill_sb = kill_f2fs_super, 4552 - .fs_flags = FS_REQUIRES_DEV, 4531 + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP, 4553 4532 }; 4554 4533 MODULE_ALIAS_FS("f2fs"); 4555 4534
+36 -4
fs/f2fs/sysfs.c
··· 41 41 ATGC_INFO, /* struct atgc_management */ 42 42 }; 43 43 44 + static const char *gc_mode_names[MAX_GC_MODE] = { 45 + "GC_NORMAL", 46 + "GC_IDLE_CB", 47 + "GC_IDLE_GREEDY", 48 + "GC_IDLE_AT", 49 + "GC_URGENT_HIGH", 50 + "GC_URGENT_LOW", 51 + "GC_URGENT_MID" 52 + }; 53 + 44 54 struct f2fs_attr { 45 55 struct attribute attr; 46 56 ssize_t (*show)(struct f2fs_attr *, struct f2fs_sb_info *, char *); ··· 326 316 return sysfs_emit(buf, "%u\n", sbi->compr_new_inode); 327 317 #endif 328 318 319 + if (!strcmp(a->attr.name, "gc_urgent")) 320 + return sysfs_emit(buf, "%s\n", 321 + gc_mode_names[sbi->gc_mode]); 322 + 329 323 if (!strcmp(a->attr.name, "gc_segment_mode")) 330 - return sysfs_emit(buf, "%u\n", sbi->gc_segment_mode); 324 + return sysfs_emit(buf, "%s\n", 325 + gc_mode_names[sbi->gc_segment_mode]); 331 326 332 327 if (!strcmp(a->attr.name, "gc_reclaimed_segments")) { 333 328 return sysfs_emit(buf, "%u\n", ··· 378 363 if (!strlen(name) || strlen(name) >= F2FS_EXTENSION_LEN) 379 364 return -EINVAL; 380 365 381 - down_write(&sbi->sb_lock); 366 + f2fs_down_write(&sbi->sb_lock); 382 367 383 368 ret = f2fs_update_extension_list(sbi, name, hot, set); 384 369 if (ret) ··· 388 373 if (ret) 389 374 f2fs_update_extension_list(sbi, name, hot, !set); 390 375 out: 391 - up_write(&sbi->sb_lock); 376 + f2fs_up_write(&sbi->sb_lock); 392 377 return ret ? ret : count; 393 378 } 394 379 ··· 483 468 } 484 469 } else if (t == 2) { 485 470 sbi->gc_mode = GC_URGENT_LOW; 471 + } else if (t == 3) { 472 + sbi->gc_mode = GC_URGENT_MID; 473 + if (sbi->gc_thread) { 474 + sbi->gc_thread->gc_wake = 1; 475 + wake_up_interruptible_all( 476 + &sbi->gc_thread->gc_wait_queue_head); 477 + } 486 478 } else { 487 479 return -EINVAL; 488 480 } ··· 503 481 } else if (t == GC_IDLE_AT) { 504 482 if (!sbi->am.atgc_enabled) 505 483 return -EINVAL; 506 - sbi->gc_mode = GC_AT; 484 + sbi->gc_mode = GC_IDLE_AT; 507 485 } else { 508 486 sbi->gc_mode = GC_NORMAL; 509 487 } ··· 738 716 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_urgent, gc_mode); 739 717 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, reclaim_segments, rec_prefree_segments); 740 718 F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, max_small_discards, max_discards); 719 + F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, max_discard_request, max_discard_request); 720 + F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, min_discard_issue_time, min_discard_issue_time); 721 + F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, mid_discard_issue_time, mid_discard_issue_time); 722 + F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, max_discard_issue_time, max_discard_issue_time); 741 723 F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, discard_granularity, discard_granularity); 742 724 F2FS_RW_ATTR(RESERVED_BLOCKS, f2fs_sb_info, reserved_blocks, reserved_blocks); 743 725 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, batched_trim_sections, trim_sections); ··· 754 728 F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh); 755 729 F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ra_nid_pages, ra_nid_pages); 756 730 F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, dirty_nats_ratio, dirty_nats_ratio); 731 + F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, max_roll_forward_node_blocks, max_rf_node_blocks); 757 732 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_victim_search, max_victim_search); 758 733 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, migration_granularity, migration_granularity); 759 734 F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, dir_level, dir_level); ··· 859 832 ATTR_LIST(reclaim_segments), 860 833 ATTR_LIST(main_blkaddr), 861 834 ATTR_LIST(max_small_discards), 835 + ATTR_LIST(max_discard_request), 836 + ATTR_LIST(min_discard_issue_time), 837 + ATTR_LIST(mid_discard_issue_time), 838 + ATTR_LIST(max_discard_issue_time), 862 839 ATTR_LIST(discard_granularity), 863 840 ATTR_LIST(pending_discard), 864 841 ATTR_LIST(batched_trim_sections), ··· 878 847 ATTR_LIST(ram_thresh), 879 848 ATTR_LIST(ra_nid_pages), 880 849 ATTR_LIST(dirty_nats_ratio), 850 + ATTR_LIST(max_roll_forward_node_blocks), 881 851 ATTR_LIST(cp_interval), 882 852 ATTR_LIST(idle_interval), 883 853 ATTR_LIST(discard_idle_interval),
+2 -2
fs/f2fs/verity.c
··· 208 208 * from re-instantiating cached pages we are truncating (since unlike 209 209 * normal file accesses, garbage collection isn't limited by i_size). 210 210 */ 211 - down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 211 + f2fs_down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 212 212 truncate_inode_pages(inode->i_mapping, inode->i_size); 213 213 err2 = f2fs_truncate(inode); 214 214 if (err2) { ··· 216 216 err2); 217 217 set_sbi_flag(sbi, SBI_NEED_FSCK); 218 218 } 219 - up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 219 + f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); 220 220 clear_inode_flag(inode, FI_VERITY_IN_PROGRESS); 221 221 return err ?: err2; 222 222 }
+6 -6
fs/f2fs/xattr.c
··· 525 525 if (len > F2FS_NAME_LEN) 526 526 return -ERANGE; 527 527 528 - down_read(&F2FS_I(inode)->i_xattr_sem); 528 + f2fs_down_read(&F2FS_I(inode)->i_xattr_sem); 529 529 error = lookup_all_xattrs(inode, ipage, index, len, name, 530 530 &entry, &base_addr, &base_size, &is_inline); 531 - up_read(&F2FS_I(inode)->i_xattr_sem); 531 + f2fs_up_read(&F2FS_I(inode)->i_xattr_sem); 532 532 if (error) 533 533 return error; 534 534 ··· 562 562 int error; 563 563 size_t rest = buffer_size; 564 564 565 - down_read(&F2FS_I(inode)->i_xattr_sem); 565 + f2fs_down_read(&F2FS_I(inode)->i_xattr_sem); 566 566 error = read_all_xattrs(inode, NULL, &base_addr); 567 - up_read(&F2FS_I(inode)->i_xattr_sem); 567 + f2fs_up_read(&F2FS_I(inode)->i_xattr_sem); 568 568 if (error) 569 569 return error; 570 570 ··· 786 786 f2fs_balance_fs(sbi, true); 787 787 788 788 f2fs_lock_op(sbi); 789 - down_write(&F2FS_I(inode)->i_xattr_sem); 789 + f2fs_down_write(&F2FS_I(inode)->i_xattr_sem); 790 790 err = __f2fs_setxattr(inode, index, name, value, size, ipage, flags); 791 - up_write(&F2FS_I(inode)->i_xattr_sem); 791 + f2fs_up_write(&F2FS_I(inode)->i_xattr_sem); 792 792 f2fs_unlock_op(sbi); 793 793 794 794 f2fs_update_time(sbi, REQ_TIME);