Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
"More MM work: a memcg scalability improvememt"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/lru: revise the comments of lru_lock
mm/lru: introduce relock_page_lruvec()
mm/lru: replace pgdat lru_lock with lruvec lock
mm/swap.c: serialize memcg changes in pagevec_lru_move_fn
mm/compaction: do page isolation first in compaction
mm/lru: introduce TestClearPageLRU()
mm/mlock: remove __munlock_isolate_lru_page()
mm/mlock: remove lru_lock on TestClearPageMlocked
mm/vmscan: remove lruvec reget in move_pages_to_lru
mm/lru: move lock into lru_note_cost
mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn
mm/memcg: add debug checking in lock_page_memcg
mm: page_idle_get_page() does not need lru_lock
mm/rmap: stop store reordering issue on page->mapping
mm/vmscan: remove unnecessary lruvec adding
mm/thp: narrow lru locking
mm/thp: simplify lru_add_page_tail()
mm/thp: use head for head page in lru_add_page_tail()
mm/thp: move lru_add_page_tail() to huge_memory.c

+536 -372
+3 -12
Documentation/admin-guide/cgroup-v1/memcg_test.rst
··· 133 133 134 134 8. LRU 135 135 ====== 136 - Each memcg has its own private LRU. Now, its handling is under global 137 - VM's control (means that it's handled under global pgdat->lru_lock). 138 - Almost all routines around memcg's LRU is called by global LRU's 139 - list management functions under pgdat->lru_lock. 140 - 141 - A special function is mem_cgroup_isolate_pages(). This scans 142 - memcg's private LRU and call __isolate_lru_page() to extract a page 143 - from LRU. 144 - 145 - (By __isolate_lru_page(), the page is removed from both of global and 146 - private LRU.) 147 - 136 + Each memcg has its own vector of LRUs (inactive anon, active anon, 137 + inactive file, active file, unevictable) of pages from each node, 138 + each LRU handled under a single lru_lock for that memcg and node. 148 139 149 140 9. Typical Tests. 150 141 =================
+9 -12
Documentation/admin-guide/cgroup-v1/memory.rst
··· 287 287 2.6 Locking 288 288 ----------- 289 289 290 - lock_page_cgroup()/unlock_page_cgroup() should not be called under 291 - the i_pages lock. 290 + Lock order is as follows: 292 291 293 - Other lock order is following: 292 + Page lock (PG_locked bit of page->flags) 293 + mm->page_table_lock or split pte_lock 294 + lock_page_memcg (memcg->move_lock) 295 + mapping->i_pages lock 296 + lruvec->lru_lock. 294 297 295 - PG_locked. 296 - mm->page_table_lock 297 - pgdat->lru_lock 298 - lock_page_cgroup. 299 - 300 - In many cases, just lock_page_cgroup() is called. 301 - 302 - per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by 303 - pgdat->lru_lock, it has no lock of its own. 298 + Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by 299 + lruvec->lru_lock; PG_lru bit of page->flags is cleared before 300 + isolating a page from its LRU under lruvec->lru_lock. 304 301 305 302 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) 306 303 -----------------------------------------------
+1 -1
Documentation/trace/events-kmem.rst
··· 69 69 Broadly speaking, pages are taken off the LRU lock in bulk and 70 70 freed in batch with a page list. Significant amounts of activity here could 71 71 indicate that the system is under memory pressure and can also indicate 72 - contention on the zone->lru_lock. 72 + contention on the lruvec->lru_lock. 73 73 74 74 4. Per-CPU Allocator Activity 75 75 =============================
+8 -14
Documentation/vm/unevictable-lru.rst
··· 33 33 memory x86_64 systems. 34 34 35 35 To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of 36 - main memory will have over 32 million 4k pages in a single zone. When a large 36 + main memory will have over 32 million 4k pages in a single node. When a large 37 37 fraction of these pages are not evictable for any reason [see below], vmscan 38 38 will spend a lot of time scanning the LRU lists looking for the small fraction 39 39 of pages that are evictable. This can result in a situation where all CPUs are ··· 55 55 The Unevictable Page List 56 56 ------------------------- 57 57 58 - The Unevictable LRU infrastructure consists of an additional, per-zone, LRU list 58 + The Unevictable LRU infrastructure consists of an additional, per-node, LRU list 59 59 called the "unevictable" list and an associated page flag, PG_unevictable, to 60 60 indicate that the page is being managed on the unevictable list. 61 61 ··· 84 84 swap-backed pages. This differentiation is only important while the pages are, 85 85 in fact, evictable. 86 86 87 - The unevictable list benefits from the "arrayification" of the per-zone LRU 87 + The unevictable list benefits from the "arrayification" of the per-node LRU 88 88 lists and statistics originally proposed and posted by Christoph Lameter. 89 - 90 - The unevictable list does not use the LRU pagevec mechanism. Rather, 91 - unevictable pages are placed directly on the page's zone's unevictable list 92 - under the zone lru_lock. This allows us to prevent the stranding of pages on 93 - the unevictable list when one task has the page isolated from the LRU and other 94 - tasks are changing the "evictability" state of the page. 95 89 96 90 97 91 Memory Control Group Interaction ··· 95 101 memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the 96 102 lru_list enum. 97 103 98 - The memory controller data structure automatically gets a per-zone unevictable 99 - list as a result of the "arrayification" of the per-zone LRU lists (one per 104 + The memory controller data structure automatically gets a per-node unevictable 105 + list as a result of the "arrayification" of the per-node LRU lists (one per 100 106 lru_list enum element). The memory controller tracks the movement of pages to 101 107 and from the unevictable list. 102 108 ··· 190 196 active/inactive LRU lists for vmscan to deal with. vmscan checks for such 191 197 pages in all of the shrink_{active|inactive|page}_list() functions and will 192 198 "cull" such pages that it encounters: that is, it diverts those pages to the 193 - unevictable list for the zone being scanned. 199 + unevictable list for the node being scanned. 194 200 195 201 There may be situations where a page is mapped into a VM_LOCKED VMA, but the 196 202 page is not marked as PG_mlocked. Such pages will make it all the way to ··· 322 328 page from the LRU, as it is likely on the appropriate active or inactive list 323 329 at that time. If the isolate_lru_page() succeeds, mlock_vma_page() will put 324 330 back the page - by calling putback_lru_page() - which will notice that the page 325 - is now mlocked and divert the page to the zone's unevictable list. If 331 + is now mlocked and divert the page to the node's unevictable list. If 326 332 mlock_vma_page() is unable to isolate the page from the LRU, vmscan will handle 327 333 it later if and when it attempts to reclaim the page. 328 334 ··· 597 603 unevictable list in mlock_vma_page(). 598 604 599 605 shrink_inactive_list() also diverts any unevictable pages that it finds on the 600 - inactive lists to the appropriate zone's unevictable list. 606 + inactive lists to the appropriate node's unevictable list. 601 607 602 608 shrink_inactive_list() should only see SHM_LOCK'd pages that became SHM_LOCK'd 603 609 after shrink_active_list() had moved them to the inactive list, or pages mapped
+110
include/linux/memcontrol.h
··· 654 654 655 655 struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); 656 656 657 + static inline bool lruvec_holds_page_lru_lock(struct page *page, 658 + struct lruvec *lruvec) 659 + { 660 + pg_data_t *pgdat = page_pgdat(page); 661 + const struct mem_cgroup *memcg; 662 + struct mem_cgroup_per_node *mz; 663 + 664 + if (mem_cgroup_disabled()) 665 + return lruvec == &pgdat->__lruvec; 666 + 667 + mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec); 668 + memcg = page_memcg(page) ? : root_mem_cgroup; 669 + 670 + return lruvec->pgdat == pgdat && mz->memcg == memcg; 671 + } 672 + 657 673 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); 658 674 659 675 struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); 660 676 661 677 struct mem_cgroup *get_mem_cgroup_from_page(struct page *page); 678 + 679 + struct lruvec *lock_page_lruvec(struct page *page); 680 + struct lruvec *lock_page_lruvec_irq(struct page *page); 681 + struct lruvec *lock_page_lruvec_irqsave(struct page *page, 682 + unsigned long *flags); 683 + 684 + #ifdef CONFIG_DEBUG_VM 685 + void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page); 686 + #else 687 + static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) 688 + { 689 + } 690 + #endif 662 691 663 692 static inline 664 693 struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ ··· 1196 1167 return &pgdat->__lruvec; 1197 1168 } 1198 1169 1170 + static inline bool lruvec_holds_page_lru_lock(struct page *page, 1171 + struct lruvec *lruvec) 1172 + { 1173 + pg_data_t *pgdat = page_pgdat(page); 1174 + 1175 + return lruvec == &pgdat->__lruvec; 1176 + } 1177 + 1199 1178 static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) 1200 1179 { 1201 1180 return NULL; ··· 1227 1190 1228 1191 static inline void mem_cgroup_put(struct mem_cgroup *memcg) 1229 1192 { 1193 + } 1194 + 1195 + static inline struct lruvec *lock_page_lruvec(struct page *page) 1196 + { 1197 + struct pglist_data *pgdat = page_pgdat(page); 1198 + 1199 + spin_lock(&pgdat->__lruvec.lru_lock); 1200 + return &pgdat->__lruvec; 1201 + } 1202 + 1203 + static inline struct lruvec *lock_page_lruvec_irq(struct page *page) 1204 + { 1205 + struct pglist_data *pgdat = page_pgdat(page); 1206 + 1207 + spin_lock_irq(&pgdat->__lruvec.lru_lock); 1208 + return &pgdat->__lruvec; 1209 + } 1210 + 1211 + static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page, 1212 + unsigned long *flagsp) 1213 + { 1214 + struct pglist_data *pgdat = page_pgdat(page); 1215 + 1216 + spin_lock_irqsave(&pgdat->__lruvec.lru_lock, *flagsp); 1217 + return &pgdat->__lruvec; 1230 1218 } 1231 1219 1232 1220 static inline struct mem_cgroup * ··· 1473 1411 void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) 1474 1412 { 1475 1413 } 1414 + 1415 + static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) 1416 + { 1417 + } 1476 1418 #endif /* CONFIG_MEMCG */ 1477 1419 1478 1420 /* idx can be of type enum memcg_stat_item or node_stat_item */ ··· 1556 1490 if (!memcg) 1557 1491 return NULL; 1558 1492 return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec)); 1493 + } 1494 + 1495 + static inline void unlock_page_lruvec(struct lruvec *lruvec) 1496 + { 1497 + spin_unlock(&lruvec->lru_lock); 1498 + } 1499 + 1500 + static inline void unlock_page_lruvec_irq(struct lruvec *lruvec) 1501 + { 1502 + spin_unlock_irq(&lruvec->lru_lock); 1503 + } 1504 + 1505 + static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec, 1506 + unsigned long flags) 1507 + { 1508 + spin_unlock_irqrestore(&lruvec->lru_lock, flags); 1509 + } 1510 + 1511 + /* Don't lock again iff page's lruvec locked */ 1512 + static inline struct lruvec *relock_page_lruvec_irq(struct page *page, 1513 + struct lruvec *locked_lruvec) 1514 + { 1515 + if (locked_lruvec) { 1516 + if (lruvec_holds_page_lru_lock(page, locked_lruvec)) 1517 + return locked_lruvec; 1518 + 1519 + unlock_page_lruvec_irq(locked_lruvec); 1520 + } 1521 + 1522 + return lock_page_lruvec_irq(page); 1523 + } 1524 + 1525 + /* Don't lock again iff page's lruvec locked */ 1526 + static inline struct lruvec *relock_page_lruvec_irqsave(struct page *page, 1527 + struct lruvec *locked_lruvec, unsigned long *flags) 1528 + { 1529 + if (locked_lruvec) { 1530 + if (lruvec_holds_page_lru_lock(page, locked_lruvec)) 1531 + return locked_lruvec; 1532 + 1533 + unlock_page_lruvec_irqrestore(locked_lruvec, *flags); 1534 + } 1535 + 1536 + return lock_page_lruvec_irqsave(page, flags); 1559 1537 } 1560 1538 1561 1539 #ifdef CONFIG_CGROUP_WRITEBACK
+1 -1
include/linux/mm_types.h
··· 79 79 struct { /* Page cache and anonymous pages */ 80 80 /** 81 81 * @lru: Pageout list, eg. active_list protected by 82 - * pgdat->lru_lock. Sometimes used as a generic list 82 + * lruvec->lru_lock. Sometimes used as a generic list 83 83 * by the page owner. 84 84 */ 85 85 struct list_head lru;
+3 -3
include/linux/mmzone.h
··· 113 113 struct pglist_data; 114 114 115 115 /* 116 - * zone->lock and the zone lru_lock are two of the hottest locks in the kernel. 117 - * So add a wild amount of padding here to ensure that they fall into separate 116 + * Add a wild amount of padding here to ensure datas fall into separate 118 117 * cachelines. There are very few zone structures in the machine, so space 119 118 * consumption is not a concern here. 120 119 */ ··· 275 276 276 277 struct lruvec { 277 278 struct list_head lists[NR_LRU_LISTS]; 279 + /* per lruvec lru_lock for memcg */ 280 + spinlock_t lru_lock; 278 281 /* 279 282 * These track the cost of reclaiming one LRU - file or anon - 280 283 * over the other. As the observed cost of reclaiming one LRU ··· 783 782 784 783 /* Write-intensive fields used by page reclaim */ 785 784 ZONE_PADDING(_pad1_) 786 - spinlock_t lru_lock; 787 785 788 786 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT 789 787 /*
+1
include/linux/page-flags.h
··· 334 334 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) 335 335 __CLEARPAGEFLAG(Dirty, dirty, PF_HEAD) 336 336 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) 337 + TESTCLEARFLAG(LRU, lru, PF_HEAD) 337 338 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) 338 339 TESTCLEARFLAG(Active, active, PF_HEAD) 339 340 PAGEFLAG(Workingset, workingset, PF_HEAD)
+1 -3
include/linux/swap.h
··· 338 338 unsigned int nr_pages); 339 339 extern void lru_note_cost_page(struct page *); 340 340 extern void lru_cache_add(struct page *); 341 - extern void lru_add_page_tail(struct page *page, struct page *page_tail, 342 - struct lruvec *lruvec, struct list_head *head); 343 341 extern void mark_page_accessed(struct page *); 344 342 extern void lru_add_drain(void); 345 343 extern void lru_add_drain_cpu(int cpu); ··· 356 358 extern unsigned long zone_reclaimable_pages(struct zone *zone); 357 359 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, 358 360 gfp_t gfp_mask, nodemask_t *mask); 359 - extern int __isolate_lru_page(struct page *page, isolate_mode_t mode); 361 + extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); 360 362 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, 361 363 unsigned long nr_pages, 362 364 gfp_t gfp_mask,
+67 -27
mm/compaction.c
··· 804 804 unsigned long nr_scanned = 0, nr_isolated = 0; 805 805 struct lruvec *lruvec; 806 806 unsigned long flags = 0; 807 - bool locked = false; 807 + struct lruvec *locked = NULL; 808 808 struct page *page = NULL, *valid_page = NULL; 809 809 unsigned long start_pfn = low_pfn; 810 810 bool skip_on_failure = false; ··· 868 868 * contention, to give chance to IRQs. Abort completely if 869 869 * a fatal signal is pending. 870 870 */ 871 - if (!(low_pfn % SWAP_CLUSTER_MAX) 872 - && compact_unlock_should_abort(&pgdat->lru_lock, 873 - flags, &locked, cc)) { 874 - low_pfn = 0; 875 - goto fatal_pending; 871 + if (!(low_pfn % SWAP_CLUSTER_MAX)) { 872 + if (locked) { 873 + unlock_page_lruvec_irqrestore(locked, flags); 874 + locked = NULL; 875 + } 876 + 877 + if (fatal_signal_pending(current)) { 878 + cc->contended = true; 879 + 880 + low_pfn = 0; 881 + goto fatal_pending; 882 + } 883 + 884 + cond_resched(); 876 885 } 877 886 878 887 if (!pfn_valid_within(low_pfn)) ··· 899 890 if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) { 900 891 if (!cc->ignore_skip_hint && get_pageblock_skip(page)) { 901 892 low_pfn = end_pfn; 893 + page = NULL; 902 894 goto isolate_abort; 903 895 } 904 896 valid_page = page; ··· 953 943 if (unlikely(__PageMovable(page)) && 954 944 !PageIsolated(page)) { 955 945 if (locked) { 956 - spin_unlock_irqrestore(&pgdat->lru_lock, 957 - flags); 958 - locked = false; 946 + unlock_page_lruvec_irqrestore(locked, flags); 947 + locked = NULL; 959 948 } 960 949 961 950 if (!isolate_movable_page(page, isolate_mode)) ··· 980 971 if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page)) 981 972 goto isolate_fail; 982 973 974 + /* 975 + * Be careful not to clear PageLRU until after we're 976 + * sure the page is not being freed elsewhere -- the 977 + * page release code relies on it. 978 + */ 979 + if (unlikely(!get_page_unless_zero(page))) 980 + goto isolate_fail; 981 + 982 + if (__isolate_lru_page_prepare(page, isolate_mode) != 0) 983 + goto isolate_fail_put; 984 + 985 + /* Try isolate the page */ 986 + if (!TestClearPageLRU(page)) 987 + goto isolate_fail_put; 988 + 989 + rcu_read_lock(); 990 + lruvec = mem_cgroup_page_lruvec(page, pgdat); 991 + 983 992 /* If we already hold the lock, we can skip some rechecking */ 984 - if (!locked) { 985 - locked = compact_lock_irqsave(&pgdat->lru_lock, 986 - &flags, cc); 993 + if (lruvec != locked) { 994 + if (locked) 995 + unlock_page_lruvec_irqrestore(locked, flags); 996 + 997 + compact_lock_irqsave(&lruvec->lru_lock, &flags, cc); 998 + locked = lruvec; 999 + rcu_read_unlock(); 1000 + 1001 + lruvec_memcg_debug(lruvec, page); 987 1002 988 1003 /* Try get exclusive access under lock */ 989 1004 if (!skip_updated) { ··· 1016 983 goto isolate_abort; 1017 984 } 1018 985 1019 - /* Recheck PageLRU and PageCompound under lock */ 1020 - if (!PageLRU(page)) 1021 - goto isolate_fail; 1022 - 1023 986 /* 1024 987 * Page become compound since the non-locked check, 1025 988 * and it's on LRU. It can only be a THP so the order ··· 1023 994 */ 1024 995 if (unlikely(PageCompound(page) && !cc->alloc_contig)) { 1025 996 low_pfn += compound_nr(page) - 1; 1026 - goto isolate_fail; 997 + SetPageLRU(page); 998 + goto isolate_fail_put; 1027 999 } 1028 - } 1029 - 1030 - lruvec = mem_cgroup_page_lruvec(page, pgdat); 1031 - 1032 - /* Try isolate the page */ 1033 - if (__isolate_lru_page(page, isolate_mode) != 0) 1034 - goto isolate_fail; 1000 + } else 1001 + rcu_read_unlock(); 1035 1002 1036 1003 /* The whole page is taken off the LRU; skip the tail pages. */ 1037 1004 if (PageCompound(page)) ··· 1057 1032 } 1058 1033 1059 1034 continue; 1035 + 1036 + isolate_fail_put: 1037 + /* Avoid potential deadlock in freeing page under lru_lock */ 1038 + if (locked) { 1039 + unlock_page_lruvec_irqrestore(locked, flags); 1040 + locked = NULL; 1041 + } 1042 + put_page(page); 1043 + 1060 1044 isolate_fail: 1061 1045 if (!skip_on_failure) 1062 1046 continue; ··· 1077 1043 */ 1078 1044 if (nr_isolated) { 1079 1045 if (locked) { 1080 - spin_unlock_irqrestore(&pgdat->lru_lock, flags); 1081 - locked = false; 1046 + unlock_page_lruvec_irqrestore(locked, flags); 1047 + locked = NULL; 1082 1048 } 1083 1049 putback_movable_pages(&cc->migratepages); 1084 1050 cc->nr_migratepages = 0; ··· 1102 1068 if (unlikely(low_pfn > end_pfn)) 1103 1069 low_pfn = end_pfn; 1104 1070 1071 + page = NULL; 1072 + 1105 1073 isolate_abort: 1106 1074 if (locked) 1107 - spin_unlock_irqrestore(&pgdat->lru_lock, flags); 1075 + unlock_page_lruvec_irqrestore(locked, flags); 1076 + if (page) { 1077 + SetPageLRU(page); 1078 + put_page(page); 1079 + } 1108 1080 1109 1081 /* 1110 1082 * Updated the cached scanner pfn once the pageblock has been scanned
+2 -2
mm/filemap.c
··· 102 102 * ->swap_lock (try_to_unmap_one) 103 103 * ->private_lock (try_to_unmap_one) 104 104 * ->i_pages lock (try_to_unmap_one) 105 - * ->pgdat->lru_lock (follow_page->mark_page_accessed) 106 - * ->pgdat->lru_lock (check_pte_range->isolate_lru_page) 105 + * ->lruvec->lru_lock (follow_page->mark_page_accessed) 106 + * ->lruvec->lru_lock (check_pte_range->isolate_lru_page) 107 107 * ->private_lock (page_remove_rmap->set_page_dirty) 108 108 * ->i_pages lock (page_remove_rmap->set_page_dirty) 109 109 * bdi.wb->list_lock (page_remove_rmap->set_page_dirty)
+32 -13
mm/huge_memory.c
··· 2359 2359 } 2360 2360 } 2361 2361 2362 + static void lru_add_page_tail(struct page *head, struct page *tail, 2363 + struct lruvec *lruvec, struct list_head *list) 2364 + { 2365 + VM_BUG_ON_PAGE(!PageHead(head), head); 2366 + VM_BUG_ON_PAGE(PageCompound(tail), head); 2367 + VM_BUG_ON_PAGE(PageLRU(tail), head); 2368 + lockdep_assert_held(&lruvec->lru_lock); 2369 + 2370 + if (list) { 2371 + /* page reclaim is reclaiming a huge page */ 2372 + VM_WARN_ON(PageLRU(head)); 2373 + get_page(tail); 2374 + list_add_tail(&tail->lru, list); 2375 + } else { 2376 + /* head is still on lru (and we have it frozen) */ 2377 + VM_WARN_ON(!PageLRU(head)); 2378 + SetPageLRU(tail); 2379 + list_add_tail(&tail->lru, &head->lru); 2380 + } 2381 + } 2382 + 2362 2383 static void __split_huge_page_tail(struct page *head, int tail, 2363 2384 struct lruvec *lruvec, struct list_head *list) 2364 2385 { ··· 2446 2425 } 2447 2426 2448 2427 static void __split_huge_page(struct page *page, struct list_head *list, 2449 - pgoff_t end, unsigned long flags) 2428 + pgoff_t end) 2450 2429 { 2451 2430 struct page *head = compound_head(page); 2452 - pg_data_t *pgdat = page_pgdat(head); 2453 2431 struct lruvec *lruvec; 2454 2432 struct address_space *swap_cache = NULL; 2455 2433 unsigned long offset = 0; 2456 2434 unsigned int nr = thp_nr_pages(head); 2457 2435 int i; 2458 - 2459 - lruvec = mem_cgroup_page_lruvec(head, pgdat); 2460 2436 2461 2437 /* complete memcg works before add pages to LRU */ 2462 2438 mem_cgroup_split_huge_fixup(head); ··· 2465 2447 swap_cache = swap_address_space(entry); 2466 2448 xa_lock(&swap_cache->i_pages); 2467 2449 } 2450 + 2451 + /* lock lru list/PageCompound, ref freezed by page_ref_freeze */ 2452 + lruvec = lock_page_lruvec(head); 2468 2453 2469 2454 for (i = nr - 1; i >= 1; i--) { 2470 2455 __split_huge_page_tail(head, i, lruvec, list); ··· 2488 2467 } 2489 2468 2490 2469 ClearPageCompound(head); 2470 + unlock_page_lruvec(lruvec); 2471 + /* Caller disabled irqs, so they are still disabled here */ 2491 2472 2492 2473 split_page_owner(head, nr); 2493 2474 ··· 2507 2484 page_ref_add(head, 2); 2508 2485 xa_unlock(&head->mapping->i_pages); 2509 2486 } 2510 - 2511 - spin_unlock_irqrestore(&pgdat->lru_lock, flags); 2487 + local_irq_enable(); 2512 2488 2513 2489 remap_page(head, nr); 2514 2490 ··· 2653 2631 int split_huge_page_to_list(struct page *page, struct list_head *list) 2654 2632 { 2655 2633 struct page *head = compound_head(page); 2656 - struct pglist_data *pgdata = NODE_DATA(page_to_nid(head)); 2657 2634 struct deferred_split *ds_queue = get_deferred_split_queue(head); 2658 2635 struct anon_vma *anon_vma = NULL; 2659 2636 struct address_space *mapping = NULL; 2660 2637 int count, mapcount, extra_pins, ret; 2661 - unsigned long flags; 2662 2638 pgoff_t end; 2663 2639 2664 2640 VM_BUG_ON_PAGE(is_huge_zero_page(head), head); ··· 2717 2697 unmap_page(head); 2718 2698 VM_BUG_ON_PAGE(compound_mapcount(head), head); 2719 2699 2720 - /* prevent PageLRU to go away from under us, and freeze lru stats */ 2721 - spin_lock_irqsave(&pgdata->lru_lock, flags); 2722 - 2700 + /* block interrupt reentry in xa_lock and spinlock */ 2701 + local_irq_disable(); 2723 2702 if (mapping) { 2724 2703 XA_STATE(xas, &mapping->i_pages, page_index(head)); 2725 2704 ··· 2748 2729 __dec_lruvec_page_state(head, NR_FILE_THPS); 2749 2730 } 2750 2731 2751 - __split_huge_page(page, list, end, flags); 2732 + __split_huge_page(page, list, end); 2752 2733 ret = 0; 2753 2734 } else { 2754 2735 if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { ··· 2762 2743 spin_unlock(&ds_queue->split_queue_lock); 2763 2744 fail: if (mapping) 2764 2745 xa_unlock(&mapping->i_pages); 2765 - spin_unlock_irqrestore(&pgdata->lru_lock, flags); 2746 + local_irq_enable(); 2766 2747 remap_page(head, thp_nr_pages(head)); 2767 2748 ret = -EBUSY; 2768 2749 }
+81 -3
mm/memcontrol.c
··· 20 20 * Lockless page tracking & accounting 21 21 * Unified hierarchy configuration model 22 22 * Copyright (C) 2015 Red Hat, Inc., Johannes Weiner 23 + * 24 + * Per memcg lru locking 25 + * Copyright (C) 2020 Alibaba, Inc, Alex Shi 23 26 */ 24 27 25 28 #include <linux/page_counter.h> ··· 1325 1322 return ret; 1326 1323 } 1327 1324 1325 + #ifdef CONFIG_DEBUG_VM 1326 + void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page) 1327 + { 1328 + struct mem_cgroup *memcg; 1329 + 1330 + if (mem_cgroup_disabled()) 1331 + return; 1332 + 1333 + memcg = page_memcg(page); 1334 + 1335 + if (!memcg) 1336 + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != root_mem_cgroup, page); 1337 + else 1338 + VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != memcg, page); 1339 + } 1340 + #endif 1341 + 1328 1342 /** 1329 1343 * mem_cgroup_page_lruvec - return lruvec for isolating/putting an LRU page 1330 1344 * @page: the page ··· 1379 1359 */ 1380 1360 if (unlikely(lruvec->pgdat != pgdat)) 1381 1361 lruvec->pgdat = pgdat; 1362 + return lruvec; 1363 + } 1364 + 1365 + /** 1366 + * lock_page_lruvec - lock and return lruvec for a given page. 1367 + * @page: the page 1368 + * 1369 + * This series functions should be used in either conditions: 1370 + * PageLRU is cleared or unset 1371 + * or page->_refcount is zero 1372 + * or page is locked. 1373 + */ 1374 + struct lruvec *lock_page_lruvec(struct page *page) 1375 + { 1376 + struct lruvec *lruvec; 1377 + struct pglist_data *pgdat = page_pgdat(page); 1378 + 1379 + rcu_read_lock(); 1380 + lruvec = mem_cgroup_page_lruvec(page, pgdat); 1381 + spin_lock(&lruvec->lru_lock); 1382 + rcu_read_unlock(); 1383 + 1384 + lruvec_memcg_debug(lruvec, page); 1385 + 1386 + return lruvec; 1387 + } 1388 + 1389 + struct lruvec *lock_page_lruvec_irq(struct page *page) 1390 + { 1391 + struct lruvec *lruvec; 1392 + struct pglist_data *pgdat = page_pgdat(page); 1393 + 1394 + rcu_read_lock(); 1395 + lruvec = mem_cgroup_page_lruvec(page, pgdat); 1396 + spin_lock_irq(&lruvec->lru_lock); 1397 + rcu_read_unlock(); 1398 + 1399 + lruvec_memcg_debug(lruvec, page); 1400 + 1401 + return lruvec; 1402 + } 1403 + 1404 + struct lruvec *lock_page_lruvec_irqsave(struct page *page, unsigned long *flags) 1405 + { 1406 + struct lruvec *lruvec; 1407 + struct pglist_data *pgdat = page_pgdat(page); 1408 + 1409 + rcu_read_lock(); 1410 + lruvec = mem_cgroup_page_lruvec(page, pgdat); 1411 + spin_lock_irqsave(&lruvec->lru_lock, *flags); 1412 + rcu_read_unlock(); 1413 + 1414 + lruvec_memcg_debug(lruvec, page); 1415 + 1382 1416 return lruvec; 1383 1417 } 1384 1418 ··· 2215 2141 memcg = page_memcg(head); 2216 2142 if (unlikely(!memcg)) 2217 2143 return NULL; 2144 + 2145 + #ifdef CONFIG_PROVE_LOCKING 2146 + local_irq_save(flags); 2147 + might_lock(&memcg->move_lock); 2148 + local_irq_restore(flags); 2149 + #endif 2218 2150 2219 2151 if (atomic_read(&memcg->moving_account) <= 0) 2220 2152 return memcg; ··· 3343 3263 #endif /* CONFIG_MEMCG_KMEM */ 3344 3264 3345 3265 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 3346 - 3347 3266 /* 3348 - * Because tail pages are not marked as "used", set it. We're under 3349 - * pgdat->lru_lock and migration entries setup in all page mappings. 3267 + * Because page_memcg(head) is not set on compound tails, set it now. 3350 3268 */ 3351 3269 void mem_cgroup_split_huge_fixup(struct page *head) 3352 3270 {
+17 -46
mm/mlock.c
··· 106 106 } 107 107 108 108 /* 109 - * Isolate a page from LRU with optional get_page() pin. 110 - * Assumes lru_lock already held and page already pinned. 111 - */ 112 - static bool __munlock_isolate_lru_page(struct page *page, bool getpage) 113 - { 114 - if (PageLRU(page)) { 115 - struct lruvec *lruvec; 116 - 117 - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); 118 - if (getpage) 119 - get_page(page); 120 - ClearPageLRU(page); 121 - del_page_from_lru_list(page, lruvec, page_lru(page)); 122 - return true; 123 - } 124 - 125 - return false; 126 - } 127 - 128 - /* 129 109 * Finish munlock after successful page isolation 130 110 * 131 111 * Page must be locked. This is a wrapper for try_to_munlock() ··· 167 187 unsigned int munlock_vma_page(struct page *page) 168 188 { 169 189 int nr_pages; 170 - pg_data_t *pgdat = page_pgdat(page); 171 190 172 191 /* For try_to_munlock() and to serialize with page migration */ 173 192 BUG_ON(!PageLocked(page)); 174 - 175 193 VM_BUG_ON_PAGE(PageTail(page), page); 176 - 177 - /* 178 - * Serialize with any parallel __split_huge_page_refcount() which 179 - * might otherwise copy PageMlocked to part of the tail pages before 180 - * we clear it in the head page. It also stabilizes thp_nr_pages(). 181 - */ 182 - spin_lock_irq(&pgdat->lru_lock); 183 194 184 195 if (!TestClearPageMlocked(page)) { 185 196 /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ 186 - nr_pages = 1; 187 - goto unlock_out; 197 + return 0; 188 198 } 189 199 190 200 nr_pages = thp_nr_pages(page); 191 - __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); 201 + mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); 192 202 193 - if (__munlock_isolate_lru_page(page, true)) { 194 - spin_unlock_irq(&pgdat->lru_lock); 203 + if (!isolate_lru_page(page)) 195 204 __munlock_isolated_page(page); 196 - goto out; 197 - } 198 - __munlock_isolation_failed(page); 205 + else 206 + __munlock_isolation_failed(page); 199 207 200 - unlock_out: 201 - spin_unlock_irq(&pgdat->lru_lock); 202 - 203 - out: 204 208 return nr_pages - 1; 205 209 } 206 210 ··· 262 298 int nr = pagevec_count(pvec); 263 299 int delta_munlocked = -nr; 264 300 struct pagevec pvec_putback; 301 + struct lruvec *lruvec = NULL; 265 302 int pgrescued = 0; 266 303 267 304 pagevec_init(&pvec_putback); 268 305 269 306 /* Phase 1: page isolation */ 270 - spin_lock_irq(&zone->zone_pgdat->lru_lock); 271 307 for (i = 0; i < nr; i++) { 272 308 struct page *page = pvec->pages[i]; 273 309 ··· 276 312 * We already have pin from follow_page_mask() 277 313 * so we can spare the get_page() here. 278 314 */ 279 - if (__munlock_isolate_lru_page(page, false)) 315 + if (TestClearPageLRU(page)) { 316 + lruvec = relock_page_lruvec_irq(page, lruvec); 317 + del_page_from_lru_list(page, lruvec, 318 + page_lru(page)); 280 319 continue; 281 - else 320 + } else 282 321 __munlock_isolation_failed(page); 283 322 } else { 284 323 delta_munlocked++; ··· 296 329 pagevec_add(&pvec_putback, pvec->pages[i]); 297 330 pvec->pages[i] = NULL; 298 331 } 299 - __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); 300 - spin_unlock_irq(&zone->zone_pgdat->lru_lock); 332 + if (lruvec) { 333 + __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); 334 + unlock_page_lruvec_irq(lruvec); 335 + } else if (delta_munlocked) { 336 + mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); 337 + } 301 338 302 339 /* Now we can release pins of pages that we are not munlocking */ 303 340 pagevec_release(&pvec_putback);
+1
mm/mmzone.c
··· 77 77 enum lru_list lru; 78 78 79 79 memset(lruvec, 0, sizeof(struct lruvec)); 80 + spin_lock_init(&lruvec->lru_lock); 80 81 81 82 for_each_lru(lru) 82 83 INIT_LIST_HEAD(&lruvec->lists[lru]);
-1
mm/page_alloc.c
··· 6870 6870 init_waitqueue_head(&pgdat->pfmemalloc_wait); 6871 6871 6872 6872 pgdat_page_ext_init(pgdat); 6873 - spin_lock_init(&pgdat->lru_lock); 6874 6873 lruvec_init(&pgdat->__lruvec); 6875 6874 } 6876 6875
-4
mm/page_idle.c
··· 32 32 static struct page *page_idle_get_page(unsigned long pfn) 33 33 { 34 34 struct page *page = pfn_to_online_page(pfn); 35 - pg_data_t *pgdat; 36 35 37 36 if (!page || !PageLRU(page) || 38 37 !get_page_unless_zero(page)) 39 38 return NULL; 40 39 41 - pgdat = page_pgdat(page); 42 - spin_lock_irq(&pgdat->lru_lock); 43 40 if (unlikely(!PageLRU(page))) { 44 41 put_page(page); 45 42 page = NULL; 46 43 } 47 - spin_unlock_irq(&pgdat->lru_lock); 48 44 return page; 49 45 } 50 46
+9 -3
mm/rmap.c
··· 28 28 * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) 29 29 * anon_vma->rwsem 30 30 * mm->page_table_lock or pte_lock 31 - * pgdat->lru_lock (in mark_page_accessed, isolate_lru_page) 32 31 * swap_lock (in swap_duplicate, swap_info_get) 33 32 * mmlist_lock (in mmput, drain_mmlist and others) 34 33 * mapping->private_lock (in __set_page_dirty_buffers) 35 - * mem_cgroup_{begin,end}_page_stat (memcg->move_lock) 34 + * lock_page_memcg move_lock (in __set_page_dirty_buffers) 36 35 * i_pages lock (widely used) 36 + * lruvec->lru_lock (in lock_page_lruvec_irq) 37 37 * inode->i_lock (in set_page_dirty's __mark_inode_dirty) 38 38 * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) 39 39 * sb_lock (within inode_lock in fs/fs-writeback.c) ··· 1054 1054 if (!exclusive) 1055 1055 anon_vma = anon_vma->root; 1056 1056 1057 + /* 1058 + * page_idle does a lockless/optimistic rmap scan on page->mapping. 1059 + * Make sure the compiler doesn't split the stores of anon_vma and 1060 + * the PAGE_MAPPING_ANON type identifier, otherwise the rmap code 1061 + * could mistake the mapping for a struct address_space and crash. 1062 + */ 1057 1063 anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON; 1058 - page->mapping = (struct address_space *) anon_vma; 1064 + WRITE_ONCE(page->mapping, (struct address_space *) anon_vma); 1059 1065 page->index = linear_page_index(vma, address); 1060 1066 } 1061 1067
+83 -125
mm/swap.c
··· 79 79 static void __page_cache_release(struct page *page) 80 80 { 81 81 if (PageLRU(page)) { 82 - pg_data_t *pgdat = page_pgdat(page); 83 82 struct lruvec *lruvec; 84 83 unsigned long flags; 85 84 86 - spin_lock_irqsave(&pgdat->lru_lock, flags); 87 - lruvec = mem_cgroup_page_lruvec(page, pgdat); 85 + lruvec = lock_page_lruvec_irqsave(page, &flags); 88 86 VM_BUG_ON_PAGE(!PageLRU(page), page); 89 87 __ClearPageLRU(page); 90 88 del_page_from_lru_list(page, lruvec, page_off_lru(page)); 91 - spin_unlock_irqrestore(&pgdat->lru_lock, flags); 89 + unlock_page_lruvec_irqrestore(lruvec, flags); 92 90 } 93 91 __ClearPageWaiters(page); 94 92 } ··· 202 204 EXPORT_SYMBOL_GPL(get_kernel_page); 203 205 204 206 static void pagevec_lru_move_fn(struct pagevec *pvec, 205 - void (*move_fn)(struct page *page, struct lruvec *lruvec, void *arg), 206 - void *arg) 207 + void (*move_fn)(struct page *page, struct lruvec *lruvec)) 207 208 { 208 209 int i; 209 - struct pglist_data *pgdat = NULL; 210 - struct lruvec *lruvec; 210 + struct lruvec *lruvec = NULL; 211 211 unsigned long flags = 0; 212 212 213 213 for (i = 0; i < pagevec_count(pvec); i++) { 214 214 struct page *page = pvec->pages[i]; 215 - struct pglist_data *pagepgdat = page_pgdat(page); 216 215 217 - if (pagepgdat != pgdat) { 218 - if (pgdat) 219 - spin_unlock_irqrestore(&pgdat->lru_lock, flags); 220 - pgdat = pagepgdat; 221 - spin_lock_irqsave(&pgdat->lru_lock, flags); 222 - } 216 + /* block memcg migration during page moving between lru */ 217 + if (!TestClearPageLRU(page)) 218 + continue; 223 219 224 - lruvec = mem_cgroup_page_lruvec(page, pgdat); 225 - (*move_fn)(page, lruvec, arg); 220 + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); 221 + (*move_fn)(page, lruvec); 222 + 223 + SetPageLRU(page); 226 224 } 227 - if (pgdat) 228 - spin_unlock_irqrestore(&pgdat->lru_lock, flags); 225 + if (lruvec) 226 + unlock_page_lruvec_irqrestore(lruvec, flags); 229 227 release_pages(pvec->pages, pvec->nr); 230 228 pagevec_reinit(pvec); 231 229 } 232 230 233 - static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec, 234 - void *arg) 231 + static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec) 235 232 { 236 - int *pgmoved = arg; 237 - 238 - if (PageLRU(page) && !PageUnevictable(page)) { 233 + if (!PageUnevictable(page)) { 239 234 del_page_from_lru_list(page, lruvec, page_lru(page)); 240 235 ClearPageActive(page); 241 236 add_page_to_lru_list_tail(page, lruvec, page_lru(page)); 242 - (*pgmoved) += thp_nr_pages(page); 237 + __count_vm_events(PGROTATED, thp_nr_pages(page)); 243 238 } 244 - } 245 - 246 - /* 247 - * pagevec_move_tail() must be called with IRQ disabled. 248 - * Otherwise this may cause nasty races. 249 - */ 250 - static void pagevec_move_tail(struct pagevec *pvec) 251 - { 252 - int pgmoved = 0; 253 - 254 - pagevec_lru_move_fn(pvec, pagevec_move_tail_fn, &pgmoved); 255 - __count_vm_events(PGROTATED, pgmoved); 256 239 } 257 240 258 241 /* 259 242 * Writeback is about to end against a page which has been marked for immediate 260 243 * reclaim. If it still appears to be reclaimable, move it to the tail of the 261 244 * inactive list. 245 + * 246 + * rotate_reclaimable_page() must disable IRQs, to prevent nasty races. 262 247 */ 263 248 void rotate_reclaimable_page(struct page *page) 264 249 { ··· 254 273 local_lock_irqsave(&lru_rotate.lock, flags); 255 274 pvec = this_cpu_ptr(&lru_rotate.pvec); 256 275 if (!pagevec_add(pvec, page) || PageCompound(page)) 257 - pagevec_move_tail(pvec); 276 + pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); 258 277 local_unlock_irqrestore(&lru_rotate.lock, flags); 259 278 } 260 279 } ··· 264 283 do { 265 284 unsigned long lrusize; 266 285 286 + /* 287 + * Hold lruvec->lru_lock is safe here, since 288 + * 1) The pinned lruvec in reclaim, or 289 + * 2) From a pre-LRU page during refault (which also holds the 290 + * rcu lock, so would be safe even if the page was on the LRU 291 + * and could move simultaneously to a new lruvec). 292 + */ 293 + spin_lock_irq(&lruvec->lru_lock); 267 294 /* Record cost event */ 268 295 if (file) 269 296 lruvec->file_cost += nr_pages; ··· 295 306 lruvec->file_cost /= 2; 296 307 lruvec->anon_cost /= 2; 297 308 } 309 + spin_unlock_irq(&lruvec->lru_lock); 298 310 } while ((lruvec = parent_lruvec(lruvec))); 299 311 } 300 312 ··· 305 315 page_is_file_lru(page), thp_nr_pages(page)); 306 316 } 307 317 308 - static void __activate_page(struct page *page, struct lruvec *lruvec, 309 - void *arg) 318 + static void __activate_page(struct page *page, struct lruvec *lruvec) 310 319 { 311 - if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { 320 + if (!PageActive(page) && !PageUnevictable(page)) { 312 321 int lru = page_lru_base_type(page); 313 322 int nr_pages = thp_nr_pages(page); 314 323 ··· 329 340 struct pagevec *pvec = &per_cpu(lru_pvecs.activate_page, cpu); 330 341 331 342 if (pagevec_count(pvec)) 332 - pagevec_lru_move_fn(pvec, __activate_page, NULL); 343 + pagevec_lru_move_fn(pvec, __activate_page); 333 344 } 334 345 335 346 static bool need_activate_page_drain(int cpu) ··· 347 358 pvec = this_cpu_ptr(&lru_pvecs.activate_page); 348 359 get_page(page); 349 360 if (!pagevec_add(pvec, page) || PageCompound(page)) 350 - pagevec_lru_move_fn(pvec, __activate_page, NULL); 361 + pagevec_lru_move_fn(pvec, __activate_page); 351 362 local_unlock(&lru_pvecs.lock); 352 363 } 353 364 } ··· 359 370 360 371 static void activate_page(struct page *page) 361 372 { 362 - pg_data_t *pgdat = page_pgdat(page); 373 + struct lruvec *lruvec; 363 374 364 375 page = compound_head(page); 365 - spin_lock_irq(&pgdat->lru_lock); 366 - __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL); 367 - spin_unlock_irq(&pgdat->lru_lock); 376 + if (TestClearPageLRU(page)) { 377 + lruvec = lock_page_lruvec_irq(page); 378 + __activate_page(page, lruvec); 379 + unlock_page_lruvec_irq(lruvec); 380 + SetPageLRU(page); 381 + } 368 382 } 369 383 #endif 370 384 ··· 517 525 * be write it out by flusher threads as this is much more effective 518 526 * than the single-page writeout from reclaim. 519 527 */ 520 - static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, 521 - void *arg) 528 + static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec) 522 529 { 523 530 int lru; 524 531 bool active; 525 532 int nr_pages = thp_nr_pages(page); 526 - 527 - if (!PageLRU(page)) 528 - return; 529 533 530 534 if (PageUnevictable(page)) 531 535 return; ··· 561 573 } 562 574 } 563 575 564 - static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, 565 - void *arg) 576 + static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec) 566 577 { 567 - if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { 578 + if (PageActive(page) && !PageUnevictable(page)) { 568 579 int lru = page_lru_base_type(page); 569 580 int nr_pages = thp_nr_pages(page); 570 581 ··· 578 591 } 579 592 } 580 593 581 - static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, 582 - void *arg) 594 + static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec) 583 595 { 584 - if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) && 596 + if (PageAnon(page) && PageSwapBacked(page) && 585 597 !PageSwapCache(page) && !PageUnevictable(page)) { 586 598 bool active = PageActive(page); 587 599 int nr_pages = thp_nr_pages(page); ··· 622 636 623 637 /* No harm done if a racing interrupt already did this */ 624 638 local_lock_irqsave(&lru_rotate.lock, flags); 625 - pagevec_move_tail(pvec); 639 + pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); 626 640 local_unlock_irqrestore(&lru_rotate.lock, flags); 627 641 } 628 642 629 643 pvec = &per_cpu(lru_pvecs.lru_deactivate_file, cpu); 630 644 if (pagevec_count(pvec)) 631 - pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); 645 + pagevec_lru_move_fn(pvec, lru_deactivate_file_fn); 632 646 633 647 pvec = &per_cpu(lru_pvecs.lru_deactivate, cpu); 634 648 if (pagevec_count(pvec)) 635 - pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); 649 + pagevec_lru_move_fn(pvec, lru_deactivate_fn); 636 650 637 651 pvec = &per_cpu(lru_pvecs.lru_lazyfree, cpu); 638 652 if (pagevec_count(pvec)) 639 - pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); 653 + pagevec_lru_move_fn(pvec, lru_lazyfree_fn); 640 654 641 655 activate_page_drain(cpu); 642 656 } ··· 665 679 pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file); 666 680 667 681 if (!pagevec_add(pvec, page) || PageCompound(page)) 668 - pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); 682 + pagevec_lru_move_fn(pvec, lru_deactivate_file_fn); 669 683 local_unlock(&lru_pvecs.lock); 670 684 } 671 685 } ··· 687 701 pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate); 688 702 get_page(page); 689 703 if (!pagevec_add(pvec, page) || PageCompound(page)) 690 - pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); 704 + pagevec_lru_move_fn(pvec, lru_deactivate_fn); 691 705 local_unlock(&lru_pvecs.lock); 692 706 } 693 707 } ··· 709 723 pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree); 710 724 get_page(page); 711 725 if (!pagevec_add(pvec, page) || PageCompound(page)) 712 - pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); 726 + pagevec_lru_move_fn(pvec, lru_lazyfree_fn); 713 727 local_unlock(&lru_pvecs.lock); 714 728 } 715 729 } ··· 857 871 { 858 872 int i; 859 873 LIST_HEAD(pages_to_free); 860 - struct pglist_data *locked_pgdat = NULL; 861 - struct lruvec *lruvec; 874 + struct lruvec *lruvec = NULL; 862 875 unsigned long flags; 863 876 unsigned int lock_batch; 864 877 ··· 867 882 /* 868 883 * Make sure the IRQ-safe lock-holding time does not get 869 884 * excessive with a continuous string of pages from the 870 - * same pgdat. The lock is held only if pgdat != NULL. 885 + * same lruvec. The lock is held only if lruvec != NULL. 871 886 */ 872 - if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) { 873 - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); 874 - locked_pgdat = NULL; 887 + if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { 888 + unlock_page_lruvec_irqrestore(lruvec, flags); 889 + lruvec = NULL; 875 890 } 876 891 877 892 page = compound_head(page); ··· 879 894 continue; 880 895 881 896 if (is_zone_device_page(page)) { 882 - if (locked_pgdat) { 883 - spin_unlock_irqrestore(&locked_pgdat->lru_lock, 884 - flags); 885 - locked_pgdat = NULL; 897 + if (lruvec) { 898 + unlock_page_lruvec_irqrestore(lruvec, flags); 899 + lruvec = NULL; 886 900 } 887 901 /* 888 902 * ZONE_DEVICE pages that return 'false' from ··· 902 918 continue; 903 919 904 920 if (PageCompound(page)) { 905 - if (locked_pgdat) { 906 - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); 907 - locked_pgdat = NULL; 921 + if (lruvec) { 922 + unlock_page_lruvec_irqrestore(lruvec, flags); 923 + lruvec = NULL; 908 924 } 909 925 __put_compound_page(page); 910 926 continue; 911 927 } 912 928 913 929 if (PageLRU(page)) { 914 - struct pglist_data *pgdat = page_pgdat(page); 930 + struct lruvec *prev_lruvec = lruvec; 915 931 916 - if (pgdat != locked_pgdat) { 917 - if (locked_pgdat) 918 - spin_unlock_irqrestore(&locked_pgdat->lru_lock, 919 - flags); 932 + lruvec = relock_page_lruvec_irqsave(page, lruvec, 933 + &flags); 934 + if (prev_lruvec != lruvec) 920 935 lock_batch = 0; 921 - locked_pgdat = pgdat; 922 - spin_lock_irqsave(&locked_pgdat->lru_lock, flags); 923 - } 924 936 925 - lruvec = mem_cgroup_page_lruvec(page, locked_pgdat); 926 937 VM_BUG_ON_PAGE(!PageLRU(page), page); 927 938 __ClearPageLRU(page); 928 939 del_page_from_lru_list(page, lruvec, page_off_lru(page)); ··· 927 948 928 949 list_add(&page->lru, &pages_to_free); 929 950 } 930 - if (locked_pgdat) 931 - spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags); 951 + if (lruvec) 952 + unlock_page_lruvec_irqrestore(lruvec, flags); 932 953 933 954 mem_cgroup_uncharge_list(&pages_to_free); 934 955 free_unref_page_list(&pages_to_free); ··· 956 977 } 957 978 EXPORT_SYMBOL(__pagevec_release); 958 979 959 - #ifdef CONFIG_TRANSPARENT_HUGEPAGE 960 - /* used by __split_huge_page_refcount() */ 961 - void lru_add_page_tail(struct page *page, struct page *page_tail, 962 - struct lruvec *lruvec, struct list_head *list) 963 - { 964 - VM_BUG_ON_PAGE(!PageHead(page), page); 965 - VM_BUG_ON_PAGE(PageCompound(page_tail), page); 966 - VM_BUG_ON_PAGE(PageLRU(page_tail), page); 967 - lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock); 968 - 969 - if (!list) 970 - SetPageLRU(page_tail); 971 - 972 - if (likely(PageLRU(page))) 973 - list_add_tail(&page_tail->lru, &page->lru); 974 - else if (list) { 975 - /* page reclaim is reclaiming a huge page */ 976 - get_page(page_tail); 977 - list_add_tail(&page_tail->lru, list); 978 - } else { 979 - /* 980 - * Head page has not yet been counted, as an hpage, 981 - * so we must account for each subpage individually. 982 - * 983 - * Put page_tail on the list at the correct position 984 - * so they all end up in order. 985 - */ 986 - add_page_to_lru_list_tail(page_tail, lruvec, 987 - page_lru(page_tail)); 988 - } 989 - } 990 - #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ 991 - 992 - static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec, 993 - void *arg) 980 + static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec) 994 981 { 995 982 enum lru_list lru; 996 983 int was_unevictable = TestClearPageUnevictable(page); ··· 1015 1070 */ 1016 1071 void __pagevec_lru_add(struct pagevec *pvec) 1017 1072 { 1018 - pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL); 1073 + int i; 1074 + struct lruvec *lruvec = NULL; 1075 + unsigned long flags = 0; 1076 + 1077 + for (i = 0; i < pagevec_count(pvec); i++) { 1078 + struct page *page = pvec->pages[i]; 1079 + 1080 + lruvec = relock_page_lruvec_irqsave(page, lruvec, &flags); 1081 + __pagevec_lru_add_fn(page, lruvec); 1082 + } 1083 + if (lruvec) 1084 + unlock_page_lruvec_irqrestore(lruvec, flags); 1085 + release_pages(pvec->pages, pvec->nr); 1086 + pagevec_reinit(pvec); 1019 1087 } 1020 1088 1021 1089 /**
+107 -100
mm/vmscan.c
··· 1539 1539 * 1540 1540 * returns 0 on success, -ve errno on failure. 1541 1541 */ 1542 - int __isolate_lru_page(struct page *page, isolate_mode_t mode) 1542 + int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode) 1543 1543 { 1544 - int ret = -EINVAL; 1544 + int ret = -EBUSY; 1545 1545 1546 1546 /* Only take pages on the LRU. */ 1547 1547 if (!PageLRU(page)) ··· 1550 1550 /* Compaction should not handle unevictable pages but CMA can do so */ 1551 1551 if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) 1552 1552 return ret; 1553 - 1554 - ret = -EBUSY; 1555 1553 1556 1554 /* 1557 1555 * To minimise LRU disruption, the caller can indicate that it only ··· 1591 1593 if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) 1592 1594 return ret; 1593 1595 1594 - if (likely(get_page_unless_zero(page))) { 1595 - /* 1596 - * Be careful not to clear PageLRU until after we're 1597 - * sure the page is not being freed elsewhere -- the 1598 - * page release code relies on it. 1599 - */ 1600 - ClearPageLRU(page); 1601 - ret = 0; 1602 - } 1603 - 1604 - return ret; 1596 + return 0; 1605 1597 } 1606 - 1607 1598 1608 1599 /* 1609 1600 * Update LRU sizes after isolating pages. The LRU size updates must ··· 1613 1626 } 1614 1627 1615 1628 /** 1616 - * pgdat->lru_lock is heavily contended. Some of the functions that 1629 + * Isolating page from the lruvec to fill in @dst list by nr_to_scan times. 1630 + * 1631 + * lruvec->lru_lock is heavily contended. Some of the functions that 1617 1632 * shrink the lists perform better by taking out a batch of pages 1618 1633 * and working on them outside the LRU lock. 1619 1634 * 1620 1635 * For pagecache intensive workloads, this function is the hottest 1621 1636 * spot in the kernel (apart from copy_*_user functions). 1622 1637 * 1623 - * Appropriate locks must be held before calling this function. 1638 + * Lru_lock must be held before calling this function. 1624 1639 * 1625 1640 * @nr_to_scan: The number of eligible pages to look through on the list. 1626 1641 * @lruvec: The LRU vector to pull pages from. ··· 1655 1666 page = lru_to_page(src); 1656 1667 prefetchw_prev_lru_page(page, src, flags); 1657 1668 1658 - VM_BUG_ON_PAGE(!PageLRU(page), page); 1659 - 1660 1669 nr_pages = compound_nr(page); 1661 1670 total_scan += nr_pages; 1662 1671 ··· 1675 1688 * only when the page is being freed somewhere else. 1676 1689 */ 1677 1690 scan += nr_pages; 1678 - switch (__isolate_lru_page(page, mode)) { 1691 + switch (__isolate_lru_page_prepare(page, mode)) { 1679 1692 case 0: 1693 + /* 1694 + * Be careful not to clear PageLRU until after we're 1695 + * sure the page is not being freed elsewhere -- the 1696 + * page release code relies on it. 1697 + */ 1698 + if (unlikely(!get_page_unless_zero(page))) 1699 + goto busy; 1700 + 1701 + if (!TestClearPageLRU(page)) { 1702 + /* 1703 + * This page may in other isolation path, 1704 + * but we still hold lru_lock. 1705 + */ 1706 + put_page(page); 1707 + goto busy; 1708 + } 1709 + 1680 1710 nr_taken += nr_pages; 1681 1711 nr_zone_taken[page_zonenum(page)] += nr_pages; 1682 1712 list_move(&page->lru, dst); 1683 1713 break; 1684 1714 1685 - case -EBUSY: 1715 + default: 1716 + busy: 1686 1717 /* else it is being freed elsewhere */ 1687 1718 list_move(&page->lru, src); 1688 - continue; 1689 - 1690 - default: 1691 - BUG(); 1692 1719 } 1693 1720 } 1694 1721 ··· 1765 1764 VM_BUG_ON_PAGE(!page_count(page), page); 1766 1765 WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); 1767 1766 1768 - if (PageLRU(page)) { 1769 - pg_data_t *pgdat = page_pgdat(page); 1767 + if (TestClearPageLRU(page)) { 1770 1768 struct lruvec *lruvec; 1771 1769 1772 - spin_lock_irq(&pgdat->lru_lock); 1773 - lruvec = mem_cgroup_page_lruvec(page, pgdat); 1774 - if (PageLRU(page)) { 1775 - int lru = page_lru(page); 1776 - get_page(page); 1777 - ClearPageLRU(page); 1778 - del_page_from_lru_list(page, lruvec, lru); 1779 - ret = 0; 1780 - } 1781 - spin_unlock_irq(&pgdat->lru_lock); 1770 + get_page(page); 1771 + lruvec = lock_page_lruvec_irq(page); 1772 + del_page_from_lru_list(page, lruvec, page_lru(page)); 1773 + unlock_page_lruvec_irq(lruvec); 1774 + ret = 0; 1782 1775 } 1776 + 1783 1777 return ret; 1784 1778 } 1785 1779 ··· 1816 1820 } 1817 1821 1818 1822 /* 1819 - * This moves pages from @list to corresponding LRU list. 1820 - * 1821 - * We move them the other way if the page is referenced by one or more 1822 - * processes, from rmap. 1823 - * 1824 - * If the pages are mostly unmapped, the processing is fast and it is 1825 - * appropriate to hold zone_lru_lock across the whole operation. But if 1826 - * the pages are mapped, the processing is slow (page_referenced()) so we 1827 - * should drop zone_lru_lock around each page. It's impossible to balance 1828 - * this, so instead we remove the pages from the LRU while processing them. 1829 - * It is safe to rely on PG_active against the non-LRU pages in here because 1830 - * nobody will play with that bit on a non-LRU page. 1831 - * 1832 - * The downside is that we have to touch page->_refcount against each page. 1833 - * But we had to alter page->flags anyway. 1823 + * move_pages_to_lru() moves pages from private @list to appropriate LRU list. 1824 + * On return, @list is reused as a list of pages to be freed by the caller. 1834 1825 * 1835 1826 * Returns the number of pages moved to the given lruvec. 1836 1827 */ 1837 - 1838 1828 static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, 1839 1829 struct list_head *list) 1840 1830 { 1841 - struct pglist_data *pgdat = lruvec_pgdat(lruvec); 1842 1831 int nr_pages, nr_moved = 0; 1843 1832 LIST_HEAD(pages_to_free); 1844 1833 struct page *page; ··· 1832 1851 while (!list_empty(list)) { 1833 1852 page = lru_to_page(list); 1834 1853 VM_BUG_ON_PAGE(PageLRU(page), page); 1854 + list_del(&page->lru); 1835 1855 if (unlikely(!page_evictable(page))) { 1836 - list_del(&page->lru); 1837 - spin_unlock_irq(&pgdat->lru_lock); 1856 + spin_unlock_irq(&lruvec->lru_lock); 1838 1857 putback_lru_page(page); 1839 - spin_lock_irq(&pgdat->lru_lock); 1858 + spin_lock_irq(&lruvec->lru_lock); 1840 1859 continue; 1841 1860 } 1842 - lruvec = mem_cgroup_page_lruvec(page, pgdat); 1843 1861 1862 + /* 1863 + * The SetPageLRU needs to be kept here for list integrity. 1864 + * Otherwise: 1865 + * #0 move_pages_to_lru #1 release_pages 1866 + * if !put_page_testzero 1867 + * if (put_page_testzero()) 1868 + * !PageLRU //skip lru_lock 1869 + * SetPageLRU() 1870 + * list_add(&page->lru,) 1871 + * list_add(&page->lru,) 1872 + */ 1844 1873 SetPageLRU(page); 1845 - lru = page_lru(page); 1846 1874 1847 - nr_pages = thp_nr_pages(page); 1848 - update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); 1849 - list_move(&page->lru, &lruvec->lists[lru]); 1850 - 1851 - if (put_page_testzero(page)) { 1875 + if (unlikely(put_page_testzero(page))) { 1852 1876 __ClearPageLRU(page); 1853 1877 __ClearPageActive(page); 1854 - del_page_from_lru_list(page, lruvec, lru); 1855 1878 1856 1879 if (unlikely(PageCompound(page))) { 1857 - spin_unlock_irq(&pgdat->lru_lock); 1880 + spin_unlock_irq(&lruvec->lru_lock); 1858 1881 destroy_compound_page(page); 1859 - spin_lock_irq(&pgdat->lru_lock); 1882 + spin_lock_irq(&lruvec->lru_lock); 1860 1883 } else 1861 1884 list_add(&page->lru, &pages_to_free); 1862 - } else { 1863 - nr_moved += nr_pages; 1864 - if (PageActive(page)) 1865 - workingset_age_nonresident(lruvec, nr_pages); 1885 + 1886 + continue; 1866 1887 } 1888 + 1889 + /* 1890 + * All pages were isolated from the same lruvec (and isolation 1891 + * inhibits memcg migration). 1892 + */ 1893 + VM_BUG_ON_PAGE(!lruvec_holds_page_lru_lock(page, lruvec), page); 1894 + lru = page_lru(page); 1895 + nr_pages = thp_nr_pages(page); 1896 + 1897 + update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); 1898 + list_add(&page->lru, &lruvec->lists[lru]); 1899 + nr_moved += nr_pages; 1900 + if (PageActive(page)) 1901 + workingset_age_nonresident(lruvec, nr_pages); 1867 1902 } 1868 1903 1869 1904 /* ··· 1936 1939 1937 1940 lru_add_drain(); 1938 1941 1939 - spin_lock_irq(&pgdat->lru_lock); 1942 + spin_lock_irq(&lruvec->lru_lock); 1940 1943 1941 1944 nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, 1942 1945 &nr_scanned, sc, lru); ··· 1948 1951 __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned); 1949 1952 __count_vm_events(PGSCAN_ANON + file, nr_scanned); 1950 1953 1951 - spin_unlock_irq(&pgdat->lru_lock); 1954 + spin_unlock_irq(&lruvec->lru_lock); 1952 1955 1953 1956 if (nr_taken == 0) 1954 1957 return 0; 1955 1958 1956 1959 nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, &stat, false); 1957 1960 1958 - spin_lock_irq(&pgdat->lru_lock); 1959 - 1961 + spin_lock_irq(&lruvec->lru_lock); 1960 1962 move_pages_to_lru(lruvec, &page_list); 1961 1963 1962 1964 __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); 1963 - lru_note_cost(lruvec, file, stat.nr_pageout); 1964 1965 item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT; 1965 1966 if (!cgroup_reclaim(sc)) 1966 1967 __count_vm_events(item, nr_reclaimed); 1967 1968 __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); 1968 1969 __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); 1970 + spin_unlock_irq(&lruvec->lru_lock); 1969 1971 1970 - spin_unlock_irq(&pgdat->lru_lock); 1971 - 1972 + lru_note_cost(lruvec, file, stat.nr_pageout); 1972 1973 mem_cgroup_uncharge_list(&page_list); 1973 1974 free_unref_page_list(&page_list); 1974 1975 ··· 1998 2003 return nr_reclaimed; 1999 2004 } 2000 2005 2006 + /* 2007 + * shrink_active_list() moves pages from the active LRU to the inactive LRU. 2008 + * 2009 + * We move them the other way if the page is referenced by one or more 2010 + * processes. 2011 + * 2012 + * If the pages are mostly unmapped, the processing is fast and it is 2013 + * appropriate to hold lru_lock across the whole operation. But if 2014 + * the pages are mapped, the processing is slow (page_referenced()), so 2015 + * we should drop lru_lock around each page. It's impossible to balance 2016 + * this, so instead we remove the pages from the LRU while processing them. 2017 + * It is safe to rely on PG_active against the non-LRU pages in here because 2018 + * nobody will play with that bit on a non-LRU page. 2019 + * 2020 + * The downside is that we have to touch page->_refcount against each page. 2021 + * But we had to alter page->flags anyway. 2022 + */ 2001 2023 static void shrink_active_list(unsigned long nr_to_scan, 2002 2024 struct lruvec *lruvec, 2003 2025 struct scan_control *sc, ··· 2034 2022 2035 2023 lru_add_drain(); 2036 2024 2037 - spin_lock_irq(&pgdat->lru_lock); 2025 + spin_lock_irq(&lruvec->lru_lock); 2038 2026 2039 2027 nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, 2040 2028 &nr_scanned, sc, lru); ··· 2045 2033 __count_vm_events(PGREFILL, nr_scanned); 2046 2034 __count_memcg_events(lruvec_memcg(lruvec), PGREFILL, nr_scanned); 2047 2035 2048 - spin_unlock_irq(&pgdat->lru_lock); 2036 + spin_unlock_irq(&lruvec->lru_lock); 2049 2037 2050 2038 while (!list_empty(&l_hold)) { 2051 2039 cond_resched(); ··· 2091 2079 /* 2092 2080 * Move pages back to the lru list. 2093 2081 */ 2094 - spin_lock_irq(&pgdat->lru_lock); 2082 + spin_lock_irq(&lruvec->lru_lock); 2095 2083 2096 2084 nr_activate = move_pages_to_lru(lruvec, &l_active); 2097 2085 nr_deactivate = move_pages_to_lru(lruvec, &l_inactive); ··· 2102 2090 __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); 2103 2091 2104 2092 __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); 2105 - spin_unlock_irq(&pgdat->lru_lock); 2093 + spin_unlock_irq(&lruvec->lru_lock); 2106 2094 2107 2095 mem_cgroup_uncharge_list(&l_active); 2108 2096 free_unref_page_list(&l_active); ··· 2690 2678 /* 2691 2679 * Determine the scan balance between anon and file LRUs. 2692 2680 */ 2693 - spin_lock_irq(&pgdat->lru_lock); 2681 + spin_lock_irq(&target_lruvec->lru_lock); 2694 2682 sc->anon_cost = target_lruvec->anon_cost; 2695 2683 sc->file_cost = target_lruvec->file_cost; 2696 - spin_unlock_irq(&pgdat->lru_lock); 2684 + spin_unlock_irq(&target_lruvec->lru_lock); 2697 2685 2698 2686 /* 2699 2687 * Target desirable inactive:active list ratios for the anon ··· 4269 4257 */ 4270 4258 void check_move_unevictable_pages(struct pagevec *pvec) 4271 4259 { 4272 - struct lruvec *lruvec; 4273 - struct pglist_data *pgdat = NULL; 4260 + struct lruvec *lruvec = NULL; 4274 4261 int pgscanned = 0; 4275 4262 int pgrescued = 0; 4276 4263 int i; 4277 4264 4278 4265 for (i = 0; i < pvec->nr; i++) { 4279 4266 struct page *page = pvec->pages[i]; 4280 - struct pglist_data *pagepgdat = page_pgdat(page); 4281 4267 int nr_pages; 4282 4268 4283 4269 if (PageTransTail(page)) ··· 4284 4274 nr_pages = thp_nr_pages(page); 4285 4275 pgscanned += nr_pages; 4286 4276 4287 - if (pagepgdat != pgdat) { 4288 - if (pgdat) 4289 - spin_unlock_irq(&pgdat->lru_lock); 4290 - pgdat = pagepgdat; 4291 - spin_lock_irq(&pgdat->lru_lock); 4292 - } 4293 - lruvec = mem_cgroup_page_lruvec(page, pgdat); 4294 - 4295 - if (!PageLRU(page) || !PageUnevictable(page)) 4277 + /* block memcg migration during page moving between lru */ 4278 + if (!TestClearPageLRU(page)) 4296 4279 continue; 4297 4280 4298 - if (page_evictable(page)) { 4281 + lruvec = relock_page_lruvec_irq(page, lruvec); 4282 + if (page_evictable(page) && PageUnevictable(page)) { 4299 4283 enum lru_list lru = page_lru_base_type(page); 4300 4284 4301 4285 VM_BUG_ON_PAGE(PageActive(page), page); ··· 4298 4294 add_page_to_lru_list(page, lruvec, lru); 4299 4295 pgrescued += nr_pages; 4300 4296 } 4297 + SetPageLRU(page); 4301 4298 } 4302 4299 4303 - if (pgdat) { 4300 + if (lruvec) { 4304 4301 __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); 4305 4302 __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); 4306 - spin_unlock_irq(&pgdat->lru_lock); 4303 + unlock_page_lruvec_irq(lruvec); 4304 + } else if (pgscanned) { 4305 + count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); 4307 4306 } 4308 4307 } 4309 4308 EXPORT_SYMBOL_GPL(check_move_unevictable_pages);
-2
mm/workingset.c
··· 381 381 if (workingset) { 382 382 SetPageWorkingset(page); 383 383 /* XXX: Move to lru_cache_add() when it supports new vs putback */ 384 - spin_lock_irq(&page_pgdat(page)->lru_lock); 385 384 lru_note_cost_page(page); 386 - spin_unlock_irq(&page_pgdat(page)->lru_lock); 387 385 inc_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file); 388 386 } 389 387 out: