Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

memcontrol: add helpers for hugetlb memcg accounting

Patch series "hugetlb memcg accounting", v4.

Currently, hugetlb memory usage is not acounted for in the memory
controller, which could lead to memory overprotection for cgroups with
hugetlb-backed memory. This has been observed in our production system.

For instance, here is one of our usecases: suppose there are two 32G
containers. The machine is booted with hugetlb_cma=6G, and each container
may or may not use up to 3 gigantic page, depending on the workload within
it. The rest is anon, cache, slab, etc. We can set the hugetlb cgroup
limit of each cgroup to 3G to enforce hugetlb fairness. But it is very
difficult to configure memory.max to keep overall consumption, including
anon, cache, slab etcetera fair.

What we have had to resort to is to constantly poll hugetlb usage and
readjust memory.max. Similar procedure is done to other memory limits
(memory.low for e.g). However, this is rather cumbersome and buggy.
Furthermore, when there is a delay in memory limits correction, (for e.g
when hugetlb usage changes within consecutive runs of the userspace
agent), the system could be in an over/underprotected state.

This patch series rectifies this issue by charging the memcg when the
hugetlb folio is allocated, and uncharging when the folio is freed. In
addition, a new selftest is added to demonstrate and verify this new
behavior.


This patch (of 4):

This patch exposes charge committing and cancelling as parts of the memory
controller interface. These functionalities are useful when the
try_charge() and commit_charge() stages have to be separated by other
actions in between (which can fail). One such example is the new hugetlb
accounting behavior in the following patch.

The patch also adds a helper function to obtain a reference to the
current task's memcg.

Link: https://lkml.kernel.org/r/20231006184629.155543-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231006184629.155543-2-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun heo <tj@kernel.org>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Nhat Pham and committed by
Andrew Morton
4b569387 59838b25

+68 -12
+21
include/linux/memcontrol.h
··· 652 652 page_counter_read(&memcg->memory); 653 653 } 654 654 655 + void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg); 656 + 655 657 int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp); 656 658 657 659 /** ··· 704 702 return; 705 703 __mem_cgroup_uncharge_list(page_list); 706 704 } 705 + 706 + void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages); 707 707 708 708 void mem_cgroup_migrate(struct folio *old, struct folio *new); 709 709 ··· 762 758 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); 763 759 764 760 struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); 761 + 762 + struct mem_cgroup *get_mem_cgroup_from_current(void); 765 763 766 764 struct lruvec *folio_lruvec_lock(struct folio *folio); 767 765 struct lruvec *folio_lruvec_lock_irq(struct folio *folio); ··· 1245 1239 return false; 1246 1240 } 1247 1241 1242 + static inline void mem_cgroup_commit_charge(struct folio *folio, 1243 + struct mem_cgroup *memcg) 1244 + { 1245 + } 1246 + 1248 1247 static inline int mem_cgroup_charge(struct folio *folio, 1249 1248 struct mm_struct *mm, gfp_t gfp) 1250 1249 { ··· 1271 1260 } 1272 1261 1273 1262 static inline void mem_cgroup_uncharge_list(struct list_head *page_list) 1263 + { 1264 + } 1265 + 1266 + static inline void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, 1267 + unsigned int nr_pages) 1274 1268 { 1275 1269 } 1276 1270 ··· 1312 1296 } 1313 1297 1314 1298 static inline struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) 1299 + { 1300 + return NULL; 1301 + } 1302 + 1303 + static inline struct mem_cgroup *get_mem_cgroup_from_current(void) 1315 1304 { 1316 1305 return NULL; 1317 1306 }
+47 -12
mm/memcontrol.c
··· 1100 1100 } 1101 1101 1102 1102 /** 1103 + * get_mem_cgroup_from_current - Obtain a reference on current task's memcg. 1104 + */ 1105 + struct mem_cgroup *get_mem_cgroup_from_current(void) 1106 + { 1107 + struct mem_cgroup *memcg; 1108 + 1109 + if (mem_cgroup_disabled()) 1110 + return NULL; 1111 + 1112 + again: 1113 + rcu_read_lock(); 1114 + memcg = mem_cgroup_from_task(current); 1115 + if (!css_tryget(&memcg->css)) { 1116 + rcu_read_unlock(); 1117 + goto again; 1118 + } 1119 + rcu_read_unlock(); 1120 + return memcg; 1121 + } 1122 + 1123 + /** 1103 1124 * mem_cgroup_iter - iterate over memory cgroup hierarchy 1104 1125 * @root: hierarchy root 1105 1126 * @prev: previously returned memcg, NULL on first invocation ··· 2894 2873 return try_charge_memcg(memcg, gfp_mask, nr_pages); 2895 2874 } 2896 2875 2897 - static inline void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) 2876 + /** 2877 + * mem_cgroup_cancel_charge() - cancel an uncommitted try_charge() call. 2878 + * @memcg: memcg previously charged. 2879 + * @nr_pages: number of pages previously charged. 2880 + */ 2881 + void mem_cgroup_cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) 2898 2882 { 2899 2883 if (mem_cgroup_is_root(memcg)) 2900 2884 return; ··· 2922 2896 * - mem_cgroup_trylock_pages() 2923 2897 */ 2924 2898 folio->memcg_data = (unsigned long)memcg; 2899 + } 2900 + 2901 + /** 2902 + * mem_cgroup_commit_charge - commit a previously successful try_charge(). 2903 + * @folio: folio to commit the charge to. 2904 + * @memcg: memcg previously charged. 2905 + */ 2906 + void mem_cgroup_commit_charge(struct folio *folio, struct mem_cgroup *memcg) 2907 + { 2908 + css_get(&memcg->css); 2909 + commit_charge(folio, memcg); 2910 + 2911 + local_irq_disable(); 2912 + mem_cgroup_charge_statistics(memcg, folio_nr_pages(folio)); 2913 + memcg_check_events(memcg, folio_nid(folio)); 2914 + local_irq_enable(); 2925 2915 } 2926 2916 2927 2917 #ifdef CONFIG_MEMCG_KMEM ··· 6158 6116 6159 6117 /* we must uncharge all the leftover precharges from mc.to */ 6160 6118 if (mc.precharge) { 6161 - cancel_charge(mc.to, mc.precharge); 6119 + mem_cgroup_cancel_charge(mc.to, mc.precharge); 6162 6120 mc.precharge = 0; 6163 6121 } 6164 6122 /* ··· 6166 6124 * we must uncharge here. 6167 6125 */ 6168 6126 if (mc.moved_charge) { 6169 - cancel_charge(mc.from, mc.moved_charge); 6127 + mem_cgroup_cancel_charge(mc.from, mc.moved_charge); 6170 6128 mc.moved_charge = 0; 6171 6129 } 6172 6130 /* we must fixup refcnts and charges */ ··· 7073 7031 static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, 7074 7032 gfp_t gfp) 7075 7033 { 7076 - long nr_pages = folio_nr_pages(folio); 7077 7034 int ret; 7078 7035 7079 - ret = try_charge(memcg, gfp, nr_pages); 7036 + ret = try_charge(memcg, gfp, folio_nr_pages(folio)); 7080 7037 if (ret) 7081 7038 goto out; 7082 7039 7083 - css_get(&memcg->css); 7084 - commit_charge(folio, memcg); 7085 - 7086 - local_irq_disable(); 7087 - mem_cgroup_charge_statistics(memcg, nr_pages); 7088 - memcg_check_events(memcg, folio_nid(folio)); 7089 - local_irq_enable(); 7040 + mem_cgroup_commit_charge(folio, memcg); 7090 7041 out: 7091 7042 return ret; 7092 7043 }