Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

memcg: introduce private id API for in-kernel users

Patch series "memcg: separate private and public ID namespaces".

The memory cgroup subsystem maintains a private ID infrastructure that
is decoupled from the cgroup IDs. This private ID system exists because
some kernel objects (like swap entries and shadow entries in the
workingset code) can outlive the cgroup they were associated with.
The motivation is best described in commit 73f576c04b941 ("mm:
memcontrol: fix cgroup creation failure after many small jobs").

Unfortunately, some in-kernel users (DAMON, LRU gen debugfs interface,
shrinker debugfs) started exposing these private IDs to userspace.
This is problematic because:

1. The private IDs are internal implementation details that could change
2. Userspace already has access to cgroup IDs through the cgroup
filesystem
3. Using different ID namespaces in different interfaces is confusing

This series cleans up the memcg ID infrastructure by:

1. Explicitly marking the private ID APIs with "private" in their names
to make it clear they are for internal use only (swap/workingset)

2. Making the public cgroup ID APIs (mem_cgroup_id/mem_cgroup_get_from_id)
unconditionally available

3. Converting DAMON, LRU gen, and shrinker debugfs interfaces to use
the public cgroup IDs instead of the private IDs

4. Removing the now-unused wrapper functions and renaming the public
APIs for clarity

After this series:
- mem_cgroup_private_id() / mem_cgroup_from_private_id() are used for
internal kernel objects that outlive their cgroup (swap, workingset)
- mem_cgroup_id() / mem_cgroup_get_from_id() return the public cgroup ID
(from cgroup_id()) for use in userspace-facing interfaces


This patch (of 8):

The memory cgroup maintains a private ID infrastructure decoupled from the
cgroup IDs for swapout records and shadow entries. The main motivation of
this private ID infra is best described in the commit 73f576c04b941 ("mm:
memcontrol: fix cgroup creation failure after many small jobs").

Unfortunately some users have started exposing these private IDs to the
userspace where they should have used the cgroup IDs which are already
exposed to the userspace. Let's rename the memcg ID APIs to explicitly
mark them private.

No functional change is intended.

Link: https://lkml.kernel.org/r/20251225232116.294540-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20251225232116.294540-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Shakeel Butt and committed by
Andrew Morton
e77786b4 2c4c3e29

+61 -38
+21 -3
include/linux/memcontrol.h
··· 65 65 66 66 #define MEM_CGROUP_ID_SHIFT 16 67 67 68 - struct mem_cgroup_id { 68 + struct mem_cgroup_private_id { 69 69 int id; 70 70 refcount_t ref; 71 71 }; ··· 191 191 struct cgroup_subsys_state css; 192 192 193 193 /* Private memcg ID. Used to ID objects that outlive the cgroup */ 194 - struct mem_cgroup_id id; 194 + struct mem_cgroup_private_id id; 195 195 196 196 /* Accounted resources */ 197 197 struct page_counter memory; /* Both v1 & v2 */ ··· 821 821 void mem_cgroup_scan_tasks(struct mem_cgroup *memcg, 822 822 int (*)(struct task_struct *, void *), void *arg); 823 823 824 - static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) 824 + static inline unsigned short mem_cgroup_private_id(struct mem_cgroup *memcg) 825 825 { 826 826 if (mem_cgroup_disabled()) 827 827 return 0; 828 828 829 829 return memcg->id.id; 830 + } 831 + struct mem_cgroup *mem_cgroup_from_private_id(unsigned short id); 832 + 833 + static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) 834 + { 835 + return mem_cgroup_private_id(memcg); 830 836 } 831 837 struct mem_cgroup *mem_cgroup_from_id(unsigned short id); 832 838 ··· 1290 1284 } 1291 1285 1292 1286 static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id) 1287 + { 1288 + WARN_ON_ONCE(id); 1289 + /* XXX: This should always return root_mem_cgroup */ 1290 + return NULL; 1291 + } 1292 + 1293 + static inline unsigned short mem_cgroup_private_id(struct mem_cgroup *memcg) 1294 + { 1295 + return 0; 1296 + } 1297 + 1298 + static inline struct mem_cgroup *mem_cgroup_from_private_id(unsigned short id) 1293 1299 { 1294 1300 WARN_ON_ONCE(id); 1295 1301 /* XXX: This should always return root_mem_cgroup */
+1 -1
mm/list_lru.c
··· 369 369 370 370 xa_for_each(&lru->xa, index, mlru) { 371 371 rcu_read_lock(); 372 - memcg = mem_cgroup_from_id(index); 372 + memcg = mem_cgroup_from_private_id(index); 373 373 if (!mem_cgroup_tryget(memcg)) { 374 374 rcu_read_unlock(); 375 375 continue;
+3 -3
mm/memcontrol-v1.c
··· 635 635 * have an ID allocated to it anymore, charge the closest online 636 636 * ancestor for the swap instead and transfer the memory+swap charge. 637 637 */ 638 - swap_memcg = mem_cgroup_id_get_online(memcg); 638 + swap_memcg = mem_cgroup_private_id_get_online(memcg); 639 639 nr_entries = folio_nr_pages(folio); 640 640 /* Get references for the tail pages, too */ 641 641 if (nr_entries > 1) 642 - mem_cgroup_id_get_many(swap_memcg, nr_entries - 1); 642 + mem_cgroup_private_id_get_many(swap_memcg, nr_entries - 1); 643 643 mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); 644 644 645 - swap_cgroup_record(folio, mem_cgroup_id(swap_memcg), entry); 645 + swap_cgroup_record(folio, mem_cgroup_private_id(swap_memcg), entry); 646 646 647 647 folio_unqueue_deferred_split(folio); 648 648 folio->memcg_data = 0;
+2 -2
mm/memcontrol-v1.h
··· 28 28 unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item); 29 29 int memory_stat_show(struct seq_file *m, void *v); 30 30 31 - void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n); 32 - struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg); 31 + void mem_cgroup_private_id_get_many(struct mem_cgroup *memcg, unsigned int n); 32 + struct mem_cgroup *mem_cgroup_private_id_get_online(struct mem_cgroup *memcg); 33 33 34 34 /* Cgroup v1-specific declarations */ 35 35 #ifdef CONFIG_MEMCG_V1
+30 -25
mm/memcontrol.c
··· 3554 3554 */ 3555 3555 3556 3556 #define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1) 3557 - static DEFINE_XARRAY_ALLOC1(mem_cgroup_ids); 3557 + static DEFINE_XARRAY_ALLOC1(mem_cgroup_private_ids); 3558 3558 3559 - static void mem_cgroup_id_remove(struct mem_cgroup *memcg) 3559 + static void mem_cgroup_private_id_remove(struct mem_cgroup *memcg) 3560 3560 { 3561 3561 if (memcg->id.id > 0) { 3562 - xa_erase(&mem_cgroup_ids, memcg->id.id); 3562 + xa_erase(&mem_cgroup_private_ids, memcg->id.id); 3563 3563 memcg->id.id = 0; 3564 3564 } 3565 3565 } 3566 3566 3567 - void __maybe_unused mem_cgroup_id_get_many(struct mem_cgroup *memcg, 3567 + void __maybe_unused mem_cgroup_private_id_get_many(struct mem_cgroup *memcg, 3568 3568 unsigned int n) 3569 3569 { 3570 3570 refcount_add(n, &memcg->id.ref); 3571 3571 } 3572 3572 3573 - static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n) 3573 + static void mem_cgroup_private_id_put_many(struct mem_cgroup *memcg, unsigned int n) 3574 3574 { 3575 3575 if (refcount_sub_and_test(n, &memcg->id.ref)) { 3576 - mem_cgroup_id_remove(memcg); 3576 + mem_cgroup_private_id_remove(memcg); 3577 3577 3578 3578 /* Memcg ID pins CSS */ 3579 3579 css_put(&memcg->css); 3580 3580 } 3581 3581 } 3582 3582 3583 - static inline void mem_cgroup_id_put(struct mem_cgroup *memcg) 3583 + static inline void mem_cgroup_private_id_put(struct mem_cgroup *memcg) 3584 3584 { 3585 - mem_cgroup_id_put_many(memcg, 1); 3585 + mem_cgroup_private_id_put_many(memcg, 1); 3586 3586 } 3587 3587 3588 - struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg) 3588 + struct mem_cgroup *mem_cgroup_private_id_get_online(struct mem_cgroup *memcg) 3589 3589 { 3590 3590 while (!refcount_inc_not_zero(&memcg->id.ref)) { 3591 3591 /* ··· 3604 3604 } 3605 3605 3606 3606 /** 3607 - * mem_cgroup_from_id - look up a memcg from a memcg id 3607 + * mem_cgroup_from_private_id - look up a memcg from a memcg id 3608 3608 * @id: the memcg id to look up 3609 3609 * 3610 3610 * Caller must hold rcu_read_lock(). 3611 3611 */ 3612 - struct mem_cgroup *mem_cgroup_from_id(unsigned short id) 3612 + struct mem_cgroup *mem_cgroup_from_private_id(unsigned short id) 3613 3613 { 3614 3614 WARN_ON_ONCE(!rcu_read_lock_held()); 3615 - return xa_load(&mem_cgroup_ids, id); 3615 + return xa_load(&mem_cgroup_private_ids, id); 3616 + } 3617 + 3618 + struct mem_cgroup *mem_cgroup_from_id(unsigned short id) 3619 + { 3620 + return mem_cgroup_from_private_id(id); 3616 3621 } 3617 3622 3618 3623 #ifdef CONFIG_SHRINKER_DEBUG ··· 3716 3711 if (!memcg) 3717 3712 return ERR_PTR(-ENOMEM); 3718 3713 3719 - error = xa_alloc(&mem_cgroup_ids, &memcg->id.id, NULL, 3714 + error = xa_alloc(&mem_cgroup_private_ids, &memcg->id.id, NULL, 3720 3715 XA_LIMIT(1, MEM_CGROUP_ID_MAX), GFP_KERNEL); 3721 3716 if (error) 3722 3717 goto fail; ··· 3776 3771 lru_gen_init_memcg(memcg); 3777 3772 return memcg; 3778 3773 fail: 3779 - mem_cgroup_id_remove(memcg); 3774 + mem_cgroup_private_id_remove(memcg); 3780 3775 __mem_cgroup_free(memcg); 3781 3776 return ERR_PTR(error); 3782 3777 } ··· 3859 3854 css_get(css); 3860 3855 3861 3856 /* 3862 - * Ensure mem_cgroup_from_id() works once we're fully online. 3857 + * Ensure mem_cgroup_from_private_id() works once we're fully online. 3863 3858 * 3864 3859 * We could do this earlier and require callers to filter with 3865 3860 * css_tryget_online(). But right now there are no users that ··· 3868 3863 * publish it here at the end of onlining. This matches the 3869 3864 * regular ID destruction during offlining. 3870 3865 */ 3871 - xa_store(&mem_cgroup_ids, memcg->id.id, memcg, GFP_KERNEL); 3866 + xa_store(&mem_cgroup_private_ids, memcg->id.id, memcg, GFP_KERNEL); 3872 3867 3873 3868 return 0; 3874 3869 offline_kmem: 3875 3870 memcg_offline_kmem(memcg); 3876 3871 remove_id: 3877 - mem_cgroup_id_remove(memcg); 3872 + mem_cgroup_private_id_remove(memcg); 3878 3873 return -ENOMEM; 3879 3874 } 3880 3875 ··· 3897 3892 3898 3893 drain_all_stock(memcg); 3899 3894 3900 - mem_cgroup_id_put(memcg); 3895 + mem_cgroup_private_id_put(memcg); 3901 3896 } 3902 3897 3903 3898 static void mem_cgroup_css_released(struct cgroup_subsys_state *css) ··· 4784 4779 4785 4780 id = lookup_swap_cgroup_id(entry); 4786 4781 rcu_read_lock(); 4787 - memcg = mem_cgroup_from_id(id); 4782 + memcg = mem_cgroup_from_private_id(id); 4788 4783 if (!memcg || !css_tryget_online(&memcg->css)) 4789 4784 memcg = get_mem_cgroup_from_mm(mm); 4790 4785 rcu_read_unlock(); ··· 5179 5174 return 0; 5180 5175 } 5181 5176 5182 - memcg = mem_cgroup_id_get_online(memcg); 5177 + memcg = mem_cgroup_private_id_get_online(memcg); 5183 5178 5184 5179 if (!mem_cgroup_is_root(memcg) && 5185 5180 !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) { 5186 5181 memcg_memory_event(memcg, MEMCG_SWAP_MAX); 5187 5182 memcg_memory_event(memcg, MEMCG_SWAP_FAIL); 5188 - mem_cgroup_id_put(memcg); 5183 + mem_cgroup_private_id_put(memcg); 5189 5184 return -ENOMEM; 5190 5185 } 5191 5186 5192 5187 /* Get references for the tail pages, too */ 5193 5188 if (nr_pages > 1) 5194 - mem_cgroup_id_get_many(memcg, nr_pages - 1); 5189 + mem_cgroup_private_id_get_many(memcg, nr_pages - 1); 5195 5190 mod_memcg_state(memcg, MEMCG_SWAP, nr_pages); 5196 5191 5197 - swap_cgroup_record(folio, mem_cgroup_id(memcg), entry); 5192 + swap_cgroup_record(folio, mem_cgroup_private_id(memcg), entry); 5198 5193 5199 5194 return 0; 5200 5195 } ··· 5211 5206 5212 5207 id = swap_cgroup_clear(entry, nr_pages); 5213 5208 rcu_read_lock(); 5214 - memcg = mem_cgroup_from_id(id); 5209 + memcg = mem_cgroup_from_private_id(id); 5215 5210 if (memcg) { 5216 5211 if (!mem_cgroup_is_root(memcg)) { 5217 5212 if (do_memsw_account()) ··· 5220 5215 page_counter_uncharge(&memcg->swap, nr_pages); 5221 5216 } 5222 5217 mod_memcg_state(memcg, MEMCG_SWAP, -nr_pages); 5223 - mem_cgroup_id_put_many(memcg, nr_pages); 5218 + mem_cgroup_private_id_put_many(memcg, nr_pages); 5224 5219 } 5225 5220 rcu_read_unlock(); 5226 5221 }
+4 -4
mm/workingset.c
··· 254 254 hist = lru_hist_from_seq(min_seq); 255 255 atomic_long_add(delta, &lrugen->evicted[hist][type][tier]); 256 256 257 - return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset); 257 + return pack_shadow(mem_cgroup_private_id(memcg), pgdat, token, workingset); 258 258 } 259 259 260 260 /* ··· 271 271 272 272 unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset); 273 273 274 - memcg = mem_cgroup_from_id(memcg_id); 274 + memcg = mem_cgroup_from_private_id(memcg_id); 275 275 *lruvec = mem_cgroup_lruvec(memcg, pgdat); 276 276 277 277 max_seq = READ_ONCE((*lruvec)->lrugen.max_seq); ··· 395 395 396 396 lruvec = mem_cgroup_lruvec(target_memcg, pgdat); 397 397 /* XXX: target_memcg can be NULL, go through lruvec */ 398 - memcgid = mem_cgroup_id(lruvec_memcg(lruvec)); 398 + memcgid = mem_cgroup_private_id(lruvec_memcg(lruvec)); 399 399 eviction = atomic_long_read(&lruvec->nonresident_age); 400 400 eviction >>= bucket_order; 401 401 workingset_age_nonresident(lruvec, folio_nr_pages(folio)); ··· 456 456 * would be better if the root_mem_cgroup existed in all 457 457 * configurations instead. 458 458 */ 459 - eviction_memcg = mem_cgroup_from_id(memcgid); 459 + eviction_memcg = mem_cgroup_from_private_id(memcgid); 460 460 if (!mem_cgroup_tryget(eviction_memcg)) 461 461 eviction_memcg = NULL; 462 462 rcu_read_unlock();