Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

memcg: skip cgroup_file_notify if spinning is not allowed

Generally memcg charging is allowed from all the contexts including NMI
where even spinning on spinlock can cause locking issues. However one
call chain was missed during the addition of memcg charging from any
context support. That is try_charge_memcg() -> memcg_memory_event() ->
cgroup_file_notify().

The possible function call tree under cgroup_file_notify() can acquire
many different spin locks in spinning mode. Some of them are
cgroup_file_kn_lock, kernfs_notify_lock, pool_workqeue's lock. So, let's
just skip cgroup_file_notify() from memcg charging if the context does not
allow spinning.

Alternative approach was also explored where instead of skipping
cgroup_file_notify(), we defer the memcg event processing to irq_work [1].
However it adds complexity and it was decided to keep things simple until
we need more memcg events with !allow_spinning requirement.

Link: https://lore.kernel.org/all/5qi2llyzf7gklncflo6gxoozljbm4h3tpnuv4u4ej4ztysvi6f@x44v7nz2wdzd/ [1]
Link: https://lkml.kernel.org/r/20250922220203.261714-1-shakeel.butt@linux.dev
Fixes: 3ac4638a734a ("memcg: make memcg_rstat_updated nmi safe")
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Closes: https://lore.kernel.org/all/20250905061919.439648-1-yepeilin@google.com/
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peilin Ye <yepeilin@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Shakeel Butt and committed by
Andrew Morton
fcc0669c 7a405dbb

+23 -10
+19 -7
include/linux/memcontrol.h
··· 1001 1001 count_memcg_events_mm(mm, idx, 1); 1002 1002 } 1003 1003 1004 - static inline void memcg_memory_event(struct mem_cgroup *memcg, 1005 - enum memcg_memory_event event) 1004 + static inline void __memcg_memory_event(struct mem_cgroup *memcg, 1005 + enum memcg_memory_event event, 1006 + bool allow_spinning) 1006 1007 { 1007 1008 bool swap_event = event == MEMCG_SWAP_HIGH || event == MEMCG_SWAP_MAX || 1008 1009 event == MEMCG_SWAP_FAIL; 1009 1010 1011 + /* For now only MEMCG_MAX can happen with !allow_spinning context. */ 1012 + VM_WARN_ON_ONCE(!allow_spinning && event != MEMCG_MAX); 1013 + 1010 1014 atomic_long_inc(&memcg->memory_events_local[event]); 1011 - if (!swap_event) 1015 + if (!swap_event && allow_spinning) 1012 1016 cgroup_file_notify(&memcg->events_local_file); 1013 1017 1014 1018 do { 1015 1019 atomic_long_inc(&memcg->memory_events[event]); 1016 - if (swap_event) 1017 - cgroup_file_notify(&memcg->swap_events_file); 1018 - else 1019 - cgroup_file_notify(&memcg->events_file); 1020 + if (allow_spinning) { 1021 + if (swap_event) 1022 + cgroup_file_notify(&memcg->swap_events_file); 1023 + else 1024 + cgroup_file_notify(&memcg->events_file); 1025 + } 1020 1026 1021 1027 if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) 1022 1028 break; ··· 1030 1024 break; 1031 1025 } while ((memcg = parent_mem_cgroup(memcg)) && 1032 1026 !mem_cgroup_is_root(memcg)); 1027 + } 1028 + 1029 + static inline void memcg_memory_event(struct mem_cgroup *memcg, 1030 + enum memcg_memory_event event) 1031 + { 1032 + __memcg_memory_event(memcg, event, true); 1033 1033 } 1034 1034 1035 1035 static inline void memcg_memory_event_mm(struct mm_struct *mm,
+4 -3
mm/memcontrol.c
··· 2307 2307 bool drained = false; 2308 2308 bool raised_max_event = false; 2309 2309 unsigned long pflags; 2310 + bool allow_spinning = gfpflags_allow_spinning(gfp_mask); 2310 2311 2311 2312 retry: 2312 2313 if (consume_stock(memcg, nr_pages)) 2313 2314 return 0; 2314 2315 2315 - if (!gfpflags_allow_spinning(gfp_mask)) 2316 + if (!allow_spinning) 2316 2317 /* Avoid the refill and flush of the older stock */ 2317 2318 batch = nr_pages; 2318 2319 ··· 2349 2348 if (!gfpflags_allow_blocking(gfp_mask)) 2350 2349 goto nomem; 2351 2350 2352 - memcg_memory_event(mem_over_limit, MEMCG_MAX); 2351 + __memcg_memory_event(mem_over_limit, MEMCG_MAX, allow_spinning); 2353 2352 raised_max_event = true; 2354 2353 2355 2354 psi_memstall_enter(&pflags); ··· 2416 2415 * a MEMCG_MAX event. 2417 2416 */ 2418 2417 if (!raised_max_event) 2419 - memcg_memory_event(mem_over_limit, MEMCG_MAX); 2418 + __memcg_memory_event(mem_over_limit, MEMCG_MAX, allow_spinning); 2420 2419 2421 2420 /* 2422 2421 * The allocation either can't fail or will lead to more memory