Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/filemap: add AS_KERNEL_FILE

Patch series "introduce kernel file mapped folios", v4.

Btrfs currently tracks its metadata pages in the page cache, using a fake
inode (fs_info->btree_inode) with offsets corresponding to where the
metadata is stored in the filesystem's full logical address space.

A consequence of this is that when btrfs uses filemap_add_folio(), this
usage is charged to the cgroup of whichever task happens to be running at
the time. These folios don't belong to any particular user cgroup, so I
don't think it makes much sense for them to be charged in that way. Some
negative consequences as a result:

- A task can be holding some important btrfs locks, then need to lookup
some metadata and go into reclaim, extending the duration it holds
that lock for, and unfairly pushing its own reclaim pain onto other
cgroups.

- If that cgroup goes into reclaim, it might reclaim these folios a
different non-reclaiming cgroup might need soon. This is naturally
offset by LRU reclaim, but still.

We have two options for how to manage such file pages:
1. charge them to the root cgroup.
2. don't charge them to any cgroup at all.

2. breaks the invariant that every mapped page has a cgroup. This is
workable, but unnecessarily risky. Therefore, go with 1.

A very similar proposal to use the root cgroup was previously made by Qu,
where he eventually proposed the idea of setting it per address_space.
This makes good sense for the btrfs use case, as the behavior should apply
to all use of the address_space, not select allocations. I.e., if someone
adds another filemap_add_folio() call using btrfs's btree_inode, we would
almost certainly want to account that to the root cgroup as well.


This patch (of 3):

Add the flag AS_KERNEL_FILE to the address_space to indicate that this
mapping's memory is exempt from the usual memcg accounting.

[boris@bur.io: fix CONFIG_MEMCG build for AS_KERNEL_FILE]
Link: https://lkml.kernel.org/r/6de59ddeec81b5c294d337c001ba0061631d4ec6.1755816635.git.boris@bur.io
Link: https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/
Link: https://lkml.kernel.org/r/f09c4e2c90351d4cb30a1969f7a863b9238bd291.1755812945.git.boris@bur.io
Signed-off-by: Boris Burkov <boris@bur.io>
Suggested-by: Qu Wenruo <wqu@suse.com>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Boris Burkov and committed by
Andrew Morton
cf1dec76 c090868f

+10
+2
include/linux/memcontrol.h
··· 1059 1059 1060 1060 #define MEM_CGROUP_ID_SHIFT 0 1061 1061 1062 + #define root_mem_cgroup (NULL) 1063 + 1062 1064 static inline struct mem_cgroup *folio_memcg(struct folio *folio) 1063 1065 { 1064 1066 return NULL;
+2
include/linux/pagemap.h
··· 211 211 folio contents */ 212 212 AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */ 213 213 AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, 214 + AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't 215 + account usage to user cgroups */ 214 216 /* Bits 16-25 are used for FOLIO_ORDER */ 215 217 AS_FOLIO_ORDER_BITS = 5, 216 218 AS_FOLIO_ORDER_MIN = 16,
+6
mm/filemap.c
··· 960 960 { 961 961 void *shadow = NULL; 962 962 int ret; 963 + struct mem_cgroup *tmp; 964 + bool kernel_file = test_bit(AS_KERNEL_FILE, &mapping->flags); 963 965 966 + if (kernel_file) 967 + tmp = set_active_memcg(root_mem_cgroup); 964 968 ret = mem_cgroup_charge(folio, NULL, gfp); 969 + if (kernel_file) 970 + set_active_memcg(tmp); 965 971 if (ret) 966 972 return ret; 967 973