slab: sheaf prefilling for guaranteed allocations

Add functions for efficient guaranteed allocations e.g. in a critical
section that cannot sleep, when the exact number of allocations is not
known beforehand, but an upper limit can be calculated.

kmem_cache_prefill_sheaf() returns a sheaf containing at least given
number of objects.

kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf
and is guaranteed not to fail until depleted.

kmem_cache_return_sheaf() is for giving the sheaf back to the slab
allocator after the critical section. This will also attempt to refill
it to cache's sheaf capacity for better efficiency of sheaves handling,
but it's not stricly necessary to succeed.

kmem_cache_refill_sheaf() can be used to refill a previously obtained
sheaf to requested size. If the current size is sufficient, it does
nothing. If the requested size exceeds cache's sheaf_capacity and the
sheaf's current capacity, the sheaf will be replaced with a new one,
hence the indirect pointer parameter.

kmem_cache_sheaf_size() can be used to query the current size.

The implementation supports requesting sizes that exceed cache's
sheaf_capacity, but it is not efficient - such "oversize" sheaves are
allocated fresh in kmem_cache_prefill_sheaf() and flushed and freed
immediately by kmem_cache_return_sheaf(). kmem_cache_refill_sheaf()
might be especially ineffective when replacing a sheaf with a new one of
a larger capacity. It is therefore better to size cache's
sheaf_capacity accordingly to make oversize sheaves exceptional.

CONFIG_SLUB_STATS counters are added for sheaf prefill and return
operations. A prefill or return is considered _fast when it is able to
grab or return a percpu spare sheaf (even if the sheaf needs a refill to
satisfy the request, as those should amortize over time), and _slow
otherwise (when the barn or even sheaf allocation/freeing has to be
involved). sheaf_prefill_oversize is provided to determine how many
prefills were oversize (counter for oversize returns is not necessary as
all oversize refills result in oversize returns).

When slub_debug is enabled for a cache with sheaves, no percpu sheaves
exist for it, but the prefill functionality is still provided simply by
all prefilled sheaves becoming oversize. If percpu sheaves are not
created for a cache due to not passing the sheaf_capacity argument on
cache creation, the prefills also work through oversize sheaves, but
there's a WARN_ON_ONCE() to indicate the omission.

Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Vlastimil Babka 8 months ago 3c1ea5c5 ec66e0d5

+279

2 changed files

expand all

include

linux

slab.h

slub.c

+16

include/linux/slab.h

··· 829 829 int node) __assume_slab_alignment __malloc; 830 830 #define kmem_cache_alloc_node(...) alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__)) 831 831 832 + struct slab_sheaf * 833 + kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size); 834 + 835 + int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp, 836 + struct slab_sheaf **sheafp, unsigned int size); 837 + 838 + void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, 839 + struct slab_sheaf *sheaf); 840 + 841 + void *kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *cachep, gfp_t gfp, 842 + struct slab_sheaf *sheaf) __assume_slab_alignment __malloc; 843 + #define kmem_cache_alloc_from_sheaf(...) \ 844 + alloc_hooks(kmem_cache_alloc_from_sheaf_noprof(__VA_ARGS__)) 845 + 846 + unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf); 847 + 832 848 /* 833 849 * These macros allow declaring a kmem_buckets * parameter alongside size, which 834 850 * can be compiled out with CONFIG_SLAB_BUCKETS=n so that a large number of call

+263

mm/slub.c

··· 401 401 BARN_GET_FAIL, /* Failed to get full sheaf from barn */ 402 402 BARN_PUT, /* Put full sheaf to barn */ 403 403 BARN_PUT_FAIL, /* Failed to put full sheaf to barn */ 404 + SHEAF_PREFILL_FAST, /* Sheaf prefill grabbed the spare sheaf */ 405 + SHEAF_PREFILL_SLOW, /* Sheaf prefill found no spare sheaf */ 406 + SHEAF_PREFILL_OVERSIZE, /* Allocation of oversize sheaf for prefill */ 407 + SHEAF_RETURN_FAST, /* Sheaf return reattached spare sheaf */ 408 + SHEAF_RETURN_SLOW, /* Sheaf return could not reattach spare */ 404 409 NR_SLUB_STAT_ITEMS 405 410 }; 406 411 ··· 467 462 union { 468 463 struct rcu_head rcu_head; 469 464 struct list_head barn_list; 465 + /* only used for prefilled sheafs */ 466 + unsigned int capacity; 470 467 }; 471 468 struct kmem_cache *cache; 472 469 unsigned int size; ··· 2845 2838 spin_unlock_irqrestore(&barn->lock, flags); 2846 2839 } 2847 2840 2841 + static struct slab_sheaf *barn_get_full_or_empty_sheaf(struct node_barn *barn) 2842 + { 2843 + struct slab_sheaf *sheaf = NULL; 2844 + unsigned long flags; 2845 + 2846 + spin_lock_irqsave(&barn->lock, flags); 2847 + 2848 + if (barn->nr_full) { 2849 + sheaf = list_first_entry(&barn->sheaves_full, struct slab_sheaf, 2850 + barn_list); 2851 + list_del(&sheaf->barn_list); 2852 + barn->nr_full--; 2853 + } else if (barn->nr_empty) { 2854 + sheaf = list_first_entry(&barn->sheaves_empty, 2855 + struct slab_sheaf, barn_list); 2856 + list_del(&sheaf->barn_list); 2857 + barn->nr_empty--; 2858 + } 2859 + 2860 + spin_unlock_irqrestore(&barn->lock, flags); 2861 + 2862 + return sheaf; 2863 + } 2864 + 2848 2865 /* 2849 2866 * If a full sheaf is available, return it and put the supplied empty one to 2850 2867 * barn. We ignore the limit on empty sheaves as the number of sheaves doesn't ··· 5067 5036 } 5068 5037 EXPORT_SYMBOL(kmem_cache_alloc_node_noprof); 5069 5038 5039 + /* 5040 + * returns a sheaf that has at least the requested size 5041 + * when prefilling is needed, do so with given gfp flags 5042 + * 5043 + * return NULL if sheaf allocation or prefilling failed 5044 + */ 5045 + struct slab_sheaf * 5046 + kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size) 5047 + { 5048 + struct slub_percpu_sheaves *pcs; 5049 + struct slab_sheaf *sheaf = NULL; 5050 + 5051 + if (unlikely(size > s->sheaf_capacity)) { 5052 + 5053 + /* 5054 + * slab_debug disables cpu sheaves intentionally so all 5055 + * prefilled sheaves become "oversize" and we give up on 5056 + * performance for the debugging. Same with SLUB_TINY. 5057 + * Creating a cache without sheaves and then requesting a 5058 + * prefilled sheaf is however not expected, so warn. 5059 + */ 5060 + WARN_ON_ONCE(s->sheaf_capacity == 0 && 5061 + !IS_ENABLED(CONFIG_SLUB_TINY) && 5062 + !(s->flags & SLAB_DEBUG_FLAGS)); 5063 + 5064 + sheaf = kzalloc(struct_size(sheaf, objects, size), gfp); 5065 + if (!sheaf) 5066 + return NULL; 5067 + 5068 + stat(s, SHEAF_PREFILL_OVERSIZE); 5069 + sheaf->cache = s; 5070 + sheaf->capacity = size; 5071 + 5072 + if (!__kmem_cache_alloc_bulk(s, gfp, size, 5073 + &sheaf->objects[0])) { 5074 + kfree(sheaf); 5075 + return NULL; 5076 + } 5077 + 5078 + sheaf->size = size; 5079 + 5080 + return sheaf; 5081 + } 5082 + 5083 + local_lock(&s->cpu_sheaves->lock); 5084 + pcs = this_cpu_ptr(s->cpu_sheaves); 5085 + 5086 + if (pcs->spare) { 5087 + sheaf = pcs->spare; 5088 + pcs->spare = NULL; 5089 + stat(s, SHEAF_PREFILL_FAST); 5090 + } else { 5091 + stat(s, SHEAF_PREFILL_SLOW); 5092 + sheaf = barn_get_full_or_empty_sheaf(get_barn(s)); 5093 + if (sheaf && sheaf->size) 5094 + stat(s, BARN_GET); 5095 + else 5096 + stat(s, BARN_GET_FAIL); 5097 + } 5098 + 5099 + local_unlock(&s->cpu_sheaves->lock); 5100 + 5101 + 5102 + if (!sheaf) 5103 + sheaf = alloc_empty_sheaf(s, gfp); 5104 + 5105 + if (sheaf && sheaf->size < size) { 5106 + if (refill_sheaf(s, sheaf, gfp)) { 5107 + sheaf_flush_unused(s, sheaf); 5108 + free_empty_sheaf(s, sheaf); 5109 + sheaf = NULL; 5110 + } 5111 + } 5112 + 5113 + if (sheaf) 5114 + sheaf->capacity = s->sheaf_capacity; 5115 + 5116 + return sheaf; 5117 + } 5118 + 5119 + /* 5120 + * Use this to return a sheaf obtained by kmem_cache_prefill_sheaf() 5121 + * 5122 + * If the sheaf cannot simply become the percpu spare sheaf, but there's space 5123 + * for a full sheaf in the barn, we try to refill the sheaf back to the cache's 5124 + * sheaf_capacity to avoid handling partially full sheaves. 5125 + * 5126 + * If the refill fails because gfp is e.g. GFP_NOWAIT, or the barn is full, the 5127 + * sheaf is instead flushed and freed. 5128 + */ 5129 + void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, 5130 + struct slab_sheaf *sheaf) 5131 + { 5132 + struct slub_percpu_sheaves *pcs; 5133 + struct node_barn *barn; 5134 + 5135 + if (unlikely(sheaf->capacity != s->sheaf_capacity)) { 5136 + sheaf_flush_unused(s, sheaf); 5137 + kfree(sheaf); 5138 + return; 5139 + } 5140 + 5141 + local_lock(&s->cpu_sheaves->lock); 5142 + pcs = this_cpu_ptr(s->cpu_sheaves); 5143 + barn = get_barn(s); 5144 + 5145 + if (!pcs->spare) { 5146 + pcs->spare = sheaf; 5147 + sheaf = NULL; 5148 + stat(s, SHEAF_RETURN_FAST); 5149 + } 5150 + 5151 + local_unlock(&s->cpu_sheaves->lock); 5152 + 5153 + if (!sheaf) 5154 + return; 5155 + 5156 + stat(s, SHEAF_RETURN_SLOW); 5157 + 5158 + /* 5159 + * If the barn has too many full sheaves or we fail to refill the sheaf, 5160 + * simply flush and free it. 5161 + */ 5162 + if (data_race(barn->nr_full) >= MAX_FULL_SHEAVES || 5163 + refill_sheaf(s, sheaf, gfp)) { 5164 + sheaf_flush_unused(s, sheaf); 5165 + free_empty_sheaf(s, sheaf); 5166 + return; 5167 + } 5168 + 5169 + barn_put_full_sheaf(barn, sheaf); 5170 + stat(s, BARN_PUT); 5171 + } 5172 + 5173 + /* 5174 + * refill a sheaf previously returned by kmem_cache_prefill_sheaf to at least 5175 + * the given size 5176 + * 5177 + * the sheaf might be replaced by a new one when requesting more than 5178 + * s->sheaf_capacity objects if such replacement is necessary, but the refill 5179 + * fails (returning -ENOMEM), the existing sheaf is left intact 5180 + * 5181 + * In practice we always refill to full sheaf's capacity. 5182 + */ 5183 + int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp, 5184 + struct slab_sheaf **sheafp, unsigned int size) 5185 + { 5186 + struct slab_sheaf *sheaf; 5187 + 5188 + /* 5189 + * TODO: do we want to support *sheaf == NULL to be equivalent of 5190 + * kmem_cache_prefill_sheaf() ? 5191 + */ 5192 + if (!sheafp || !(*sheafp)) 5193 + return -EINVAL; 5194 + 5195 + sheaf = *sheafp; 5196 + if (sheaf->size >= size) 5197 + return 0; 5198 + 5199 + if (likely(sheaf->capacity >= size)) { 5200 + if (likely(sheaf->capacity == s->sheaf_capacity)) 5201 + return refill_sheaf(s, sheaf, gfp); 5202 + 5203 + if (!__kmem_cache_alloc_bulk(s, gfp, sheaf->capacity - sheaf->size, 5204 + &sheaf->objects[sheaf->size])) { 5205 + return -ENOMEM; 5206 + } 5207 + sheaf->size = sheaf->capacity; 5208 + 5209 + return 0; 5210 + } 5211 + 5212 + /* 5213 + * We had a regular sized sheaf and need an oversize one, or we had an 5214 + * oversize one already but need a larger one now. 5215 + * This should be a very rare path so let's not complicate it. 5216 + */ 5217 + sheaf = kmem_cache_prefill_sheaf(s, gfp, size); 5218 + if (!sheaf) 5219 + return -ENOMEM; 5220 + 5221 + kmem_cache_return_sheaf(s, gfp, *sheafp); 5222 + *sheafp = sheaf; 5223 + return 0; 5224 + } 5225 + 5226 + /* 5227 + * Allocate from a sheaf obtained by kmem_cache_prefill_sheaf() 5228 + * 5229 + * Guaranteed not to fail as many allocations as was the requested size. 5230 + * After the sheaf is emptied, it fails - no fallback to the slab cache itself. 5231 + * 5232 + * The gfp parameter is meant only to specify __GFP_ZERO or __GFP_ACCOUNT 5233 + * memcg charging is forced over limit if necessary, to avoid failure. 5234 + */ 5235 + void * 5236 + kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp, 5237 + struct slab_sheaf *sheaf) 5238 + { 5239 + void *ret = NULL; 5240 + bool init; 5241 + 5242 + if (sheaf->size == 0) 5243 + goto out; 5244 + 5245 + ret = sheaf->objects[--sheaf->size]; 5246 + 5247 + init = slab_want_init_on_alloc(gfp, s); 5248 + 5249 + /* add __GFP_NOFAIL to force successful memcg charging */ 5250 + slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, s->object_size); 5251 + out: 5252 + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE); 5253 + 5254 + return ret; 5255 + } 5256 + 5257 + unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf) 5258 + { 5259 + return sheaf->size; 5260 + } 5070 5261 /* 5071 5262 * To avoid unnecessary overhead, we pass through large allocation requests 5072 5263 * directly to the page allocator. We use __GFP_COMP, because we will need to ··· 8831 8578 STAT_ATTR(BARN_GET_FAIL, barn_get_fail); 8832 8579 STAT_ATTR(BARN_PUT, barn_put); 8833 8580 STAT_ATTR(BARN_PUT_FAIL, barn_put_fail); 8581 + STAT_ATTR(SHEAF_PREFILL_FAST, sheaf_prefill_fast); 8582 + STAT_ATTR(SHEAF_PREFILL_SLOW, sheaf_prefill_slow); 8583 + STAT_ATTR(SHEAF_PREFILL_OVERSIZE, sheaf_prefill_oversize); 8584 + STAT_ATTR(SHEAF_RETURN_FAST, sheaf_return_fast); 8585 + STAT_ATTR(SHEAF_RETURN_SLOW, sheaf_return_slow); 8834 8586 #endif /* CONFIG_SLUB_STATS */ 8835 8587 8836 8588 #ifdef CONFIG_KFENCE ··· 8936 8678 &barn_get_fail_attr.attr, 8937 8679 &barn_put_attr.attr, 8938 8680 &barn_put_fail_attr.attr, 8681 + &sheaf_prefill_fast_attr.attr, 8682 + &sheaf_prefill_slow_attr.attr, 8683 + &sheaf_prefill_oversize_attr.attr, 8684 + &sheaf_return_fast_attr.attr, 8685 + &sheaf_return_slow_attr.attr, 8939 8686 #endif 8940 8687 #ifdef CONFIG_FAILSLAB 8941 8688 &failslab_attr.attr,

Configure Feed

Configure Feed