Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

io_uring/rsrc: get rid of per-ring io_rsrc_node list

Work in progress, but get rid of the per-ring serialization of resource
nodes, like registered buffers and files. Main issue here is that one
node can otherwise hold up a bunch of other nodes from getting freed,
which is especially a problem for file resource nodes and networked
workloads where some descriptors may not see activity in a long time.

As an example, instantiate an io_uring ring fd and create a sparse
registered file table. Even 2 will do. Then create a socket and register
it as fixed file 0, F0. The number of open files in the app is now 5,
with 0/1/2 being the usual stdin/out/err, 3 being the ring fd, and 4
being the socket. Register this socket (eg "the listener") in slot 0 of
the registered file table. Now add an operation on the socket that uses
slot 0. Finally, loop N times, where each loop creates a new socket,
registers said socket as a file, then unregisters the socket, and
finally closes the socket. This is roughly similar to what a basic
accept loop would look like.

At the end of this loop, it's not unreasonable to expect that there
would still be 5 open files. Each socket created and registered in the
loop is also unregistered and closed. But since the listener socket
registered first still has references to its resource node due to still
being active, each subsequent socket unregistration is stuck behind it
for reclaim. Hence 5 + N files are still open at that point, where N is
awaiting the final put held up by the listener socket.

Rewrite the io_rsrc_node handling to NOT rely on serialization. Struct
io_kiocb now gets explicit resource nodes assigned, with each holding a
reference to the parent node. A parent node is either of type FILE or
BUFFER, which are the two types of nodes that exist. A request can have
two nodes assigned, if it's using both registered files and buffers.
Since request issue and task_work completion is both under the ring
private lock, no atomics are needed to handle these references. It's a
simple unlocked inc/dec. As before, the registered buffer or file table
each hold a reference as well to the registered nodes. Final put of the
node will remove the node and free the underlying resource, eg unmap the
buffer or put the file.

Outside of removing the stall in resource reclaim described above, it
has the following advantages:

1) It's a lot simpler than the previous scheme, and easier to follow.
No need to specific quiesce handling anymore.

2) There are no resource node allocations in the fast path, all of that
happens at resource registration time.

3) The structs related to resource handling can all get simplified
quite a bit, like io_rsrc_node and io_rsrc_data. io_rsrc_put can
go away completely.

4) Handling of resource tags is much simpler, and doesn't require
persistent storage as it can simply get assigned up front at
registration time. Just copy them in one-by-one at registration time
and assign to the resource node.

The only real downside is that a request is now explicitly limited to
pinning 2 resources, one file and one buffer, where before just
assigning a resource node to a request would pin all of them. The upside
is that it's easier to follow now, as an individual resource is
explicitly referenced and assigned to the request.

With this in place, the above mentioned example will be using exactly 5
files at the end of the loop, not N.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

+272 -467
+3 -7
include/linux/io_uring_types.h
··· 56 56 }; 57 57 58 58 struct io_file_table { 59 - struct io_fixed_file *files; 59 + struct io_rsrc_node **nodes; 60 60 unsigned long *bitmap; 61 61 unsigned int alloc_hint; 62 62 }; ··· 264 264 * Fixed resources fast path, should be accessed only under 265 265 * uring_lock, and updated through io_uring_register(2) 266 266 */ 267 - struct io_rsrc_node *rsrc_node; 268 267 atomic_t cancel_seq; 269 268 270 269 /* ··· 276 277 struct io_wq_work_list iopoll_list; 277 278 278 279 struct io_file_table file_table; 279 - struct io_mapped_ubuf **user_bufs; 280 + struct io_rsrc_node **user_bufs; 280 281 unsigned nr_user_files; 281 282 unsigned nr_user_bufs; 282 283 ··· 371 372 struct io_rsrc_data *buf_data; 372 373 373 374 /* protected by ->uring_lock */ 374 - struct list_head rsrc_ref_list; 375 375 struct io_alloc_cache rsrc_node_cache; 376 - struct wait_queue_head rsrc_quiesce_wq; 377 - unsigned rsrc_quiesce; 378 376 379 377 u32 pers_next; 380 378 struct xarray personalities; ··· 638 642 __poll_t apoll_events; 639 643 }; 640 644 641 - struct io_rsrc_node *rsrc_node; 645 + struct io_rsrc_node *rsrc_nodes[2]; 642 646 643 647 atomic_t refs; 644 648 bool cancel_seq_set;
+1 -1
io_uring/fdinfo.c
··· 176 176 } 177 177 seq_printf(m, "UserBufs:\t%u\n", ctx->nr_user_bufs); 178 178 for (i = 0; has_lock && i < ctx->nr_user_bufs; i++) { 179 - struct io_mapped_ubuf *buf = ctx->user_bufs[i]; 179 + struct io_mapped_ubuf *buf = ctx->user_bufs[i]->buf; 180 180 181 181 seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, buf->len); 182 182 }
+19 -33
io_uring/filetable.c
··· 38 38 39 39 bool io_alloc_file_tables(struct io_file_table *table, unsigned nr_files) 40 40 { 41 - table->files = kvcalloc(nr_files, sizeof(table->files[0]), 42 - GFP_KERNEL_ACCOUNT); 43 - if (unlikely(!table->files)) 41 + table->nodes = kvmalloc_array(nr_files, sizeof(struct io_src_node *), 42 + GFP_KERNEL_ACCOUNT | __GFP_ZERO); 43 + if (unlikely(!table->nodes)) 44 44 return false; 45 45 46 46 table->bitmap = bitmap_zalloc(nr_files, GFP_KERNEL_ACCOUNT); 47 47 if (unlikely(!table->bitmap)) { 48 - kvfree(table->files); 48 + kvfree(table->nodes); 49 49 return false; 50 50 } 51 51 ··· 54 54 55 55 void io_free_file_tables(struct io_file_table *table) 56 56 { 57 - kvfree(table->files); 57 + kvfree(table->nodes); 58 58 bitmap_free(table->bitmap); 59 - table->files = NULL; 59 + table->nodes = NULL; 60 60 table->bitmap = NULL; 61 61 } 62 62 ··· 64 64 u32 slot_index) 65 65 __must_hold(&req->ctx->uring_lock) 66 66 { 67 - struct io_fixed_file *file_slot; 68 - int ret; 67 + struct io_rsrc_node *node; 69 68 70 69 if (io_is_uring_fops(file)) 71 70 return -EBADF; ··· 73 74 if (slot_index >= ctx->nr_user_files) 74 75 return -EINVAL; 75 76 77 + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); 78 + if (!node) 79 + return -ENOMEM; 80 + 76 81 slot_index = array_index_nospec(slot_index, ctx->nr_user_files); 77 - file_slot = io_fixed_file_slot(&ctx->file_table, slot_index); 78 - 79 - if (file_slot->file_ptr) { 80 - ret = io_queue_rsrc_removal(ctx->file_data, slot_index, 81 - io_slot_file(file_slot)); 82 - if (ret) 83 - return ret; 84 - 85 - file_slot->file_ptr = 0; 86 - } else { 82 + if (ctx->file_table.nodes[slot_index]) 83 + io_put_rsrc_node(ctx->file_table.nodes[slot_index]); 84 + else 87 85 io_file_bitmap_set(&ctx->file_table, slot_index); 88 - } 89 86 90 - *io_get_tag_slot(ctx->file_data, slot_index) = 0; 91 - io_fixed_file_set(file_slot, file); 87 + ctx->file_table.nodes[slot_index] = node; 88 + io_fixed_file_set(node, file); 92 89 return 0; 93 90 } 94 91 ··· 129 134 130 135 int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset) 131 136 { 132 - struct io_fixed_file *file_slot; 133 - int ret; 134 - 135 137 if (unlikely(!ctx->file_data)) 136 138 return -ENXIO; 137 139 if (offset >= ctx->nr_user_files) 138 140 return -EINVAL; 139 141 140 142 offset = array_index_nospec(offset, ctx->nr_user_files); 141 - file_slot = io_fixed_file_slot(&ctx->file_table, offset); 142 - if (!file_slot->file_ptr) 143 + if (!ctx->file_table.nodes[offset]) 143 144 return -EBADF; 144 - 145 - ret = io_queue_rsrc_removal(ctx->file_data, offset, 146 - io_slot_file(file_slot)); 147 - if (ret) 148 - return ret; 149 - 150 - file_slot->file_ptr = 0; 145 + io_put_rsrc_node(ctx->file_table.nodes[offset]); 146 + ctx->file_table.nodes[offset] = NULL; 151 147 io_file_bitmap_clear(&ctx->file_table, offset); 152 148 return 0; 153 149 }
+12 -13
io_uring/filetable.h
··· 34 34 table->alloc_hint = bit + 1; 35 35 } 36 36 37 - static inline struct io_fixed_file * 38 - io_fixed_file_slot(struct io_file_table *table, unsigned i) 39 - { 40 - return &table->files[i]; 41 - } 42 - 43 37 #define FFS_NOWAIT 0x1UL 44 38 #define FFS_ISREG 0x2UL 45 39 #define FFS_MASK ~(FFS_NOWAIT|FFS_ISREG) 46 40 47 - static inline unsigned int io_slot_flags(struct io_fixed_file *slot) 41 + static inline unsigned int io_slot_flags(struct io_rsrc_node *node) 48 42 { 49 - return (slot->file_ptr & ~FFS_MASK) << REQ_F_SUPPORT_NOWAIT_BIT; 43 + 44 + return (node->file_ptr & ~FFS_MASK) << REQ_F_SUPPORT_NOWAIT_BIT; 50 45 } 51 46 52 - static inline struct file *io_slot_file(struct io_fixed_file *slot) 47 + static inline struct file *io_slot_file(struct io_rsrc_node *node) 53 48 { 54 - return (struct file *)(slot->file_ptr & FFS_MASK); 49 + return (struct file *)(node->file_ptr & FFS_MASK); 55 50 } 56 51 57 52 static inline struct file *io_file_from_index(struct io_file_table *table, 58 53 int index) 59 54 { 60 - return io_slot_file(io_fixed_file_slot(table, index)); 55 + struct io_rsrc_node *node = table->nodes[index]; 56 + 57 + if (node) 58 + return io_slot_file(node); 59 + return NULL; 61 60 } 62 61 63 - static inline void io_fixed_file_set(struct io_fixed_file *file_slot, 62 + static inline void io_fixed_file_set(struct io_rsrc_node *node, 64 63 struct file *file) 65 64 { 66 - file_slot->file_ptr = (unsigned long)file | 65 + node->file_ptr = (unsigned long)file | 67 66 (io_file_get_flags(file) >> REQ_F_SUPPORT_NOWAIT_BIT); 68 67 } 69 68
+12 -26
io_uring/io_uring.c
··· 333 333 mutex_init(&ctx->uring_lock); 334 334 init_waitqueue_head(&ctx->cq_wait); 335 335 init_waitqueue_head(&ctx->poll_wq); 336 - init_waitqueue_head(&ctx->rsrc_quiesce_wq); 337 336 spin_lock_init(&ctx->completion_lock); 338 337 spin_lock_init(&ctx->timeout_lock); 339 338 INIT_WQ_LIST(&ctx->iopoll_list); ··· 340 341 INIT_LIST_HEAD(&ctx->defer_list); 341 342 INIT_LIST_HEAD(&ctx->timeout_list); 342 343 INIT_LIST_HEAD(&ctx->ltimeout_list); 343 - INIT_LIST_HEAD(&ctx->rsrc_ref_list); 344 344 init_llist_head(&ctx->work_llist); 345 345 INIT_LIST_HEAD(&ctx->tctx_list); 346 346 ctx->submit_state.free_list.next = NULL; ··· 1413 1415 io_clean_op(req); 1414 1416 } 1415 1417 io_put_file(req); 1416 - io_put_rsrc_node(ctx, req->rsrc_node); 1418 + io_req_put_rsrc_nodes(req); 1417 1419 io_put_task(req->task); 1418 1420 1419 1421 node = req->comp_list.next; ··· 1876 1878 unsigned int issue_flags) 1877 1879 { 1878 1880 struct io_ring_ctx *ctx = req->ctx; 1879 - struct io_fixed_file *slot; 1881 + struct io_rsrc_node *node; 1880 1882 struct file *file = NULL; 1881 1883 1882 1884 io_ring_submit_lock(ctx, issue_flags); ··· 1884 1886 if (unlikely((unsigned int)fd >= ctx->nr_user_files)) 1885 1887 goto out; 1886 1888 fd = array_index_nospec(fd, ctx->nr_user_files); 1887 - slot = io_fixed_file_slot(&ctx->file_table, fd); 1888 - if (!req->rsrc_node) 1889 - __io_req_set_rsrc_node(req, ctx); 1890 - req->flags |= io_slot_flags(slot); 1891 - file = io_slot_file(slot); 1889 + node = ctx->file_table.nodes[fd]; 1890 + if (node) { 1891 + io_req_assign_rsrc_node(req, node); 1892 + req->flags |= io_slot_flags(node); 1893 + file = io_slot_file(node); 1894 + } 1892 1895 out: 1893 1896 io_ring_submit_unlock(ctx, issue_flags); 1894 1897 return file; ··· 2035 2036 req->flags = (__force io_req_flags_t) sqe_flags; 2036 2037 req->cqe.user_data = READ_ONCE(sqe->user_data); 2037 2038 req->file = NULL; 2038 - req->rsrc_node = NULL; 2039 + req->rsrc_nodes[IORING_RSRC_FILE] = NULL; 2040 + req->rsrc_nodes[IORING_RSRC_BUFFER] = NULL; 2039 2041 req->task = current; 2040 2042 req->cancel_seq_set = false; 2041 2043 ··· 2718 2718 static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) 2719 2719 { 2720 2720 io_sq_thread_finish(ctx); 2721 - /* __io_rsrc_put_work() may need uring_lock to progress, wait w/o it */ 2722 - if (WARN_ON_ONCE(!list_empty(&ctx->rsrc_ref_list))) 2723 - return; 2724 2721 2725 2722 mutex_lock(&ctx->uring_lock); 2726 - if (ctx->buf_data) 2727 - __io_sqe_buffers_unregister(ctx); 2728 - if (ctx->file_data) 2729 - __io_sqe_files_unregister(ctx); 2723 + io_sqe_buffers_unregister(ctx); 2724 + io_sqe_files_unregister(ctx); 2730 2725 io_cqring_overflow_kill(ctx); 2731 2726 io_eventfd_unregister(ctx); 2732 2727 io_alloc_cache_free(&ctx->apoll_cache, kfree); ··· 2738 2743 if (ctx->submitter_task) 2739 2744 put_task_struct(ctx->submitter_task); 2740 2745 2741 - /* there are no registered resources left, nobody uses it */ 2742 - if (ctx->rsrc_node) 2743 - io_rsrc_node_destroy(ctx, ctx->rsrc_node); 2744 - 2745 - WARN_ON_ONCE(!list_empty(&ctx->rsrc_ref_list)); 2746 2746 WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); 2747 2747 2748 2748 io_alloc_cache_free(&ctx->rsrc_node_cache, kfree); ··· 3716 3726 p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; 3717 3727 3718 3728 ret = io_sq_offload_create(ctx, p); 3719 - if (ret) 3720 - goto err; 3721 - 3722 - ret = io_rsrc_init(ctx); 3723 3729 if (ret) 3724 3730 goto err; 3725 3731
+6 -5
io_uring/net.c
··· 1342 1342 1343 1343 if (sr->flags & IORING_RECVSEND_FIXED_BUF) { 1344 1344 struct io_ring_ctx *ctx = req->ctx; 1345 - struct io_mapped_ubuf *imu; 1345 + struct io_rsrc_node *node; 1346 1346 int idx; 1347 1347 1348 1348 ret = -EFAULT; 1349 1349 io_ring_submit_lock(ctx, issue_flags); 1350 1350 if (sr->buf_index < ctx->nr_user_bufs) { 1351 1351 idx = array_index_nospec(sr->buf_index, ctx->nr_user_bufs); 1352 - imu = READ_ONCE(ctx->user_bufs[idx]); 1353 - io_req_set_rsrc_node(sr->notif, ctx); 1352 + node = ctx->user_bufs[idx]; 1353 + io_req_assign_rsrc_node(sr->notif, node); 1354 1354 ret = 0; 1355 1355 } 1356 1356 io_ring_submit_unlock(ctx, issue_flags); ··· 1358 1358 if (unlikely(ret)) 1359 1359 return ret; 1360 1360 1361 - ret = io_import_fixed(ITER_SOURCE, &kmsg->msg.msg_iter, imu, 1362 - (u64)(uintptr_t)sr->buf, sr->len); 1361 + ret = io_import_fixed(ITER_SOURCE, &kmsg->msg.msg_iter, 1362 + node->buf, (u64)(uintptr_t)sr->buf, 1363 + sr->len); 1363 1364 if (unlikely(ret)) 1364 1365 return ret; 1365 1366 kmsg->msg.sg_from_iter = io_sg_from_iter;
+3 -3
io_uring/nop.c
··· 61 61 } 62 62 if (nop->flags & IORING_NOP_FIXED_BUFFER) { 63 63 struct io_ring_ctx *ctx = req->ctx; 64 - struct io_mapped_ubuf *imu; 64 + struct io_rsrc_node *node; 65 65 int idx; 66 66 67 67 ret = -EFAULT; 68 68 io_ring_submit_lock(ctx, issue_flags); 69 69 if (nop->buffer < ctx->nr_user_bufs) { 70 70 idx = array_index_nospec(nop->buffer, ctx->nr_user_bufs); 71 - imu = READ_ONCE(ctx->user_bufs[idx]); 72 - io_req_set_rsrc_node(req, ctx); 71 + node = READ_ONCE(ctx->user_bufs[idx]); 72 + io_req_assign_rsrc_node(req, node); 73 73 ret = 0; 74 74 } 75 75 io_ring_submit_unlock(ctx, issue_flags);
+2 -1
io_uring/notif.c
··· 117 117 notif->file = NULL; 118 118 notif->task = current; 119 119 io_get_task_refs(1); 120 - notif->rsrc_node = NULL; 120 + notif->rsrc_nodes[IORING_RSRC_FILE] = NULL; 121 + notif->rsrc_nodes[IORING_RSRC_BUFFER] = NULL; 121 122 122 123 nd = io_notif_to_data(notif); 123 124 nd->zc_report = false;
+180 -305
io_uring/rsrc.c
··· 26 26 u32 offset; 27 27 }; 28 28 29 - static void io_rsrc_buf_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc); 30 - static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, 31 - struct io_mapped_ubuf **pimu, 32 - struct page **last_hpage); 29 + static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, 30 + struct iovec *iov, struct page **last_hpage); 33 31 34 32 /* only define max */ 35 33 #define IORING_MAX_FIXED_FILES (1U << 20) ··· 108 110 return 0; 109 111 } 110 112 111 - static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_mapped_ubuf **slot) 113 + static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) 112 114 { 113 - struct io_mapped_ubuf *imu = *slot; 114 115 unsigned int i; 115 116 116 - *slot = NULL; 117 - if (imu != &dummy_ubuf) { 117 + if (node->buf != &dummy_ubuf) { 118 + struct io_mapped_ubuf *imu = node->buf; 119 + 118 120 if (!refcount_dec_and_test(&imu->refs)) 119 121 return; 120 122 for (i = 0; i < imu->nr_bvecs; i++) ··· 125 127 } 126 128 } 127 129 128 - static void io_rsrc_put_work(struct io_rsrc_node *node) 130 + struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type) 129 131 { 130 - struct io_rsrc_put *prsrc = &node->item; 132 + struct io_rsrc_node *node; 131 133 132 - if (prsrc->tag) 133 - io_post_aux_cqe(node->ctx, prsrc->tag, 0, 0); 134 - 135 - switch (node->type) { 136 - case IORING_RSRC_FILE: 137 - fput(prsrc->file); 138 - break; 139 - case IORING_RSRC_BUFFER: 140 - io_rsrc_buf_put(node->ctx, prsrc); 141 - break; 142 - default: 143 - WARN_ON_ONCE(1); 144 - break; 145 - } 146 - } 147 - 148 - void io_rsrc_node_destroy(struct io_ring_ctx *ctx, struct io_rsrc_node *node) 149 - { 150 - if (!io_alloc_cache_put(&ctx->rsrc_node_cache, node)) 151 - kfree(node); 152 - } 153 - 154 - void io_rsrc_node_ref_zero(struct io_rsrc_node *node) 155 - __must_hold(&node->ctx->uring_lock) 156 - { 157 - struct io_ring_ctx *ctx = node->ctx; 158 - 159 - while (!list_empty(&ctx->rsrc_ref_list)) { 160 - node = list_first_entry(&ctx->rsrc_ref_list, 161 - struct io_rsrc_node, node); 162 - /* recycle ref nodes in order */ 163 - if (node->refs) 164 - break; 165 - list_del(&node->node); 166 - 167 - if (likely(!node->empty)) 168 - io_rsrc_put_work(node); 169 - io_rsrc_node_destroy(ctx, node); 170 - } 171 - if (list_empty(&ctx->rsrc_ref_list) && unlikely(ctx->rsrc_quiesce)) 172 - wake_up_all(&ctx->rsrc_quiesce_wq); 173 - } 174 - 175 - struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx) 176 - { 177 - struct io_rsrc_node *ref_node; 178 - 179 - ref_node = io_alloc_cache_get(&ctx->rsrc_node_cache); 180 - if (!ref_node) { 181 - ref_node = kzalloc(sizeof(*ref_node), GFP_KERNEL); 182 - if (!ref_node) 134 + node = io_alloc_cache_get(&ctx->rsrc_node_cache); 135 + if (!node) { 136 + node = kzalloc(sizeof(*node), GFP_KERNEL); 137 + if (!node) 183 138 return NULL; 184 139 } 185 140 186 - ref_node->ctx = ctx; 187 - ref_node->empty = 0; 188 - ref_node->refs = 1; 189 - return ref_node; 190 - } 191 - 192 - __cold static int io_rsrc_ref_quiesce(struct io_rsrc_data *data, 193 - struct io_ring_ctx *ctx) 194 - { 195 - struct io_rsrc_node *backup; 196 - DEFINE_WAIT(we); 197 - int ret; 198 - 199 - /* As We may drop ->uring_lock, other task may have started quiesce */ 200 - if (data->quiesce) 201 - return -ENXIO; 202 - 203 - backup = io_rsrc_node_alloc(ctx); 204 - if (!backup) 205 - return -ENOMEM; 206 - ctx->rsrc_node->empty = true; 207 - ctx->rsrc_node->type = -1; 208 - list_add_tail(&ctx->rsrc_node->node, &ctx->rsrc_ref_list); 209 - io_put_rsrc_node(ctx, ctx->rsrc_node); 210 - ctx->rsrc_node = backup; 211 - 212 - if (list_empty(&ctx->rsrc_ref_list)) 213 - return 0; 214 - 215 - if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) { 216 - atomic_set(&ctx->cq_wait_nr, 1); 217 - smp_mb(); 218 - } 219 - 220 - ctx->rsrc_quiesce++; 221 - data->quiesce = true; 222 - do { 223 - prepare_to_wait(&ctx->rsrc_quiesce_wq, &we, TASK_INTERRUPTIBLE); 224 - mutex_unlock(&ctx->uring_lock); 225 - 226 - ret = io_run_task_work_sig(ctx); 227 - if (ret < 0) { 228 - finish_wait(&ctx->rsrc_quiesce_wq, &we); 229 - mutex_lock(&ctx->uring_lock); 230 - if (list_empty(&ctx->rsrc_ref_list)) 231 - ret = 0; 232 - break; 233 - } 234 - 235 - schedule(); 236 - mutex_lock(&ctx->uring_lock); 237 - ret = 0; 238 - } while (!list_empty(&ctx->rsrc_ref_list)); 239 - 240 - finish_wait(&ctx->rsrc_quiesce_wq, &we); 241 - data->quiesce = false; 242 - ctx->rsrc_quiesce--; 243 - 244 - if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) { 245 - atomic_set(&ctx->cq_wait_nr, 0); 246 - smp_mb(); 247 - } 248 - return ret; 249 - } 250 - 251 - static void io_free_page_table(void **table, size_t size) 252 - { 253 - unsigned i, nr_tables = DIV_ROUND_UP(size, PAGE_SIZE); 254 - 255 - for (i = 0; i < nr_tables; i++) 256 - kfree(table[i]); 257 - kfree(table); 141 + node->ctx = ctx; 142 + node->refs = 1; 143 + node->type = type; 144 + return node; 258 145 } 259 146 260 147 static void io_rsrc_data_free(struct io_rsrc_data *data) 261 148 { 262 - size_t size = data->nr * sizeof(data->tags[0][0]); 149 + int i; 263 150 264 - if (data->tags) 265 - io_free_page_table((void **)data->tags, size); 151 + for (i = 0; i < data->nr; i++) { 152 + struct io_rsrc_node *node = data->nodes[i]; 153 + 154 + io_put_rsrc_node(node); 155 + } 156 + kvfree(data->nodes); 266 157 kfree(data); 267 158 } 268 159 269 - static __cold void **io_alloc_page_table(size_t size) 270 - { 271 - unsigned i, nr_tables = DIV_ROUND_UP(size, PAGE_SIZE); 272 - size_t init_size = size; 273 - void **table; 274 - 275 - table = kcalloc(nr_tables, sizeof(*table), GFP_KERNEL_ACCOUNT); 276 - if (!table) 277 - return NULL; 278 - 279 - for (i = 0; i < nr_tables; i++) { 280 - unsigned int this_size = min_t(size_t, size, PAGE_SIZE); 281 - 282 - table[i] = kzalloc(this_size, GFP_KERNEL_ACCOUNT); 283 - if (!table[i]) { 284 - io_free_page_table(table, init_size); 285 - return NULL; 286 - } 287 - size -= this_size; 288 - } 289 - return table; 290 - } 291 - 292 - __cold static int io_rsrc_data_alloc(struct io_ring_ctx *ctx, int type, 293 - u64 __user *utags, 294 - unsigned nr, struct io_rsrc_data **pdata) 160 + __cold static int io_rsrc_data_alloc(struct io_ring_ctx *ctx, unsigned nr, 161 + struct io_rsrc_data **pdata) 295 162 { 296 163 struct io_rsrc_data *data; 297 - int ret = 0; 298 - unsigned i; 299 164 300 165 data = kzalloc(sizeof(*data), GFP_KERNEL); 301 166 if (!data) 302 167 return -ENOMEM; 303 - data->tags = (u64 **)io_alloc_page_table(nr * sizeof(data->tags[0][0])); 304 - if (!data->tags) { 168 + 169 + data->nodes = kvmalloc_array(nr, sizeof(struct io_rsrc_node *), 170 + GFP_KERNEL | __GFP_ZERO); 171 + if (!data->nodes) { 305 172 kfree(data); 306 173 return -ENOMEM; 307 174 } 308 175 309 176 data->nr = nr; 310 - data->ctx = ctx; 311 - data->rsrc_type = type; 312 - if (utags) { 313 - ret = -EFAULT; 314 - for (i = 0; i < nr; i++) { 315 - u64 *tag_slot = io_get_tag_slot(data, i); 316 - 317 - if (copy_from_user(tag_slot, &utags[i], 318 - sizeof(*tag_slot))) 319 - goto fail; 320 - } 321 - } 322 177 *pdata = data; 323 178 return 0; 324 - fail: 325 - io_rsrc_data_free(data); 326 - return ret; 327 179 } 328 180 329 181 static int __io_sqe_files_update(struct io_ring_ctx *ctx, ··· 182 334 { 183 335 u64 __user *tags = u64_to_user_ptr(up->tags); 184 336 __s32 __user *fds = u64_to_user_ptr(up->data); 185 - struct io_rsrc_data *data = ctx->file_data; 186 - struct io_fixed_file *file_slot; 187 337 int fd, i, err = 0; 188 338 unsigned int done; 189 339 ··· 206 360 continue; 207 361 208 362 i = array_index_nospec(up->offset + done, ctx->nr_user_files); 209 - file_slot = io_fixed_file_slot(&ctx->file_table, i); 210 - 211 - if (file_slot->file_ptr) { 212 - err = io_queue_rsrc_removal(data, i, 213 - io_slot_file(file_slot)); 214 - if (err) 215 - break; 216 - file_slot->file_ptr = 0; 363 + if (ctx->file_table.nodes[i]) { 364 + io_put_rsrc_node(ctx->file_table.nodes[i]); 365 + ctx->file_table.nodes[i] = NULL; 217 366 io_file_bitmap_clear(&ctx->file_table, i); 218 367 } 219 368 if (fd != -1) { 220 369 struct file *file = fget(fd); 370 + struct io_rsrc_node *node; 221 371 222 372 if (!file) { 223 373 err = -EBADF; ··· 227 385 err = -EBADF; 228 386 break; 229 387 } 230 - *io_get_tag_slot(data, i) = tag; 231 - io_fixed_file_set(file_slot, file); 388 + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); 389 + if (!node) { 390 + err = -ENOMEM; 391 + fput(file); 392 + break; 393 + } 394 + ctx->file_table.nodes[i] = node; 395 + node->tag = tag; 396 + io_fixed_file_set(node, file); 232 397 io_file_bitmap_set(&ctx->file_table, i); 233 398 } 234 399 } ··· 260 411 return -EINVAL; 261 412 262 413 for (done = 0; done < nr_args; done++) { 263 - struct io_mapped_ubuf *imu; 414 + struct io_rsrc_node *node; 264 415 u64 tag = 0; 265 416 266 417 uvec = u64_to_user_ptr(user_data); ··· 280 431 err = -EINVAL; 281 432 break; 282 433 } 283 - err = io_sqe_buffer_register(ctx, iov, &imu, &last_hpage); 284 - if (err) 285 - break; 286 - 287 434 i = array_index_nospec(up->offset + done, ctx->nr_user_bufs); 288 - if (ctx->user_bufs[i] != &dummy_ubuf) { 289 - err = io_queue_rsrc_removal(ctx->buf_data, i, 290 - ctx->user_bufs[i]); 291 - if (unlikely(err)) { 292 - io_buffer_unmap(ctx, &imu); 293 - break; 294 - } 295 - ctx->user_bufs[i] = (struct io_mapped_ubuf *)&dummy_ubuf; 435 + node = io_sqe_buffer_register(ctx, iov, &last_hpage); 436 + if (IS_ERR(node)) { 437 + err = PTR_ERR(node); 438 + break; 296 439 } 440 + io_put_rsrc_node(ctx->user_bufs[i]); 297 441 298 - ctx->user_bufs[i] = imu; 299 - *io_get_tag_slot(ctx->buf_data, i) = tag; 442 + ctx->user_bufs[i] = node; 443 + node->tag = tag; 300 444 if (ctx->compat) 301 445 user_data += sizeof(struct compat_iovec); 302 446 else ··· 464 622 return IOU_OK; 465 623 } 466 624 467 - int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx, void *rsrc) 625 + void io_free_rsrc_node(struct io_rsrc_node *node) 468 626 { 469 - struct io_ring_ctx *ctx = data->ctx; 470 - struct io_rsrc_node *node = ctx->rsrc_node; 471 - u64 *tag_slot = io_get_tag_slot(data, idx); 627 + struct io_ring_ctx *ctx = node->ctx; 472 628 473 - ctx->rsrc_node = io_rsrc_node_alloc(ctx); 474 - if (unlikely(!ctx->rsrc_node)) { 475 - ctx->rsrc_node = node; 476 - return -ENOMEM; 629 + lockdep_assert_held(&ctx->uring_lock); 630 + 631 + if (node->tag) 632 + io_post_aux_cqe(node->ctx, node->tag, 0, 0); 633 + 634 + switch (node->type) { 635 + case IORING_RSRC_FILE: 636 + if (io_slot_file(node)) 637 + fput(io_slot_file(node)); 638 + break; 639 + case IORING_RSRC_BUFFER: 640 + if (node->buf) 641 + io_buffer_unmap(node->ctx, node); 642 + break; 643 + default: 644 + WARN_ON_ONCE(1); 645 + break; 477 646 } 478 647 479 - node->item.rsrc = rsrc; 480 - node->type = data->rsrc_type; 481 - node->item.tag = *tag_slot; 482 - *tag_slot = 0; 483 - list_add_tail(&node->node, &ctx->rsrc_ref_list); 484 - io_put_rsrc_node(ctx, node); 485 - return 0; 648 + if (!io_alloc_cache_put(&ctx->rsrc_node_cache, node)) 649 + kfree(node); 486 650 } 487 651 488 - void __io_sqe_files_unregister(struct io_ring_ctx *ctx) 652 + static void __io_sqe_files_unregister(struct io_ring_ctx *ctx) 489 653 { 490 654 int i; 491 655 492 - for (i = 0; i < ctx->nr_user_files; i++) { 493 - struct file *file = io_file_from_index(&ctx->file_table, i); 656 + lockdep_assert_held(&ctx->uring_lock); 494 657 495 - if (!file) 496 - continue; 497 - io_file_bitmap_clear(&ctx->file_table, i); 498 - fput(file); 658 + for (i = 0; i < ctx->nr_user_files; i++) { 659 + struct io_rsrc_node *node = ctx->file_table.nodes[i]; 660 + 661 + if (node) { 662 + io_put_rsrc_node(node); 663 + io_file_bitmap_clear(&ctx->file_table, i); 664 + ctx->file_table.nodes[i] = NULL; 665 + } 499 666 } 500 667 501 668 io_free_file_tables(&ctx->file_table); ··· 516 665 517 666 int io_sqe_files_unregister(struct io_ring_ctx *ctx) 518 667 { 519 - unsigned nr = ctx->nr_user_files; 520 - int ret; 521 - 522 668 if (!ctx->file_data) 523 669 return -ENXIO; 524 670 525 - /* 526 - * Quiesce may unlock ->uring_lock, and while it's not held 527 - * prevent new requests using the table. 528 - */ 529 - ctx->nr_user_files = 0; 530 - ret = io_rsrc_ref_quiesce(ctx->file_data, ctx); 531 - ctx->nr_user_files = nr; 532 - if (!ret) 533 - __io_sqe_files_unregister(ctx); 534 - return ret; 671 + __io_sqe_files_unregister(ctx); 672 + return 0; 535 673 } 536 674 537 675 int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, ··· 539 699 return -EMFILE; 540 700 if (nr_args > rlimit(RLIMIT_NOFILE)) 541 701 return -EMFILE; 542 - ret = io_rsrc_data_alloc(ctx, IORING_RSRC_FILE, tags, nr_args, 543 - &ctx->file_data); 702 + ret = io_rsrc_data_alloc(ctx, nr_args, &ctx->file_data); 544 703 if (ret) 545 704 return ret; 546 705 ··· 550 711 } 551 712 552 713 for (i = 0; i < nr_args; i++, ctx->nr_user_files++) { 553 - struct io_fixed_file *file_slot; 714 + struct io_rsrc_node *node; 715 + u64 tag = 0; 554 716 555 - if (fds && copy_from_user(&fd, &fds[i], sizeof(fd))) { 556 - ret = -EFAULT; 717 + ret = -EFAULT; 718 + if (tags && copy_from_user(&tag, &tags[i], sizeof(tag))) 557 719 goto fail; 558 - } 720 + if (fds && copy_from_user(&fd, &fds[i], sizeof(fd))) 721 + goto fail; 559 722 /* allow sparse sets */ 560 723 if (!fds || fd == -1) { 561 724 ret = -EINVAL; 562 - if (unlikely(*io_get_tag_slot(ctx->file_data, i))) 725 + if (tag) 563 726 goto fail; 564 727 continue; 565 728 } ··· 578 737 fput(file); 579 738 goto fail; 580 739 } 581 - file_slot = io_fixed_file_slot(&ctx->file_table, i); 582 - io_fixed_file_set(file_slot, file); 740 + ret = -ENOMEM; 741 + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); 742 + if (!node) { 743 + fput(file); 744 + goto fail; 745 + } 746 + if (tag) 747 + node->tag = tag; 748 + ctx->file_table.nodes[i] = node; 749 + io_fixed_file_set(node, file); 583 750 io_file_bitmap_set(&ctx->file_table, i); 584 751 } 585 752 ··· 599 750 return ret; 600 751 } 601 752 602 - static void io_rsrc_buf_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc) 603 - { 604 - io_buffer_unmap(ctx, &prsrc->buf); 605 - prsrc->buf = NULL; 606 - } 607 - 608 - void __io_sqe_buffers_unregister(struct io_ring_ctx *ctx) 753 + static void __io_sqe_buffers_unregister(struct io_ring_ctx *ctx) 609 754 { 610 755 unsigned int i; 611 756 612 - for (i = 0; i < ctx->nr_user_bufs; i++) 613 - io_buffer_unmap(ctx, &ctx->user_bufs[i]); 614 - kfree(ctx->user_bufs); 615 - io_rsrc_data_free(ctx->buf_data); 757 + lockdep_assert_held(&ctx->uring_lock); 758 + 759 + for (i = 0; i < ctx->nr_user_bufs; i++) { 760 + io_put_rsrc_node(ctx->user_bufs[i]); 761 + ctx->user_bufs[i] = NULL; 762 + } 763 + kvfree(ctx->user_bufs); 616 764 ctx->user_bufs = NULL; 765 + io_rsrc_data_free(ctx->buf_data); 617 766 ctx->buf_data = NULL; 618 767 ctx->nr_user_bufs = 0; 619 768 } 620 769 621 770 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx) 622 771 { 623 - unsigned nr = ctx->nr_user_bufs; 624 - int ret; 625 - 626 772 if (!ctx->buf_data) 627 773 return -ENXIO; 628 774 629 - /* 630 - * Quiesce may unlock ->uring_lock, and while it's not held 631 - * prevent new requests using the table. 632 - */ 633 - ctx->nr_user_bufs = 0; 634 - ret = io_rsrc_ref_quiesce(ctx->buf_data, ctx); 635 - ctx->nr_user_bufs = nr; 636 - if (!ret) 637 - __io_sqe_buffers_unregister(ctx); 638 - return ret; 775 + __io_sqe_buffers_unregister(ctx); 776 + return 0; 639 777 } 640 778 641 779 /* ··· 649 813 650 814 /* check previously registered pages */ 651 815 for (i = 0; i < ctx->nr_user_bufs; i++) { 652 - struct io_mapped_ubuf *imu = ctx->user_bufs[i]; 816 + struct io_rsrc_node *node = ctx->user_bufs[i]; 817 + struct io_mapped_ubuf *imu = node->buf; 653 818 654 819 for (j = 0; j < imu->nr_bvecs; j++) { 655 820 if (!PageCompound(imu->bvec[j].bv_page)) ··· 787 950 return io_do_coalesce_buffer(pages, nr_pages, data, nr_folios); 788 951 } 789 952 790 - static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, 791 - struct io_mapped_ubuf **pimu, 792 - struct page **last_hpage) 953 + static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, 954 + struct iovec *iov, 955 + struct page **last_hpage) 793 956 { 794 957 struct io_mapped_ubuf *imu = NULL; 795 958 struct page **pages = NULL; 959 + struct io_rsrc_node *node; 796 960 unsigned long off; 797 961 size_t size; 798 962 int ret, nr_pages, i; 799 963 struct io_imu_folio_data data; 800 964 bool coalesced; 801 965 802 - *pimu = (struct io_mapped_ubuf *)&dummy_ubuf; 803 - if (!iov->iov_base) 804 - return 0; 966 + node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); 967 + if (!node) 968 + return ERR_PTR(-ENOMEM); 969 + node->buf = NULL; 970 + 971 + if (!iov->iov_base) { 972 + node->buf = (struct io_mapped_ubuf *) &dummy_ubuf; 973 + return node; 974 + } 805 975 806 976 ret = -ENOMEM; 807 977 pages = io_pin_pages((unsigned long) iov->iov_base, iov->iov_len, ··· 842 998 imu->folio_shift = data.folio_shift; 843 999 refcount_set(&imu->refs, 1); 844 1000 off = (unsigned long) iov->iov_base & ((1UL << imu->folio_shift) - 1); 845 - *pimu = imu; 1001 + node->buf = imu; 846 1002 ret = 0; 847 1003 848 1004 for (i = 0; i < nr_pages; i++) { ··· 854 1010 size -= vec_len; 855 1011 } 856 1012 done: 857 - if (ret) 1013 + if (ret) { 858 1014 kvfree(imu); 1015 + if (node) 1016 + io_put_rsrc_node(node); 1017 + node = ERR_PTR(ret); 1018 + } 859 1019 kvfree(pages); 860 - return ret; 1020 + return node; 861 1021 } 862 1022 863 1023 static int io_buffers_map_alloc(struct io_ring_ctx *ctx, unsigned int nr_args) ··· 885 1037 return -EBUSY; 886 1038 if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS) 887 1039 return -EINVAL; 888 - ret = io_rsrc_data_alloc(ctx, IORING_RSRC_BUFFER, tags, nr_args, &data); 1040 + ret = io_rsrc_data_alloc(ctx, nr_args, &data); 889 1041 if (ret) 890 1042 return ret; 891 1043 ret = io_buffers_map_alloc(ctx, nr_args); ··· 898 1050 memset(iov, 0, sizeof(*iov)); 899 1051 900 1052 for (i = 0; i < nr_args; i++, ctx->nr_user_bufs++) { 1053 + struct io_rsrc_node *node; 1054 + u64 tag = 0; 1055 + 901 1056 if (arg) { 902 1057 uvec = (struct iovec __user *) arg; 903 1058 iov = iovec_from_user(uvec, 1, 1, &fast_iov, ctx->compat); ··· 917 1066 arg += sizeof(struct iovec); 918 1067 } 919 1068 920 - if (!iov->iov_base && *io_get_tag_slot(data, i)) { 921 - ret = -EINVAL; 922 - break; 1069 + if (tags) { 1070 + if (copy_from_user(&tag, &tags[i], sizeof(tag))) { 1071 + ret = -EFAULT; 1072 + break; 1073 + } 1074 + if (tag && !iov->iov_base) { 1075 + ret = -EINVAL; 1076 + break; 1077 + } 923 1078 } 924 1079 925 - ret = io_sqe_buffer_register(ctx, iov, &ctx->user_bufs[i], 926 - &last_hpage); 927 - if (ret) 1080 + node = io_sqe_buffer_register(ctx, iov, &last_hpage); 1081 + if (IS_ERR(node)) { 1082 + ret = PTR_ERR(node); 928 1083 break; 1084 + } 1085 + node->tag = tag; 1086 + ctx->user_bufs[i] = node; 929 1087 } 930 1088 931 1089 WARN_ON_ONCE(ctx->buf_data); ··· 1008 1148 1009 1149 static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx) 1010 1150 { 1011 - struct io_mapped_ubuf **user_bufs; 1151 + struct io_rsrc_node **user_bufs; 1012 1152 struct io_rsrc_data *data; 1013 1153 int i, ret, nbufs; 1014 1154 ··· 1023 1163 nbufs = src_ctx->nr_user_bufs; 1024 1164 if (!nbufs) 1025 1165 goto out_unlock; 1026 - ret = io_rsrc_data_alloc(ctx, IORING_RSRC_BUFFER, NULL, nbufs, &data); 1166 + ret = io_rsrc_data_alloc(ctx, nbufs, &data); 1027 1167 if (ret) 1028 1168 goto out_unlock; 1029 1169 1030 1170 ret = -ENOMEM; 1031 - user_bufs = kcalloc(nbufs, sizeof(*ctx->user_bufs), GFP_KERNEL); 1171 + user_bufs = kvmalloc_array(nbufs, sizeof(struct io_rsrc_node *), 1172 + GFP_KERNEL | __GFP_ZERO); 1032 1173 if (!user_bufs) 1033 1174 goto out_free_data; 1034 1175 1035 1176 for (i = 0; i < nbufs; i++) { 1036 - struct io_mapped_ubuf *src = src_ctx->user_bufs[i]; 1177 + struct io_mapped_ubuf *imu = src_ctx->user_bufs[i]->buf; 1178 + struct io_rsrc_node *dst_node; 1037 1179 1038 - if (src != &dummy_ubuf) 1039 - refcount_inc(&src->refs); 1040 - user_bufs[i] = src; 1180 + dst_node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); 1181 + if (!dst_node) 1182 + goto out_put_free; 1183 + 1184 + if (imu == &dummy_ubuf) { 1185 + dst_node->buf = (struct io_mapped_ubuf *) &dummy_ubuf; 1186 + } else { 1187 + refcount_inc(&imu->refs); 1188 + dst_node->buf = imu; 1189 + } 1190 + user_bufs[i] = dst_node; 1041 1191 } 1042 1192 1043 1193 /* Have a ref on the bufs now, drop src lock and re-grab our own lock */ ··· 1060 1190 return 0; 1061 1191 } 1062 1192 1193 + mutex_unlock(&ctx->uring_lock); 1194 + mutex_lock(&src_ctx->uring_lock); 1063 1195 /* someone raced setting up buffers, dump ours */ 1064 - for (i = 0; i < nbufs; i++) 1065 - io_buffer_unmap(ctx, &user_bufs[i]); 1066 - io_rsrc_data_free(data); 1067 - kfree(user_bufs); 1068 - return -EBUSY; 1196 + ret = -EBUSY; 1197 + i = nbufs; 1198 + out_put_free: 1199 + while (i--) { 1200 + io_buffer_unmap(src_ctx, user_bufs[i]); 1201 + kfree(user_bufs[i]); 1202 + } 1203 + kvfree(user_bufs); 1069 1204 out_free_data: 1070 1205 io_rsrc_data_free(data); 1071 1206 out_unlock:
+17 -54
io_uring/rsrc.h
··· 13 13 IORING_RSRC_BUFFER = 1, 14 14 }; 15 15 16 - struct io_rsrc_put { 17 - u64 tag; 18 - union { 19 - void *rsrc; 20 - struct file *file; 21 - struct io_mapped_ubuf *buf; 22 - }; 23 - }; 24 - 25 16 struct io_rsrc_data { 26 - struct io_ring_ctx *ctx; 27 - 28 - u64 **tags; 29 17 unsigned int nr; 30 - u16 rsrc_type; 31 - bool quiesce; 18 + struct io_rsrc_node **nodes; 32 19 }; 33 20 34 21 struct io_rsrc_node { 35 22 struct io_ring_ctx *ctx; 36 23 int refs; 37 - bool empty; 38 24 u16 type; 39 - struct list_head node; 40 - struct io_rsrc_put item; 41 - }; 42 25 43 - struct io_fixed_file { 44 - /* file * with additional FFS_* flags */ 45 - unsigned long file_ptr; 26 + u64 tag; 27 + union { 28 + unsigned long file_ptr; 29 + struct io_mapped_ubuf *buf; 30 + }; 46 31 }; 47 32 48 33 struct io_mapped_ubuf { ··· 48 63 unsigned int folio_shift; 49 64 }; 50 65 51 - void io_rsrc_node_ref_zero(struct io_rsrc_node *node); 52 - void io_rsrc_node_destroy(struct io_ring_ctx *ctx, struct io_rsrc_node *ref_node); 53 - struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx); 54 - int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx, void *rsrc); 66 + struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type); 67 + void io_free_rsrc_node(struct io_rsrc_node *node); 55 68 56 69 int io_import_fixed(int ddir, struct iov_iter *iter, 57 70 struct io_mapped_ubuf *imu, 58 71 u64 buf_addr, size_t len); 59 72 60 73 int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg); 61 - void __io_sqe_buffers_unregister(struct io_ring_ctx *ctx); 62 74 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx); 63 75 int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, 64 76 unsigned int nr_args, u64 __user *tags); 65 - void __io_sqe_files_unregister(struct io_ring_ctx *ctx); 66 77 int io_sqe_files_unregister(struct io_ring_ctx *ctx); 67 78 int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, 68 79 unsigned nr_args, u64 __user *tags); ··· 70 89 int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, 71 90 unsigned int size, unsigned int type); 72 91 73 - static inline void io_put_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node) 92 + static inline void io_put_rsrc_node(struct io_rsrc_node *node) 74 93 { 75 - lockdep_assert_held(&ctx->uring_lock); 76 - 77 94 if (node && !--node->refs) 78 - io_rsrc_node_ref_zero(node); 95 + io_free_rsrc_node(node); 79 96 } 80 97 81 - static inline void __io_req_set_rsrc_node(struct io_kiocb *req, 82 - struct io_ring_ctx *ctx) 98 + static inline void io_req_put_rsrc_nodes(struct io_kiocb *req) 83 99 { 84 - lockdep_assert_held(&ctx->uring_lock); 85 - req->rsrc_node = ctx->rsrc_node; 86 - ctx->rsrc_node->refs++; 100 + io_put_rsrc_node(req->rsrc_nodes[IORING_RSRC_FILE]); 101 + io_put_rsrc_node(req->rsrc_nodes[IORING_RSRC_BUFFER]); 87 102 } 88 103 89 - static inline void io_req_set_rsrc_node(struct io_kiocb *req, 90 - struct io_ring_ctx *ctx) 104 + static inline void io_req_assign_rsrc_node(struct io_kiocb *req, 105 + struct io_rsrc_node *node) 91 106 { 92 - if (!req->rsrc_node) 93 - __io_req_set_rsrc_node(req, ctx); 94 - } 95 - 96 - static inline u64 *io_get_tag_slot(struct io_rsrc_data *data, unsigned int idx) 97 - { 98 - unsigned int off = idx & IO_RSRC_TAG_TABLE_MASK; 99 - unsigned int table_idx = idx >> IO_RSRC_TAG_TABLE_SHIFT; 100 - 101 - return &data->tags[table_idx][off]; 102 - } 103 - 104 - static inline int io_rsrc_init(struct io_ring_ctx *ctx) 105 - { 106 - ctx->rsrc_node = io_rsrc_node_alloc(ctx); 107 - return ctx->rsrc_node ? 0 : -ENOMEM; 107 + node->refs++; 108 + req->rsrc_nodes[node->type] = node; 108 109 } 109 110 110 111 int io_files_update(struct io_kiocb *req, unsigned int issue_flags);
+4 -4
io_uring/rw.c
··· 330 330 { 331 331 struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); 332 332 struct io_ring_ctx *ctx = req->ctx; 333 - struct io_mapped_ubuf *imu; 333 + struct io_rsrc_node *node; 334 334 struct io_async_rw *io; 335 335 u16 index; 336 336 int ret; ··· 342 342 if (unlikely(req->buf_index >= ctx->nr_user_bufs)) 343 343 return -EFAULT; 344 344 index = array_index_nospec(req->buf_index, ctx->nr_user_bufs); 345 - imu = ctx->user_bufs[index]; 346 - io_req_set_rsrc_node(req, ctx); 345 + node = ctx->user_bufs[index]; 346 + io_req_assign_rsrc_node(req, node); 347 347 348 348 io = req->async_data; 349 - ret = io_import_fixed(ddir, &io->iter, imu, rw->addr, rw->len); 349 + ret = io_import_fixed(ddir, &io->iter, node->buf, rw->addr, rw->len); 350 350 iov_iter_save_state(&io->iter, &io->iter_state); 351 351 return ret; 352 352 }
+9 -7
io_uring/splice.c
··· 51 51 { 52 52 struct io_splice *sp = io_kiocb_to_cmd(req, struct io_splice); 53 53 54 - io_put_rsrc_node(req->ctx, sp->rsrc_node); 54 + io_put_rsrc_node(sp->rsrc_node); 55 55 } 56 56 57 57 static struct file *io_splice_get_file(struct io_kiocb *req, ··· 59 59 { 60 60 struct io_splice *sp = io_kiocb_to_cmd(req, struct io_splice); 61 61 struct io_ring_ctx *ctx = req->ctx; 62 - struct io_fixed_file *slot; 62 + struct io_rsrc_node *node; 63 63 struct file *file = NULL; 64 64 65 65 if (!(sp->flags & SPLICE_F_FD_IN_FIXED)) ··· 69 69 if (unlikely(sp->splice_fd_in >= ctx->nr_user_files)) 70 70 goto out; 71 71 sp->splice_fd_in = array_index_nospec(sp->splice_fd_in, ctx->nr_user_files); 72 - slot = &ctx->file_table.files[sp->splice_fd_in]; 73 - if (!req->rsrc_node) 74 - __io_req_set_rsrc_node(req, ctx); 75 - file = io_slot_file(slot); 76 - req->flags |= REQ_F_NEED_CLEANUP; 72 + node = ctx->file_table.nodes[sp->splice_fd_in]; 73 + if (node) { 74 + node->refs++; 75 + sp->rsrc_node = node; 76 + file = io_slot_file(node); 77 + req->flags |= REQ_F_NEED_CLEANUP; 78 + } 77 79 out: 78 80 io_ring_submit_unlock(ctx, issue_flags); 79 81 return file;
+4 -8
io_uring/uring_cmd.c
··· 220 220 * being called. This prevents destruction of the mapped buffer 221 221 * we'll need at actual import time. 222 222 */ 223 - io_req_set_rsrc_node(req, ctx); 223 + io_req_assign_rsrc_node(req, ctx->user_bufs[req->buf_index]); 224 224 } 225 225 ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); 226 226 ··· 276 276 struct iov_iter *iter, void *ioucmd) 277 277 { 278 278 struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); 279 - struct io_ring_ctx *ctx = req->ctx; 279 + struct io_rsrc_node *node = req->rsrc_nodes[IORING_RSRC_BUFFER]; 280 280 281 281 /* Must have had rsrc_node assigned at prep time */ 282 - if (req->rsrc_node) { 283 - struct io_mapped_ubuf *imu; 284 - 285 - imu = READ_ONCE(ctx->user_bufs[req->buf_index]); 286 - return io_import_fixed(rw, iter, imu, ubuf, len); 287 - } 282 + if (node) 283 + return io_import_fixed(rw, iter, node->buf, ubuf, len); 288 284 289 285 return -EFAULT; 290 286 }