Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

io_uring/net: add provided buffer support for IORING_OP_SEND

It's pretty trivial to wire up provided buffer support for the send
side, just like how it's done the receive side. This enables setting up
a buffer ring that an application can use to push pending sends to,
and then have a send pick a buffer from that ring.

One of the challenges with async IO and networking sends is that you
can get into reordering conditions if you have more than one inflight
at the same time. Consider the following scenario where everything is
fine:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, completes successfully, posts CQE
5) sendB is issued, completes successfully, posts CQE

All is fine. Requests are always issued in-order, and both complete
inline as most sends do.

However, if we're flooding socket1 with sends, the following could
also result from the same sequence:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, completes successfully, posts CQE
7) sendA is retried, completes successfully, posts CQE

Now we've sent sendB before sendA, which can make things unhappy. If
both sendA and sendB had been using provided buffers, then it would look
as follows instead:

1) App queues dataA for sendA, queues sendA for socket1
2) App queues dataB for sendB queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, picks first buffer (dataA), completes successfully,
posts CQE (which says "I sent dataA")
7) sendA is retried, picks first buffer (dataB), completes successfully,
posts CQE (which says "I sent dataB")

Now we've sent the data in order, and everybody is happy.

It's worth noting that this also opens the door for supporting multishot
sends, as provided buffers would be a prerequisite for that. Those can
trigger either when new buffers are added to the outgoing ring, or (if
stalled due to lack of space) when space frees up in the socket.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

+21 -5
+20 -5
io_uring/net.c
··· 364 364 kmsg->msg.msg_name = &kmsg->addr; 365 365 kmsg->msg.msg_namelen = sr->addr_len; 366 366 } 367 - ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &kmsg->msg.msg_iter); 368 - if (unlikely(ret < 0)) 369 - return ret; 370 - 367 + if (!io_do_buffer_select(req)) { 368 + ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, 369 + &kmsg->msg.msg_iter); 370 + if (unlikely(ret < 0)) 371 + return ret; 372 + } 371 373 return 0; 372 374 } 373 375 ··· 482 480 struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); 483 481 struct io_async_msghdr *kmsg = req->async_data; 484 482 struct socket *sock; 483 + unsigned int cflags; 485 484 unsigned flags; 486 485 int min_ret = 0; 487 486 int ret; ··· 494 491 if (!(req->flags & REQ_F_POLLED) && 495 492 (sr->flags & IORING_RECVSEND_POLL_FIRST)) 496 493 return -EAGAIN; 494 + 495 + if (io_do_buffer_select(req)) { 496 + size_t len = sr->len; 497 + void __user *buf; 498 + 499 + buf = io_buffer_select(req, &len, issue_flags); 500 + if (unlikely(!buf)) 501 + return -ENOBUFS; 502 + sr->buf = buf; 503 + sr->len = len; 504 + } 497 505 498 506 flags = sr->msg_flags; 499 507 if (issue_flags & IO_URING_F_NONBLOCK) ··· 535 521 else if (sr->done_io) 536 522 ret = sr->done_io; 537 523 io_req_msg_cleanup(req, issue_flags); 538 - io_req_set_res(req, ret, 0); 524 + cflags = io_put_kbuf(req, issue_flags); 525 + io_req_set_res(req, ret, cflags); 539 526 return IOU_OK; 540 527 } 541 528
+1
io_uring/opdef.c
··· 281 281 .pollout = 1, 282 282 .audit_skip = 1, 283 283 .ioprio = 1, 284 + .buffer_select = 1, 284 285 #if defined(CONFIG_NET) 285 286 .async_size = sizeof(struct io_async_msghdr), 286 287 .prep = io_sendmsg_prep,