Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

io_uring/net: allow multishot receive per-invocation cap

If an application is handling multiple receive streams using recv
multishot, then the amount of retries and buffer peeking for multishot
and bundles can process too much per socket before moving on. This isn't
directly controllable by the application. By default, io_uring will
retry a recv MULTISHOT_MAX_RETRY (32) times, if the socket keeps having
data to receive. And if using bundles, then each bundle peek will
potentially map up to PEEK_MAX_IMPORT (256) iovecs of data. Once these
limits are hit, then a requeue operation will be done, where the request
will get retried after other pending requests have had a time to get
executed.

Add support for capping the per-invocation receive length, before a
requeue condition is considered for each receive. This is done by setting
sqe->mshot_len to the byte value. For example, if this is set to 1024,
then each receive will be requeued by 1024 bytes received.

Link: https://lore.kernel.org/io-uring/20250709203420.1321689-4-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Jens Axboe 6a8afb9f 3919b695

+17 -6
+17 -6
io_uring/net.c
··· 75 75 u16 flags; 76 76 /* initialised and used only by !msg send variants */ 77 77 u16 buf_group; 78 + unsigned mshot_len; 78 79 void __user *msg_control; 79 80 /* used only for send zerocopy */ 80 81 struct io_kiocb *notif; ··· 88 87 enum sr_retry_flags { 89 88 IORING_RECV_RETRY = (1U << 15), 90 89 IORING_RECV_PARTIAL_MAP = (1U << 14), 90 + IORING_RECV_MSHOT_CAP = (1U << 13), 91 91 92 92 IORING_RECV_RETRY_CLEAR = IORING_RECV_RETRY | IORING_RECV_PARTIAL_MAP, 93 - IORING_RECV_NO_RETRY = IORING_RECV_RETRY | IORING_RECV_PARTIAL_MAP, 93 + IORING_RECV_NO_RETRY = IORING_RECV_RETRY | IORING_RECV_PARTIAL_MAP | 94 + IORING_RECV_MSHOT_CAP, 94 95 }; 95 96 96 97 /* ··· 202 199 req->flags &= ~REQ_F_BL_EMPTY; 203 200 sr->done_io = 0; 204 201 sr->flags &= ~IORING_RECV_RETRY_CLEAR; 205 - sr->len = 0; /* get from the provided buffer */ 202 + sr->len = sr->mshot_len; 206 203 } 207 204 208 205 static int io_net_import_vec(struct io_kiocb *req, struct io_async_msghdr *iomsg, ··· 790 787 sr->buf_group = req->buf_index; 791 788 req->buf_list = NULL; 792 789 } 790 + sr->mshot_len = 0; 793 791 if (sr->flags & IORING_RECV_MULTISHOT) { 794 792 if (!(req->flags & REQ_F_BUFFER_SELECT)) 795 793 return -EINVAL; 796 794 if (sr->msg_flags & MSG_WAITALL) 797 795 return -EINVAL; 798 - if (req->opcode == IORING_OP_RECV && sr->len) 799 - return -EINVAL; 796 + if (req->opcode == IORING_OP_RECV) 797 + sr->mshot_len = sr->len; 800 798 req->flags |= REQ_F_APOLL_MULTISHOT; 801 799 } 802 800 if (sr->flags & IORING_RECVSEND_BUNDLE) { ··· 838 834 issue_flags); 839 835 if (sr->flags & IORING_RECV_RETRY) 840 836 cflags = req->cqe.flags | (cflags & CQE_F_MASK); 837 + if (sr->mshot_len && *ret >= sr->mshot_len) 838 + sr->flags |= IORING_RECV_MSHOT_CAP; 841 839 /* bundle with no more immediate buffers, we're done */ 842 840 if (req->flags & REQ_F_BL_EMPTY) 843 841 goto finish; ··· 870 864 io_mshot_prep_retry(req, kmsg); 871 865 /* Known not-empty or unknown state, retry */ 872 866 if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0) { 873 - if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) 867 + if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY && 868 + !(sr->flags & IORING_RECV_MSHOT_CAP)) { 874 869 return false; 870 + } 875 871 /* mshot retries exceeded, force a requeue */ 876 872 sr->nr_multishot_loops = 0; 873 + sr->flags &= ~IORING_RECV_MSHOT_CAP; 877 874 if (issue_flags & IO_URING_F_MULTISHOT) 878 875 *ret = IOU_REQUEUE; 879 876 } ··· 1089 1080 arg.mode |= KBUF_MODE_FREE; 1090 1081 } 1091 1082 1092 - if (kmsg->msg.msg_inq > 1) 1083 + if (*len) 1084 + arg.max_len = *len; 1085 + else if (kmsg->msg.msg_inq > 1) 1093 1086 arg.max_len = min_not_zero(*len, kmsg->msg.msg_inq); 1094 1087 1095 1088 ret = io_buffers_peek(req, &arg);