aio: fix reqs_available handling

As reported by Dan Aloni, commit f8567a3845ac ("aio: fix aio request
leak when events are reaped by userspace") introduces a regression when
user code attempts to perform io_submit() with more events than are
available in the ring buffer. Reverting that commit would reintroduce a
regression when user space event reaping is used.

Fixing this bug is a bit more involved than the previous attempts to fix
this regression. Since we do not have a single point at which we can
count events as being reaped by user space and io_getevents(), we have
to track event completion by looking at the number of events left in the
event ring. So long as there are as many events in the ring buffer as
there have been completion events generate, we cannot call
put_reqs_available(). The code to check for this is now placed in
refill_reqs_available().

A test program from Dan and modified by me for verifying this bug is available
at http://www.kvack.org/~bcrl/20140824-aio_bug.c .

Reported-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Dan Aloni <dan@kernelim.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: stable@vger.kernel.org # v3.16 and anything that f8567a3845ac was backported to
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Benjamin LaHaise and committed by

Linus Torvalds 12 years ago d856f32a 451fd722

+73 -4

1 changed file

expand all

aio.c

+73 -4

fs/aio.c

··· 141 141 142 142 struct { 143 143 unsigned tail; 144 + unsigned completed_events; 144 145 spinlock_t completion_lock; 145 146 } ____cacheline_aligned_in_smp; 146 147 ··· 858 857 return ret; 859 858 } 860 859 860 + /* refill_reqs_available 861 + * Updates the reqs_available reference counts used for tracking the 862 + * number of free slots in the completion ring. This can be called 863 + * from aio_complete() (to optimistically update reqs_available) or 864 + * from aio_get_req() (the we're out of events case). It must be 865 + * called holding ctx->completion_lock. 866 + */ 867 + static void refill_reqs_available(struct kioctx *ctx, unsigned head, 868 + unsigned tail) 869 + { 870 + unsigned events_in_ring, completed; 871 + 872 + /* Clamp head since userland can write to it. */ 873 + head %= ctx->nr_events; 874 + if (head <= tail) 875 + events_in_ring = tail - head; 876 + else 877 + events_in_ring = ctx->nr_events - (head - tail); 878 + 879 + completed = ctx->completed_events; 880 + if (events_in_ring < completed) 881 + completed -= events_in_ring; 882 + else 883 + completed = 0; 884 + 885 + if (!completed) 886 + return; 887 + 888 + ctx->completed_events -= completed; 889 + put_reqs_available(ctx, completed); 890 + } 891 + 892 + /* user_refill_reqs_available 893 + * Called to refill reqs_available when aio_get_req() encounters an 894 + * out of space in the completion ring. 895 + */ 896 + static void user_refill_reqs_available(struct kioctx *ctx) 897 + { 898 + spin_lock_irq(&ctx->completion_lock); 899 + if (ctx->completed_events) { 900 + struct aio_ring *ring; 901 + unsigned head; 902 + 903 + /* Access of ring->head may race with aio_read_events_ring() 904 + * here, but that's okay since whether we read the old version 905 + * or the new version, and either will be valid. The important 906 + * part is that head cannot pass tail since we prevent 907 + * aio_complete() from updating tail by holding 908 + * ctx->completion_lock. Even if head is invalid, the check 909 + * against ctx->completed_events below will make sure we do the 910 + * safe/right thing. 911 + */ 912 + ring = kmap_atomic(ctx->ring_pages[0]); 913 + head = ring->head; 914 + kunmap_atomic(ring); 915 + 916 + refill_reqs_available(ctx, head, ctx->tail); 917 + } 918 + 919 + spin_unlock_irq(&ctx->completion_lock); 920 + } 921 + 861 922 /* aio_get_req 862 923 * Allocate a slot for an aio request. 863 924 * Returns NULL if no requests are free. ··· 928 865 { 929 866 struct kiocb *req; 930 867 931 - if (!get_reqs_available(ctx)) 932 - return NULL; 868 + if (!get_reqs_available(ctx)) { 869 + user_refill_reqs_available(ctx); 870 + if (!get_reqs_available(ctx)) 871 + return NULL; 872 + } 933 873 934 874 req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO); 935 875 if (unlikely(!req)) ··· 991 925 struct kioctx *ctx = iocb->ki_ctx; 992 926 struct aio_ring *ring; 993 927 struct io_event *ev_page, *event; 928 + unsigned tail, pos, head; 994 929 unsigned long flags; 995 - unsigned tail, pos; 996 930 997 931 /* 998 932 * Special case handling for sync iocbs: ··· 1053 987 ctx->tail = tail; 1054 988 1055 989 ring = kmap_atomic(ctx->ring_pages[0]); 990 + head = ring->head; 1056 991 ring->tail = tail; 1057 992 kunmap_atomic(ring); 1058 993 flush_dcache_page(ctx->ring_pages[0]); 1059 994 995 + ctx->completed_events++; 996 + if (ctx->completed_events > 1) 997 + refill_reqs_available(ctx, head, tail); 1060 998 spin_unlock_irqrestore(&ctx->completion_lock, flags); 1061 999 1062 1000 pr_debug("added to ring %p at [%u]\n", iocb, tail); ··· 1075 1005 1076 1006 /* everything turned out well, dispose of the aiocb. */ 1077 1007 kiocb_free(iocb); 1078 - put_reqs_available(ctx, 1); 1079 1008 1080 1009 /* 1081 1010 * We have to order our ring_info tail store above and test

Configure Feed

Configure Feed