x86-64: Fix "bytes left to copy" return value for copy_from_user()

Most users by far do not care about the exact return value (they only
really care about whether the copy succeeded in its entirety or not),
but a few special core routines actually care deeply about exactly how
many bytes were copied from user space.

And the unrolled versions of the x86-64 user copy routines would
sometimes report that it had copied more bytes than it actually had.

Very few uses actually have partial copies to begin with, but to make
this bug even harder to trigger, most x86 CPU's use the "rep string"
instructions for normal user copies, and that version didn't have this
issue.

To make it even harder to hit, the one user of this that really cared
about the return value (and used the uncached version of the copy that
doesn't use the "rep string" instructions) was the generic write
routine, which pre-populated its source, once more hiding the problem by
avoiding the exception case that triggers the bug.

In other words, very special thanks to Bron Gondwana who not only
triggered this, but created a test-program to show it, and bisected the
behavior down to commit 08291429cfa6258c4cd95d8833beb40f828b194e ("mm:
fix pagecache write deadlocks") which changed the access pattern just
enough that you can now trigger it with 'writev()' with multiple
iovec's.

That commit itself was not the cause of the bug, it just allowed all the
stars to align just right that you could trigger the problem.

[ Side note: this is just the minimal fix to make the copy routines
(with __copy_from_user_inatomic_nocache as the particular version that
was involved in showing this) have the right return values.

We really should improve on the exceptional case further - to make the
copy do a byte-accurate copy up to the exact page limit that causes it
to fail. As it is, the callers have to do extra work to handle the
limit case gracefully. ]

Reported-by: Bron Gondwana <brong@fastmail.fm>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(which didn't have this problem), and since
most users that do the carethis was very hard to trigger, but

Linus Torvalds 18 years ago 42a886af 27eaf66b

+22 -28

2 changed files

expand all

arch

x86

lib

copy_user_64.S

copy_user_nocache_64.S

+11 -14

arch/x86/lib/copy_user_64.S

··· 217 217 /* table sorted by exception address */ 218 218 .section __ex_table,"a" 219 219 .align 8 220 - .quad .Ls1,.Ls1e 221 - .quad .Ls2,.Ls2e 222 - .quad .Ls3,.Ls3e 223 - .quad .Ls4,.Ls4e 224 - .quad .Ld1,.Ls1e 220 + .quad .Ls1,.Ls1e /* Ls1-Ls4 have copied zero bytes */ 221 + .quad .Ls2,.Ls1e 222 + .quad .Ls3,.Ls1e 223 + .quad .Ls4,.Ls1e 224 + .quad .Ld1,.Ls1e /* Ld1-Ld4 have copied 0-24 bytes */ 225 225 .quad .Ld2,.Ls2e 226 226 .quad .Ld3,.Ls3e 227 227 .quad .Ld4,.Ls4e 228 - .quad .Ls5,.Ls5e 229 - .quad .Ls6,.Ls6e 230 - .quad .Ls7,.Ls7e 231 - .quad .Ls8,.Ls8e 232 - .quad .Ld5,.Ls5e 228 + .quad .Ls5,.Ls5e /* Ls5-Ls8 have copied 32 bytes */ 229 + .quad .Ls6,.Ls5e 230 + .quad .Ls7,.Ls5e 231 + .quad .Ls8,.Ls5e 232 + .quad .Ld5,.Ls5e /* Ld5-Ld8 have copied 32-56 bytes */ 233 233 .quad .Ld6,.Ls6e 234 234 .quad .Ld7,.Ls7e 235 235 .quad .Ld8,.Ls8e ··· 244 244 .quad .Le5,.Le_zero 245 245 .previous 246 246 247 - /* compute 64-offset for main loop. 8 bytes accuracy with error on the 248 - pessimistic side. this is gross. it would be better to fix the 249 - interface. */ 250 247 /* eax: zero, ebx: 64 */ 251 - .Ls1e: addl $8,%eax 248 + .Ls1e: addl $8,%eax /* eax is bytes left uncopied within the loop (Ls1e: 64 .. Ls8e: 8) */ 252 249 .Ls2e: addl $8,%eax 253 250 .Ls3e: addl $8,%eax 254 251 .Ls4e: addl $8,%eax

+11 -14

arch/x86/lib/copy_user_nocache_64.S

··· 145 145 /* table sorted by exception address */ 146 146 .section __ex_table,"a" 147 147 .align 8 148 - .quad .Ls1,.Ls1e 149 - .quad .Ls2,.Ls2e 150 - .quad .Ls3,.Ls3e 151 - .quad .Ls4,.Ls4e 152 - .quad .Ld1,.Ls1e 148 + .quad .Ls1,.Ls1e /* .Ls[1-4] - 0 bytes copied */ 149 + .quad .Ls2,.Ls1e 150 + .quad .Ls3,.Ls1e 151 + .quad .Ls4,.Ls1e 152 + .quad .Ld1,.Ls1e /* .Ld[1-4] - 0..24 bytes coped */ 153 153 .quad .Ld2,.Ls2e 154 154 .quad .Ld3,.Ls3e 155 155 .quad .Ld4,.Ls4e 156 - .quad .Ls5,.Ls5e 157 - .quad .Ls6,.Ls6e 158 - .quad .Ls7,.Ls7e 159 - .quad .Ls8,.Ls8e 160 - .quad .Ld5,.Ls5e 156 + .quad .Ls5,.Ls5e /* .Ls[5-8] - 32 bytes copied */ 157 + .quad .Ls6,.Ls5e 158 + .quad .Ls7,.Ls5e 159 + .quad .Ls8,.Ls5e 160 + .quad .Ld5,.Ls5e /* .Ld[5-8] - 32..56 bytes copied */ 161 161 .quad .Ld6,.Ls6e 162 162 .quad .Ld7,.Ls7e 163 163 .quad .Ld8,.Ls8e ··· 172 172 .quad .Le5,.Le_zero 173 173 .previous 174 174 175 - /* compute 64-offset for main loop. 8 bytes accuracy with error on the 176 - pessimistic side. this is gross. it would be better to fix the 177 - interface. */ 178 175 /* eax: zero, ebx: 64 */ 179 - .Ls1e: addl $8,%eax 176 + .Ls1e: addl $8,%eax /* eax: bytes left uncopied: Ls1e: 64 .. Ls8e: 8 */ 180 177 .Ls2e: addl $8,%eax 181 178 .Ls3e: addl $8,%eax 182 179 .Ls4e: addl $8,%eax

Configure Feed

Configure Feed