Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

vfs: avoid duplicating creds in faccessat if possible

access(2) remains commonly used, for example on exec:
access("/etc/ld.so.preload", R_OK)

or when running gcc: strace -c gcc empty.c

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 42 26 access

It falls down to do_faccessat without the AT_EACCESS flag, which in turn
results in allocation of new creds in order to modify fsuid/fsgid and
caps. This is a very expensive process single-threaded and most notably
multi-threaded, with numerous structures getting refed and unrefed on
imminent new cred destruction.

Turns out for typical consumers the resulting creds would be identical
and this can be checked upfront, avoiding the hard work.

An access benchmark plugged into will-it-scale running on Cascade Lake
shows:

test proc before after
access1 1 1310582 2908735 (+121%) # distinct files
access1 24 4716491 63822173 (+1353%) # distinct files
access2 24 2378041 5370335 (+125%) # same file

The above benchmarks are not integrated into will-it-scale, but can be
found in a pull request:

https://github.com/antonblanchard/will-it-scale/pull/36/files

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Mateusz Guzik and committed by
Linus Torvalds
981ee95c a4eecbae

+37 -1
+37 -1
fs/open.c
··· 368 368 * access() needs to use the real uid/gid, not the effective uid/gid. 369 369 * We do this by temporarily clearing all FS-related capabilities and 370 370 * switching the fsuid/fsgid around to the real ones. 371 + * 372 + * Creating new credentials is expensive, so we try to skip doing it, 373 + * which we can if the result would match what we already got. 371 374 */ 375 + static bool access_need_override_creds(int flags) 376 + { 377 + const struct cred *cred; 378 + 379 + if (flags & AT_EACCESS) 380 + return false; 381 + 382 + cred = current_cred(); 383 + if (!uid_eq(cred->fsuid, cred->uid) || 384 + !gid_eq(cred->fsgid, cred->gid)) 385 + return true; 386 + 387 + if (!issecure(SECURE_NO_SETUID_FIXUP)) { 388 + kuid_t root_uid = make_kuid(cred->user_ns, 0); 389 + if (!uid_eq(cred->uid, root_uid)) { 390 + if (!cap_isclear(cred->cap_effective)) 391 + return true; 392 + } else { 393 + if (!cap_isidentical(cred->cap_effective, 394 + cred->cap_permitted)) 395 + return true; 396 + } 397 + } 398 + 399 + return false; 400 + } 401 + 372 402 static const struct cred *access_override_creds(void) 373 403 { 374 404 const struct cred *old_cred; ··· 407 377 override_cred = prepare_creds(); 408 378 if (!override_cred) 409 379 return NULL; 380 + 381 + /* 382 + * XXX access_need_override_creds performs checks in hopes of skipping 383 + * this work. Make sure it stays in sync if making any changes in this 384 + * routine. 385 + */ 410 386 411 387 override_cred->fsuid = override_cred->uid; 412 388 override_cred->fsgid = override_cred->gid; ··· 473 437 if (flags & AT_EMPTY_PATH) 474 438 lookup_flags |= LOOKUP_EMPTY; 475 439 476 - if (!(flags & AT_EACCESS)) { 440 + if (access_need_override_creds(flags)) { 477 441 old_cred = access_override_creds(); 478 442 if (!old_cred) 479 443 return -ENOMEM;