Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'landlock-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux

Pull landlock updates from Mickaël Salaün:

- extend Landlock to enforce restrictions on a whole process, similarly
to the seccomp's TSYNC flag

- refactor data structures to simplify code and improve performance

- add documentation to cover missing parts

* tag 'landlock-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
mailmap: Add entry for Mickaël Salaün
landlock: Transpose the layer masks data structure
landlock: Add access_mask_subset() helper
selftests/landlock: Add filesystem access benchmark
landlock: Document audit blocker field format
landlock: Add errata documentation section
landlock: Add backwards compatibility for restrict flags
landlock: Refactor TCP socket type check
landlock: Minor reword of docs for TCP access rights
landlock: Document LANDLOCK_RESTRICT_SELF_TSYNC
selftests/landlock: Add LANDLOCK_RESTRICT_SELF_TSYNC tests
landlock: Multithreading support for landlock_restrict_self()

+1493 -406
+1
.mailmap
··· 563 563 Michel Lespinasse <michel@lespinasse.org> 564 564 Michel Lespinasse <michel@lespinasse.org> <walken@google.com> 565 565 Michel Lespinasse <michel@lespinasse.org> <walken@zoy.org> 566 + Mickaël Salaün <mic@digikod.net> <mic@linux.microsoft.com> 566 567 Miguel Ojeda <ojeda@kernel.org> <miguel.ojeda.sandonis@gmail.com> 567 568 Mike Rapoport <rppt@kernel.org> <mike@compulab.co.il> 568 569 Mike Rapoport <rppt@kernel.org> <mike.rapoport@gmail.com>
+33 -2
Documentation/admin-guide/LSM/landlock.rst
··· 6 6 ================================ 7 7 8 8 :Author: Mickaël Salaün 9 - :Date: March 2025 9 + :Date: January 2026 10 10 11 11 Landlock can leverage the audit framework to log events. 12 12 ··· 37 37 38 38 domain=195ba459b blockers=fs.refer path="/usr/bin" dev="vda2" ino=351 39 39 domain=195ba459b blockers=fs.make_reg,fs.refer path="/usr/local" dev="vda2" ino=365 40 + 41 + 42 + The ``blockers`` field uses dot-separated prefixes to indicate the type of 43 + restriction that caused the denial: 44 + 45 + **fs.*** - Filesystem access rights (ABI 1+): 46 + - fs.execute, fs.write_file, fs.read_file, fs.read_dir 47 + - fs.remove_dir, fs.remove_file 48 + - fs.make_char, fs.make_dir, fs.make_reg, fs.make_sock 49 + - fs.make_fifo, fs.make_block, fs.make_sym 50 + - fs.refer (ABI 2+) 51 + - fs.truncate (ABI 3+) 52 + - fs.ioctl_dev (ABI 5+) 53 + 54 + **net.*** - Network access rights (ABI 4+): 55 + - net.bind_tcp - TCP port binding was denied 56 + - net.connect_tcp - TCP connection was denied 57 + 58 + **scope.*** - IPC scoping restrictions (ABI 6+): 59 + - scope.abstract_unix_socket - Abstract UNIX socket connection denied 60 + - scope.signal - Signal sending denied 61 + 62 + Multiple blockers can appear in a single event (comma-separated) when 63 + multiple access rights are missing. For example, creating a regular file 64 + in a directory that lacks both ``make_reg`` and ``refer`` rights would show 65 + ``blockers=fs.make_reg,fs.refer``. 66 + 67 + The object identification fields (path, dev, ino for filesystem; opid, 68 + ocomm for signals) depend on the type of access being blocked and provide 69 + context about what resource was involved in the denial. 70 + 40 71 41 72 AUDIT_LANDLOCK_DOMAIN 42 73 This record type describes the status of a Landlock domain. The ``status`` ··· 117 86 number following a timestamp (``msg=audit(1729738800.268:30)``). The first 118 87 event (serial ``30``) contains 4 records. The first record 119 88 (``type=LANDLOCK_ACCESS``) shows an access denied by the domain `1a6fdc66f`. 120 - The cause of this denial is signal scopping restriction 89 + The cause of this denial is signal scoping restriction 121 90 (``blockers=scope.signal``). The process that would have receive this signal 122 91 is the init process (``opid=1 ocomm="systemd"``). 123 92
+93 -12
Documentation/userspace-api/landlock.rst
··· 8 8 ===================================== 9 9 10 10 :Author: Mickaël Salaün 11 - :Date: March 2025 11 + :Date: January 2026 12 12 13 13 The goal of Landlock is to enable restriction of ambient rights (e.g. global 14 14 filesystem or network access) for a set of processes. Because Landlock ··· 142 142 } 143 143 144 144 We can now add a new rule to this ruleset thanks to the returned file 145 - descriptor referring to this ruleset. The rule will only allow reading the 146 - file hierarchy ``/usr``. Without another rule, write actions would then be 147 - denied by the ruleset. To add ``/usr`` to the ruleset, we open it with the 148 - ``O_PATH`` flag and fill the &struct landlock_path_beneath_attr with this file 149 - descriptor. 145 + descriptor referring to this ruleset. The rule will allow reading and 146 + executing the file hierarchy ``/usr``. Without another rule, write actions 147 + would then be denied by the ruleset. To add ``/usr`` to the ruleset, we open 148 + it with the ``O_PATH`` flag and fill the &struct landlock_path_beneath_attr with 149 + this file descriptor. 150 150 151 151 .. code-block:: c 152 152 ··· 191 191 err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, 192 192 &net_port, 0); 193 193 194 + When passing a non-zero ``flags`` argument to ``landlock_restrict_self()``, a 195 + similar backwards compatibility check is needed for the restrict flags 196 + (see sys_landlock_restrict_self() documentation for available flags): 197 + 198 + .. code-block:: c 199 + 200 + __u32 restrict_flags = LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON; 201 + if (abi < 7) { 202 + /* Clear logging flags unsupported before ABI 7. */ 203 + restrict_flags &= ~(LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF | 204 + LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON | 205 + LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF); 206 + } 207 + 194 208 The next step is to restrict the current thread from gaining more privileges 195 209 (e.g. through a SUID binary). We now have a ruleset with the first rule 196 - allowing read access to ``/usr`` while denying all other handled accesses for 197 - the filesystem, and a second rule allowing HTTPS connections. 210 + allowing read and execute access to ``/usr`` while denying all other handled 211 + accesses for the filesystem, and a second rule allowing HTTPS connections. 198 212 199 213 .. code-block:: c 200 214 ··· 222 208 223 209 .. code-block:: c 224 210 225 - if (landlock_restrict_self(ruleset_fd, 0)) { 211 + if (landlock_restrict_self(ruleset_fd, restrict_flags)) { 226 212 perror("Failed to enforce ruleset"); 227 213 close(ruleset_fd); 228 214 return 1; ··· 445 431 printf("Landlock supports LANDLOCK_ACCESS_FS_REFER.\n"); 446 432 } 447 433 448 - The following kernel interfaces are implicitly supported by the first ABI 449 - version. Features only supported from a specific version are explicitly marked 450 - as such. 434 + All Landlock kernel interfaces are supported by the first ABI version unless 435 + explicitly noted in their documentation. 436 + 437 + Landlock errata 438 + --------------- 439 + 440 + In addition to ABI versions, Landlock provides an errata mechanism to track 441 + fixes for issues that may affect backwards compatibility or require userspace 442 + awareness. The errata bitmask can be queried using: 443 + 444 + .. code-block:: c 445 + 446 + int errata; 447 + 448 + errata = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_ERRATA); 449 + if (errata < 0) { 450 + /* Landlock not available or disabled */ 451 + return 0; 452 + } 453 + 454 + The returned value is a bitmask where each bit represents a specific erratum. 455 + If bit N is set (``errata & (1 << (N - 1))``), then erratum N has been fixed 456 + in the running kernel. 457 + 458 + .. warning:: 459 + 460 + **Most applications should NOT check errata.** In 99.9% of cases, checking 461 + errata is unnecessary, increases code complexity, and can potentially 462 + decrease protection if misused. For example, disabling the sandbox when an 463 + erratum is not fixed could leave the system less secure than using 464 + Landlock's best-effort protection. When in doubt, ignore errata. 465 + 466 + .. kernel-doc:: security/landlock/errata/abi-4.h 467 + :doc: erratum_1 468 + 469 + .. kernel-doc:: security/landlock/errata/abi-6.h 470 + :doc: erratum_2 471 + 472 + .. kernel-doc:: security/landlock/errata/abi-1.h 473 + :doc: erratum_3 474 + 475 + How to check for errata 476 + ~~~~~~~~~~~~~~~~~~~~~~~ 477 + 478 + If you determine that your application needs to check for specific errata, 479 + use this pattern: 480 + 481 + .. code-block:: c 482 + 483 + int errata = landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_ERRATA); 484 + if (errata >= 0) { 485 + /* Check for specific erratum (1-indexed) */ 486 + if (errata & (1 << (erratum_number - 1))) { 487 + /* Erratum N is fixed in this kernel */ 488 + } else { 489 + /* Erratum N is NOT fixed - consider implications for your use case */ 490 + } 491 + } 492 + 493 + **Important:** Only check errata if your application specifically relies on 494 + behavior that changed due to the fix. The fixes generally make Landlock less 495 + restrictive or more correct, not more restrictive. 451 496 452 497 Kernel interface 453 498 ================ ··· 676 603 ``LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF`` flags passed to 677 604 sys_landlock_restrict_self(). See Documentation/admin-guide/LSM/landlock.rst 678 605 for more details on audit. 606 + 607 + Thread synchronization (ABI < 8) 608 + -------------------------------- 609 + 610 + Starting with the Landlock ABI version 8, it is now possible to 611 + enforce Landlock rulesets across all threads of the calling process 612 + using the ``LANDLOCK_RESTRICT_SELF_TSYNC`` flag passed to 613 + sys_landlock_restrict_self(). 679 614 680 615 .. _kernel_support: 681 616
+22 -8
include/uapi/linux/landlock.h
··· 117 117 * future nested domains, not the one being created. It can also be used 118 118 * with a @ruleset_fd value of -1 to mute subdomain logs without creating a 119 119 * domain. 120 + * 121 + * The following flag supports policy enforcement in multithreaded processes: 122 + * 123 + * %LANDLOCK_RESTRICT_SELF_TSYNC 124 + * Applies the new Landlock configuration atomically to all threads of the 125 + * current process, including the Landlock domain and logging 126 + * configuration. This overrides the Landlock configuration of sibling 127 + * threads, irrespective of previously established Landlock domains and 128 + * logging configurations on these threads. 129 + * 130 + * If the calling thread is running with no_new_privs, this operation 131 + * enables no_new_privs on the sibling threads as well. 120 132 */ 121 133 /* clang-format off */ 122 134 #define LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF (1U << 0) 123 135 #define LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON (1U << 1) 124 136 #define LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF (1U << 2) 137 + #define LANDLOCK_RESTRICT_SELF_TSYNC (1U << 3) 125 138 /* clang-format on */ 126 139 127 140 /** ··· 195 182 * It should be noted that port 0 passed to :manpage:`bind(2)` will bind 196 183 * to an available port from the ephemeral port range. This can be 197 184 * configured with the ``/proc/sys/net/ipv4/ip_local_port_range`` sysctl 198 - * (also used for IPv6). 185 + * (also used for IPv6), and within that range, on a per-socket basis 186 + * with ``setsockopt(IP_LOCAL_PORT_RANGE)``. 199 187 * 200 - * A Landlock rule with port 0 and the ``LANDLOCK_ACCESS_NET_BIND_TCP`` 188 + * A Landlock rule with port 0 and the %LANDLOCK_ACCESS_NET_BIND_TCP 201 189 * right means that requesting to bind on port 0 is allowed and it will 202 - * automatically translate to binding on the related port range. 190 + * automatically translate to binding on a kernel-assigned ephemeral 191 + * port. 203 192 */ 204 193 __u64 port; 205 194 }; ··· 344 329 * These flags enable to restrict a sandboxed process to a set of network 345 330 * actions. 346 331 * 347 - * This is supported since Landlock ABI version 4. 348 - * 349 332 * The following access rights apply to TCP port numbers: 350 333 * 351 - * - %LANDLOCK_ACCESS_NET_BIND_TCP: Bind a TCP socket to a local port. 352 - * - %LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect an active TCP socket to 353 - * a remote port. 334 + * - %LANDLOCK_ACCESS_NET_BIND_TCP: Bind TCP sockets to the given local 335 + * port. Support added in Landlock ABI version 4. 336 + * - %LANDLOCK_ACCESS_NET_CONNECT_TCP: Connect TCP sockets to the given 337 + * remote port. Support added in Landlock ABI version 4. 354 338 */ 355 339 /* clang-format off */ 356 340 #define LANDLOCK_ACCESS_NET_BIND_TCP (1ULL << 0)
+9 -2
security/landlock/Makefile
··· 1 1 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o 2 2 3 - landlock-y := setup.o syscalls.o object.o ruleset.o \ 4 - cred.o task.o fs.o 3 + landlock-y := \ 4 + setup.o \ 5 + syscalls.o \ 6 + object.o \ 7 + ruleset.o \ 8 + cred.o \ 9 + task.o \ 10 + fs.o \ 11 + tsync.o 5 12 6 13 landlock-$(CONFIG_INET) += net.o 7 14
+29 -6
security/landlock/access.h
··· 61 61 static_assert(sizeof(typeof_member(union access_masks_all, masks)) == 62 62 sizeof(typeof_member(union access_masks_all, all))); 63 63 64 - typedef u16 layer_mask_t; 65 - 66 - /* Makes sure all layers can be checked. */ 67 - static_assert(BITS_PER_TYPE(layer_mask_t) >= LANDLOCK_MAX_NUM_LAYERS); 64 + /** 65 + * struct layer_access_masks - A boolean matrix of layers and access rights 66 + * 67 + * This has a bit for each combination of layer numbers and access rights. 68 + * During access checks, it is used to represent the access rights for each 69 + * layer which still need to be fulfilled. When all bits are 0, the access 70 + * request is considered to be fulfilled. 71 + */ 72 + struct layer_access_masks { 73 + /** 74 + * @access: The unfulfilled access rights for each layer. 75 + */ 76 + access_mask_t access[LANDLOCK_MAX_NUM_LAYERS]; 77 + }; 68 78 69 79 /* 70 - * Tracks domains responsible of a denied access. This is required to avoid 71 - * storing in each object the full layer_masks[] required by update_request(). 80 + * Tracks domains responsible of a denied access. This avoids storing in each 81 + * object the full matrix of per-layer unfulfilled access rights, which is 82 + * required by update_request(). 83 + * 84 + * Each nibble represents the layer index of the newest layer which denied a 85 + * certain access right. For file system access rights, the upper four bits are 86 + * the index of the layer which denies LANDLOCK_ACCESS_FS_IOCTL_DEV and the 87 + * lower nibble represents LANDLOCK_ACCESS_FS_TRUNCATE. 72 88 */ 73 89 typedef u8 deny_masks_t; 74 90 ··· 111 95 access_masks.fs |= _LANDLOCK_ACCESS_FS_INITIALLY_DENIED; 112 96 113 97 return access_masks; 98 + } 99 + 100 + /* Checks the subset relation between access masks. */ 101 + static inline bool access_mask_subset(access_mask_t subset, 102 + access_mask_t superset) 103 + { 104 + return (subset | superset) == superset; 114 105 } 115 106 116 107 #endif /* _SECURITY_LANDLOCK_ACCESS_H */
+25 -56
security/landlock/audit.c
··· 180 180 181 181 #endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */ 182 182 183 + /* Get the youngest layer that denied the access_request. */ 183 184 static size_t get_denied_layer(const struct landlock_ruleset *const domain, 184 185 access_mask_t *const access_request, 185 - const layer_mask_t (*const layer_masks)[], 186 - const size_t layer_masks_size) 186 + const struct layer_access_masks *masks) 187 187 { 188 - const unsigned long access_req = *access_request; 189 - unsigned long access_bit; 190 - access_mask_t missing = 0; 191 - long youngest_layer = -1; 192 - 193 - for_each_set_bit(access_bit, &access_req, layer_masks_size) { 194 - const layer_mask_t mask = (*layer_masks)[access_bit]; 195 - long layer; 196 - 197 - if (!mask) 198 - continue; 199 - 200 - /* __fls(1) == 0 */ 201 - layer = __fls(mask); 202 - if (layer > youngest_layer) { 203 - youngest_layer = layer; 204 - missing = BIT(access_bit); 205 - } else if (layer == youngest_layer) { 206 - missing |= BIT(access_bit); 188 + for (ssize_t i = ARRAY_SIZE(masks->access) - 1; i >= 0; i--) { 189 + if (masks->access[i] & *access_request) { 190 + *access_request &= masks->access[i]; 191 + return i; 207 192 } 208 193 } 209 194 210 - *access_request = missing; 211 - if (youngest_layer == -1) 212 - return domain->num_layers - 1; 213 - 214 - return youngest_layer; 195 + /* Not found - fall back to default values */ 196 + *access_request = 0; 197 + return domain->num_layers - 1; 215 198 } 216 199 217 200 #ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST ··· 204 221 const struct landlock_ruleset dom = { 205 222 .num_layers = 5, 206 223 }; 207 - const layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = { 208 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT(0), 209 - [BIT_INDEX(LANDLOCK_ACCESS_FS_READ_FILE)] = BIT(1), 210 - [BIT_INDEX(LANDLOCK_ACCESS_FS_READ_DIR)] = BIT(1) | BIT(0), 211 - [BIT_INDEX(LANDLOCK_ACCESS_FS_REMOVE_DIR)] = BIT(2), 224 + const struct layer_access_masks masks = { 225 + .access[0] = LANDLOCK_ACCESS_FS_EXECUTE | 226 + LANDLOCK_ACCESS_FS_READ_DIR, 227 + .access[1] = LANDLOCK_ACCESS_FS_READ_FILE | 228 + LANDLOCK_ACCESS_FS_READ_DIR, 229 + .access[2] = LANDLOCK_ACCESS_FS_REMOVE_DIR, 212 230 }; 213 231 access_mask_t access; 214 232 215 233 access = LANDLOCK_ACCESS_FS_EXECUTE; 216 - KUNIT_EXPECT_EQ(test, 0, 217 - get_denied_layer(&dom, &access, &layer_masks, 218 - sizeof(layer_masks))); 234 + KUNIT_EXPECT_EQ(test, 0, get_denied_layer(&dom, &access, &masks)); 219 235 KUNIT_EXPECT_EQ(test, access, LANDLOCK_ACCESS_FS_EXECUTE); 220 236 221 237 access = LANDLOCK_ACCESS_FS_READ_FILE; 222 - KUNIT_EXPECT_EQ(test, 1, 223 - get_denied_layer(&dom, &access, &layer_masks, 224 - sizeof(layer_masks))); 238 + KUNIT_EXPECT_EQ(test, 1, get_denied_layer(&dom, &access, &masks)); 225 239 KUNIT_EXPECT_EQ(test, access, LANDLOCK_ACCESS_FS_READ_FILE); 226 240 227 241 access = LANDLOCK_ACCESS_FS_READ_DIR; 228 - KUNIT_EXPECT_EQ(test, 1, 229 - get_denied_layer(&dom, &access, &layer_masks, 230 - sizeof(layer_masks))); 242 + KUNIT_EXPECT_EQ(test, 1, get_denied_layer(&dom, &access, &masks)); 231 243 KUNIT_EXPECT_EQ(test, access, LANDLOCK_ACCESS_FS_READ_DIR); 232 244 233 245 access = LANDLOCK_ACCESS_FS_READ_FILE | LANDLOCK_ACCESS_FS_READ_DIR; 234 - KUNIT_EXPECT_EQ(test, 1, 235 - get_denied_layer(&dom, &access, &layer_masks, 236 - sizeof(layer_masks))); 246 + KUNIT_EXPECT_EQ(test, 1, get_denied_layer(&dom, &access, &masks)); 237 247 KUNIT_EXPECT_EQ(test, access, 238 248 LANDLOCK_ACCESS_FS_READ_FILE | 239 249 LANDLOCK_ACCESS_FS_READ_DIR); 240 250 241 251 access = LANDLOCK_ACCESS_FS_EXECUTE | LANDLOCK_ACCESS_FS_READ_DIR; 242 - KUNIT_EXPECT_EQ(test, 1, 243 - get_denied_layer(&dom, &access, &layer_masks, 244 - sizeof(layer_masks))); 252 + KUNIT_EXPECT_EQ(test, 1, get_denied_layer(&dom, &access, &masks)); 245 253 KUNIT_EXPECT_EQ(test, access, LANDLOCK_ACCESS_FS_READ_DIR); 246 254 247 255 access = LANDLOCK_ACCESS_FS_WRITE_FILE; 248 - KUNIT_EXPECT_EQ(test, 4, 249 - get_denied_layer(&dom, &access, &layer_masks, 250 - sizeof(layer_masks))); 256 + KUNIT_EXPECT_EQ(test, 4, get_denied_layer(&dom, &access, &masks)); 251 257 KUNIT_EXPECT_EQ(test, access, 0); 252 258 } 253 259 ··· 342 370 return false; 343 371 } 344 372 345 - if (WARN_ON_ONCE(!!request->layer_masks ^ !!request->layer_masks_size)) 346 - return false; 347 - 348 373 if (request->deny_masks) { 349 374 if (WARN_ON_ONCE(!request->all_existing_optional_access)) 350 375 return false; ··· 375 406 if (missing) { 376 407 /* Gets the nearest domain that denies the request. */ 377 408 if (request->layer_masks) { 378 - youngest_layer = get_denied_layer( 379 - subject->domain, &missing, request->layer_masks, 380 - request->layer_masks_size); 409 + youngest_layer = get_denied_layer(subject->domain, 410 + &missing, 411 + request->layer_masks); 381 412 } else { 382 413 youngest_layer = get_layer_from_deny_masks( 383 - &missing, request->all_existing_optional_access, 414 + &missing, _LANDLOCK_ACCESS_FS_OPTIONAL, 384 415 request->deny_masks); 385 416 } 386 417 youngest_denied =
+1 -2
security/landlock/audit.h
··· 43 43 access_mask_t access; 44 44 45 45 /* Required fields for requests with layer masks. */ 46 - const layer_mask_t (*layer_masks)[]; 47 - size_t layer_masks_size; 46 + const struct layer_access_masks *layer_masks; 48 47 49 48 /* Required fields for requests with deny masks. */ 50 49 const access_mask_t all_existing_optional_access;
+12
security/landlock/cred.h
··· 26 26 * This structure is packed to minimize the size of struct 27 27 * landlock_file_security. However, it is always aligned in the LSM cred blob, 28 28 * see lsm_set_blob_size(). 29 + * 30 + * When updating this, also update landlock_cred_copy() if needed. 29 31 */ 30 32 struct landlock_cred_security { 31 33 /** ··· 65 63 landlock_cred(const struct cred *cred) 66 64 { 67 65 return cred->security + landlock_blob_sizes.lbs_cred; 66 + } 67 + 68 + static inline void landlock_cred_copy(struct landlock_cred_security *dst, 69 + const struct landlock_cred_security *src) 70 + { 71 + landlock_put_ruleset(dst->domain); 72 + 73 + *dst = *src; 74 + 75 + landlock_get_ruleset(src->domain); 68 76 } 69 77 70 78 static inline struct landlock_ruleset *landlock_get_current_domain(void)
+24 -20
security/landlock/domain.c
··· 182 182 deny_masks_t 183 183 landlock_get_deny_masks(const access_mask_t all_existing_optional_access, 184 184 const access_mask_t optional_access, 185 - const layer_mask_t (*const layer_masks)[], 186 - const size_t layer_masks_size) 185 + const struct layer_access_masks *const masks) 187 186 { 188 187 const unsigned long access_opt = optional_access; 189 188 unsigned long access_bit; 190 189 deny_masks_t deny_masks = 0; 190 + access_mask_t all_denied = 0; 191 191 192 192 /* This may require change with new object types. */ 193 - WARN_ON_ONCE(access_opt != 194 - (optional_access & all_existing_optional_access)); 193 + WARN_ON_ONCE(!access_mask_subset(optional_access, 194 + all_existing_optional_access)); 195 195 196 - if (WARN_ON_ONCE(!layer_masks)) 196 + if (WARN_ON_ONCE(!masks)) 197 197 return 0; 198 198 199 199 if (WARN_ON_ONCE(!access_opt)) 200 200 return 0; 201 201 202 - for_each_set_bit(access_bit, &access_opt, layer_masks_size) { 203 - const layer_mask_t mask = (*layer_masks)[access_bit]; 202 + for (ssize_t i = ARRAY_SIZE(masks->access) - 1; i >= 0; i--) { 203 + const access_mask_t denied = masks->access[i] & optional_access; 204 + const unsigned long newly_denied = denied & ~all_denied; 204 205 205 - if (!mask) 206 + if (!newly_denied) 206 207 continue; 207 208 208 - /* __fls(1) == 0 */ 209 - deny_masks |= get_layer_deny_mask(all_existing_optional_access, 210 - access_bit, __fls(mask)); 209 + for_each_set_bit(access_bit, &newly_denied, 210 + 8 * sizeof(access_mask_t)) { 211 + deny_masks |= get_layer_deny_mask( 212 + all_existing_optional_access, access_bit, i); 213 + } 214 + all_denied |= denied; 211 215 } 212 216 return deny_masks; 213 217 } ··· 220 216 221 217 static void test_landlock_get_deny_masks(struct kunit *const test) 222 218 { 223 - const layer_mask_t layers1[BITS_PER_TYPE(access_mask_t)] = { 224 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT_ULL(0) | 225 - BIT_ULL(9), 226 - [BIT_INDEX(LANDLOCK_ACCESS_FS_TRUNCATE)] = BIT_ULL(1), 227 - [BIT_INDEX(LANDLOCK_ACCESS_FS_IOCTL_DEV)] = BIT_ULL(2) | 228 - BIT_ULL(0), 219 + const struct layer_access_masks layers1 = { 220 + .access[0] = LANDLOCK_ACCESS_FS_EXECUTE | 221 + LANDLOCK_ACCESS_FS_IOCTL_DEV, 222 + .access[1] = LANDLOCK_ACCESS_FS_TRUNCATE, 223 + .access[2] = LANDLOCK_ACCESS_FS_IOCTL_DEV, 224 + .access[9] = LANDLOCK_ACCESS_FS_EXECUTE, 229 225 }; 230 226 231 227 KUNIT_EXPECT_EQ(test, 0x1, 232 228 landlock_get_deny_masks(_LANDLOCK_ACCESS_FS_OPTIONAL, 233 229 LANDLOCK_ACCESS_FS_TRUNCATE, 234 - &layers1, ARRAY_SIZE(layers1))); 230 + &layers1)); 235 231 KUNIT_EXPECT_EQ(test, 0x20, 236 232 landlock_get_deny_masks(_LANDLOCK_ACCESS_FS_OPTIONAL, 237 233 LANDLOCK_ACCESS_FS_IOCTL_DEV, 238 - &layers1, ARRAY_SIZE(layers1))); 234 + &layers1)); 239 235 KUNIT_EXPECT_EQ( 240 236 test, 0x21, 241 237 landlock_get_deny_masks(_LANDLOCK_ACCESS_FS_OPTIONAL, 242 238 LANDLOCK_ACCESS_FS_TRUNCATE | 243 239 LANDLOCK_ACCESS_FS_IOCTL_DEV, 244 - &layers1, ARRAY_SIZE(layers1))); 240 + &layers1)); 245 241 } 246 242 247 243 #endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */
+1 -2
security/landlock/domain.h
··· 122 122 deny_masks_t 123 123 landlock_get_deny_masks(const access_mask_t all_existing_optional_access, 124 124 const access_mask_t optional_access, 125 - const layer_mask_t (*const layer_masks)[], 126 - size_t layer_masks_size); 125 + const struct layer_access_masks *const masks); 127 126 128 127 int landlock_init_hierarchy_log(struct landlock_hierarchy *const hierarchy); 129 128
+8
security/landlock/errata/abi-1.h
··· 12 12 * hierarchy down to its filesystem root and those from the related mount point 13 13 * hierarchy. This prevents access right widening through rename or link 14 14 * actions. 15 + * 16 + * Impact: 17 + * 18 + * Without this fix, it was possible to widen access rights through rename or 19 + * link actions involving disconnected directories, potentially bypassing 20 + * ``LANDLOCK_ACCESS_FS_REFER`` restrictions. This could allow privilege 21 + * escalation in complex mount scenarios where directories become disconnected 22 + * from their original mount points. 15 23 */ 16 24 LANDLOCK_ERRATUM(3)
+7
security/landlock/errata/abi-4.h
··· 11 11 * :manpage:`bind(2)` and :manpage:`connect(2)` operations. This change ensures 12 12 * that only TCP sockets are subject to TCP access rights, allowing other 13 13 * protocols to operate without unnecessary restrictions. 14 + * 15 + * Impact: 16 + * 17 + * In kernels without this fix, using ``LANDLOCK_ACCESS_NET_BIND_TCP`` or 18 + * ``LANDLOCK_ACCESS_NET_CONNECT_TCP`` would incorrectly restrict non-TCP 19 + * stream protocols (SMC, MPTCP, SCTP), potentially breaking applications 20 + * that rely on these protocols while using Landlock network restrictions. 14 21 */ 15 22 LANDLOCK_ERRATUM(1)
+10
security/landlock/errata/abi-6.h
··· 15 15 * interaction between threads of the same process should always be allowed. 16 16 * This change ensures that any thread is allowed to send signals to any other 17 17 * thread within the same process, regardless of their domain. 18 + * 19 + * Impact: 20 + * 21 + * This problem only manifests when the userspace process is itself using 22 + * :manpage:`libpsx(3)` or an equivalent mechanism to enforce a Landlock policy 23 + * on multiple already-running threads at once. Programs which enforce a 24 + * Landlock policy at startup time and only then become multithreaded are not 25 + * affected. Without this fix, signal scoping could break multi-threaded 26 + * applications that expect threads within the same process to freely signal 27 + * each other. 18 28 */ 19 29 LANDLOCK_ERRATUM(2)
+159 -193
security/landlock/fs.c
··· 331 331 332 332 /* Files only get access rights that make sense. */ 333 333 if (!d_is_dir(path->dentry) && 334 - (access_rights | ACCESS_FILE) != ACCESS_FILE) 334 + !access_mask_subset(access_rights, ACCESS_FILE)) 335 335 return -EINVAL; 336 336 if (WARN_ON_ONCE(ruleset->num_layers != 1)) 337 337 return -EINVAL; ··· 399 399 }; 400 400 401 401 /* 402 + * Returns true iff the child file with the given src_child access rights under 403 + * src_parent would result in having the same or fewer access rights if it were 404 + * moved under new_parent. 405 + */ 406 + static bool may_refer(const struct layer_access_masks *const src_parent, 407 + const struct layer_access_masks *const src_child, 408 + const struct layer_access_masks *const new_parent, 409 + const bool child_is_dir) 410 + { 411 + for (size_t i = 0; i < ARRAY_SIZE(new_parent->access); i++) { 412 + access_mask_t child_access = src_parent->access[i] & 413 + src_child->access[i]; 414 + access_mask_t parent_access = new_parent->access[i]; 415 + 416 + if (!child_is_dir) { 417 + child_access &= ACCESS_FILE; 418 + parent_access &= ACCESS_FILE; 419 + } 420 + 421 + if (!access_mask_subset(child_access, parent_access)) 422 + return false; 423 + } 424 + return true; 425 + } 426 + 427 + /* 402 428 * Check that a destination file hierarchy has more restrictions than a source 403 429 * file hierarchy. This is only used for link and rename actions. 404 430 * 405 - * @layer_masks_child2: Optional child masks. 431 + * Returns: true if child1 may be moved from parent1 to parent2 without 432 + * increasing its access rights. If child2 is set, an additional condition is 433 + * that child2 may be used from parent2 to parent1 without increasing its access 434 + * rights. 406 435 */ 407 - static bool no_more_access( 408 - const layer_mask_t (*const layer_masks_parent1)[LANDLOCK_NUM_ACCESS_FS], 409 - const layer_mask_t (*const layer_masks_child1)[LANDLOCK_NUM_ACCESS_FS], 410 - const bool child1_is_directory, 411 - const layer_mask_t (*const layer_masks_parent2)[LANDLOCK_NUM_ACCESS_FS], 412 - const layer_mask_t (*const layer_masks_child2)[LANDLOCK_NUM_ACCESS_FS], 413 - const bool child2_is_directory) 436 + static bool no_more_access(const struct layer_access_masks *const parent1, 437 + const struct layer_access_masks *const child1, 438 + const bool child1_is_dir, 439 + const struct layer_access_masks *const parent2, 440 + const struct layer_access_masks *const child2, 441 + const bool child2_is_dir) 414 442 { 415 - unsigned long access_bit; 443 + if (!may_refer(parent1, child1, parent2, child1_is_dir)) 444 + return false; 416 445 417 - for (access_bit = 0; access_bit < ARRAY_SIZE(*layer_masks_parent2); 418 - access_bit++) { 419 - /* Ignores accesses that only make sense for directories. */ 420 - const bool is_file_access = 421 - !!(BIT_ULL(access_bit) & ACCESS_FILE); 446 + if (!child2) 447 + return true; 422 448 423 - if (child1_is_directory || is_file_access) { 424 - /* 425 - * Checks if the destination restrictions are a 426 - * superset of the source ones (i.e. inherited access 427 - * rights without child exceptions): 428 - * restrictions(parent2) >= restrictions(child1) 429 - */ 430 - if ((((*layer_masks_parent1)[access_bit] & 431 - (*layer_masks_child1)[access_bit]) | 432 - (*layer_masks_parent2)[access_bit]) != 433 - (*layer_masks_parent2)[access_bit]) 434 - return false; 435 - } 436 - 437 - if (!layer_masks_child2) 438 - continue; 439 - if (child2_is_directory || is_file_access) { 440 - /* 441 - * Checks inverted restrictions for RENAME_EXCHANGE: 442 - * restrictions(parent1) >= restrictions(child2) 443 - */ 444 - if ((((*layer_masks_parent2)[access_bit] & 445 - (*layer_masks_child2)[access_bit]) | 446 - (*layer_masks_parent1)[access_bit]) != 447 - (*layer_masks_parent1)[access_bit]) 448 - return false; 449 - } 450 - } 451 - return true; 449 + return may_refer(parent2, child2, parent1, child2_is_dir); 452 450 } 453 451 454 452 #define NMA_TRUE(...) KUNIT_EXPECT_TRUE(test, no_more_access(__VA_ARGS__)) ··· 456 458 457 459 static void test_no_more_access(struct kunit *const test) 458 460 { 459 - const layer_mask_t rx0[LANDLOCK_NUM_ACCESS_FS] = { 460 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT_ULL(0), 461 - [BIT_INDEX(LANDLOCK_ACCESS_FS_READ_FILE)] = BIT_ULL(0), 461 + const struct layer_access_masks rx0 = { 462 + .access[0] = LANDLOCK_ACCESS_FS_EXECUTE | 463 + LANDLOCK_ACCESS_FS_READ_FILE, 462 464 }; 463 - const layer_mask_t mx0[LANDLOCK_NUM_ACCESS_FS] = { 464 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT_ULL(0), 465 - [BIT_INDEX(LANDLOCK_ACCESS_FS_MAKE_REG)] = BIT_ULL(0), 465 + const struct layer_access_masks mx0 = { 466 + .access[0] = LANDLOCK_ACCESS_FS_EXECUTE | 467 + LANDLOCK_ACCESS_FS_MAKE_REG, 466 468 }; 467 - const layer_mask_t x0[LANDLOCK_NUM_ACCESS_FS] = { 468 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT_ULL(0), 469 + const struct layer_access_masks x0 = { 470 + .access[0] = LANDLOCK_ACCESS_FS_EXECUTE, 469 471 }; 470 - const layer_mask_t x1[LANDLOCK_NUM_ACCESS_FS] = { 471 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT_ULL(1), 472 + const struct layer_access_masks x1 = { 473 + .access[1] = LANDLOCK_ACCESS_FS_EXECUTE, 472 474 }; 473 - const layer_mask_t x01[LANDLOCK_NUM_ACCESS_FS] = { 474 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT_ULL(0) | 475 - BIT_ULL(1), 475 + const struct layer_access_masks x01 = { 476 + .access[0] = LANDLOCK_ACCESS_FS_EXECUTE, 477 + .access[1] = LANDLOCK_ACCESS_FS_EXECUTE, 476 478 }; 477 - const layer_mask_t allows_all[LANDLOCK_NUM_ACCESS_FS] = {}; 479 + const struct layer_access_masks allows_all = {}; 478 480 479 481 /* Checks without restriction. */ 480 482 NMA_TRUE(&x0, &allows_all, false, &allows_all, NULL, false); ··· 562 564 #undef NMA_TRUE 563 565 #undef NMA_FALSE 564 566 565 - static bool is_layer_masks_allowed( 566 - layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS]) 567 + static bool is_layer_masks_allowed(const struct layer_access_masks *masks) 567 568 { 568 - return !memchr_inv(layer_masks, 0, sizeof(*layer_masks)); 569 + return !memchr_inv(&masks->access, 0, sizeof(masks->access)); 569 570 } 570 571 571 572 /* 572 - * Removes @layer_masks accesses that are not requested. 573 + * Removes @masks accesses that are not requested. 573 574 * 574 575 * Returns true if the request is allowed, false otherwise. 575 576 */ 576 - static bool 577 - scope_to_request(const access_mask_t access_request, 578 - layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS]) 577 + static bool scope_to_request(const access_mask_t access_request, 578 + struct layer_access_masks *masks) 579 579 { 580 - const unsigned long access_req = access_request; 581 - unsigned long access_bit; 580 + bool saw_unfulfilled_access = false; 582 581 583 - if (WARN_ON_ONCE(!layer_masks)) 582 + if (WARN_ON_ONCE(!masks)) 584 583 return true; 585 584 586 - for_each_clear_bit(access_bit, &access_req, ARRAY_SIZE(*layer_masks)) 587 - (*layer_masks)[access_bit] = 0; 588 - 589 - return is_layer_masks_allowed(layer_masks); 585 + for (size_t i = 0; i < ARRAY_SIZE(masks->access); i++) { 586 + masks->access[i] &= access_request; 587 + if (masks->access[i]) 588 + saw_unfulfilled_access = true; 589 + } 590 + return !saw_unfulfilled_access; 590 591 } 591 592 592 593 #ifdef CONFIG_SECURITY_LANDLOCK_KUNIT_TEST ··· 593 596 static void test_scope_to_request_with_exec_none(struct kunit *const test) 594 597 { 595 598 /* Allows everything. */ 596 - layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {}; 599 + struct layer_access_masks masks = {}; 597 600 598 601 /* Checks and scopes with execute. */ 599 - KUNIT_EXPECT_TRUE(test, scope_to_request(LANDLOCK_ACCESS_FS_EXECUTE, 600 - &layer_masks)); 601 - KUNIT_EXPECT_EQ(test, 0, 602 - layer_masks[BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)]); 603 - KUNIT_EXPECT_EQ(test, 0, 604 - layer_masks[BIT_INDEX(LANDLOCK_ACCESS_FS_WRITE_FILE)]); 602 + KUNIT_EXPECT_TRUE(test, 603 + scope_to_request(LANDLOCK_ACCESS_FS_EXECUTE, &masks)); 604 + KUNIT_EXPECT_EQ(test, 0, masks.access[0]); 605 605 } 606 606 607 607 static void test_scope_to_request_with_exec_some(struct kunit *const test) 608 608 { 609 609 /* Denies execute and write. */ 610 - layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = { 611 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT_ULL(0), 612 - [BIT_INDEX(LANDLOCK_ACCESS_FS_WRITE_FILE)] = BIT_ULL(1), 610 + struct layer_access_masks masks = { 611 + .access[0] = LANDLOCK_ACCESS_FS_EXECUTE, 612 + .access[1] = LANDLOCK_ACCESS_FS_WRITE_FILE, 613 613 }; 614 614 615 615 /* Checks and scopes with execute. */ 616 616 KUNIT_EXPECT_FALSE(test, scope_to_request(LANDLOCK_ACCESS_FS_EXECUTE, 617 - &layer_masks)); 618 - KUNIT_EXPECT_EQ(test, BIT_ULL(0), 619 - layer_masks[BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)]); 620 - KUNIT_EXPECT_EQ(test, 0, 621 - layer_masks[BIT_INDEX(LANDLOCK_ACCESS_FS_WRITE_FILE)]); 617 + &masks)); 618 + KUNIT_EXPECT_EQ(test, LANDLOCK_ACCESS_FS_EXECUTE, masks.access[0]); 619 + KUNIT_EXPECT_EQ(test, 0, masks.access[1]); 622 620 } 623 621 624 622 static void test_scope_to_request_without_access(struct kunit *const test) 625 623 { 626 624 /* Denies execute and write. */ 627 - layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = { 628 - [BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)] = BIT_ULL(0), 629 - [BIT_INDEX(LANDLOCK_ACCESS_FS_WRITE_FILE)] = BIT_ULL(1), 625 + struct layer_access_masks masks = { 626 + .access[0] = LANDLOCK_ACCESS_FS_EXECUTE, 627 + .access[1] = LANDLOCK_ACCESS_FS_WRITE_FILE, 630 628 }; 631 629 632 630 /* Checks and scopes without access request. */ 633 - KUNIT_EXPECT_TRUE(test, scope_to_request(0, &layer_masks)); 634 - KUNIT_EXPECT_EQ(test, 0, 635 - layer_masks[BIT_INDEX(LANDLOCK_ACCESS_FS_EXECUTE)]); 636 - KUNIT_EXPECT_EQ(test, 0, 637 - layer_masks[BIT_INDEX(LANDLOCK_ACCESS_FS_WRITE_FILE)]); 631 + KUNIT_EXPECT_TRUE(test, scope_to_request(0, &masks)); 632 + KUNIT_EXPECT_EQ(test, 0, masks.access[0]); 633 + KUNIT_EXPECT_EQ(test, 0, masks.access[1]); 638 634 } 639 635 640 636 #endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */ ··· 636 646 * Returns true if there is at least one access right different than 637 647 * LANDLOCK_ACCESS_FS_REFER. 638 648 */ 639 - static bool 640 - is_eacces(const layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS], 641 - const access_mask_t access_request) 649 + static bool is_eacces(const struct layer_access_masks *masks, 650 + const access_mask_t access_request) 642 651 { 643 - unsigned long access_bit; 644 - /* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */ 645 - const unsigned long access_check = access_request & 646 - ~LANDLOCK_ACCESS_FS_REFER; 647 - 648 - if (!layer_masks) 652 + if (!masks) 649 653 return false; 650 654 651 - for_each_set_bit(access_bit, &access_check, ARRAY_SIZE(*layer_masks)) { 652 - if ((*layer_masks)[access_bit]) 655 + for (size_t i = 0; i < ARRAY_SIZE(masks->access); i++) { 656 + /* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */ 657 + if (masks->access[i] & access_request & 658 + ~LANDLOCK_ACCESS_FS_REFER) 653 659 return true; 654 660 } 655 661 return false; ··· 658 672 659 673 static void test_is_eacces_with_none(struct kunit *const test) 660 674 { 661 - const layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {}; 675 + const struct layer_access_masks masks = {}; 662 676 663 - IE_FALSE(&layer_masks, 0); 664 - IE_FALSE(&layer_masks, LANDLOCK_ACCESS_FS_REFER); 665 - IE_FALSE(&layer_masks, LANDLOCK_ACCESS_FS_EXECUTE); 666 - IE_FALSE(&layer_masks, LANDLOCK_ACCESS_FS_WRITE_FILE); 677 + IE_FALSE(&masks, 0); 678 + IE_FALSE(&masks, LANDLOCK_ACCESS_FS_REFER); 679 + IE_FALSE(&masks, LANDLOCK_ACCESS_FS_EXECUTE); 680 + IE_FALSE(&masks, LANDLOCK_ACCESS_FS_WRITE_FILE); 667 681 } 668 682 669 683 static void test_is_eacces_with_refer(struct kunit *const test) 670 684 { 671 - const layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = { 672 - [BIT_INDEX(LANDLOCK_ACCESS_FS_REFER)] = BIT_ULL(0), 685 + const struct layer_access_masks masks = { 686 + .access[0] = LANDLOCK_ACCESS_FS_REFER, 673 687 }; 674 688 675 - IE_FALSE(&layer_masks, 0); 676 - IE_FALSE(&layer_masks, LANDLOCK_ACCESS_FS_REFER); 677 - IE_FALSE(&layer_masks, LANDLOCK_ACCESS_FS_EXECUTE); 678 - IE_FALSE(&layer_masks, LANDLOCK_ACCESS_FS_WRITE_FILE); 689 + IE_FALSE(&masks, 0); 690 + IE_FALSE(&masks, LANDLOCK_ACCESS_FS_REFER); 691 + IE_FALSE(&masks, LANDLOCK_ACCESS_FS_EXECUTE); 692 + IE_FALSE(&masks, LANDLOCK_ACCESS_FS_WRITE_FILE); 679 693 } 680 694 681 695 static void test_is_eacces_with_write(struct kunit *const test) 682 696 { 683 - const layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = { 684 - [BIT_INDEX(LANDLOCK_ACCESS_FS_WRITE_FILE)] = BIT_ULL(0), 697 + const struct layer_access_masks masks = { 698 + .access[0] = LANDLOCK_ACCESS_FS_WRITE_FILE, 685 699 }; 686 700 687 - IE_FALSE(&layer_masks, 0); 688 - IE_FALSE(&layer_masks, LANDLOCK_ACCESS_FS_REFER); 689 - IE_FALSE(&layer_masks, LANDLOCK_ACCESS_FS_EXECUTE); 701 + IE_FALSE(&masks, 0); 702 + IE_FALSE(&masks, LANDLOCK_ACCESS_FS_REFER); 703 + IE_FALSE(&masks, LANDLOCK_ACCESS_FS_EXECUTE); 690 704 691 - IE_TRUE(&layer_masks, LANDLOCK_ACCESS_FS_WRITE_FILE); 705 + IE_TRUE(&masks, LANDLOCK_ACCESS_FS_WRITE_FILE); 692 706 } 693 707 694 708 #endif /* CONFIG_SECURITY_LANDLOCK_KUNIT_TEST */ ··· 738 752 * - true if the access request is granted; 739 753 * - false otherwise. 740 754 */ 741 - static bool is_access_to_paths_allowed( 742 - const struct landlock_ruleset *const domain, 743 - const struct path *const path, 744 - const access_mask_t access_request_parent1, 745 - layer_mask_t (*const layer_masks_parent1)[LANDLOCK_NUM_ACCESS_FS], 746 - struct landlock_request *const log_request_parent1, 747 - struct dentry *const dentry_child1, 748 - const access_mask_t access_request_parent2, 749 - layer_mask_t (*const layer_masks_parent2)[LANDLOCK_NUM_ACCESS_FS], 750 - struct landlock_request *const log_request_parent2, 751 - struct dentry *const dentry_child2) 755 + static bool 756 + is_access_to_paths_allowed(const struct landlock_ruleset *const domain, 757 + const struct path *const path, 758 + const access_mask_t access_request_parent1, 759 + struct layer_access_masks *layer_masks_parent1, 760 + struct landlock_request *const log_request_parent1, 761 + struct dentry *const dentry_child1, 762 + const access_mask_t access_request_parent2, 763 + struct layer_access_masks *layer_masks_parent2, 764 + struct landlock_request *const log_request_parent2, 765 + struct dentry *const dentry_child2) 752 766 { 753 767 bool allowed_parent1 = false, allowed_parent2 = false, is_dom_check, 754 768 child1_is_directory = true, child2_is_directory = true; 755 769 struct path walker_path; 756 770 access_mask_t access_masked_parent1, access_masked_parent2; 757 - layer_mask_t _layer_masks_child1[LANDLOCK_NUM_ACCESS_FS], 758 - _layer_masks_child2[LANDLOCK_NUM_ACCESS_FS]; 759 - layer_mask_t(*layer_masks_child1)[LANDLOCK_NUM_ACCESS_FS] = NULL, 760 - (*layer_masks_child2)[LANDLOCK_NUM_ACCESS_FS] = NULL; 771 + struct layer_access_masks _layer_masks_child1, _layer_masks_child2; 772 + struct layer_access_masks *layer_masks_child1 = NULL, 773 + *layer_masks_child2 = NULL; 761 774 762 775 if (!access_request_parent1 && !access_request_parent2) 763 776 return true; ··· 796 811 } 797 812 798 813 if (unlikely(dentry_child1)) { 799 - landlock_unmask_layers( 800 - find_rule(domain, dentry_child1), 801 - landlock_init_layer_masks( 802 - domain, LANDLOCK_MASK_ACCESS_FS, 803 - &_layer_masks_child1, LANDLOCK_KEY_INODE), 804 - &_layer_masks_child1, ARRAY_SIZE(_layer_masks_child1)); 814 + if (landlock_init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS, 815 + &_layer_masks_child1, 816 + LANDLOCK_KEY_INODE)) 817 + landlock_unmask_layers(find_rule(domain, dentry_child1), 818 + &_layer_masks_child1); 805 819 layer_masks_child1 = &_layer_masks_child1; 806 820 child1_is_directory = d_is_dir(dentry_child1); 807 821 } 808 822 if (unlikely(dentry_child2)) { 809 - landlock_unmask_layers( 810 - find_rule(domain, dentry_child2), 811 - landlock_init_layer_masks( 812 - domain, LANDLOCK_MASK_ACCESS_FS, 813 - &_layer_masks_child2, LANDLOCK_KEY_INODE), 814 - &_layer_masks_child2, ARRAY_SIZE(_layer_masks_child2)); 823 + if (landlock_init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS, 824 + &_layer_masks_child2, 825 + LANDLOCK_KEY_INODE)) 826 + landlock_unmask_layers(find_rule(domain, dentry_child2), 827 + &_layer_masks_child2); 815 828 layer_masks_child2 = &_layer_masks_child2; 816 829 child2_is_directory = d_is_dir(dentry_child2); 817 830 } ··· 864 881 } 865 882 866 883 rule = find_rule(domain, walker_path.dentry); 867 - allowed_parent1 = allowed_parent1 || 868 - landlock_unmask_layers( 869 - rule, access_masked_parent1, 870 - layer_masks_parent1, 871 - ARRAY_SIZE(*layer_masks_parent1)); 872 - allowed_parent2 = allowed_parent2 || 873 - landlock_unmask_layers( 874 - rule, access_masked_parent2, 875 - layer_masks_parent2, 876 - ARRAY_SIZE(*layer_masks_parent2)); 884 + allowed_parent1 = 885 + allowed_parent1 || 886 + landlock_unmask_layers(rule, layer_masks_parent1); 887 + allowed_parent2 = 888 + allowed_parent2 || 889 + landlock_unmask_layers(rule, layer_masks_parent2); 877 890 878 891 /* Stops when a rule from each layer grants access. */ 879 892 if (allowed_parent1 && allowed_parent2) ··· 929 950 log_request_parent1->audit.u.path = *path; 930 951 log_request_parent1->access = access_masked_parent1; 931 952 log_request_parent1->layer_masks = layer_masks_parent1; 932 - log_request_parent1->layer_masks_size = 933 - ARRAY_SIZE(*layer_masks_parent1); 934 953 } 935 954 936 955 if (!allowed_parent2 && log_request_parent2) { ··· 937 960 log_request_parent2->audit.u.path = *path; 938 961 log_request_parent2->access = access_masked_parent2; 939 962 log_request_parent2->layer_masks = layer_masks_parent2; 940 - log_request_parent2->layer_masks_size = 941 - ARRAY_SIZE(*layer_masks_parent2); 942 963 } 943 964 #endif /* CONFIG_AUDIT */ 944 965 ··· 951 976 }; 952 977 const struct landlock_cred_security *const subject = 953 978 landlock_get_applicable_subject(current_cred(), masks, NULL); 954 - layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {}; 979 + struct layer_access_masks layer_masks; 955 980 struct landlock_request request = {}; 956 981 957 982 if (!subject) ··· 1026 1051 * - true if all the domain access rights are allowed for @dir; 1027 1052 * - false if the walk reached @mnt_root. 1028 1053 */ 1029 - static bool collect_domain_accesses( 1030 - const struct landlock_ruleset *const domain, 1031 - const struct dentry *const mnt_root, struct dentry *dir, 1032 - layer_mask_t (*const layer_masks_dom)[LANDLOCK_NUM_ACCESS_FS]) 1054 + static bool collect_domain_accesses(const struct landlock_ruleset *const domain, 1055 + const struct dentry *const mnt_root, 1056 + struct dentry *dir, 1057 + struct layer_access_masks *layer_masks_dom) 1033 1058 { 1034 - unsigned long access_dom; 1035 1059 bool ret = false; 1036 1060 1037 1061 if (WARN_ON_ONCE(!domain || !mnt_root || !dir || !layer_masks_dom)) ··· 1038 1064 if (is_nouser_or_private(dir)) 1039 1065 return true; 1040 1066 1041 - access_dom = landlock_init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS, 1042 - layer_masks_dom, 1043 - LANDLOCK_KEY_INODE); 1067 + if (!landlock_init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS, 1068 + layer_masks_dom, LANDLOCK_KEY_INODE)) 1069 + return true; 1044 1070 1045 1071 dget(dir); 1046 1072 while (true) { 1047 1073 struct dentry *parent_dentry; 1048 1074 1049 1075 /* Gets all layers allowing all domain accesses. */ 1050 - if (landlock_unmask_layers(find_rule(domain, dir), access_dom, 1051 - layer_masks_dom, 1052 - ARRAY_SIZE(*layer_masks_dom))) { 1076 + if (landlock_unmask_layers(find_rule(domain, dir), 1077 + layer_masks_dom)) { 1053 1078 /* 1054 1079 * Stops when all handled accesses are allowed by at 1055 1080 * least one rule in each layer. ··· 1136 1163 access_mask_t access_request_parent1, access_request_parent2; 1137 1164 struct path mnt_dir; 1138 1165 struct dentry *old_parent; 1139 - layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS] = {}, 1140 - layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS] = {}; 1166 + struct layer_access_masks layer_masks_parent1 = {}, 1167 + layer_masks_parent2 = {}; 1141 1168 struct landlock_request request1 = {}, request2 = {}; 1142 1169 1143 1170 if (!subject) ··· 1613 1640 1614 1641 static int hook_file_open(struct file *const file) 1615 1642 { 1616 - layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {}; 1643 + struct layer_access_masks layer_masks = {}; 1617 1644 access_mask_t open_access_request, full_access_request, allowed_access, 1618 1645 optional_access; 1619 1646 const struct landlock_cred_security *const subject = ··· 1648 1675 &layer_masks, &request, NULL, 0, NULL, NULL, NULL)) { 1649 1676 allowed_access = full_access_request; 1650 1677 } else { 1651 - unsigned long access_bit; 1652 - const unsigned long access_req = full_access_request; 1653 - 1654 1678 /* 1655 1679 * Calculate the actual allowed access rights from layer_masks. 1656 - * Add each access right to allowed_access which has not been 1657 - * vetoed by any layer. 1680 + * Remove the access rights from the full access request which 1681 + * are still unfulfilled in any of the layers. 1658 1682 */ 1659 - allowed_access = 0; 1660 - for_each_set_bit(access_bit, &access_req, 1661 - ARRAY_SIZE(layer_masks)) { 1662 - if (!layer_masks[access_bit]) 1663 - allowed_access |= BIT_ULL(access_bit); 1664 - } 1683 + allowed_access = full_access_request; 1684 + for (size_t i = 0; i < ARRAY_SIZE(layer_masks.access); i++) 1685 + allowed_access &= ~layer_masks.access[i]; 1665 1686 } 1666 1687 1667 1688 /* ··· 1667 1700 landlock_file(file)->allowed_access = allowed_access; 1668 1701 #ifdef CONFIG_AUDIT 1669 1702 landlock_file(file)->deny_masks = landlock_get_deny_masks( 1670 - _LANDLOCK_ACCESS_FS_OPTIONAL, optional_access, &layer_masks, 1671 - ARRAY_SIZE(layer_masks)); 1703 + _LANDLOCK_ACCESS_FS_OPTIONAL, optional_access, &layer_masks); 1672 1704 #endif /* CONFIG_AUDIT */ 1673 1705 1674 - if ((open_access_request & allowed_access) == open_access_request) 1706 + if (access_mask_subset(open_access_request, allowed_access)) 1675 1707 return 0; 1676 1708 1677 1709 /* Sets access to reflect the actual request. */
+1 -1
security/landlock/limits.h
··· 31 31 #define LANDLOCK_MASK_SCOPE ((LANDLOCK_LAST_SCOPE << 1) - 1) 32 32 #define LANDLOCK_NUM_SCOPE __const_hweight64(LANDLOCK_MASK_SCOPE) 33 33 34 - #define LANDLOCK_LAST_RESTRICT_SELF LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF 34 + #define LANDLOCK_LAST_RESTRICT_SELF LANDLOCK_RESTRICT_SELF_TSYNC 35 35 #define LANDLOCK_MASK_RESTRICT_SELF ((LANDLOCK_LAST_RESTRICT_SELF << 1) - 1) 36 36 37 37 /* clang-format on */
+21 -9
security/landlock/net.c
··· 47 47 access_mask_t access_request) 48 48 { 49 49 __be16 port; 50 - layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_NET] = {}; 50 + struct layer_access_masks layer_masks = {}; 51 51 const struct landlock_rule *rule; 52 52 struct landlock_id id = { 53 53 .type = LANDLOCK_KEY_NET_PORT, ··· 60 60 struct lsm_network_audit audit_net = {}; 61 61 62 62 if (!subject) 63 - return 0; 64 - 65 - if (!sk_is_tcp(sock->sk)) 66 63 return 0; 67 64 68 65 /* Checks for minimal header length to safely read sa_family. */ ··· 191 194 access_request = landlock_init_layer_masks(subject->domain, 192 195 access_request, &layer_masks, 193 196 LANDLOCK_KEY_NET_PORT); 194 - if (landlock_unmask_layers(rule, access_request, &layer_masks, 195 - ARRAY_SIZE(layer_masks))) 197 + if (!access_request) 198 + return 0; 199 + 200 + if (landlock_unmask_layers(rule, &layer_masks)) 196 201 return 0; 197 202 198 203 audit_net.family = address->sa_family; ··· 205 206 .audit.u.net = &audit_net, 206 207 .access = access_request, 207 208 .layer_masks = &layer_masks, 208 - .layer_masks_size = ARRAY_SIZE(layer_masks), 209 209 }); 210 210 return -EACCES; 211 211 } ··· 212 214 static int hook_socket_bind(struct socket *const sock, 213 215 struct sockaddr *const address, const int addrlen) 214 216 { 217 + access_mask_t access_request; 218 + 219 + if (sk_is_tcp(sock->sk)) 220 + access_request = LANDLOCK_ACCESS_NET_BIND_TCP; 221 + else 222 + return 0; 223 + 215 224 return current_check_access_socket(sock, address, addrlen, 216 - LANDLOCK_ACCESS_NET_BIND_TCP); 225 + access_request); 217 226 } 218 227 219 228 static int hook_socket_connect(struct socket *const sock, 220 229 struct sockaddr *const address, 221 230 const int addrlen) 222 231 { 232 + access_mask_t access_request; 233 + 234 + if (sk_is_tcp(sock->sk)) 235 + access_request = LANDLOCK_ACCESS_NET_CONNECT_TCP; 236 + else 237 + return 0; 238 + 223 239 return current_check_access_socket(sock, address, addrlen, 224 - LANDLOCK_ACCESS_NET_CONNECT_TCP); 240 + access_request); 225 241 } 226 242 227 243 static struct security_hook_list landlock_hooks[] __ro_after_init = {
+34 -55
security/landlock/ruleset.c
··· 612 612 return NULL; 613 613 } 614 614 615 - /* 616 - * @layer_masks is read and may be updated according to the access request and 617 - * the matching rule. 618 - * @masks_array_size must be equal to ARRAY_SIZE(*layer_masks). 615 + /** 616 + * landlock_unmask_layers - Remove the access rights in @masks 617 + * which are granted in @rule 619 618 * 620 - * Returns true if the request is allowed (i.e. relevant layer masks for the 621 - * request are empty). 619 + * Updates the set of (per-layer) unfulfilled access rights @masks 620 + * so that all the access rights granted in @rule are removed from it 621 + * (because they are now fulfilled). 622 + * 623 + * @rule: A rule that grants a set of access rights for each layer 624 + * @masks: A matrix of unfulfilled access rights for each layer 625 + * 626 + * Returns true if the request is allowed (i.e. the access rights granted all 627 + * remaining unfulfilled access rights and masks has no leftover set bits). 622 628 */ 623 629 bool landlock_unmask_layers(const struct landlock_rule *const rule, 624 - const access_mask_t access_request, 625 - layer_mask_t (*const layer_masks)[], 626 - const size_t masks_array_size) 630 + struct layer_access_masks *masks) 627 631 { 628 - size_t layer_level; 629 - 630 - if (!access_request || !layer_masks) 632 + if (!masks) 631 633 return true; 632 634 if (!rule) 633 635 return false; ··· 644 642 * by only one rule, but by the union (binary OR) of multiple rules. 645 643 * E.g. /a/b <execute> + /a <read> => /a/b <execute + read> 646 644 */ 647 - for (layer_level = 0; layer_level < rule->num_layers; layer_level++) { 648 - const struct landlock_layer *const layer = 649 - &rule->layers[layer_level]; 650 - const layer_mask_t layer_bit = BIT_ULL(layer->level - 1); 651 - const unsigned long access_req = access_request; 652 - unsigned long access_bit; 653 - bool is_empty; 645 + for (size_t i = 0; i < rule->num_layers; i++) { 646 + const struct landlock_layer *const layer = &rule->layers[i]; 654 647 655 - /* 656 - * Records in @layer_masks which layer grants access to each requested 657 - * access: bit cleared if the related layer grants access. 658 - */ 659 - is_empty = true; 660 - for_each_set_bit(access_bit, &access_req, masks_array_size) { 661 - if (layer->access & BIT_ULL(access_bit)) 662 - (*layer_masks)[access_bit] &= ~layer_bit; 663 - is_empty = is_empty && !(*layer_masks)[access_bit]; 664 - } 665 - if (is_empty) 666 - return true; 648 + /* Clear the bits where the layer in the rule grants access. */ 649 + masks->access[layer->level - 1] &= ~layer->access; 667 650 } 668 - return false; 651 + 652 + for (size_t i = 0; i < ARRAY_SIZE(masks->access); i++) { 653 + if (masks->access[i]) 654 + return false; 655 + } 656 + return true; 669 657 } 670 658 671 659 typedef access_mask_t ··· 665 673 /** 666 674 * landlock_init_layer_masks - Initialize layer masks from an access request 667 675 * 668 - * Populates @layer_masks such that for each access right in @access_request, 676 + * Populates @masks such that for each access right in @access_request, 669 677 * the bits for all the layers are set where this access right is handled. 670 678 * 671 679 * @domain: The domain that defines the current restrictions. 672 680 * @access_request: The requested access rights to check. 673 - * @layer_masks: It must contain %LANDLOCK_NUM_ACCESS_FS or 674 - * %LANDLOCK_NUM_ACCESS_NET elements according to @key_type. 681 + * @masks: Layer access masks to populate. 675 682 * @key_type: The key type to switch between access masks of different types. 676 683 * 677 684 * Returns: An access mask where each access right bit is set which is handled ··· 679 688 access_mask_t 680 689 landlock_init_layer_masks(const struct landlock_ruleset *const domain, 681 690 const access_mask_t access_request, 682 - layer_mask_t (*const layer_masks)[], 691 + struct layer_access_masks *const masks, 683 692 const enum landlock_key_type key_type) 684 693 { 685 694 access_mask_t handled_accesses = 0; 686 - size_t layer_level, num_access; 687 695 get_access_mask_t *get_access_mask; 688 696 689 697 switch (key_type) { 690 698 case LANDLOCK_KEY_INODE: 691 699 get_access_mask = landlock_get_fs_access_mask; 692 - num_access = LANDLOCK_NUM_ACCESS_FS; 693 700 break; 694 701 695 702 #if IS_ENABLED(CONFIG_INET) 696 703 case LANDLOCK_KEY_NET_PORT: 697 704 get_access_mask = landlock_get_net_access_mask; 698 - num_access = LANDLOCK_NUM_ACCESS_NET; 699 705 break; 700 706 #endif /* IS_ENABLED(CONFIG_INET) */ 701 707 ··· 701 713 return 0; 702 714 } 703 715 704 - memset(layer_masks, 0, 705 - array_size(sizeof((*layer_masks)[0]), num_access)); 706 - 707 716 /* An empty access request can happen because of O_WRONLY | O_RDWR. */ 708 717 if (!access_request) 709 718 return 0; 710 719 711 - /* Saves all handled accesses per layer. */ 712 - for (layer_level = 0; layer_level < domain->num_layers; layer_level++) { 713 - const unsigned long access_req = access_request; 714 - const access_mask_t access_mask = 715 - get_access_mask(domain, layer_level); 716 - unsigned long access_bit; 720 + for (size_t i = 0; i < domain->num_layers; i++) { 721 + const access_mask_t handled = get_access_mask(domain, i); 717 722 718 - for_each_set_bit(access_bit, &access_req, num_access) { 719 - if (BIT_ULL(access_bit) & access_mask) { 720 - (*layer_masks)[access_bit] |= 721 - BIT_ULL(layer_level); 722 - handled_accesses |= BIT_ULL(access_bit); 723 - } 724 - } 723 + masks->access[i] = access_request & handled; 724 + handled_accesses |= masks->access[i]; 725 725 } 726 + for (size_t i = domain->num_layers; i < ARRAY_SIZE(masks->access); i++) 727 + masks->access[i] = 0; 728 + 726 729 return handled_accesses; 727 730 }
+2 -4
security/landlock/ruleset.h
··· 302 302 } 303 303 304 304 bool landlock_unmask_layers(const struct landlock_rule *const rule, 305 - const access_mask_t access_request, 306 - layer_mask_t (*const layer_masks)[], 307 - const size_t masks_array_size); 305 + struct layer_access_masks *masks); 308 306 309 307 access_mask_t 310 308 landlock_init_layer_masks(const struct landlock_ruleset *const domain, 311 309 const access_mask_t access_request, 312 - layer_mask_t (*const layer_masks)[], 310 + struct layer_access_masks *masks, 313 311 const enum landlock_key_type key_type); 314 312 315 313 #endif /* _SECURITY_LANDLOCK_RULESET_H */
+43 -30
security/landlock/syscalls.c
··· 36 36 #include "net.h" 37 37 #include "ruleset.h" 38 38 #include "setup.h" 39 + #include "tsync.h" 39 40 40 41 static bool is_initialized(void) 41 42 { ··· 158 157 /* 159 158 * The Landlock ABI version should be incremented for each new Landlock-related 160 159 * user space visible change (e.g. Landlock syscalls). This version should 161 - * only be incremented once per Linux release, and the date in 160 + * only be incremented once per Linux release. When incrementing, the date in 162 161 * Documentation/userspace-api/landlock.rst should be updated to reflect the 163 162 * UAPI change. 163 + * If the change involves a fix that requires userspace awareness, also update 164 + * the errata documentation in Documentation/userspace-api/landlock.rst . 164 165 */ 165 - const int landlock_abi_version = 7; 166 + const int landlock_abi_version = 8; 166 167 167 168 /** 168 169 * sys_landlock_create_ruleset - Create a new ruleset ··· 457 454 * - %LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF 458 455 * - %LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON 459 456 * - %LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF 457 + * - %LANDLOCK_RESTRICT_SELF_TSYNC 460 458 * 461 - * This system call enables to enforce a Landlock ruleset on the current 462 - * thread. Enforcing a ruleset requires that the task has %CAP_SYS_ADMIN in its 459 + * This system call enforces a Landlock ruleset on the current thread. 460 + * Enforcing a ruleset requires that the task has %CAP_SYS_ADMIN in its 463 461 * namespace or is running with no_new_privs. This avoids scenarios where 464 462 * unprivileged tasks can affect the behavior of privileged children. 465 463 * ··· 482 478 SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32, 483 479 flags) 484 480 { 485 - struct landlock_ruleset *new_dom, 486 - *ruleset __free(landlock_put_ruleset) = NULL; 481 + struct landlock_ruleset *ruleset __free(landlock_put_ruleset) = NULL; 487 482 struct cred *new_cred; 488 483 struct landlock_cred_security *new_llcred; 489 484 bool __maybe_unused log_same_exec, log_new_exec, log_subdomains, ··· 541 538 * We could optimize this case by not calling commit_creds() if this flag 542 539 * was already set, but it is not worth the complexity. 543 540 */ 544 - if (!ruleset) 545 - return commit_creds(new_cred); 541 + if (ruleset) { 542 + /* 543 + * There is no possible race condition while copying and 544 + * manipulating the current credentials because they are 545 + * dedicated per thread. 546 + */ 547 + struct landlock_ruleset *const new_dom = 548 + landlock_merge_ruleset(new_llcred->domain, ruleset); 549 + if (IS_ERR(new_dom)) { 550 + abort_creds(new_cred); 551 + return PTR_ERR(new_dom); 552 + } 546 553 547 - /* 548 - * There is no possible race condition while copying and manipulating 549 - * the current credentials because they are dedicated per thread. 550 - */ 551 - new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset); 552 - if (IS_ERR(new_dom)) { 553 - abort_creds(new_cred); 554 - return PTR_ERR(new_dom); 554 + #ifdef CONFIG_AUDIT 555 + new_dom->hierarchy->log_same_exec = log_same_exec; 556 + new_dom->hierarchy->log_new_exec = log_new_exec; 557 + if ((!log_same_exec && !log_new_exec) || !prev_log_subdomains) 558 + new_dom->hierarchy->log_status = LANDLOCK_LOG_DISABLED; 559 + #endif /* CONFIG_AUDIT */ 560 + 561 + /* Replaces the old (prepared) domain. */ 562 + landlock_put_ruleset(new_llcred->domain); 563 + new_llcred->domain = new_dom; 564 + 565 + #ifdef CONFIG_AUDIT 566 + new_llcred->domain_exec |= BIT(new_dom->num_layers - 1); 567 + #endif /* CONFIG_AUDIT */ 555 568 } 556 569 557 - #ifdef CONFIG_AUDIT 558 - new_dom->hierarchy->log_same_exec = log_same_exec; 559 - new_dom->hierarchy->log_new_exec = log_new_exec; 560 - if ((!log_same_exec && !log_new_exec) || !prev_log_subdomains) 561 - new_dom->hierarchy->log_status = LANDLOCK_LOG_DISABLED; 562 - #endif /* CONFIG_AUDIT */ 563 - 564 - /* Replaces the old (prepared) domain. */ 565 - landlock_put_ruleset(new_llcred->domain); 566 - new_llcred->domain = new_dom; 567 - 568 - #ifdef CONFIG_AUDIT 569 - new_llcred->domain_exec |= BIT(new_dom->num_layers - 1); 570 - #endif /* CONFIG_AUDIT */ 570 + if (flags & LANDLOCK_RESTRICT_SELF_TSYNC) { 571 + const int err = landlock_restrict_sibling_threads( 572 + current_cred(), new_cred); 573 + if (err) { 574 + abort_creds(new_cred); 575 + return err; 576 + } 577 + } 571 578 572 579 return commit_creds(new_cred); 573 580 }
+561
security/landlock/tsync.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Landlock - Cross-thread ruleset enforcement 4 + * 5 + * Copyright © 2025 Google LLC 6 + */ 7 + 8 + #include <linux/atomic.h> 9 + #include <linux/cleanup.h> 10 + #include <linux/completion.h> 11 + #include <linux/cred.h> 12 + #include <linux/errno.h> 13 + #include <linux/overflow.h> 14 + #include <linux/rcupdate.h> 15 + #include <linux/sched.h> 16 + #include <linux/sched/signal.h> 17 + #include <linux/sched/task.h> 18 + #include <linux/slab.h> 19 + #include <linux/task_work.h> 20 + 21 + #include "cred.h" 22 + #include "tsync.h" 23 + 24 + /* 25 + * Shared state between multiple threads which are enforcing Landlock rulesets 26 + * in lockstep with each other. 27 + */ 28 + struct tsync_shared_context { 29 + /* The old and tentative new creds of the calling thread. */ 30 + const struct cred *old_cred; 31 + const struct cred *new_cred; 32 + 33 + /* True if sibling tasks need to set the no_new_privs flag. */ 34 + bool set_no_new_privs; 35 + 36 + /* An error encountered in preparation step, or 0. */ 37 + atomic_t preparation_error; 38 + 39 + /* 40 + * Barrier after preparation step in restrict_one_thread. 41 + * The calling thread waits for completion. 42 + * 43 + * Re-initialized on every round of looking for newly spawned threads. 44 + */ 45 + atomic_t num_preparing; 46 + struct completion all_prepared; 47 + 48 + /* Sibling threads wait for completion. */ 49 + struct completion ready_to_commit; 50 + 51 + /* 52 + * Barrier after commit step (used by syscall impl to wait for 53 + * completion). 54 + */ 55 + atomic_t num_unfinished; 56 + struct completion all_finished; 57 + }; 58 + 59 + struct tsync_work { 60 + struct callback_head work; 61 + struct task_struct *task; 62 + struct tsync_shared_context *shared_ctx; 63 + }; 64 + 65 + /* 66 + * restrict_one_thread - update a thread's Landlock domain in lockstep with the 67 + * other threads in the same process 68 + * 69 + * When this is run, the same function gets run in all other threads in the same 70 + * process (except for the calling thread which called landlock_restrict_self). 71 + * The concurrently running invocations of restrict_one_thread coordinate 72 + * through the shared ctx object to do their work in lockstep to implement 73 + * all-or-nothing semantics for enforcing the new Landlock domain. 74 + * 75 + * Afterwards, depending on the presence of an error, all threads either commit 76 + * or abort the prepared credentials. The commit operation can not fail any 77 + * more. 78 + */ 79 + static void restrict_one_thread(struct tsync_shared_context *ctx) 80 + { 81 + int err; 82 + struct cred *cred = NULL; 83 + 84 + if (current_cred() == ctx->old_cred) { 85 + /* 86 + * Switch out old_cred with new_cred, if possible. 87 + * 88 + * In the common case, where all threads initially point to the same 89 + * struct cred, this optimization avoids creating separate redundant 90 + * credentials objects for each, which would all have the same contents. 91 + * 92 + * Note: We are intentionally dropping the const qualifier here, because 93 + * it is required by commit_creds() and abort_creds(). 94 + */ 95 + cred = (struct cred *)get_cred(ctx->new_cred); 96 + } else { 97 + /* Else, prepare new creds and populate them. */ 98 + cred = prepare_creds(); 99 + 100 + if (!cred) { 101 + atomic_set(&ctx->preparation_error, -ENOMEM); 102 + 103 + /* 104 + * Even on error, we need to adhere to the protocol and coordinate 105 + * with concurrently running invocations. 106 + */ 107 + if (atomic_dec_return(&ctx->num_preparing) == 0) 108 + complete_all(&ctx->all_prepared); 109 + 110 + goto out; 111 + } 112 + 113 + landlock_cred_copy(landlock_cred(cred), 114 + landlock_cred(ctx->new_cred)); 115 + } 116 + 117 + /* 118 + * Barrier: Wait until all threads are done preparing. 119 + * After this point, we can have no more failures. 120 + */ 121 + if (atomic_dec_return(&ctx->num_preparing) == 0) 122 + complete_all(&ctx->all_prepared); 123 + 124 + /* 125 + * Wait for signal from calling thread that it's safe to read the 126 + * preparation error now and we are ready to commit (or abort). 127 + */ 128 + wait_for_completion(&ctx->ready_to_commit); 129 + 130 + /* Abort the commit if any of the other threads had an error. */ 131 + err = atomic_read(&ctx->preparation_error); 132 + if (err) { 133 + abort_creds(cred); 134 + goto out; 135 + } 136 + 137 + /* 138 + * Make sure that all sibling tasks fulfill the no_new_privs prerequisite. 139 + * (This is in line with Seccomp's SECCOMP_FILTER_FLAG_TSYNC logic in 140 + * kernel/seccomp.c) 141 + */ 142 + if (ctx->set_no_new_privs) 143 + task_set_no_new_privs(current); 144 + 145 + commit_creds(cred); 146 + 147 + out: 148 + /* Notify the calling thread once all threads are done */ 149 + if (atomic_dec_return(&ctx->num_unfinished) == 0) 150 + complete_all(&ctx->all_finished); 151 + } 152 + 153 + /* 154 + * restrict_one_thread_callback - task_work callback for restricting a thread 155 + * 156 + * Calls restrict_one_thread with the struct landlock_shared_tsync_context. 157 + */ 158 + static void restrict_one_thread_callback(struct callback_head *work) 159 + { 160 + struct tsync_work *ctx = container_of(work, struct tsync_work, work); 161 + 162 + restrict_one_thread(ctx->shared_ctx); 163 + } 164 + 165 + /* 166 + * struct tsync_works - a growable array of per-task contexts 167 + * 168 + * The zero-initialized struct represents the empty array. 169 + */ 170 + struct tsync_works { 171 + struct tsync_work **works; 172 + size_t size; 173 + size_t capacity; 174 + }; 175 + 176 + /* 177 + * tsync_works_provide - provides a preallocated tsync_work for the given task 178 + * 179 + * This also stores a task pointer in the context and increments the reference 180 + * count of the task. 181 + * 182 + * This function may fail in the case where we did not preallocate sufficient 183 + * capacity. This can legitimately happen if new threads get started after we 184 + * grew the capacity. 185 + * 186 + * Returns: 187 + * A pointer to the preallocated context struct, with task filled in. 188 + * 189 + * NULL, if we ran out of preallocated context structs. 190 + */ 191 + static struct tsync_work *tsync_works_provide(struct tsync_works *s, 192 + struct task_struct *task) 193 + { 194 + struct tsync_work *ctx; 195 + 196 + if (s->size >= s->capacity) 197 + return NULL; 198 + 199 + ctx = s->works[s->size]; 200 + s->size++; 201 + 202 + ctx->task = get_task_struct(task); 203 + return ctx; 204 + } 205 + 206 + /* 207 + * tsync_works_grow_by - preallocates space for n more contexts in s 208 + * 209 + * On a successful return, the subsequent n calls to tsync_works_provide() are 210 + * guaranteed to succeed. (size + n <= capacity) 211 + * 212 + * Returns: 213 + * -ENOMEM if the (re)allocation fails 214 + 215 + * 0 if the allocation succeeds, partially succeeds, or no reallocation 216 + * was needed 217 + */ 218 + static int tsync_works_grow_by(struct tsync_works *s, size_t n, gfp_t flags) 219 + { 220 + size_t i; 221 + size_t new_capacity; 222 + struct tsync_work **works; 223 + struct tsync_work *work; 224 + 225 + if (check_add_overflow(s->size, n, &new_capacity)) 226 + return -EOVERFLOW; 227 + 228 + /* No need to reallocate if s already has sufficient capacity. */ 229 + if (new_capacity <= s->capacity) 230 + return 0; 231 + 232 + works = krealloc_array(s->works, new_capacity, sizeof(s->works[0]), 233 + flags); 234 + if (!works) 235 + return -ENOMEM; 236 + 237 + s->works = works; 238 + 239 + for (i = s->capacity; i < new_capacity; i++) { 240 + work = kzalloc(sizeof(*work), flags); 241 + if (!work) { 242 + /* 243 + * Leave the object in a consistent state, 244 + * but return an error. 245 + */ 246 + s->capacity = i; 247 + return -ENOMEM; 248 + } 249 + s->works[i] = work; 250 + } 251 + s->capacity = new_capacity; 252 + return 0; 253 + } 254 + 255 + /* 256 + * tsync_works_contains - checks for presence of task in s 257 + */ 258 + static bool tsync_works_contains_task(const struct tsync_works *s, 259 + struct task_struct *task) 260 + { 261 + size_t i; 262 + 263 + for (i = 0; i < s->size; i++) 264 + if (s->works[i]->task == task) 265 + return true; 266 + return false; 267 + } 268 + 269 + /* 270 + * tsync_works_release - frees memory held by s and drops all task references 271 + * 272 + * This does not free s itself, only the data structures held by it. 273 + */ 274 + static void tsync_works_release(struct tsync_works *s) 275 + { 276 + size_t i; 277 + 278 + for (i = 0; i < s->size; i++) { 279 + if (!s->works[i]->task) 280 + continue; 281 + 282 + put_task_struct(s->works[i]->task); 283 + } 284 + 285 + for (i = 0; i < s->capacity; i++) 286 + kfree(s->works[i]); 287 + kfree(s->works); 288 + s->works = NULL; 289 + s->size = 0; 290 + s->capacity = 0; 291 + } 292 + 293 + /* 294 + * count_additional_threads - counts the sibling threads that are not in works 295 + */ 296 + static size_t count_additional_threads(const struct tsync_works *works) 297 + { 298 + struct task_struct *thread, *caller; 299 + size_t n = 0; 300 + 301 + caller = current; 302 + 303 + guard(rcu)(); 304 + 305 + for_each_thread(caller, thread) { 306 + /* Skip current, since it is initiating the sync. */ 307 + if (thread == caller) 308 + continue; 309 + 310 + /* Skip exited threads. */ 311 + if (thread->flags & PF_EXITING) 312 + continue; 313 + 314 + /* Skip threads that we have already seen. */ 315 + if (tsync_works_contains_task(works, thread)) 316 + continue; 317 + 318 + n++; 319 + } 320 + return n; 321 + } 322 + 323 + /* 324 + * schedule_task_work - adds task_work for all eligible sibling threads 325 + * which have not been scheduled yet 326 + * 327 + * For each added task_work, atomically increments shared_ctx->num_preparing and 328 + * shared_ctx->num_unfinished. 329 + * 330 + * Returns: 331 + * true, if at least one eligible sibling thread was found 332 + */ 333 + static bool schedule_task_work(struct tsync_works *works, 334 + struct tsync_shared_context *shared_ctx) 335 + { 336 + int err; 337 + struct task_struct *thread, *caller; 338 + struct tsync_work *ctx; 339 + bool found_more_threads = false; 340 + 341 + caller = current; 342 + 343 + guard(rcu)(); 344 + 345 + for_each_thread(caller, thread) { 346 + /* Skip current, since it is initiating the sync. */ 347 + if (thread == caller) 348 + continue; 349 + 350 + /* Skip exited threads. */ 351 + if (thread->flags & PF_EXITING) 352 + continue; 353 + 354 + /* Skip threads that we already looked at. */ 355 + if (tsync_works_contains_task(works, thread)) 356 + continue; 357 + 358 + /* 359 + * We found a sibling thread that is not doing its task_work yet, and 360 + * which might spawn new threads before our task work runs, so we need 361 + * at least one more round in the outer loop. 362 + */ 363 + found_more_threads = true; 364 + 365 + ctx = tsync_works_provide(works, thread); 366 + if (!ctx) { 367 + /* 368 + * We ran out of preallocated contexts -- we need to try again with 369 + * this thread at a later time! 370 + * found_more_threads is already true at this point. 371 + */ 372 + break; 373 + } 374 + 375 + ctx->shared_ctx = shared_ctx; 376 + 377 + atomic_inc(&shared_ctx->num_preparing); 378 + atomic_inc(&shared_ctx->num_unfinished); 379 + 380 + init_task_work(&ctx->work, restrict_one_thread_callback); 381 + err = task_work_add(thread, &ctx->work, TWA_SIGNAL); 382 + if (err) { 383 + /* 384 + * task_work_add() only fails if the task is about to exit. We 385 + * checked that earlier, but it can happen as a race. Resume 386 + * without setting an error, as the task is probably gone in the 387 + * next loop iteration. For consistency, remove the task from ctx 388 + * so that it does not look like we handed it a task_work. 389 + */ 390 + put_task_struct(ctx->task); 391 + ctx->task = NULL; 392 + 393 + atomic_dec(&shared_ctx->num_preparing); 394 + atomic_dec(&shared_ctx->num_unfinished); 395 + } 396 + } 397 + 398 + return found_more_threads; 399 + } 400 + 401 + /* 402 + * cancel_tsync_works - cancel all task works where it is possible 403 + * 404 + * Task works can be canceled as long as they are still queued and have not 405 + * started running. If they get canceled, we decrement 406 + * shared_ctx->num_preparing and shared_ctx->num_unfished and mark the two 407 + * completions if needed, as if the task was never scheduled. 408 + */ 409 + static void cancel_tsync_works(struct tsync_works *works, 410 + struct tsync_shared_context *shared_ctx) 411 + { 412 + int i; 413 + 414 + for (i = 0; i < works->size; i++) { 415 + if (!task_work_cancel(works->works[i]->task, 416 + &works->works[i]->work)) 417 + continue; 418 + 419 + /* After dequeueing, act as if the task work had executed. */ 420 + 421 + if (atomic_dec_return(&shared_ctx->num_preparing) == 0) 422 + complete_all(&shared_ctx->all_prepared); 423 + 424 + if (atomic_dec_return(&shared_ctx->num_unfinished) == 0) 425 + complete_all(&shared_ctx->all_finished); 426 + } 427 + } 428 + 429 + /* 430 + * restrict_sibling_threads - enables a Landlock policy for all sibling threads 431 + */ 432 + int landlock_restrict_sibling_threads(const struct cred *old_cred, 433 + const struct cred *new_cred) 434 + { 435 + int err; 436 + struct tsync_shared_context shared_ctx; 437 + struct tsync_works works = {}; 438 + size_t newly_discovered_threads; 439 + bool found_more_threads; 440 + 441 + atomic_set(&shared_ctx.preparation_error, 0); 442 + init_completion(&shared_ctx.all_prepared); 443 + init_completion(&shared_ctx.ready_to_commit); 444 + atomic_set(&shared_ctx.num_unfinished, 1); 445 + init_completion(&shared_ctx.all_finished); 446 + shared_ctx.old_cred = old_cred; 447 + shared_ctx.new_cred = new_cred; 448 + shared_ctx.set_no_new_privs = task_no_new_privs(current); 449 + 450 + /* 451 + * We schedule a pseudo-signal task_work for each of the calling task's 452 + * sibling threads. In the task work, each thread: 453 + * 454 + * 1) runs prepare_creds() and writes back the error to 455 + * shared_ctx.preparation_error, if needed. 456 + * 457 + * 2) signals that it's done with prepare_creds() to the calling task. 458 + * (completion "all_prepared"). 459 + * 460 + * 3) waits for the completion "ready_to_commit". This is sent by the 461 + * calling task after ensuring that all sibling threads have done 462 + * with the "preparation" stage. 463 + * 464 + * After this barrier is reached, it's safe to read 465 + * shared_ctx.preparation_error. 466 + * 467 + * 4) reads shared_ctx.preparation_error and then either does commit_creds() 468 + * or abort_creds(). 469 + * 470 + * 5) signals that it's done altogether (barrier synchronization 471 + * "all_finished") 472 + * 473 + * Unlike seccomp, which modifies sibling tasks directly, we do not need to 474 + * acquire the cred_guard_mutex and sighand->siglock: 475 + * 476 + * - As in our case, all threads are themselves exchanging their own struct 477 + * cred through the credentials API, no locks are needed for that. 478 + * - Our for_each_thread() loops are protected by RCU. 479 + * - We do not acquire a lock to keep the list of sibling threads stable 480 + * between our for_each_thread loops. If the list of available sibling 481 + * threads changes between these for_each_thread loops, we make up for 482 + * that by continuing to look for threads until they are all discovered 483 + * and have entered their task_work, where they are unable to spawn new 484 + * threads. 485 + */ 486 + do { 487 + /* In RCU read-lock, count the threads we need. */ 488 + newly_discovered_threads = count_additional_threads(&works); 489 + 490 + if (newly_discovered_threads == 0) 491 + break; /* done */ 492 + 493 + err = tsync_works_grow_by(&works, newly_discovered_threads, 494 + GFP_KERNEL_ACCOUNT); 495 + if (err) { 496 + atomic_set(&shared_ctx.preparation_error, err); 497 + break; 498 + } 499 + 500 + /* 501 + * The "all_prepared" barrier is used locally to the loop body, this use 502 + * of for_each_thread(). We can reset it on each loop iteration because 503 + * all previous loop iterations are done with it already. 504 + * 505 + * num_preparing is initialized to 1 so that the counter can not go to 0 506 + * and mark the completion as done before all task works are registered. 507 + * We decrement it at the end of the loop body. 508 + */ 509 + atomic_set(&shared_ctx.num_preparing, 1); 510 + reinit_completion(&shared_ctx.all_prepared); 511 + 512 + /* 513 + * In RCU read-lock, schedule task work on newly discovered sibling 514 + * tasks. 515 + */ 516 + found_more_threads = schedule_task_work(&works, &shared_ctx); 517 + 518 + /* 519 + * Decrement num_preparing for current, to undo that we initialized it 520 + * to 1 a few lines above. 521 + */ 522 + if (atomic_dec_return(&shared_ctx.num_preparing) > 0) { 523 + if (wait_for_completion_interruptible( 524 + &shared_ctx.all_prepared)) { 525 + /* In case of interruption, we need to retry the system call. */ 526 + atomic_set(&shared_ctx.preparation_error, 527 + -ERESTARTNOINTR); 528 + 529 + /* 530 + * Cancel task works for tasks that did not start running yet, 531 + * and decrement all_prepared and num_unfinished accordingly. 532 + */ 533 + cancel_tsync_works(&works, &shared_ctx); 534 + 535 + /* 536 + * The remaining task works have started running, so waiting for 537 + * their completion will finish. 538 + */ 539 + wait_for_completion(&shared_ctx.all_prepared); 540 + } 541 + } 542 + } while (found_more_threads && 543 + !atomic_read(&shared_ctx.preparation_error)); 544 + 545 + /* 546 + * We now have all sibling threads blocking and in "prepared" state in the 547 + * task work. Ask all threads to commit. 548 + */ 549 + complete_all(&shared_ctx.ready_to_commit); 550 + 551 + /* 552 + * Decrement num_unfinished for current, to undo that we initialized it to 1 553 + * at the beginning. 554 + */ 555 + if (atomic_dec_return(&shared_ctx.num_unfinished) > 0) 556 + wait_for_completion(&shared_ctx.all_finished); 557 + 558 + tsync_works_release(&works); 559 + 560 + return atomic_read(&shared_ctx.preparation_error); 561 + }
+16
security/landlock/tsync.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Landlock - Cross-thread ruleset enforcement 4 + * 5 + * Copyright © 2025 Google LLC 6 + */ 7 + 8 + #ifndef _SECURITY_LANDLOCK_TSYNC_H 9 + #define _SECURITY_LANDLOCK_TSYNC_H 10 + 11 + #include <linux/cred.h> 12 + 13 + int landlock_restrict_sibling_threads(const struct cred *old_cred, 14 + const struct cred *new_cred); 15 + 16 + #endif /* _SECURITY_LANDLOCK_TSYNC_H */
+1
tools/testing/selftests/landlock/.gitignore
··· 1 1 /*_test 2 + /fs_bench 2 3 /sandbox-and-launch 3 4 /true 4 5 /wait-pipe
+1
tools/testing/selftests/landlock/Makefile
··· 9 9 src_test := $(wildcard *_test.c) 10 10 11 11 TEST_GEN_PROGS := $(src_test:.c=) 12 + TEST_GEN_PROGS += fs_bench 12 13 13 14 TEST_GEN_PROGS_EXTENDED := \ 14 15 true \
+4 -4
tools/testing/selftests/landlock/base_test.c
··· 76 76 const struct landlock_ruleset_attr ruleset_attr = { 77 77 .handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE, 78 78 }; 79 - ASSERT_EQ(7, landlock_create_ruleset(NULL, 0, 79 + ASSERT_EQ(8, landlock_create_ruleset(NULL, 0, 80 80 LANDLOCK_CREATE_RULESET_VERSION)); 81 81 82 82 ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0, ··· 288 288 EXPECT_EQ(EBADFD, errno); 289 289 } 290 290 291 - TEST(restrict_self_fd_flags) 291 + TEST(restrict_self_fd_logging_flags) 292 292 { 293 293 int fd; 294 294 ··· 304 304 EXPECT_EQ(EBADFD, errno); 305 305 } 306 306 307 - TEST(restrict_self_flags) 307 + TEST(restrict_self_logging_flags) 308 308 { 309 - const __u32 last_flag = LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF; 309 + const __u32 last_flag = LANDLOCK_RESTRICT_SELF_TSYNC; 310 310 311 311 /* Tests invalid flag combinations. */ 312 312
+214
tools/testing/selftests/landlock/fs_bench.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Landlock filesystem benchmark 4 + * 5 + * This program benchmarks the time required for file access checks. We use a 6 + * large number (-d flag) of nested directories where each directory inode has 7 + * an associated Landlock rule, and we repeatedly (-n flag) exercise a file 8 + * access for which Landlock has to walk the path all the way up to the root. 9 + * 10 + * With an increasing number of nested subdirectories, Landlock's portion of the 11 + * overall system call time increases, which makes the effects of Landlock 12 + * refactorings more measurable. 13 + * 14 + * This benchmark does *not* measure the building of the Landlock ruleset. The 15 + * time required to add all these rules is not large enough to be easily 16 + * measurable. A separate benchmark tool would be better to test that, and that 17 + * tool could then also use a simpler file system layout. 18 + * 19 + * Copyright © 2026 Google LLC 20 + */ 21 + 22 + #define _GNU_SOURCE 23 + #include <err.h> 24 + #include <errno.h> 25 + #include <fcntl.h> 26 + #include <linux/landlock.h> 27 + #include <linux/prctl.h> 28 + #include <stdbool.h> 29 + #include <stdio.h> 30 + #include <stdlib.h> 31 + #include <string.h> 32 + #include <sys/prctl.h> 33 + #include <sys/stat.h> 34 + #include <sys/times.h> 35 + #include <time.h> 36 + #include <unistd.h> 37 + 38 + #include "wrappers.h" 39 + 40 + static void usage(const char *const argv0) 41 + { 42 + printf("Usage:\n"); 43 + printf(" %s [OPTIONS]\n", argv0); 44 + printf("\n"); 45 + printf(" Benchmark expensive Landlock checks for D nested dirs\n"); 46 + printf("\n"); 47 + printf("Options:\n"); 48 + printf(" -h help\n"); 49 + printf(" -L disable Landlock (as a baseline)\n"); 50 + printf(" -d D set directory depth to D\n"); 51 + printf(" -n N set number of benchmark iterations to N\n"); 52 + } 53 + 54 + /* 55 + * Build a deep directory, enforce Landlock and return the FD to the 56 + * deepest dir. On any failure, exit the process with an error. 57 + */ 58 + static int build_directory(size_t depth, const bool use_landlock) 59 + { 60 + const char *path = "d"; /* directory name */ 61 + int abi, ruleset_fd, curr, prev; 62 + 63 + if (use_landlock) { 64 + abi = landlock_create_ruleset(NULL, 0, 65 + LANDLOCK_CREATE_RULESET_VERSION); 66 + if (abi < 7) 67 + err(1, "Landlock ABI too low: got %d, wanted 7+", abi); 68 + } 69 + 70 + ruleset_fd = -1; 71 + if (use_landlock) { 72 + struct landlock_ruleset_attr attr = { 73 + .handled_access_fs = LANDLOCK_ACCESS_FS_IOCTL_DEV | 74 + LANDLOCK_ACCESS_FS_WRITE_FILE | 75 + LANDLOCK_ACCESS_FS_MAKE_REG, 76 + }; 77 + ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0U); 78 + if (ruleset_fd < 0) 79 + err(1, "landlock_create_ruleset"); 80 + } 81 + 82 + curr = open(".", O_PATH); 83 + if (curr < 0) 84 + err(1, "open(.)"); 85 + 86 + while (depth--) { 87 + if (use_landlock) { 88 + struct landlock_path_beneath_attr attr = { 89 + .allowed_access = LANDLOCK_ACCESS_FS_IOCTL_DEV, 90 + .parent_fd = curr, 91 + }; 92 + if (landlock_add_rule(ruleset_fd, 93 + LANDLOCK_RULE_PATH_BENEATH, &attr, 94 + 0) < 0) 95 + err(1, "landlock_add_rule"); 96 + } 97 + 98 + if (mkdirat(curr, path, 0700) < 0) 99 + err(1, "mkdirat(%s)", path); 100 + 101 + prev = curr; 102 + curr = openat(curr, path, O_PATH); 103 + if (curr < 0) 104 + err(1, "openat(%s)", path); 105 + 106 + close(prev); 107 + } 108 + 109 + if (use_landlock) { 110 + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) < 0) 111 + err(1, "prctl"); 112 + 113 + if (landlock_restrict_self(ruleset_fd, 0) < 0) 114 + err(1, "landlock_restrict_self"); 115 + } 116 + 117 + close(ruleset_fd); 118 + return curr; 119 + } 120 + 121 + static void remove_recursively(const size_t depth) 122 + { 123 + const char *path = "d"; /* directory name */ 124 + 125 + int fd = openat(AT_FDCWD, ".", O_PATH); 126 + 127 + if (fd < 0) 128 + err(1, "openat(.)"); 129 + 130 + for (size_t i = 0; i < depth - 1; i++) { 131 + int oldfd = fd; 132 + 133 + fd = openat(fd, path, O_PATH); 134 + if (fd < 0) 135 + err(1, "openat(%s)", path); 136 + close(oldfd); 137 + } 138 + 139 + for (size_t i = 0; i < depth; i++) { 140 + if (unlinkat(fd, path, AT_REMOVEDIR) < 0) 141 + err(1, "unlinkat(%s)", path); 142 + int newfd = openat(fd, "..", O_PATH); 143 + 144 + close(fd); 145 + fd = newfd; 146 + } 147 + close(fd); 148 + } 149 + 150 + int main(int argc, char *argv[]) 151 + { 152 + bool use_landlock = true; 153 + size_t num_iterations = 100000; 154 + size_t num_subdirs = 10000; 155 + int c, curr, fd; 156 + struct tms start_time, end_time; 157 + 158 + setbuf(stdout, NULL); 159 + while ((c = getopt(argc, argv, "hLd:n:")) != -1) { 160 + switch (c) { 161 + case 'h': 162 + usage(argv[0]); 163 + return EXIT_SUCCESS; 164 + case 'L': 165 + use_landlock = false; 166 + break; 167 + case 'd': 168 + num_subdirs = atoi(optarg); 169 + break; 170 + case 'n': 171 + num_iterations = atoi(optarg); 172 + break; 173 + default: 174 + usage(argv[0]); 175 + return EXIT_FAILURE; 176 + } 177 + } 178 + 179 + printf("*** Benchmark ***\n"); 180 + printf("%zu dirs, %zu iterations, %s Landlock\n", num_subdirs, 181 + num_iterations, use_landlock ? "with" : "without"); 182 + 183 + if (times(&start_time) == -1) 184 + err(1, "times"); 185 + 186 + curr = build_directory(num_subdirs, use_landlock); 187 + 188 + for (int i = 0; i < num_iterations; i++) { 189 + fd = openat(curr, "file.txt", O_CREAT | O_TRUNC | O_WRONLY, 190 + 0600); 191 + if (use_landlock) { 192 + if (fd == 0) 193 + errx(1, "openat succeeded, expected EACCES"); 194 + if (errno != EACCES) 195 + err(1, "openat expected EACCES, but got"); 196 + } 197 + if (fd != -1) 198 + close(fd); 199 + } 200 + 201 + if (times(&end_time) == -1) 202 + err(1, "times"); 203 + 204 + printf("*** Benchmark concluded ***\n"); 205 + printf("System: %ld clocks\n", 206 + end_time.tms_stime - start_time.tms_stime); 207 + printf("User : %ld clocks\n", 208 + end_time.tms_utime - start_time.tms_utime); 209 + printf("Clocks per second: %ld\n", CLOCKS_PER_SEC); 210 + 211 + close(curr); 212 + 213 + remove_recursively(num_subdirs); 214 + }
+161
tools/testing/selftests/landlock/tsync_test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Landlock tests - Enforcing the same restrictions across multiple threads 4 + * 5 + * Copyright © 2025 Günther Noack <gnoack3000@gmail.com> 6 + */ 7 + 8 + #define _GNU_SOURCE 9 + #include <pthread.h> 10 + #include <sys/prctl.h> 11 + #include <linux/landlock.h> 12 + 13 + #include "common.h" 14 + 15 + /* create_ruleset - Create a simple ruleset FD common to all tests */ 16 + static int create_ruleset(struct __test_metadata *const _metadata) 17 + { 18 + struct landlock_ruleset_attr ruleset_attr = { 19 + .handled_access_fs = (LANDLOCK_ACCESS_FS_WRITE_FILE | 20 + LANDLOCK_ACCESS_FS_TRUNCATE), 21 + }; 22 + const int ruleset_fd = 23 + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); 24 + 25 + ASSERT_LE(0, ruleset_fd) 26 + { 27 + TH_LOG("landlock_create_ruleset: %s", strerror(errno)); 28 + } 29 + return ruleset_fd; 30 + } 31 + 32 + TEST(single_threaded_success) 33 + { 34 + const int ruleset_fd = create_ruleset(_metadata); 35 + 36 + disable_caps(_metadata); 37 + 38 + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); 39 + ASSERT_EQ(0, landlock_restrict_self(ruleset_fd, 40 + LANDLOCK_RESTRICT_SELF_TSYNC)); 41 + 42 + EXPECT_EQ(0, close(ruleset_fd)); 43 + } 44 + 45 + static void store_no_new_privs(void *data) 46 + { 47 + bool *nnp = data; 48 + 49 + if (!nnp) 50 + return; 51 + *nnp = prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); 52 + } 53 + 54 + static void *idle(void *data) 55 + { 56 + pthread_cleanup_push(store_no_new_privs, data); 57 + 58 + while (true) 59 + sleep(1); 60 + 61 + pthread_cleanup_pop(1); 62 + } 63 + 64 + TEST(multi_threaded_success) 65 + { 66 + pthread_t t1, t2; 67 + bool no_new_privs1, no_new_privs2; 68 + const int ruleset_fd = create_ruleset(_metadata); 69 + 70 + disable_caps(_metadata); 71 + 72 + ASSERT_EQ(0, pthread_create(&t1, NULL, idle, &no_new_privs1)); 73 + ASSERT_EQ(0, pthread_create(&t2, NULL, idle, &no_new_privs2)); 74 + 75 + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); 76 + 77 + EXPECT_EQ(0, landlock_restrict_self(ruleset_fd, 78 + LANDLOCK_RESTRICT_SELF_TSYNC)); 79 + 80 + ASSERT_EQ(0, pthread_cancel(t1)); 81 + ASSERT_EQ(0, pthread_cancel(t2)); 82 + ASSERT_EQ(0, pthread_join(t1, NULL)); 83 + ASSERT_EQ(0, pthread_join(t2, NULL)); 84 + 85 + /* The no_new_privs flag was implicitly enabled on all threads. */ 86 + EXPECT_TRUE(no_new_privs1); 87 + EXPECT_TRUE(no_new_privs2); 88 + 89 + EXPECT_EQ(0, close(ruleset_fd)); 90 + } 91 + 92 + TEST(multi_threaded_success_despite_diverging_domains) 93 + { 94 + pthread_t t1, t2; 95 + const int ruleset_fd = create_ruleset(_metadata); 96 + 97 + disable_caps(_metadata); 98 + 99 + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); 100 + 101 + ASSERT_EQ(0, pthread_create(&t1, NULL, idle, NULL)); 102 + ASSERT_EQ(0, pthread_create(&t2, NULL, idle, NULL)); 103 + 104 + /* 105 + * The main thread enforces a ruleset, 106 + * thereby bringing the threads' Landlock domains out of sync. 107 + */ 108 + EXPECT_EQ(0, landlock_restrict_self(ruleset_fd, 0)); 109 + 110 + /* Still, TSYNC succeeds, bringing the threads in sync again. */ 111 + EXPECT_EQ(0, landlock_restrict_self(ruleset_fd, 112 + LANDLOCK_RESTRICT_SELF_TSYNC)); 113 + 114 + ASSERT_EQ(0, pthread_cancel(t1)); 115 + ASSERT_EQ(0, pthread_cancel(t2)); 116 + ASSERT_EQ(0, pthread_join(t1, NULL)); 117 + ASSERT_EQ(0, pthread_join(t2, NULL)); 118 + EXPECT_EQ(0, close(ruleset_fd)); 119 + } 120 + 121 + struct thread_restrict_data { 122 + pthread_t t; 123 + int ruleset_fd; 124 + int result; 125 + }; 126 + 127 + static void *thread_restrict(void *data) 128 + { 129 + struct thread_restrict_data *d = data; 130 + 131 + d->result = landlock_restrict_self(d->ruleset_fd, 132 + LANDLOCK_RESTRICT_SELF_TSYNC); 133 + return NULL; 134 + } 135 + 136 + TEST(competing_enablement) 137 + { 138 + const int ruleset_fd = create_ruleset(_metadata); 139 + struct thread_restrict_data d[] = { 140 + { .ruleset_fd = ruleset_fd }, 141 + { .ruleset_fd = ruleset_fd }, 142 + }; 143 + 144 + disable_caps(_metadata); 145 + 146 + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); 147 + ASSERT_EQ(0, pthread_create(&d[0].t, NULL, thread_restrict, &d[0])); 148 + ASSERT_EQ(0, pthread_create(&d[1].t, NULL, thread_restrict, &d[1])); 149 + 150 + /* Wait for threads to finish. */ 151 + ASSERT_EQ(0, pthread_join(d[0].t, NULL)); 152 + ASSERT_EQ(0, pthread_join(d[1].t, NULL)); 153 + 154 + /* Expect that both succeeded. */ 155 + EXPECT_EQ(0, d[0].result); 156 + EXPECT_EQ(0, d[1].result); 157 + 158 + EXPECT_EQ(0, close(ruleset_fd)); 159 + } 160 + 161 + TEST_HARNESS_MAIN