Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

syscall_user_dispatch: Add PR_SYS_DISPATCH_INCLUSIVE_ON

There are two possible scenarios for syscall filtering:
- having a trusted/allowed range of PCs, and intercepting everything else
- or the opposite: a single untrusted/intercepted range and allowing
everything else (this is relevant for any kind of sandboxing scenario,
or monitoring behavior of a single library)

The current API only allows the former use case due to allowed
range wrap-around check. Add PR_SYS_DISPATCH_INCLUSIVE_ON that
enables the second use case.

Add PR_SYS_DISPATCH_EXCLUSIVE_ON alias for PR_SYS_DISPATCH_ON
to make it clear how it's different from the new
PR_SYS_DISPATCH_INCLUSIVE_ON.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/97947cc8e205ff49675826d7b0327ef2e2c66eea.1747839857.git.dvyukov@google.com


authored by

Dmitry Vyukov and committed by
Thomas Gleixner
a2fc422e b89732c8

+48 -23
+13 -8
Documentation/admin-guide/syscall-user-dispatch.rst
··· 53 53 54 54 prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <offset>, <length>, [selector]) 55 55 56 - <op> is either PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF, to enable and 57 - disable the mechanism globally for that thread. When 58 - PR_SYS_DISPATCH_OFF is used, the other fields must be zero. 56 + <op> is either PR_SYS_DISPATCH_EXCLUSIVE_ON/PR_SYS_DISPATCH_INCLUSIVE_ON 57 + or PR_SYS_DISPATCH_OFF, to enable and disable the mechanism globally for 58 + that thread. When PR_SYS_DISPATCH_OFF is used, the other fields must be zero. 59 59 60 - [<offset>, <offset>+<length>) delimit a memory region interval 61 - from which syscalls are always executed directly, regardless of the 62 - userspace selector. This provides a fast path for the C library, which 63 - includes the most common syscall dispatchers in the native code 64 - applications, and also provides a way for the signal handler to return 60 + For PR_SYS_DISPATCH_EXCLUSIVE_ON [<offset>, <offset>+<length>) delimit 61 + a memory region interval from which syscalls are always executed directly, 62 + regardless of the userspace selector. This provides a fast path for the 63 + C library, which includes the most common syscall dispatchers in the native 64 + code applications, and also provides a way for the signal handler to return 65 65 without triggering a nested SIGSYS on (rt\_)sigreturn. Users of this 66 66 interface should make sure that at least the signal trampoline code is 67 67 included in this region. In addition, for syscalls that implement the 68 68 trampoline code on the vDSO, that trampoline is never intercepted. 69 + 70 + For PR_SYS_DISPATCH_INCLUSIVE_ON [<offset>, <offset>+<length>) delimit 71 + a memory region interval from which syscalls are dispatched based on 72 + the userspace selector. Syscalls from outside of the range are always 73 + executed directly. 69 74 70 75 [selector] is a pointer to a char-sized region in the process memory 71 76 region, that provides a quick way to enable disable syscall redirection
+6 -1
include/uapi/linux/prctl.h
··· 255 255 /* Dispatch syscalls to a userspace handler */ 256 256 #define PR_SET_SYSCALL_USER_DISPATCH 59 257 257 # define PR_SYS_DISPATCH_OFF 0 258 - # define PR_SYS_DISPATCH_ON 1 258 + /* Enable dispatch except for the specified range */ 259 + # define PR_SYS_DISPATCH_EXCLUSIVE_ON 1 260 + /* Enable dispatch for the specified range */ 261 + # define PR_SYS_DISPATCH_INCLUSIVE_ON 2 262 + /* Legacy name for backwards compatibility */ 263 + # define PR_SYS_DISPATCH_ON PR_SYS_DISPATCH_EXCLUSIVE_ON 259 264 /* The control values for the user space selector when dispatch is enabled */ 260 265 # define SYSCALL_DISPATCH_FILTER_ALLOW 0 261 266 # define SYSCALL_DISPATCH_FILTER_BLOCK 1
+23 -13
kernel/entry/syscall_user_dispatch.c
··· 78 78 if (offset || len || selector) 79 79 return -EINVAL; 80 80 break; 81 - case PR_SYS_DISPATCH_ON: 81 + case PR_SYS_DISPATCH_EXCLUSIVE_ON: 82 82 /* 83 83 * Validate the direct dispatcher region just for basic 84 84 * sanity against overflow and a 0-sized dispatcher ··· 87 87 */ 88 88 if (offset && offset + len <= offset) 89 89 return -EINVAL; 90 - 90 + break; 91 + case PR_SYS_DISPATCH_INCLUSIVE_ON: 92 + if (len == 0 || offset + len <= offset) 93 + return -EINVAL; 91 94 /* 92 - * access_ok() will clear memory tags for tagged addresses 93 - * if current has memory tagging enabled. 94 - 95 - * To enable a tracer to set a tracees selector the 96 - * selector address must be untagged for access_ok(), 97 - * otherwise an untagged tracer will always fail to set a 98 - * tagged tracees selector. 95 + * Invert the range, the check in syscall_user_dispatch() 96 + * supports wrap-around. 99 97 */ 100 - if (selector && !access_ok(untagged_addr(selector), sizeof(*selector))) 101 - return -EFAULT; 102 - 98 + offset = offset + len; 99 + len = -len; 103 100 break; 104 101 default: 105 102 return -EINVAL; 106 103 } 104 + 105 + /* 106 + * access_ok() will clear memory tags for tagged addresses 107 + * if current has memory tagging enabled. 108 + * 109 + * To enable a tracer to set a tracees selector the 110 + * selector address must be untagged for access_ok(), 111 + * otherwise an untagged tracer will always fail to set a 112 + * tagged tracees selector. 113 + */ 114 + if (mode != PR_SYS_DISPATCH_OFF && selector && 115 + !access_ok(untagged_addr(selector), sizeof(*selector))) 116 + return -EFAULT; 107 117 108 118 task->syscall_dispatch.selector = selector; 109 119 task->syscall_dispatch.offset = offset; 110 120 task->syscall_dispatch.len = len; 111 121 task->syscall_dispatch.on_dispatch = false; 112 122 113 - if (mode == PR_SYS_DISPATCH_ON) 123 + if (mode != PR_SYS_DISPATCH_OFF) 114 124 set_task_syscall_work(task, SYSCALL_USER_DISPATCH); 115 125 else 116 126 clear_task_syscall_work(task, SYSCALL_USER_DISPATCH);
+6 -1
tools/include/uapi/linux/prctl.h
··· 255 255 /* Dispatch syscalls to a userspace handler */ 256 256 #define PR_SET_SYSCALL_USER_DISPATCH 59 257 257 # define PR_SYS_DISPATCH_OFF 0 258 - # define PR_SYS_DISPATCH_ON 1 258 + /* Enable dispatch except for the specified range */ 259 + # define PR_SYS_DISPATCH_EXCLUSIVE_ON 1 260 + /* Enable dispatch for the specified range */ 261 + # define PR_SYS_DISPATCH_INCLUSIVE_ON 2 262 + /* Legacy name for backwards compatibility */ 263 + # define PR_SYS_DISPATCH_ON PR_SYS_DISPATCH_EXCLUSIVE_ON 259 264 /* The control values for the user space selector when dispatch is enabled */ 260 265 # define SYSCALL_DISPATCH_FILTER_ALLOW 0 261 266 # define SYSCALL_DISPATCH_FILTER_BLOCK 1