Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
"Major additions:

- sysctl and seccomp operation to discover available actions
(tyhicks)

- new per-filter configurable logging infrastructure and sysctl
(tyhicks)

- SECCOMP_RET_LOG to log allowed syscalls (tyhicks)

- SECCOMP_RET_KILL_PROCESS as the new strictest possible action

- self-tests for new behaviors"

[ This is the seccomp part of the security pull request during the merge
window that was nixed due to unrelated problems - Linus ]

* tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
samples: Unrename SECCOMP_RET_KILL
selftests/seccomp: Test thread vs process killing
seccomp: Implement SECCOMP_RET_KILL_PROCESS action
seccomp: Introduce SECCOMP_RET_KILL_PROCESS
seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD
seccomp: Action to log before allowing
seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
seccomp: Selftest for detection of filter flag support
seccomp: Sysctl to configure actions that are allowed to be logged
seccomp: Operation for checking if an action is available
seccomp: Sysctl to display available actions
seccomp: Provide matching filter for introspection
selftests/seccomp: Refactor RET_ERRNO tests
selftests/seccomp: Add simple seccomp overhead benchmark
selftests/seccomp: Add tests for basic ptrace actions

+1007 -130
+1 -1
Documentation/networking/filter.txt
··· 337 337 jeq #14, good /* __NR_rt_sigprocmask */ 338 338 jeq #13, good /* __NR_rt_sigaction */ 339 339 jeq #35, good /* __NR_nanosleep */ 340 - bad: ret #0 /* SECCOMP_RET_KILL */ 340 + bad: ret #0 /* SECCOMP_RET_KILL_THREAD */ 341 341 good: ret #0x7fff0000 /* SECCOMP_RET_ALLOW */ 342 342 343 343 The above example code can be placed into a file (here called "foo"), and
+1
Documentation/sysctl/kernel.txt
··· 75 75 - reboot-cmd [ SPARC only ] 76 76 - rtsig-max 77 77 - rtsig-nr 78 + - seccomp/ ==> Documentation/userspace-api/seccomp_filter.rst 78 79 - sem 79 80 - sem_next_id [ sysv ipc ] 80 81 - sg-big-buff [ generic SCSI device (sg) ]
+50 -2
Documentation/userspace-api/seccomp_filter.rst
··· 87 87 A seccomp filter may return any of the following values. If multiple 88 88 filters exist, the return value for the evaluation of a given system 89 89 call will always use the highest precedent value. (For example, 90 - ``SECCOMP_RET_KILL`` will always take precedence.) 90 + ``SECCOMP_RET_KILL_PROCESS`` will always take precedence.) 91 91 92 92 In precedence order, they are: 93 93 94 - ``SECCOMP_RET_KILL``: 94 + ``SECCOMP_RET_KILL_PROCESS``: 95 + Results in the entire process exiting immediately without executing 96 + the system call. The exit status of the task (``status & 0x7f``) 97 + will be ``SIGSYS``, not ``SIGKILL``. 98 + 99 + ``SECCOMP_RET_KILL_THREAD``: 95 100 Results in the task exiting immediately without executing the 96 101 system call. The exit status of the task (``status & 0x7f``) will 97 102 be ``SIGSYS``, not ``SIGKILL``. ··· 146 141 allow use of ptrace, even of other sandboxed processes, without 147 142 extreme care; ptracers can use this mechanism to escape.) 148 143 144 + ``SECCOMP_RET_LOG``: 145 + Results in the system call being executed after it is logged. This 146 + should be used by application developers to learn which syscalls their 147 + application needs without having to iterate through multiple test and 148 + development cycles to build the list. 149 + 150 + This action will only be logged if "log" is present in the 151 + actions_logged sysctl string. 152 + 149 153 ``SECCOMP_RET_ALLOW``: 150 154 Results in the system call being executed. 151 155 ··· 183 169 and a more generic example of a higher level macro interface for BPF 184 170 program generation. 185 171 172 + Sysctls 173 + ======= 186 174 175 + Seccomp's sysctl files can be found in the ``/proc/sys/kernel/seccomp/`` 176 + directory. Here's a description of each file in that directory: 177 + 178 + ``actions_avail``: 179 + A read-only ordered list of seccomp return values (refer to the 180 + ``SECCOMP_RET_*`` macros above) in string form. The ordering, from 181 + left-to-right, is the least permissive return value to the most 182 + permissive return value. 183 + 184 + The list represents the set of seccomp return values supported 185 + by the kernel. A userspace program may use this list to 186 + determine if the actions found in the ``seccomp.h``, when the 187 + program was built, differs from the set of actions actually 188 + supported in the current running kernel. 189 + 190 + ``actions_logged``: 191 + A read-write ordered list of seccomp return values (refer to the 192 + ``SECCOMP_RET_*`` macros above) that are allowed to be logged. Writes 193 + to the file do not need to be in ordered form but reads from the file 194 + will be ordered in the same way as the actions_avail sysctl. 195 + 196 + It is important to note that the value of ``actions_logged`` does not 197 + prevent certain actions from being logged when the audit subsystem is 198 + configured to audit a task. If the action is not found in 199 + ``actions_logged`` list, the final decision on whether to audit the 200 + action for that task is ultimately left up to the audit subsystem to 201 + decide for all seccomp return values other than ``SECCOMP_RET_ALLOW``. 202 + 203 + The ``allow`` string is not accepted in the ``actions_logged`` sysctl 204 + as it is not possible to log ``SECCOMP_RET_ALLOW`` actions. Attempting 205 + to write ``allow`` to the sysctl will result in an EINVAL being 206 + returned. 187 207 188 208 Adding architecture support 189 209 ===========================
+1 -5
include/linux/audit.h
··· 314 314 315 315 static inline void audit_seccomp(unsigned long syscall, long signr, int code) 316 316 { 317 - if (!audit_enabled) 318 - return; 319 - 320 - /* Force a record to be reported if a signal was delivered. */ 321 - if (signr || unlikely(!audit_dummy_context())) 317 + if (audit_enabled && unlikely(!audit_dummy_context())) 322 318 __audit_seccomp(syscall, signr, code); 323 319 } 324 320
+2 -1
include/linux/seccomp.h
··· 3 3 4 4 #include <uapi/linux/seccomp.h> 5 5 6 - #define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC) 6 + #define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ 7 + SECCOMP_FILTER_FLAG_LOG) 7 8 8 9 #ifdef CONFIG_SECCOMP 9 10
+15 -8
include/uapi/linux/seccomp.h
··· 11 11 #define SECCOMP_MODE_FILTER 2 /* uses user-supplied filter. */ 12 12 13 13 /* Valid operations for seccomp syscall. */ 14 - #define SECCOMP_SET_MODE_STRICT 0 15 - #define SECCOMP_SET_MODE_FILTER 1 14 + #define SECCOMP_SET_MODE_STRICT 0 15 + #define SECCOMP_SET_MODE_FILTER 1 16 + #define SECCOMP_GET_ACTION_AVAIL 2 16 17 17 18 /* Valid flags for SECCOMP_SET_MODE_FILTER */ 18 19 #define SECCOMP_FILTER_FLAG_TSYNC 1 20 + #define SECCOMP_FILTER_FLAG_LOG 2 19 21 20 22 /* 21 23 * All BPF programs must return a 32-bit value. 22 24 * The bottom 16-bits are for optional return data. 23 - * The upper 16-bits are ordered from least permissive values to most. 25 + * The upper 16-bits are ordered from least permissive values to most, 26 + * as a signed value (so 0x8000000 is negative). 24 27 * 25 28 * The ordering ensures that a min_t() over composed return values always 26 29 * selects the least permissive choice. 27 30 */ 28 - #define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */ 29 - #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ 30 - #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ 31 - #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */ 32 - #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ 31 + #define SECCOMP_RET_KILL_PROCESS 0x80000000U /* kill the process */ 32 + #define SECCOMP_RET_KILL_THREAD 0x00000000U /* kill the thread */ 33 + #define SECCOMP_RET_KILL SECCOMP_RET_KILL_THREAD 34 + #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ 35 + #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ 36 + #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */ 37 + #define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */ 38 + #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ 33 39 34 40 /* Masks for the return value sections. */ 41 + #define SECCOMP_RET_ACTION_FULL 0xffff0000U 35 42 #define SECCOMP_RET_ACTION 0x7fff0000U 36 43 #define SECCOMP_RET_DATA 0x0000ffffU 37 44
+311 -12
kernel/seccomp.c
··· 17 17 #include <linux/audit.h> 18 18 #include <linux/compat.h> 19 19 #include <linux/coredump.h> 20 + #include <linux/kmemleak.h> 20 21 #include <linux/sched.h> 21 22 #include <linux/sched/task_stack.h> 22 23 #include <linux/seccomp.h> 23 24 #include <linux/slab.h> 24 25 #include <linux/syscalls.h> 26 + #include <linux/sysctl.h> 25 27 26 28 #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER 27 29 #include <asm/syscall.h> ··· 44 42 * get/put helpers should be used when accessing an instance 45 43 * outside of a lifetime-guarded section. In general, this 46 44 * is only needed for handling filters shared across tasks. 45 + * @log: true if all actions except for SECCOMP_RET_ALLOW should be logged 47 46 * @prev: points to a previously installed, or inherited, filter 48 47 * @prog: the BPF program to evaluate 49 48 * ··· 60 57 */ 61 58 struct seccomp_filter { 62 59 refcount_t usage; 60 + bool log; 63 61 struct seccomp_filter *prev; 64 62 struct bpf_prog *prog; 65 63 }; ··· 175 171 /** 176 172 * seccomp_run_filters - evaluates all seccomp filters against @sd 177 173 * @sd: optional seccomp data to be passed to filters 174 + * @match: stores struct seccomp_filter that resulted in the return value, 175 + * unless filter returned SECCOMP_RET_ALLOW, in which case it will 176 + * be unchanged. 178 177 * 179 178 * Returns valid seccomp BPF response codes. 180 179 */ 181 - static u32 seccomp_run_filters(const struct seccomp_data *sd) 180 + #define ACTION_ONLY(ret) ((s32)((ret) & (SECCOMP_RET_ACTION_FULL))) 181 + static u32 seccomp_run_filters(const struct seccomp_data *sd, 182 + struct seccomp_filter **match) 182 183 { 183 184 struct seccomp_data sd_local; 184 185 u32 ret = SECCOMP_RET_ALLOW; ··· 193 184 194 185 /* Ensure unexpected behavior doesn't result in failing open. */ 195 186 if (unlikely(WARN_ON(f == NULL))) 196 - return SECCOMP_RET_KILL; 187 + return SECCOMP_RET_KILL_PROCESS; 197 188 198 189 if (!sd) { 199 190 populate_seccomp_data(&sd_local); ··· 207 198 for (; f; f = f->prev) { 208 199 u32 cur_ret = BPF_PROG_RUN(f->prog, sd); 209 200 210 - if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) 201 + if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) { 211 202 ret = cur_ret; 203 + *match = f; 204 + } 212 205 } 213 206 return ret; 214 207 } ··· 455 444 return ret; 456 445 } 457 446 447 + /* Set log flag, if present. */ 448 + if (flags & SECCOMP_FILTER_FLAG_LOG) 449 + filter->log = true; 450 + 458 451 /* 459 452 * If there is an existing filter, make it the prev and don't drop its 460 453 * task reference. ··· 529 514 } 530 515 #endif /* CONFIG_SECCOMP_FILTER */ 531 516 517 + /* For use with seccomp_actions_logged */ 518 + #define SECCOMP_LOG_KILL_PROCESS (1 << 0) 519 + #define SECCOMP_LOG_KILL_THREAD (1 << 1) 520 + #define SECCOMP_LOG_TRAP (1 << 2) 521 + #define SECCOMP_LOG_ERRNO (1 << 3) 522 + #define SECCOMP_LOG_TRACE (1 << 4) 523 + #define SECCOMP_LOG_LOG (1 << 5) 524 + #define SECCOMP_LOG_ALLOW (1 << 6) 525 + 526 + static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS | 527 + SECCOMP_LOG_KILL_THREAD | 528 + SECCOMP_LOG_TRAP | 529 + SECCOMP_LOG_ERRNO | 530 + SECCOMP_LOG_TRACE | 531 + SECCOMP_LOG_LOG; 532 + 533 + static inline void seccomp_log(unsigned long syscall, long signr, u32 action, 534 + bool requested) 535 + { 536 + bool log = false; 537 + 538 + switch (action) { 539 + case SECCOMP_RET_ALLOW: 540 + break; 541 + case SECCOMP_RET_TRAP: 542 + log = requested && seccomp_actions_logged & SECCOMP_LOG_TRAP; 543 + break; 544 + case SECCOMP_RET_ERRNO: 545 + log = requested && seccomp_actions_logged & SECCOMP_LOG_ERRNO; 546 + break; 547 + case SECCOMP_RET_TRACE: 548 + log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE; 549 + break; 550 + case SECCOMP_RET_LOG: 551 + log = seccomp_actions_logged & SECCOMP_LOG_LOG; 552 + break; 553 + case SECCOMP_RET_KILL_THREAD: 554 + log = seccomp_actions_logged & SECCOMP_LOG_KILL_THREAD; 555 + break; 556 + case SECCOMP_RET_KILL_PROCESS: 557 + default: 558 + log = seccomp_actions_logged & SECCOMP_LOG_KILL_PROCESS; 559 + } 560 + 561 + /* 562 + * Force an audit message to be emitted when the action is RET_KILL_*, 563 + * RET_LOG, or the FILTER_FLAG_LOG bit was set and the action is 564 + * allowed to be logged by the admin. 565 + */ 566 + if (log) 567 + return __audit_seccomp(syscall, signr, action); 568 + 569 + /* 570 + * Let the audit subsystem decide if the action should be audited based 571 + * on whether the current task itself is being audited. 572 + */ 573 + return audit_seccomp(syscall, signr, action); 574 + } 575 + 532 576 /* 533 577 * Secure computing mode 1 allows only read/write/exit/sigreturn. 534 578 * To be fully secure this must be combined with rlimit ··· 613 539 #ifdef SECCOMP_DEBUG 614 540 dump_stack(); 615 541 #endif 616 - audit_seccomp(this_syscall, SIGKILL, SECCOMP_RET_KILL); 542 + seccomp_log(this_syscall, SIGKILL, SECCOMP_RET_KILL_THREAD, true); 617 543 do_exit(SIGKILL); 618 544 } 619 545 ··· 640 566 const bool recheck_after_trace) 641 567 { 642 568 u32 filter_ret, action; 569 + struct seccomp_filter *match = NULL; 643 570 int data; 644 571 645 572 /* ··· 649 574 */ 650 575 rmb(); 651 576 652 - filter_ret = seccomp_run_filters(sd); 577 + filter_ret = seccomp_run_filters(sd, &match); 653 578 data = filter_ret & SECCOMP_RET_DATA; 654 - action = filter_ret & SECCOMP_RET_ACTION; 579 + action = filter_ret & SECCOMP_RET_ACTION_FULL; 655 580 656 581 switch (action) { 657 582 case SECCOMP_RET_ERRNO: ··· 712 637 713 638 return 0; 714 639 715 - case SECCOMP_RET_ALLOW: 640 + case SECCOMP_RET_LOG: 641 + seccomp_log(this_syscall, 0, action, true); 716 642 return 0; 717 643 718 - case SECCOMP_RET_KILL: 644 + case SECCOMP_RET_ALLOW: 645 + /* 646 + * Note that the "match" filter will always be NULL for 647 + * this action since SECCOMP_RET_ALLOW is the starting 648 + * state in seccomp_run_filters(). 649 + */ 650 + return 0; 651 + 652 + case SECCOMP_RET_KILL_THREAD: 653 + case SECCOMP_RET_KILL_PROCESS: 719 654 default: 720 - audit_seccomp(this_syscall, SIGSYS, action); 655 + seccomp_log(this_syscall, SIGSYS, action, true); 721 656 /* Dump core only if this is the last remaining thread. */ 722 - if (get_nr_threads(current) == 1) { 657 + if (action == SECCOMP_RET_KILL_PROCESS || 658 + get_nr_threads(current) == 1) { 723 659 siginfo_t info; 724 660 725 661 /* Show the original registers in the dump. */ ··· 739 653 seccomp_init_siginfo(&info, this_syscall, data); 740 654 do_coredump(&info); 741 655 } 742 - do_exit(SIGSYS); 656 + if (action == SECCOMP_RET_KILL_PROCESS) 657 + do_group_exit(SIGSYS); 658 + else 659 + do_exit(SIGSYS); 743 660 } 744 661 745 662 unreachable(); 746 663 747 664 skip: 748 - audit_seccomp(this_syscall, 0, action); 665 + seccomp_log(this_syscall, 0, action, match ? match->log : false); 749 666 return -1; 750 667 } 751 668 #else ··· 883 794 } 884 795 #endif 885 796 797 + static long seccomp_get_action_avail(const char __user *uaction) 798 + { 799 + u32 action; 800 + 801 + if (copy_from_user(&action, uaction, sizeof(action))) 802 + return -EFAULT; 803 + 804 + switch (action) { 805 + case SECCOMP_RET_KILL_PROCESS: 806 + case SECCOMP_RET_KILL_THREAD: 807 + case SECCOMP_RET_TRAP: 808 + case SECCOMP_RET_ERRNO: 809 + case SECCOMP_RET_TRACE: 810 + case SECCOMP_RET_LOG: 811 + case SECCOMP_RET_ALLOW: 812 + break; 813 + default: 814 + return -EOPNOTSUPP; 815 + } 816 + 817 + return 0; 818 + } 819 + 886 820 /* Common entry point for both prctl and syscall. */ 887 821 static long do_seccomp(unsigned int op, unsigned int flags, 888 822 const char __user *uargs) ··· 917 805 return seccomp_set_mode_strict(); 918 806 case SECCOMP_SET_MODE_FILTER: 919 807 return seccomp_set_mode_filter(flags, uargs); 808 + case SECCOMP_GET_ACTION_AVAIL: 809 + if (flags != 0) 810 + return -EINVAL; 811 + 812 + return seccomp_get_action_avail(uargs); 920 813 default: 921 814 return -EINVAL; 922 815 } ··· 1039 922 return ret; 1040 923 } 1041 924 #endif 925 + 926 + #ifdef CONFIG_SYSCTL 927 + 928 + /* Human readable action names for friendly sysctl interaction */ 929 + #define SECCOMP_RET_KILL_PROCESS_NAME "kill_process" 930 + #define SECCOMP_RET_KILL_THREAD_NAME "kill_thread" 931 + #define SECCOMP_RET_TRAP_NAME "trap" 932 + #define SECCOMP_RET_ERRNO_NAME "errno" 933 + #define SECCOMP_RET_TRACE_NAME "trace" 934 + #define SECCOMP_RET_LOG_NAME "log" 935 + #define SECCOMP_RET_ALLOW_NAME "allow" 936 + 937 + static const char seccomp_actions_avail[] = 938 + SECCOMP_RET_KILL_PROCESS_NAME " " 939 + SECCOMP_RET_KILL_THREAD_NAME " " 940 + SECCOMP_RET_TRAP_NAME " " 941 + SECCOMP_RET_ERRNO_NAME " " 942 + SECCOMP_RET_TRACE_NAME " " 943 + SECCOMP_RET_LOG_NAME " " 944 + SECCOMP_RET_ALLOW_NAME; 945 + 946 + struct seccomp_log_name { 947 + u32 log; 948 + const char *name; 949 + }; 950 + 951 + static const struct seccomp_log_name seccomp_log_names[] = { 952 + { SECCOMP_LOG_KILL_PROCESS, SECCOMP_RET_KILL_PROCESS_NAME }, 953 + { SECCOMP_LOG_KILL_THREAD, SECCOMP_RET_KILL_THREAD_NAME }, 954 + { SECCOMP_LOG_TRAP, SECCOMP_RET_TRAP_NAME }, 955 + { SECCOMP_LOG_ERRNO, SECCOMP_RET_ERRNO_NAME }, 956 + { SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME }, 957 + { SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME }, 958 + { SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME }, 959 + { } 960 + }; 961 + 962 + static bool seccomp_names_from_actions_logged(char *names, size_t size, 963 + u32 actions_logged) 964 + { 965 + const struct seccomp_log_name *cur; 966 + bool append_space = false; 967 + 968 + for (cur = seccomp_log_names; cur->name && size; cur++) { 969 + ssize_t ret; 970 + 971 + if (!(actions_logged & cur->log)) 972 + continue; 973 + 974 + if (append_space) { 975 + ret = strscpy(names, " ", size); 976 + if (ret < 0) 977 + return false; 978 + 979 + names += ret; 980 + size -= ret; 981 + } else 982 + append_space = true; 983 + 984 + ret = strscpy(names, cur->name, size); 985 + if (ret < 0) 986 + return false; 987 + 988 + names += ret; 989 + size -= ret; 990 + } 991 + 992 + return true; 993 + } 994 + 995 + static bool seccomp_action_logged_from_name(u32 *action_logged, 996 + const char *name) 997 + { 998 + const struct seccomp_log_name *cur; 999 + 1000 + for (cur = seccomp_log_names; cur->name; cur++) { 1001 + if (!strcmp(cur->name, name)) { 1002 + *action_logged = cur->log; 1003 + return true; 1004 + } 1005 + } 1006 + 1007 + return false; 1008 + } 1009 + 1010 + static bool seccomp_actions_logged_from_names(u32 *actions_logged, char *names) 1011 + { 1012 + char *name; 1013 + 1014 + *actions_logged = 0; 1015 + while ((name = strsep(&names, " ")) && *name) { 1016 + u32 action_logged = 0; 1017 + 1018 + if (!seccomp_action_logged_from_name(&action_logged, name)) 1019 + return false; 1020 + 1021 + *actions_logged |= action_logged; 1022 + } 1023 + 1024 + return true; 1025 + } 1026 + 1027 + static int seccomp_actions_logged_handler(struct ctl_table *ro_table, int write, 1028 + void __user *buffer, size_t *lenp, 1029 + loff_t *ppos) 1030 + { 1031 + char names[sizeof(seccomp_actions_avail)]; 1032 + struct ctl_table table; 1033 + int ret; 1034 + 1035 + if (write && !capable(CAP_SYS_ADMIN)) 1036 + return -EPERM; 1037 + 1038 + memset(names, 0, sizeof(names)); 1039 + 1040 + if (!write) { 1041 + if (!seccomp_names_from_actions_logged(names, sizeof(names), 1042 + seccomp_actions_logged)) 1043 + return -EINVAL; 1044 + } 1045 + 1046 + table = *ro_table; 1047 + table.data = names; 1048 + table.maxlen = sizeof(names); 1049 + ret = proc_dostring(&table, write, buffer, lenp, ppos); 1050 + if (ret) 1051 + return ret; 1052 + 1053 + if (write) { 1054 + u32 actions_logged; 1055 + 1056 + if (!seccomp_actions_logged_from_names(&actions_logged, 1057 + table.data)) 1058 + return -EINVAL; 1059 + 1060 + if (actions_logged & SECCOMP_LOG_ALLOW) 1061 + return -EINVAL; 1062 + 1063 + seccomp_actions_logged = actions_logged; 1064 + } 1065 + 1066 + return 0; 1067 + } 1068 + 1069 + static struct ctl_path seccomp_sysctl_path[] = { 1070 + { .procname = "kernel", }, 1071 + { .procname = "seccomp", }, 1072 + { } 1073 + }; 1074 + 1075 + static struct ctl_table seccomp_sysctl_table[] = { 1076 + { 1077 + .procname = "actions_avail", 1078 + .data = (void *) &seccomp_actions_avail, 1079 + .maxlen = sizeof(seccomp_actions_avail), 1080 + .mode = 0444, 1081 + .proc_handler = proc_dostring, 1082 + }, 1083 + { 1084 + .procname = "actions_logged", 1085 + .mode = 0644, 1086 + .proc_handler = seccomp_actions_logged_handler, 1087 + }, 1088 + { } 1089 + }; 1090 + 1091 + static int __init seccomp_sysctl_init(void) 1092 + { 1093 + struct ctl_table_header *hdr; 1094 + 1095 + hdr = register_sysctl_paths(seccomp_sysctl_path, seccomp_sysctl_table); 1096 + if (!hdr) 1097 + pr_warn("seccomp: sysctl registration failed\n"); 1098 + else 1099 + kmemleak_not_leak(hdr); 1100 + 1101 + return 0; 1102 + } 1103 + 1104 + device_initcall(seccomp_sysctl_init) 1105 + 1106 + #endif /* CONFIG_SYSCTL */
+13 -5
tools/testing/selftests/seccomp/Makefile
··· 1 - TEST_GEN_PROGS := seccomp_bpf 2 - CFLAGS += -Wl,-no-as-needed -Wall 3 - LDFLAGS += -lpthread 1 + all: 4 2 5 3 include ../lib.mk 6 4 7 - $(TEST_GEN_PROGS): seccomp_bpf.c ../kselftest_harness.h 8 - $(CC) $(CFLAGS) $(LDFLAGS) $< -o $@ 5 + .PHONY: all clean 6 + 7 + BINARIES := seccomp_bpf seccomp_benchmark 8 + CFLAGS += -Wl,-no-as-needed -Wall 9 + 10 + seccomp_bpf: seccomp_bpf.c ../kselftest_harness.h 11 + $(CC) $(CFLAGS) $(LDFLAGS) -lpthread $< -o $@ 12 + 13 + TEST_PROGS += $(BINARIES) 14 + EXTRA_CLEAN := $(BINARIES) 15 + 16 + all: $(BINARIES)
+99
tools/testing/selftests/seccomp/seccomp_benchmark.c
··· 1 + /* 2 + * Strictly speaking, this is not a test. But it can report during test 3 + * runs so relative performace can be measured. 4 + */ 5 + #define _GNU_SOURCE 6 + #include <assert.h> 7 + #include <stdio.h> 8 + #include <stdlib.h> 9 + #include <time.h> 10 + #include <unistd.h> 11 + #include <linux/filter.h> 12 + #include <linux/seccomp.h> 13 + #include <sys/prctl.h> 14 + #include <sys/syscall.h> 15 + #include <sys/types.h> 16 + 17 + #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0])) 18 + 19 + unsigned long long timing(clockid_t clk_id, unsigned long long samples) 20 + { 21 + pid_t pid, ret; 22 + unsigned long long i; 23 + struct timespec start, finish; 24 + 25 + pid = getpid(); 26 + assert(clock_gettime(clk_id, &start) == 0); 27 + for (i = 0; i < samples; i++) { 28 + ret = syscall(__NR_getpid); 29 + assert(pid == ret); 30 + } 31 + assert(clock_gettime(clk_id, &finish) == 0); 32 + 33 + i = finish.tv_sec - start.tv_sec; 34 + i *= 1000000000; 35 + i += finish.tv_nsec - start.tv_nsec; 36 + 37 + printf("%lu.%09lu - %lu.%09lu = %llu\n", 38 + finish.tv_sec, finish.tv_nsec, 39 + start.tv_sec, start.tv_nsec, 40 + i); 41 + 42 + return i; 43 + } 44 + 45 + unsigned long long calibrate(void) 46 + { 47 + unsigned long long i; 48 + 49 + printf("Calibrating reasonable sample size...\n"); 50 + 51 + for (i = 5; ; i++) { 52 + unsigned long long samples = 1 << i; 53 + 54 + /* Find something that takes more than 5 seconds to run. */ 55 + if (timing(CLOCK_REALTIME, samples) / 1000000000ULL > 5) 56 + return samples; 57 + } 58 + } 59 + 60 + int main(int argc, char *argv[]) 61 + { 62 + struct sock_filter filter[] = { 63 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 64 + }; 65 + struct sock_fprog prog = { 66 + .len = (unsigned short)ARRAY_SIZE(filter), 67 + .filter = filter, 68 + }; 69 + long ret; 70 + unsigned long long samples; 71 + unsigned long long native, filtered; 72 + 73 + if (argc > 1) 74 + samples = strtoull(argv[1], NULL, 0); 75 + else 76 + samples = calibrate(); 77 + 78 + printf("Benchmarking %llu samples...\n", samples); 79 + 80 + native = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples; 81 + printf("getpid native: %llu ns\n", native); 82 + 83 + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 84 + assert(ret == 0); 85 + 86 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog); 87 + assert(ret == 0); 88 + 89 + filtered = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples; 90 + printf("getpid RET_ALLOW: %llu ns\n", filtered); 91 + 92 + printf("Estimated seccomp overhead per syscall: %llu ns\n", 93 + filtered - native); 94 + 95 + if (filtered == native) 96 + printf("Trying running again with more samples.\n"); 97 + 98 + return 0; 99 + }
+514 -96
tools/testing/selftests/seccomp/seccomp_bpf.c
··· 68 68 #define SECCOMP_MODE_FILTER 2 69 69 #endif 70 70 71 - #ifndef SECCOMP_RET_KILL 72 - #define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */ 73 - #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ 74 - #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ 75 - #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */ 76 - #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ 77 - 78 - /* Masks for the return value sections. */ 79 - #define SECCOMP_RET_ACTION 0x7fff0000U 80 - #define SECCOMP_RET_DATA 0x0000ffffU 81 - 71 + #ifndef SECCOMP_RET_ALLOW 82 72 struct seccomp_data { 83 73 int nr; 84 74 __u32 arch; 85 75 __u64 instruction_pointer; 86 76 __u64 args[6]; 87 77 }; 78 + #endif 79 + 80 + #ifndef SECCOMP_RET_KILL_PROCESS 81 + #define SECCOMP_RET_KILL_PROCESS 0x80000000U /* kill the process */ 82 + #define SECCOMP_RET_KILL_THREAD 0x00000000U /* kill the thread */ 83 + #endif 84 + #ifndef SECCOMP_RET_KILL 85 + #define SECCOMP_RET_KILL SECCOMP_RET_KILL_THREAD 86 + #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ 87 + #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ 88 + #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */ 89 + #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ 90 + #endif 91 + #ifndef SECCOMP_RET_LOG 92 + #define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */ 93 + #endif 94 + 95 + #ifndef __NR_seccomp 96 + # if defined(__i386__) 97 + # define __NR_seccomp 354 98 + # elif defined(__x86_64__) 99 + # define __NR_seccomp 317 100 + # elif defined(__arm__) 101 + # define __NR_seccomp 383 102 + # elif defined(__aarch64__) 103 + # define __NR_seccomp 277 104 + # elif defined(__hppa__) 105 + # define __NR_seccomp 338 106 + # elif defined(__powerpc__) 107 + # define __NR_seccomp 358 108 + # elif defined(__s390__) 109 + # define __NR_seccomp 348 110 + # else 111 + # warning "seccomp syscall number unknown for this architecture" 112 + # define __NR_seccomp 0xffff 113 + # endif 114 + #endif 115 + 116 + #ifndef SECCOMP_SET_MODE_STRICT 117 + #define SECCOMP_SET_MODE_STRICT 0 118 + #endif 119 + 120 + #ifndef SECCOMP_SET_MODE_FILTER 121 + #define SECCOMP_SET_MODE_FILTER 1 122 + #endif 123 + 124 + #ifndef SECCOMP_GET_ACTION_AVAIL 125 + #define SECCOMP_GET_ACTION_AVAIL 2 126 + #endif 127 + 128 + #ifndef SECCOMP_FILTER_FLAG_TSYNC 129 + #define SECCOMP_FILTER_FLAG_TSYNC 1 130 + #endif 131 + 132 + #ifndef SECCOMP_FILTER_FLAG_LOG 133 + #define SECCOMP_FILTER_FLAG_LOG 2 134 + #endif 135 + 136 + #ifndef seccomp 137 + int seccomp(unsigned int op, unsigned int flags, void *args) 138 + { 139 + errno = 0; 140 + return syscall(__NR_seccomp, op, flags, args); 141 + } 88 142 #endif 89 143 90 144 #if __BYTE_ORDER == __LITTLE_ENDIAN ··· 190 136 } 191 137 } 192 138 193 - /* Tests kernel support by checking for a copy_from_user() fault on * NULL. */ 139 + /* Tests kernel support by checking for a copy_from_user() fault on NULL. */ 194 140 TEST(mode_filter_support) 195 141 { 196 142 long ret; ··· 396 342 EXPECT_EQ(EINVAL, errno); 397 343 } 398 344 345 + TEST(log_all) 346 + { 347 + struct sock_filter filter[] = { 348 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_LOG), 349 + }; 350 + struct sock_fprog prog = { 351 + .len = (unsigned short)ARRAY_SIZE(filter), 352 + .filter = filter, 353 + }; 354 + long ret; 355 + pid_t parent = getppid(); 356 + 357 + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 358 + ASSERT_EQ(0, ret); 359 + 360 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog); 361 + ASSERT_EQ(0, ret); 362 + 363 + /* getppid() should succeed and be logged (no check for logging) */ 364 + EXPECT_EQ(parent, syscall(__NR_getppid)); 365 + } 366 + 399 367 TEST_SIGNAL(unknown_ret_is_kill_inside, SIGSYS) 400 368 { 401 369 struct sock_filter filter[] = { ··· 596 520 close(fd); 597 521 } 598 522 523 + /* This is a thread task to die via seccomp filter violation. */ 524 + void *kill_thread(void *data) 525 + { 526 + bool die = (bool)data; 527 + 528 + if (die) { 529 + prctl(PR_GET_SECCOMP, 0, 0, 0, 0); 530 + return (void *)SIBLING_EXIT_FAILURE; 531 + } 532 + 533 + return (void *)SIBLING_EXIT_UNKILLED; 534 + } 535 + 536 + /* Prepare a thread that will kill itself or both of us. */ 537 + void kill_thread_or_group(struct __test_metadata *_metadata, bool kill_process) 538 + { 539 + pthread_t thread; 540 + void *status; 541 + /* Kill only when calling __NR_prctl. */ 542 + struct sock_filter filter_thread[] = { 543 + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 544 + offsetof(struct seccomp_data, nr)), 545 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_prctl, 0, 1), 546 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL_THREAD), 547 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 548 + }; 549 + struct sock_fprog prog_thread = { 550 + .len = (unsigned short)ARRAY_SIZE(filter_thread), 551 + .filter = filter_thread, 552 + }; 553 + struct sock_filter filter_process[] = { 554 + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 555 + offsetof(struct seccomp_data, nr)), 556 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_prctl, 0, 1), 557 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL_PROCESS), 558 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 559 + }; 560 + struct sock_fprog prog_process = { 561 + .len = (unsigned short)ARRAY_SIZE(filter_process), 562 + .filter = filter_process, 563 + }; 564 + 565 + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { 566 + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); 567 + } 568 + 569 + ASSERT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER, 0, 570 + kill_process ? &prog_process : &prog_thread)); 571 + 572 + /* 573 + * Add the KILL_THREAD rule again to make sure that the KILL_PROCESS 574 + * flag cannot be downgraded by a new filter. 575 + */ 576 + ASSERT_EQ(0, seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog_thread)); 577 + 578 + /* Start a thread that will exit immediately. */ 579 + ASSERT_EQ(0, pthread_create(&thread, NULL, kill_thread, (void *)false)); 580 + ASSERT_EQ(0, pthread_join(thread, &status)); 581 + ASSERT_EQ(SIBLING_EXIT_UNKILLED, (unsigned long)status); 582 + 583 + /* Start a thread that will die immediately. */ 584 + ASSERT_EQ(0, pthread_create(&thread, NULL, kill_thread, (void *)true)); 585 + ASSERT_EQ(0, pthread_join(thread, &status)); 586 + ASSERT_NE(SIBLING_EXIT_FAILURE, (unsigned long)status); 587 + 588 + /* 589 + * If we get here, only the spawned thread died. Let the parent know 590 + * the whole process didn't die (i.e. this thread, the spawner, 591 + * stayed running). 592 + */ 593 + exit(42); 594 + } 595 + 596 + TEST(KILL_thread) 597 + { 598 + int status; 599 + pid_t child_pid; 600 + 601 + child_pid = fork(); 602 + ASSERT_LE(0, child_pid); 603 + if (child_pid == 0) { 604 + kill_thread_or_group(_metadata, false); 605 + _exit(38); 606 + } 607 + 608 + ASSERT_EQ(child_pid, waitpid(child_pid, &status, 0)); 609 + 610 + /* If only the thread was killed, we'll see exit 42. */ 611 + ASSERT_TRUE(WIFEXITED(status)); 612 + ASSERT_EQ(42, WEXITSTATUS(status)); 613 + } 614 + 615 + TEST(KILL_process) 616 + { 617 + int status; 618 + pid_t child_pid; 619 + 620 + child_pid = fork(); 621 + ASSERT_LE(0, child_pid); 622 + if (child_pid == 0) { 623 + kill_thread_or_group(_metadata, true); 624 + _exit(38); 625 + } 626 + 627 + ASSERT_EQ(child_pid, waitpid(child_pid, &status, 0)); 628 + 629 + /* If the entire process was killed, we'll see SIGSYS. */ 630 + ASSERT_TRUE(WIFSIGNALED(status)); 631 + ASSERT_EQ(SIGSYS, WTERMSIG(status)); 632 + } 633 + 599 634 /* TODO(wad) add 64-bit versus 32-bit arg tests. */ 600 635 TEST(arg_out_of_range) 601 636 { ··· 728 541 EXPECT_EQ(EINVAL, errno); 729 542 } 730 543 544 + #define ERRNO_FILTER(name, errno) \ 545 + struct sock_filter _read_filter_##name[] = { \ 546 + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, \ 547 + offsetof(struct seccomp_data, nr)), \ 548 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1), \ 549 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | errno), \ 550 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), \ 551 + }; \ 552 + struct sock_fprog prog_##name = { \ 553 + .len = (unsigned short)ARRAY_SIZE(_read_filter_##name), \ 554 + .filter = _read_filter_##name, \ 555 + } 556 + 557 + /* Make sure basic errno values are correctly passed through a filter. */ 731 558 TEST(ERRNO_valid) 732 559 { 733 - struct sock_filter filter[] = { 734 - BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 735 - offsetof(struct seccomp_data, nr)), 736 - BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1), 737 - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | E2BIG), 738 - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 739 - }; 740 - struct sock_fprog prog = { 741 - .len = (unsigned short)ARRAY_SIZE(filter), 742 - .filter = filter, 743 - }; 560 + ERRNO_FILTER(valid, E2BIG); 744 561 long ret; 745 562 pid_t parent = getppid(); 746 563 747 564 ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 748 565 ASSERT_EQ(0, ret); 749 566 750 - ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog); 567 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_valid); 751 568 ASSERT_EQ(0, ret); 752 569 753 570 EXPECT_EQ(parent, syscall(__NR_getppid)); ··· 759 568 EXPECT_EQ(E2BIG, errno); 760 569 } 761 570 571 + /* Make sure an errno of zero is correctly handled by the arch code. */ 762 572 TEST(ERRNO_zero) 763 573 { 764 - struct sock_filter filter[] = { 765 - BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 766 - offsetof(struct seccomp_data, nr)), 767 - BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1), 768 - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | 0), 769 - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 770 - }; 771 - struct sock_fprog prog = { 772 - .len = (unsigned short)ARRAY_SIZE(filter), 773 - .filter = filter, 774 - }; 574 + ERRNO_FILTER(zero, 0); 775 575 long ret; 776 576 pid_t parent = getppid(); 777 577 778 578 ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 779 579 ASSERT_EQ(0, ret); 780 580 781 - ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog); 581 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_zero); 782 582 ASSERT_EQ(0, ret); 783 583 784 584 EXPECT_EQ(parent, syscall(__NR_getppid)); ··· 777 595 EXPECT_EQ(0, read(0, NULL, 0)); 778 596 } 779 597 598 + /* 599 + * The SECCOMP_RET_DATA mask is 16 bits wide, but errno is smaller. 600 + * This tests that the errno value gets capped correctly, fixed by 601 + * 580c57f10768 ("seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO"). 602 + */ 780 603 TEST(ERRNO_capped) 781 604 { 782 - struct sock_filter filter[] = { 783 - BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 784 - offsetof(struct seccomp_data, nr)), 785 - BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_read, 0, 1), 786 - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ERRNO | 4096), 787 - BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 788 - }; 789 - struct sock_fprog prog = { 790 - .len = (unsigned short)ARRAY_SIZE(filter), 791 - .filter = filter, 792 - }; 605 + ERRNO_FILTER(capped, 4096); 793 606 long ret; 794 607 pid_t parent = getppid(); 795 608 796 609 ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 797 610 ASSERT_EQ(0, ret); 798 611 799 - ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog); 612 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_capped); 800 613 ASSERT_EQ(0, ret); 801 614 802 615 EXPECT_EQ(parent, syscall(__NR_getppid)); 803 616 EXPECT_EQ(-1, read(0, NULL, 0)); 804 617 EXPECT_EQ(4095, errno); 618 + } 619 + 620 + /* 621 + * Filters are processed in reverse order: last applied is executed first. 622 + * Since only the SECCOMP_RET_ACTION mask is tested for return values, the 623 + * SECCOMP_RET_DATA mask results will follow the most recently applied 624 + * matching filter return (and not the lowest or highest value). 625 + */ 626 + TEST(ERRNO_order) 627 + { 628 + ERRNO_FILTER(first, 11); 629 + ERRNO_FILTER(second, 13); 630 + ERRNO_FILTER(third, 12); 631 + long ret; 632 + pid_t parent = getppid(); 633 + 634 + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 635 + ASSERT_EQ(0, ret); 636 + 637 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_first); 638 + ASSERT_EQ(0, ret); 639 + 640 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_second); 641 + ASSERT_EQ(0, ret); 642 + 643 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog_third); 644 + ASSERT_EQ(0, ret); 645 + 646 + EXPECT_EQ(parent, syscall(__NR_getppid)); 647 + EXPECT_EQ(-1, read(0, NULL, 0)); 648 + EXPECT_EQ(12, errno); 805 649 } 806 650 807 651 FIXTURE_DATA(TRAP) { ··· 943 735 944 736 FIXTURE_DATA(precedence) { 945 737 struct sock_fprog allow; 738 + struct sock_fprog log; 946 739 struct sock_fprog trace; 947 740 struct sock_fprog error; 948 741 struct sock_fprog trap; ··· 954 745 { 955 746 struct sock_filter allow_insns[] = { 956 747 BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 748 + }; 749 + struct sock_filter log_insns[] = { 750 + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 751 + offsetof(struct seccomp_data, nr)), 752 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getpid, 1, 0), 753 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 754 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_LOG), 957 755 }; 958 756 struct sock_filter trace_insns[] = { 959 757 BPF_STMT(BPF_LD|BPF_W|BPF_ABS, ··· 998 782 memcpy(self->_x.filter, &_x##_insns, sizeof(_x##_insns)); \ 999 783 self->_x.len = (unsigned short)ARRAY_SIZE(_x##_insns) 1000 784 FILTER_ALLOC(allow); 785 + FILTER_ALLOC(log); 1001 786 FILTER_ALLOC(trace); 1002 787 FILTER_ALLOC(error); 1003 788 FILTER_ALLOC(trap); ··· 1009 792 { 1010 793 #define FILTER_FREE(_x) if (self->_x.filter) free(self->_x.filter) 1011 794 FILTER_FREE(allow); 795 + FILTER_FREE(log); 1012 796 FILTER_FREE(trace); 1013 797 FILTER_FREE(error); 1014 798 FILTER_FREE(trap); ··· 1026 808 ASSERT_EQ(0, ret); 1027 809 1028 810 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow); 811 + ASSERT_EQ(0, ret); 812 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 1029 813 ASSERT_EQ(0, ret); 1030 814 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace); 1031 815 ASSERT_EQ(0, ret); ··· 1052 832 ASSERT_EQ(0, ret); 1053 833 1054 834 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow); 835 + ASSERT_EQ(0, ret); 836 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 1055 837 ASSERT_EQ(0, ret); 1056 838 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace); 1057 839 ASSERT_EQ(0, ret); ··· 1086 864 ASSERT_EQ(0, ret); 1087 865 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error); 1088 866 ASSERT_EQ(0, ret); 867 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 868 + ASSERT_EQ(0, ret); 1089 869 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace); 1090 870 ASSERT_EQ(0, ret); 1091 871 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trap); ··· 1108 884 ASSERT_EQ(0, ret); 1109 885 1110 886 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow); 887 + ASSERT_EQ(0, ret); 888 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 1111 889 ASSERT_EQ(0, ret); 1112 890 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace); 1113 891 ASSERT_EQ(0, ret); ··· 1136 910 ASSERT_EQ(0, ret); 1137 911 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trap); 1138 912 ASSERT_EQ(0, ret); 913 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 914 + ASSERT_EQ(0, ret); 1139 915 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace); 1140 916 ASSERT_EQ(0, ret); 1141 917 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error); ··· 1159 931 1160 932 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow); 1161 933 ASSERT_EQ(0, ret); 934 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 935 + ASSERT_EQ(0, ret); 1162 936 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace); 1163 937 ASSERT_EQ(0, ret); 1164 938 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error); ··· 1179 949 ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 1180 950 ASSERT_EQ(0, ret); 1181 951 952 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 953 + ASSERT_EQ(0, ret); 1182 954 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error); 1183 955 ASSERT_EQ(0, ret); 1184 956 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace); ··· 1203 971 1204 972 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow); 1205 973 ASSERT_EQ(0, ret); 974 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 975 + ASSERT_EQ(0, ret); 1206 976 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace); 1207 977 ASSERT_EQ(0, ret); 1208 978 /* Should work just fine. */ ··· 1226 992 ASSERT_EQ(0, ret); 1227 993 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow); 1228 994 ASSERT_EQ(0, ret); 995 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 996 + ASSERT_EQ(0, ret); 1229 997 /* Should work just fine. */ 1230 998 EXPECT_EQ(parent, syscall(__NR_getppid)); 1231 999 /* No ptracer */ 1232 1000 EXPECT_EQ(-1, syscall(__NR_getpid)); 1001 + } 1002 + 1003 + TEST_F(precedence, log_is_fifth) 1004 + { 1005 + pid_t mypid, parent; 1006 + long ret; 1007 + 1008 + mypid = getpid(); 1009 + parent = getppid(); 1010 + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 1011 + ASSERT_EQ(0, ret); 1012 + 1013 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow); 1014 + ASSERT_EQ(0, ret); 1015 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 1016 + ASSERT_EQ(0, ret); 1017 + /* Should work just fine. */ 1018 + EXPECT_EQ(parent, syscall(__NR_getppid)); 1019 + /* Should also work just fine */ 1020 + EXPECT_EQ(mypid, syscall(__NR_getpid)); 1021 + } 1022 + 1023 + TEST_F(precedence, log_is_fifth_in_any_order) 1024 + { 1025 + pid_t mypid, parent; 1026 + long ret; 1027 + 1028 + mypid = getpid(); 1029 + parent = getppid(); 1030 + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 1031 + ASSERT_EQ(0, ret); 1032 + 1033 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->log); 1034 + ASSERT_EQ(0, ret); 1035 + ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow); 1036 + ASSERT_EQ(0, ret); 1037 + /* Should work just fine. */ 1038 + EXPECT_EQ(parent, syscall(__NR_getppid)); 1039 + /* Should also work just fine */ 1040 + EXPECT_EQ(mypid, syscall(__NR_getpid)); 1233 1041 } 1234 1042 1235 1043 #ifndef PTRACE_O_TRACESECCOMP ··· 1538 1262 # error "Do not know how to find your architecture's registers and syscalls" 1539 1263 #endif 1540 1264 1265 + /* When the syscall return can't be changed, stub out the tests for it. */ 1266 + #ifdef SYSCALL_NUM_RET_SHARE_REG 1267 + # define EXPECT_SYSCALL_RETURN(val, action) EXPECT_EQ(-1, action) 1268 + #else 1269 + # define EXPECT_SYSCALL_RETURN(val, action) EXPECT_EQ(val, action) 1270 + #endif 1271 + 1541 1272 /* Use PTRACE_GETREGS and PTRACE_SETREGS when available. This is useful for 1542 1273 * architectures without HAVE_ARCH_TRACEHOOK (e.g. User-mode Linux). 1543 1274 */ ··· 1640 1357 #ifdef SYSCALL_NUM_RET_SHARE_REG 1641 1358 TH_LOG("Can't modify syscall return on this architecture"); 1642 1359 #else 1643 - regs.SYSCALL_RET = 1; 1360 + regs.SYSCALL_RET = EPERM; 1644 1361 #endif 1645 1362 1646 1363 #ifdef HAVE_GETREGS ··· 1709 1426 1710 1427 if (nr == __NR_getpid) 1711 1428 change_syscall(_metadata, tracee, __NR_getppid); 1429 + if (nr == __NR_open) 1430 + change_syscall(_metadata, tracee, -1); 1712 1431 } 1713 1432 1714 1433 FIXTURE_DATA(TRACE_syscall) { ··· 1765 1480 free(self->prog.filter); 1766 1481 } 1767 1482 1483 + TEST_F(TRACE_syscall, ptrace_syscall_redirected) 1484 + { 1485 + /* Swap SECCOMP_RET_TRACE tracer for PTRACE_SYSCALL tracer. */ 1486 + teardown_trace_fixture(_metadata, self->tracer); 1487 + self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, NULL, 1488 + true); 1489 + 1490 + /* Tracer will redirect getpid to getppid. */ 1491 + EXPECT_NE(self->mypid, syscall(__NR_getpid)); 1492 + } 1493 + 1494 + TEST_F(TRACE_syscall, ptrace_syscall_dropped) 1495 + { 1496 + /* Swap SECCOMP_RET_TRACE tracer for PTRACE_SYSCALL tracer. */ 1497 + teardown_trace_fixture(_metadata, self->tracer); 1498 + self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, NULL, 1499 + true); 1500 + 1501 + /* Tracer should skip the open syscall, resulting in EPERM. */ 1502 + EXPECT_SYSCALL_RETURN(EPERM, syscall(__NR_open)); 1503 + } 1504 + 1768 1505 TEST_F(TRACE_syscall, syscall_allowed) 1769 1506 { 1770 1507 long ret; ··· 1827 1520 ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->prog, 0, 0); 1828 1521 ASSERT_EQ(0, ret); 1829 1522 1830 - #ifdef SYSCALL_NUM_RET_SHARE_REG 1831 - /* gettid has been skipped */ 1832 - EXPECT_EQ(-1, syscall(__NR_gettid)); 1833 - #else 1834 1523 /* gettid has been skipped and an altered return value stored. */ 1835 - EXPECT_EQ(1, syscall(__NR_gettid)); 1836 - #endif 1524 + EXPECT_SYSCALL_RETURN(EPERM, syscall(__NR_gettid)); 1837 1525 EXPECT_NE(self->mytid, syscall(__NR_gettid)); 1838 1526 } 1839 1527 ··· 1859 1557 ASSERT_EQ(0, ret); 1860 1558 1861 1559 /* Tracer will redirect getpid to getppid, and we should see EPERM. */ 1560 + errno = 0; 1862 1561 EXPECT_EQ(-1, syscall(__NR_getpid)); 1863 1562 EXPECT_EQ(EPERM, errno); 1864 1563 } ··· 1957 1654 EXPECT_NE(self->mypid, syscall(__NR_getpid)); 1958 1655 } 1959 1656 1960 - #ifndef __NR_seccomp 1961 - # if defined(__i386__) 1962 - # define __NR_seccomp 354 1963 - # elif defined(__x86_64__) 1964 - # define __NR_seccomp 317 1965 - # elif defined(__arm__) 1966 - # define __NR_seccomp 383 1967 - # elif defined(__aarch64__) 1968 - # define __NR_seccomp 277 1969 - # elif defined(__hppa__) 1970 - # define __NR_seccomp 338 1971 - # elif defined(__powerpc__) 1972 - # define __NR_seccomp 358 1973 - # elif defined(__s390__) 1974 - # define __NR_seccomp 348 1975 - # else 1976 - # warning "seccomp syscall number unknown for this architecture" 1977 - # define __NR_seccomp 0xffff 1978 - # endif 1979 - #endif 1980 - 1981 - #ifndef SECCOMP_SET_MODE_STRICT 1982 - #define SECCOMP_SET_MODE_STRICT 0 1983 - #endif 1984 - 1985 - #ifndef SECCOMP_SET_MODE_FILTER 1986 - #define SECCOMP_SET_MODE_FILTER 1 1987 - #endif 1988 - 1989 - #ifndef SECCOMP_FILTER_FLAG_TSYNC 1990 - #define SECCOMP_FILTER_FLAG_TSYNC 1 1991 - #endif 1992 - 1993 - #ifndef seccomp 1994 - int seccomp(unsigned int op, unsigned int flags, void *args) 1995 - { 1996 - errno = 0; 1997 - return syscall(__NR_seccomp, op, flags, args); 1998 - } 1999 - #endif 2000 - 2001 1657 TEST(seccomp_syscall) 2002 1658 { 2003 1659 struct sock_filter filter[] = { ··· 2042 1780 ret = seccomp(SECCOMP_SET_MODE_STRICT, 0, NULL); 2043 1781 EXPECT_EQ(EINVAL, errno) { 2044 1782 TH_LOG("Switched to mode strict!"); 1783 + } 1784 + } 1785 + 1786 + /* 1787 + * Test detection of known and unknown filter flags. Userspace needs to be able 1788 + * to check if a filter flag is supported by the current kernel and a good way 1789 + * of doing that is by attempting to enter filter mode, with the flag bit in 1790 + * question set, and a NULL pointer for the _args_ parameter. EFAULT indicates 1791 + * that the flag is valid and EINVAL indicates that the flag is invalid. 1792 + */ 1793 + TEST(detect_seccomp_filter_flags) 1794 + { 1795 + unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC, 1796 + SECCOMP_FILTER_FLAG_LOG }; 1797 + unsigned int flag, all_flags; 1798 + int i; 1799 + long ret; 1800 + 1801 + /* Test detection of known-good filter flags */ 1802 + for (i = 0, all_flags = 0; i < ARRAY_SIZE(flags); i++) { 1803 + flag = flags[i]; 1804 + ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL); 1805 + ASSERT_NE(ENOSYS, errno) { 1806 + TH_LOG("Kernel does not support seccomp syscall!"); 1807 + } 1808 + EXPECT_EQ(-1, ret); 1809 + EXPECT_EQ(EFAULT, errno) { 1810 + TH_LOG("Failed to detect that a known-good filter flag (0x%X) is supported!", 1811 + flag); 1812 + } 1813 + 1814 + all_flags |= flag; 1815 + } 1816 + 1817 + /* Test detection of all known-good filter flags */ 1818 + ret = seccomp(SECCOMP_SET_MODE_FILTER, all_flags, NULL); 1819 + EXPECT_EQ(-1, ret); 1820 + EXPECT_EQ(EFAULT, errno) { 1821 + TH_LOG("Failed to detect that all known-good filter flags (0x%X) are supported!", 1822 + all_flags); 1823 + } 1824 + 1825 + /* Test detection of an unknown filter flag */ 1826 + flag = -1; 1827 + ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL); 1828 + EXPECT_EQ(-1, ret); 1829 + EXPECT_EQ(EINVAL, errno) { 1830 + TH_LOG("Failed to detect that an unknown filter flag (0x%X) is unsupported!", 1831 + flag); 1832 + } 1833 + 1834 + /* 1835 + * Test detection of an unknown filter flag that may simply need to be 1836 + * added to this test 1837 + */ 1838 + flag = flags[ARRAY_SIZE(flags) - 1] << 1; 1839 + ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL); 1840 + EXPECT_EQ(-1, ret); 1841 + EXPECT_EQ(EINVAL, errno) { 1842 + TH_LOG("Failed to detect that an unknown filter flag (0x%X) is unsupported! Does a new flag need to be added to this test?", 1843 + flag); 2045 1844 } 2046 1845 } 2047 1846 ··· 2744 2421 _metadata->passed = 0; 2745 2422 } 2746 2423 2424 + TEST_SIGNAL(filter_flag_log, SIGSYS) 2425 + { 2426 + struct sock_filter allow_filter[] = { 2427 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 2428 + }; 2429 + struct sock_filter kill_filter[] = { 2430 + BPF_STMT(BPF_LD|BPF_W|BPF_ABS, 2431 + offsetof(struct seccomp_data, nr)), 2432 + BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getpid, 0, 1), 2433 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL), 2434 + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), 2435 + }; 2436 + struct sock_fprog allow_prog = { 2437 + .len = (unsigned short)ARRAY_SIZE(allow_filter), 2438 + .filter = allow_filter, 2439 + }; 2440 + struct sock_fprog kill_prog = { 2441 + .len = (unsigned short)ARRAY_SIZE(kill_filter), 2442 + .filter = kill_filter, 2443 + }; 2444 + long ret; 2445 + pid_t parent = getppid(); 2446 + 2447 + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); 2448 + ASSERT_EQ(0, ret); 2449 + 2450 + /* Verify that the FILTER_FLAG_LOG flag isn't accepted in strict mode */ 2451 + ret = seccomp(SECCOMP_SET_MODE_STRICT, SECCOMP_FILTER_FLAG_LOG, 2452 + &allow_prog); 2453 + ASSERT_NE(ENOSYS, errno) { 2454 + TH_LOG("Kernel does not support seccomp syscall!"); 2455 + } 2456 + EXPECT_NE(0, ret) { 2457 + TH_LOG("Kernel accepted FILTER_FLAG_LOG flag in strict mode!"); 2458 + } 2459 + EXPECT_EQ(EINVAL, errno) { 2460 + TH_LOG("Kernel returned unexpected errno for FILTER_FLAG_LOG flag in strict mode!"); 2461 + } 2462 + 2463 + /* Verify that a simple, permissive filter can be added with no flags */ 2464 + ret = seccomp(SECCOMP_SET_MODE_FILTER, 0, &allow_prog); 2465 + EXPECT_EQ(0, ret); 2466 + 2467 + /* See if the same filter can be added with the FILTER_FLAG_LOG flag */ 2468 + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_LOG, 2469 + &allow_prog); 2470 + ASSERT_NE(EINVAL, errno) { 2471 + TH_LOG("Kernel does not support the FILTER_FLAG_LOG flag!"); 2472 + } 2473 + EXPECT_EQ(0, ret); 2474 + 2475 + /* Ensure that the kill filter works with the FILTER_FLAG_LOG flag */ 2476 + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_LOG, 2477 + &kill_prog); 2478 + EXPECT_EQ(0, ret); 2479 + 2480 + EXPECT_EQ(parent, syscall(__NR_getppid)); 2481 + /* getpid() should never return. */ 2482 + EXPECT_EQ(0, syscall(__NR_getpid)); 2483 + } 2484 + 2485 + TEST(get_action_avail) 2486 + { 2487 + __u32 actions[] = { SECCOMP_RET_KILL_THREAD, SECCOMP_RET_TRAP, 2488 + SECCOMP_RET_ERRNO, SECCOMP_RET_TRACE, 2489 + SECCOMP_RET_LOG, SECCOMP_RET_ALLOW }; 2490 + __u32 unknown_action = 0x10000000U; 2491 + int i; 2492 + long ret; 2493 + 2494 + ret = seccomp(SECCOMP_GET_ACTION_AVAIL, 0, &actions[0]); 2495 + ASSERT_NE(ENOSYS, errno) { 2496 + TH_LOG("Kernel does not support seccomp syscall!"); 2497 + } 2498 + ASSERT_NE(EINVAL, errno) { 2499 + TH_LOG("Kernel does not support SECCOMP_GET_ACTION_AVAIL operation!"); 2500 + } 2501 + EXPECT_EQ(ret, 0); 2502 + 2503 + for (i = 0; i < ARRAY_SIZE(actions); i++) { 2504 + ret = seccomp(SECCOMP_GET_ACTION_AVAIL, 0, &actions[i]); 2505 + EXPECT_EQ(ret, 0) { 2506 + TH_LOG("Expected action (0x%X) not available!", 2507 + actions[i]); 2508 + } 2509 + } 2510 + 2511 + /* Check that an unknown action is handled properly (EOPNOTSUPP) */ 2512 + ret = seccomp(SECCOMP_GET_ACTION_AVAIL, 0, &unknown_action); 2513 + EXPECT_EQ(ret, -1); 2514 + EXPECT_EQ(errno, EOPNOTSUPP); 2515 + } 2516 + 2747 2517 /* 2748 2518 * TODO: 2749 2519 * - add microbenchmarks ··· 2845 2429 * - endianness checking when appropriate 2846 2430 * - 64-bit arg prodding 2847 2431 * - arch value testing (x86 modes especially) 2432 + * - verify that FILTER_FLAG_LOG filters generate log messages 2433 + * - verify that RET_LOG generates log messages 2848 2434 * - ... 2849 2435 */ 2850 2436