Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

powerpc/64: remove system call instruction emulation

emulate_step() instruction emulation including sc instruction emulation
initially appeared in xmon. It was then moved into sstep.c where kprobes
could use it too, and later hw_breakpoint and uprobes started to use it.

Until uprobes, the only instruction emulation users were for kernel
mode instructions.

- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.

At one point, there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there were any in-tree users. So system call emulation is not required
by the above users.

uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. Userspace system call emulation is broken and
it's not clear it ever worked well.

The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the interrupt
return code takes the kernel->kernel restore path, which does not
restore everything as the system call interrupt handler would expect
coming from userspace. regs->iamr appears to get lost for example,
because the kernel->kernel return does not restore the user iamr.
Accounting such as irqflags tracing and CPU accounting does not get
flipped back to user mode as the system call handler expects, so those
appear to enter the kernel twice without returning to userspace.

These things may be individually fixable with various complication, but
it is a big complexity for unclear real benefit.

Furthermore, it is not possible to single step a system call instruction
since it causes an interrupt. As such, a separate patch disables probing
on system call instructions.

This patch removes system call emulation and disables stepping system
calls.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64']
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a412e3b3791ed83de18704c8d90f492e7a0049c0.1648648712.git.naveen.n.rao@linux.vnet.ibm.com

authored by

Nicholas Piggin and committed by
Michael Ellerman
a553476c 54cdacd7

+10 -46
-10
arch/powerpc/kernel/interrupt_64.S
··· 219 219 */ 220 220 system_call_vectored sigill 0x7ff0 221 221 222 - 223 - /* 224 - * Entered via kernel return set up by kernel/sstep.c, must match entry regs 225 - */ 226 - .globl system_call_vectored_emulate 227 - system_call_vectored_emulate: 228 - _ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate) 229 - li r10,IRQS_ALL_DISABLED 230 - stb r10,PACAIRQSOFTMASK(r13) 231 - b system_call_vectored_common 232 222 #endif /* CONFIG_PPC_BOOK3S */ 233 223 234 224 .balign IFETCH_ALIGN_BYTES
+10 -36
arch/powerpc/lib/sstep.c
··· 15 15 #include <asm/cputable.h> 16 16 #include <asm/disassemble.h> 17 17 18 - extern char system_call_common[]; 19 - extern char system_call_vectored_emulate[]; 20 - 21 18 #ifdef CONFIG_PPC64 22 19 /* Bits in SRR1 that are copied from MSR */ 23 20 #define MSR_MASK 0xffffffff87c0ffffUL ··· 1373 1376 if (branch_taken(word, regs, op)) 1374 1377 op->type |= BRTAKEN; 1375 1378 return 1; 1376 - #ifdef CONFIG_PPC64 1377 1379 case 17: /* sc */ 1378 1380 if ((word & 0xfe2) == 2) 1379 1381 op->type = SYSCALL; ··· 1384 1388 } else 1385 1389 op->type = UNKNOWN; 1386 1390 return 0; 1387 - #endif 1388 1391 case 18: /* b */ 1389 1392 op->type = BRANCH | BRTAKEN; 1390 1393 imm = word & 0x03fffffc; ··· 3638 3643 regs_set_return_msr(regs, (regs->msr & ~op.val) | (val & op.val)); 3639 3644 goto instr_done; 3640 3645 3641 - #ifdef CONFIG_PPC64 3642 3646 case SYSCALL: /* sc */ 3643 3647 /* 3644 - * N.B. this uses knowledge about how the syscall 3645 - * entry code works. If that is changed, this will 3646 - * need to be changed also. 3648 + * Per ISA v3.1, section 7.5.15 'Trace Interrupt', we can't 3649 + * single step a system call instruction: 3650 + * 3651 + * Successful completion for an instruction means that the 3652 + * instruction caused no other interrupt. Thus a Trace 3653 + * interrupt never occurs for a System Call or System Call 3654 + * Vectored instruction, or for a Trap instruction that 3655 + * traps. 3647 3656 */ 3648 - if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) && 3649 - cpu_has_feature(CPU_FTR_REAL_LE) && 3650 - regs->gpr[0] == 0x1ebe) { 3651 - regs_set_return_msr(regs, regs->msr ^ MSR_LE); 3652 - goto instr_done; 3653 - } 3654 - regs->gpr[9] = regs->gpr[13]; 3655 - regs->gpr[10] = MSR_KERNEL; 3656 - regs->gpr[11] = regs->nip + 4; 3657 - regs->gpr[12] = regs->msr & MSR_MASK; 3658 - regs->gpr[13] = (unsigned long) get_paca(); 3659 - regs_set_return_ip(regs, (unsigned long) &system_call_common); 3660 - regs_set_return_msr(regs, MSR_KERNEL); 3661 - return 1; 3662 - 3663 - #ifdef CONFIG_PPC_BOOK3S_64 3657 + return -1; 3664 3658 case SYSCALL_VECTORED_0: /* scv 0 */ 3665 - regs->gpr[9] = regs->gpr[13]; 3666 - regs->gpr[10] = MSR_KERNEL; 3667 - regs->gpr[11] = regs->nip + 4; 3668 - regs->gpr[12] = regs->msr & MSR_MASK; 3669 - regs->gpr[13] = (unsigned long) get_paca(); 3670 - regs_set_return_ip(regs, (unsigned long) &system_call_vectored_emulate); 3671 - regs_set_return_msr(regs, MSR_KERNEL); 3672 - return 1; 3673 - #endif 3674 - 3659 + return -1; 3675 3660 case RFI: 3676 3661 return -1; 3677 - #endif 3678 3662 } 3679 3663 return 0; 3680 3664