Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

rseq: Implement rseq_grant_slice_extension()

Provide the actual decision function, which decides whether a time slice
extension is granted in the exit to user mode path when NEED_RESCHED is
evaluated.

The decision is made in two stages. First an inline quick check to avoid
going into the actual decision function. This checks whether:

#1 the functionality is enabled

#2 the exit is a return from interrupt to user mode

#3 any TIF bit, which causes extra work is set. That includes TIF_RSEQ,
which means the task was already scheduled out.

The slow path, which implements the actual user space ABI, is invoked
when:

A) #1 is true, #2 is true and #3 is false

It checks whether user space requested a slice extension by setting
the request bit in the rseq slice_ctrl field. If so, it grants the
extension and stores the slice expiry time, so that the actual exit
code can double check whether the slice is already exhausted before
going back.

B) #1 - #3 are true _and_ a slice extension was granted in a previous
loop iteration

In this case the grant is revoked.

In case that the user space access faults or invalid state is detected, the
task is terminated with SIGSEGV.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251215155709.195303303@linutronix.de

authored by

Thomas Gleixner and committed by
Peter Zijlstra
dfb630f5 7ee58f98

+108
+108
include/linux/rseq_entry.h
··· 42 42 #ifdef CONFIG_RSEQ 43 43 #include <linux/jump_label.h> 44 44 #include <linux/rseq.h> 45 + #include <linux/sched/signal.h> 45 46 #include <linux/uaccess.h> 46 47 47 48 #include <linux/tracepoint-defs.h> ··· 110 109 t->rseq.slice.state.granted = false; 111 110 } 112 111 112 + static __always_inline bool rseq_grant_slice_extension(bool work_pending) 113 + { 114 + struct task_struct *curr = current; 115 + struct rseq_slice_ctrl usr_ctrl; 116 + union rseq_slice_state state; 117 + struct rseq __user *rseq; 118 + 119 + if (!rseq_slice_extension_enabled()) 120 + return false; 121 + 122 + /* If not enabled or not a return from interrupt, nothing to do. */ 123 + state = curr->rseq.slice.state; 124 + state.enabled &= curr->rseq.event.user_irq; 125 + if (likely(!state.state)) 126 + return false; 127 + 128 + rseq = curr->rseq.usrptr; 129 + scoped_user_rw_access(rseq, efault) { 130 + 131 + /* 132 + * Quick check conditions where a grant is not possible or 133 + * needs to be revoked. 134 + * 135 + * 1) Any TIF bit which needs to do extra work aside of 136 + * rescheduling prevents a grant. 137 + * 138 + * 2) A previous rescheduling request resulted in a slice 139 + * extension grant. 140 + */ 141 + if (unlikely(work_pending || state.granted)) { 142 + /* Clear user control unconditionally. No point for checking */ 143 + unsafe_put_user(0U, &rseq->slice_ctrl.all, efault); 144 + rseq_slice_clear_grant(curr); 145 + return false; 146 + } 147 + 148 + unsafe_get_user(usr_ctrl.all, &rseq->slice_ctrl.all, efault); 149 + if (likely(!(usr_ctrl.request))) 150 + return false; 151 + 152 + /* Grant the slice extention */ 153 + usr_ctrl.request = 0; 154 + usr_ctrl.granted = 1; 155 + unsafe_put_user(usr_ctrl.all, &rseq->slice_ctrl.all, efault); 156 + } 157 + 158 + rseq_stat_inc(rseq_stats.s_granted); 159 + 160 + curr->rseq.slice.state.granted = true; 161 + /* Store expiry time for arming the timer on the way out */ 162 + curr->rseq.slice.expires = data_race(rseq_slice_ext_nsecs) + ktime_get_mono_fast_ns(); 163 + /* 164 + * This is racy against a remote CPU setting TIF_NEED_RESCHED in 165 + * several ways: 166 + * 167 + * 1) 168 + * CPU0 CPU1 169 + * clear_tsk() 170 + * set_tsk() 171 + * clear_preempt() 172 + * Raise scheduler IPI on CPU0 173 + * --> IPI 174 + * fold_need_resched() -> Folds correctly 175 + * 2) 176 + * CPU0 CPU1 177 + * set_tsk() 178 + * clear_tsk() 179 + * clear_preempt() 180 + * Raise scheduler IPI on CPU0 181 + * --> IPI 182 + * fold_need_resched() <- NOOP as TIF_NEED_RESCHED is false 183 + * 184 + * #1 is not any different from a regular remote reschedule as it 185 + * sets the previously not set bit and then raises the IPI which 186 + * folds it into the preempt counter 187 + * 188 + * #2 is obviously incorrect from a scheduler POV, but it's not 189 + * differently incorrect than the code below clearing the 190 + * reschedule request with the safety net of the timer. 191 + * 192 + * The important part is that the clearing is protected against the 193 + * scheduler IPI and also against any other interrupt which might 194 + * end up waking up a task and setting the bits in the middle of 195 + * the operation: 196 + * 197 + * clear_tsk() 198 + * ---> Interrupt 199 + * wakeup_on_this_cpu() 200 + * set_tsk() 201 + * set_preempt() 202 + * clear_preempt() 203 + * 204 + * which would be inconsistent state. 205 + */ 206 + scoped_guard(irq) { 207 + clear_tsk_need_resched(curr); 208 + clear_preempt_need_resched(); 209 + } 210 + return true; 211 + 212 + efault: 213 + force_sig(SIGSEGV); 214 + return false; 215 + } 216 + 113 217 #else /* CONFIG_RSEQ_SLICE_EXTENSION */ 114 218 static inline bool rseq_slice_extension_enabled(void) { return false; } 115 219 static inline bool rseq_arm_slice_extension_timer(void) { return false; } 116 220 static inline void rseq_slice_clear_grant(struct task_struct *t) { } 221 + static inline bool rseq_grant_slice_extension(bool work_pending) { return false; } 117 222 #endif /* !CONFIG_RSEQ_SLICE_EXTENSION */ 118 223 119 224 bool rseq_debug_update_user_cs(struct task_struct *t, struct pt_regs *regs, unsigned long csaddr); ··· 778 671 static inline void rseq_irqentry_exit_to_user_mode(void) { } 779 672 static inline void rseq_exit_to_user_mode_legacy(void) { } 780 673 static inline void rseq_debug_syscall_return(struct pt_regs *regs) { } 674 + static inline bool rseq_grant_slice_extension(bool work_pending) { return false; } 781 675 #endif /* !CONFIG_RSEQ */ 782 676 783 677 #endif /* _LINUX_RSEQ_ENTRY_H */