Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/msm/a5xx: fix races in preemption evaluation stage

On A5XX GPUs when preemption is used it's invietable to enter a soft
lock-up state in which GPU is stuck at empty ring-buffer doing nothing.
This appears as full UI lockup and not detected as GPU hang (because
it's not). This happens due to not triggering preemption when it was
needed. Sometimes this state can be recovered by some new submit but
generally it won't happen because applications are waiting for old
submits to retire.

One of the reasons why this happens is a race between a5xx_submit and
a5xx_preempt_trigger called from IRQ during submit retire. Former thread
updates ring->cur of previously empty and not current ring right after
latter checks it for emptiness. Then both threads can just exit because
for first one preempt_state wasn't NONE yet and for second one all rings
appeared to be empty.

To prevent such situations from happening we need to establish guarantee
for preempt_trigger to make decision after each submit or retire. To
implement this we serialize preemption initiation using spinlock. If
switch is already in progress we need to re-trigger preemption when it
finishes.

Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <vladimir.lypak@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/612045/
Signed-off-by: Rob Clark <robdclark@chromium.org>

authored by

Vladimir Lypak and committed by
Rob Clark
ce050f30 64fd6d01

+23 -2
+1
drivers/gpu/drm/msm/adreno/a5xx_gpu.h
··· 36 36 uint64_t preempt_iova[MSM_GPU_MAX_RINGS]; 37 37 38 38 atomic_t preempt_state; 39 + spinlock_t preempt_start_lock; 39 40 struct timer_list preempt_timer; 40 41 41 42 struct drm_gem_object *shadow_bo;
+22 -2
drivers/gpu/drm/msm/adreno/a5xx_preempt.c
··· 98 98 return; 99 99 100 100 /* 101 + * Serialize preemption start to ensure that we always make 102 + * decision on latest state. Otherwise we can get stuck in 103 + * lower priority or empty ring. 104 + */ 105 + spin_lock_irqsave(&a5xx_gpu->preempt_start_lock, flags); 106 + 107 + /* 101 108 * Try to start preemption by moving from NONE to START. If 102 109 * unsuccessful, a preemption is already in flight 103 110 */ 104 111 if (!try_preempt_state(a5xx_gpu, PREEMPT_NONE, PREEMPT_START)) 105 - return; 112 + goto out; 106 113 107 114 /* Get the next ring to preempt to */ 108 115 ring = get_next_ring(gpu); ··· 134 127 set_preempt_state(a5xx_gpu, PREEMPT_ABORT); 135 128 update_wptr(gpu, a5xx_gpu->cur_ring); 136 129 set_preempt_state(a5xx_gpu, PREEMPT_NONE); 137 - return; 130 + goto out; 138 131 } 132 + 133 + spin_unlock_irqrestore(&a5xx_gpu->preempt_start_lock, flags); 139 134 140 135 /* Make sure the wptr doesn't update while we're in motion */ 141 136 spin_lock_irqsave(&ring->preempt_lock, flags); ··· 161 152 162 153 /* And actually start the preemption */ 163 154 gpu_write(gpu, REG_A5XX_CP_CONTEXT_SWITCH_CNTL, 1); 155 + return; 156 + 157 + out: 158 + spin_unlock_irqrestore(&a5xx_gpu->preempt_start_lock, flags); 164 159 } 165 160 166 161 void a5xx_preempt_irq(struct msm_gpu *gpu) ··· 201 188 update_wptr(gpu, a5xx_gpu->cur_ring); 202 189 203 190 set_preempt_state(a5xx_gpu, PREEMPT_NONE); 191 + 192 + /* 193 + * Try to trigger preemption again in case there was a submit or 194 + * retire during ring switch 195 + */ 196 + a5xx_preempt_trigger(gpu); 204 197 } 205 198 206 199 void a5xx_preempt_hw_init(struct msm_gpu *gpu) ··· 319 300 } 320 301 } 321 302 303 + spin_lock_init(&a5xx_gpu->preempt_start_lock); 322 304 timer_setup(&a5xx_gpu->preempt_timer, a5xx_preempt_timer, 0); 323 305 }