iommu/arm-smmu-v3: Improve CMDQ lock fairness and efficiency

The SMMU CMDQ lock is highly contentious when there are multiple CPUs
issuing commands and the queue is nearly full.

The lock has the following states:
- 0: Unlocked
- >0: Shared lock held with count
- INT_MIN+N: Exclusive lock held, where N is the # of shared waiters
- INT_MIN: Exclusive lock held, no shared waiters

When multiple CPUs are polling for space in the queue, they attempt to
grab the exclusive lock to update the cons pointer from the hardware. If
they fail to get the lock, they will spin until either the cons pointer
is updated by another CPU.

The current code allows the possibility of shared lock starvation
if there is a constant stream of CPUs trying to grab the exclusive lock.
This leads to severe latency issues and soft lockups.

Consider the following scenario where CPU1's attempt to acquire the
shared lock is starved by CPU2 and CPU0 contending for the exclusive
lock.

CPU0 (exclusive) | CPU1 (shared) | CPU2 (exclusive) | `cmdq->lock`
--------------------------------------------------------------------------
trylock() //takes | | | 0
| shared_lock() | | INT_MIN
| fetch_inc() | | INT_MIN
| no return | | INT_MIN + 1
| spins // VAL >= 0 | | INT_MIN + 1
unlock() | spins... | | INT_MIN + 1
set_release(0) | spins... | | 0 see[NOTE]
(done) | (sees 0) | trylock() // takes | 0
| *exits loop* | cmpxchg(0, INT_MIN) | 0
| | *cuts in* | INT_MIN
| cmpxchg(0, 1) | | INT_MIN
| fails // != 0 | | INT_MIN
| spins // VAL >= 0 | | INT_MIN
| *starved* | | INT_MIN

[NOTE] The current code resets the exclusive lock to 0 regardless of the
state of the lock. This causes two problems:
1. It opens the possibility of back-to-back exclusive locks and the
downstream effect of starving shared lock.
2. The count of shared lock waiters are lost.

To mitigate this, we release the exclusive lock by only clearing the sign
bit while retaining the shared lock waiter count as a way to avoid
starving the shared lock waiters.

Also deleted cmpxchg loop while trying to acquire the shared lock as it
is not needed. The waiters can see the positive lock count and proceed
immediately after the exclusive lock is released.

Exclusive lock is not starved in that submitters will try exclusive lock
first when new spaces become available.

Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Alexander Grest <Alexander.Grest@microsoft.com>
Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com>
Signed-off-by: Will Deacon <will@kernel.org>

authored by

Alexander Grest and committed by

Will Deacon 5 months ago df180b1a 8f0b4cce

+21 -10

1 changed file

expand all

drivers

iommu

arm

arm-smmu-v3

arm-smmu-v3.c

+21 -10

drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c

··· 487 487 */ 488 488 static void arm_smmu_cmdq_shared_lock(struct arm_smmu_cmdq *cmdq) 489 489 { 490 - int val; 491 - 492 490 /* 493 - * We can try to avoid the cmpxchg() loop by simply incrementing the 494 - * lock counter. When held in exclusive state, the lock counter is set 495 - * to INT_MIN so these increments won't hurt as the value will remain 496 - * negative. 491 + * When held in exclusive state, the lock counter is set to INT_MIN 492 + * so these increments won't hurt as the value will remain negative. 493 + * The increment will also signal the exclusive locker that there are 494 + * shared waiters. 497 495 */ 498 496 if (atomic_fetch_inc_relaxed(&cmdq->lock) >= 0) 499 497 return; 500 498 501 - do { 502 - val = atomic_cond_read_relaxed(&cmdq->lock, VAL >= 0); 503 - } while (atomic_cmpxchg_relaxed(&cmdq->lock, val, val + 1) != val); 499 + /* 500 + * Someone else is holding the lock in exclusive state, so wait 501 + * for them to finish. Since we already incremented the lock counter, 502 + * no exclusive lock can be acquired until we finish. We don't need 503 + * the return value since we only care that the exclusive lock is 504 + * released (i.e. the lock counter is non-negative). 505 + * Once the exclusive locker releases the lock, the sign bit will 506 + * be cleared and our increment will make the lock counter positive, 507 + * allowing us to proceed. 508 + */ 509 + atomic_cond_read_relaxed(&cmdq->lock, VAL > 0); 504 510 } 505 511 506 512 static void arm_smmu_cmdq_shared_unlock(struct arm_smmu_cmdq *cmdq) ··· 533 527 __ret; \ 534 528 }) 535 529 530 + /* 531 + * Only clear the sign bit when releasing the exclusive lock this will 532 + * allow any shared_lock() waiters to proceed without the possibility 533 + * of entering the exclusive lock in a tight loop. 534 + */ 536 535 #define arm_smmu_cmdq_exclusive_unlock_irqrestore(cmdq, flags) \ 537 536 ({ \ 538 - atomic_set_release(&cmdq->lock, 0); \ 537 + atomic_fetch_andnot_release(INT_MIN, &cmdq->lock); \ 539 538 local_irq_restore(flags); \ 540 539 }) 541 540

Configure Feed

Configure Feed