Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail

Valentine reports that their guests fail to boot correctly, losing
interrupts, and indicates that the wrong interrupt gets deactivated.

What happens here is that if the maintenance interrupt is slow enough
to kick us out of the guest, extra interrupts can be activated from
the LRs. We then exit and proceed to handle EOIcount deactivations,
picking active interrupts from the AP list. But we start from the
top of the list, potentially deactivating interrupts that were in
the LRs, while EOIcount only denotes deactivation of interrupts that
are not present in an LR.

Solve this by tracking the last interrupt that made it in the LRs,
and start the EOIcount deactivation walk *after* that interrupt.
Since this only makes sense while the vcpu is loaded, stash this
in the per-CPU host state.

Huge thanks to Valentine for doing all the detective work and
providing an initial patch.

Fixes: 3cfd59f81e0f3 ("KVM: arm64: GICv3: Handle LR overflow when EOImode==0")
Fixes: 281c6c06e2a7b ("KVM: arm64: GICv2: Handle LR overflow when EOImode==0")
Reported-by: Valentine Burley <valentine.burley@collabora.com>
Tested-by: Valentine Burley <valentine.burley@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20260307115955.369455-1-valentine.burley@collabora.com
Link: https://patch.msgid.link/20260307191151.3781182-1-maz@kernel.org
Cc: stable@vger.kernel.org

+17 -8
+3
arch/arm64/include/asm/kvm_host.h
··· 784 784 /* Number of debug breakpoints/watchpoints for this CPU (minus 1) */ 785 785 unsigned int debug_brps; 786 786 unsigned int debug_wrps; 787 + 788 + /* Last vgic_irq part of the AP list recorded in an LR */ 789 + struct vgic_irq *last_lr_irq; 787 790 }; 788 791 789 792 struct kvm_host_psci_config {
+2 -2
arch/arm64/kvm/vgic/vgic-v2.c
··· 115 115 struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; 116 116 struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2; 117 117 u32 eoicount = FIELD_GET(GICH_HCR_EOICOUNT, cpuif->vgic_hcr); 118 - struct vgic_irq *irq; 118 + struct vgic_irq *irq = *host_data_ptr(last_lr_irq); 119 119 120 120 DEBUG_SPINLOCK_BUG_ON(!irqs_disabled()); 121 121 ··· 123 123 vgic_v2_fold_lr(vcpu, cpuif->vgic_lr[lr]); 124 124 125 125 /* See the GICv3 equivalent for the EOIcount handling rationale */ 126 - list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { 126 + list_for_each_entry_continue(irq, &vgic_cpu->ap_list_head, ap_list) { 127 127 u32 lr; 128 128 129 129 if (!eoicount) {
+6 -6
arch/arm64/kvm/vgic/vgic-v3.c
··· 148 148 struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; 149 149 struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3; 150 150 u32 eoicount = FIELD_GET(ICH_HCR_EL2_EOIcount, cpuif->vgic_hcr); 151 - struct vgic_irq *irq; 151 + struct vgic_irq *irq = *host_data_ptr(last_lr_irq); 152 152 153 153 DEBUG_SPINLOCK_BUG_ON(!irqs_disabled()); 154 154 ··· 158 158 /* 159 159 * EOIMode=0: use EOIcount to emulate deactivation. We are 160 160 * guaranteed to deactivate in reverse order of the activation, so 161 - * just pick one active interrupt after the other in the ap_list, 162 - * and replay the deactivation as if the CPU was doing it. We also 163 - * rely on priority drop to have taken place, and the list to be 164 - * sorted by priority. 161 + * just pick one active interrupt after the other in the tail part 162 + * of the ap_list, past the LRs, and replay the deactivation as if 163 + * the CPU was doing it. We also rely on priority drop to have taken 164 + * place, and the list to be sorted by priority. 165 165 */ 166 - list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { 166 + list_for_each_entry_continue(irq, &vgic_cpu->ap_list_head, ap_list) { 167 167 u64 lr; 168 168 169 169 /*
+6
arch/arm64/kvm/vgic/vgic.c
··· 814 814 815 815 static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu) 816 816 { 817 + if (!*host_data_ptr(last_lr_irq)) 818 + return; 819 + 817 820 if (kvm_vgic_global_state.type == VGIC_V2) 818 821 vgic_v2_fold_lr_state(vcpu); 819 822 else ··· 963 960 if (irqs_outside_lrs(&als)) 964 961 vgic_sort_ap_list(vcpu); 965 962 963 + *host_data_ptr(last_lr_irq) = NULL; 964 + 966 965 list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { 967 966 scoped_guard(raw_spinlock, &irq->irq_lock) { 968 967 if (likely(vgic_target_oracle(irq) == vcpu)) { 969 968 vgic_populate_lr(vcpu, irq, count++); 969 + *host_data_ptr(last_lr_irq) = irq; 970 970 } 971 971 } 972 972