Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/panthor: Make sure we handle 'unknown group state' case properly

When we check for state values returned by the FW, we only cover part of
the 0:7 range. Make sure we catch FW inconsistencies by adding a default
to the switch statement, and flagging the group state as unknown in that
case.

When an unknown state is detected, we trigger a reset, and consider the
group as unusable after that point, to prevent the potential corruption
from creeping in other places if we continue executing stuff on this
context.

v2:
- Add Steve's R-b
- Fix commit message

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/3b7fd2f2-679e-440c-81cd-42fc2573b515@moroto.mountain/T/#u
Suggested-by: Steven Price <steven.price@arm.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502155248.1430582-1-boris.brezillon@collabora.com

+35 -2
+35 -2
drivers/gpu/drm/panthor/panthor_sched.c
··· 490 490 * Can no longer be scheduled. The only allowed action is a destruction. 491 491 */ 492 492 PANTHOR_CS_GROUP_TERMINATED, 493 + 494 + /** 495 + * @PANTHOR_CS_GROUP_UNKNOWN_STATE: Group is an unknown state. 496 + * 497 + * The FW returned an inconsistent state. The group is flagged unusable 498 + * and can no longer be scheduled. The only allowed action is a 499 + * destruction. 500 + * 501 + * When that happens, we also schedule a FW reset, to start from a fresh 502 + * state. 503 + */ 504 + PANTHOR_CS_GROUP_UNKNOWN_STATE, 493 505 }; 494 506 495 507 /** ··· 1139 1127 struct panthor_fw_csg_iface *csg_iface; 1140 1128 struct panthor_group *group; 1141 1129 enum panthor_group_state new_state, old_state; 1130 + u32 csg_state; 1142 1131 1143 1132 lockdep_assert_held(&ptdev->scheduler->lock); 1144 1133 ··· 1150 1137 return; 1151 1138 1152 1139 old_state = group->state; 1153 - switch (csg_iface->output->ack & CSG_STATE_MASK) { 1140 + csg_state = csg_iface->output->ack & CSG_STATE_MASK; 1141 + switch (csg_state) { 1154 1142 case CSG_STATE_START: 1155 1143 case CSG_STATE_RESUME: 1156 1144 new_state = PANTHOR_CS_GROUP_ACTIVE; ··· 1162 1148 case CSG_STATE_SUSPEND: 1163 1149 new_state = PANTHOR_CS_GROUP_SUSPENDED; 1164 1150 break; 1151 + default: 1152 + /* The unknown state might be caused by a FW state corruption, 1153 + * which means the group metadata can't be trusted anymore, and 1154 + * the SUSPEND operation might propagate the corruption to the 1155 + * suspend buffers. Flag the group state as unknown to make 1156 + * sure it's unusable after that point. 1157 + */ 1158 + drm_err(&ptdev->base, "Invalid state on CSG %d (state=%d)", 1159 + csg_id, csg_state); 1160 + new_state = PANTHOR_CS_GROUP_UNKNOWN_STATE; 1161 + break; 1165 1162 } 1166 1163 1167 1164 if (old_state == new_state) 1168 1165 return; 1166 + 1167 + /* The unknown state might be caused by a FW issue, reset the FW to 1168 + * take a fresh start. 1169 + */ 1170 + if (new_state == PANTHOR_CS_GROUP_UNKNOWN_STATE) 1171 + panthor_device_schedule_reset(ptdev); 1169 1172 1170 1173 if (new_state == PANTHOR_CS_GROUP_SUSPENDED) 1171 1174 csg_slot_sync_queues_state_locked(ptdev, csg_id); ··· 1814 1783 group_can_run(struct panthor_group *group) 1815 1784 { 1816 1785 return group->state != PANTHOR_CS_GROUP_TERMINATED && 1786 + group->state != PANTHOR_CS_GROUP_UNKNOWN_STATE && 1817 1787 !group->destroyed && group->fatal_queues == 0 && 1818 1788 !group->timedout; 1819 1789 } ··· 2589 2557 2590 2558 if (csg_slot->group) { 2591 2559 csgs_upd_ctx_queue_reqs(ptdev, &upd_ctx, i, 2592 - CSG_STATE_SUSPEND, 2560 + group_can_run(csg_slot->group) ? 2561 + CSG_STATE_SUSPEND : CSG_STATE_TERMINATE, 2593 2562 CSG_STATE_MASK); 2594 2563 } 2595 2564 }