Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

scsi: ufs: core: Fix EH failure after W-LUN resume error

When a W-LUN resume fails, its parent devices in the SCSI hierarchy,
including the scsi_target, may be runtime suspended. Subsequently, the
error handler in ufshcd_recover_pm_error() fails to set the W-LUN device
back to active because the parent target is not active. This results in
the following errors:

google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ...
ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000
ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5
...
ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active

Address this by:

1. Ensuring the W-LUN's parent scsi_target is runtime resumed before
attempting to set the W-LUN to active within
ufshcd_recover_pm_error().

2. Explicitly checking for power.runtime_error on the HBA and W-LUN
devices before calling pm_runtime_set_active() to clear the error
state.

3. Adding pm_runtime_get_sync(hba->dev) in
ufshcd_err_handling_prepare() to ensure the HBA itself is active
during error recovery, even if a child device resume failed.

These changes ensure the device power states are managed correctly
during error recovery.

Signed-off-by: Brian Kao <powenkao@google.com>
Tested-by: Brian Kao <powenkao@google.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

authored by

Brian Kao and committed by
Martin K. Petersen
b4bb6daf 82f78acd

+30 -10
+30 -10
drivers/ufs/core/ufshcd.c
··· 6504 6504 6505 6505 static void ufshcd_err_handling_prepare(struct ufs_hba *hba) 6506 6506 { 6507 + /* 6508 + * A WLUN resume failure could potentially lead to the HBA being 6509 + * runtime suspended, so take an extra reference on hba->dev. 6510 + */ 6511 + pm_runtime_get_sync(hba->dev); 6507 6512 ufshcd_rpm_get_sync(hba); 6508 6513 if (pm_runtime_status_suspended(&hba->ufs_device_wlun->sdev_gendev) || 6509 6514 hba->is_sys_suspended) { ··· 6548 6543 if (ufshcd_is_clkscaling_supported(hba)) 6549 6544 ufshcd_clk_scaling_suspend(hba, false); 6550 6545 ufshcd_rpm_put(hba); 6546 + pm_runtime_put(hba->dev); 6551 6547 } 6552 6548 6553 6549 static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba) ··· 6563 6557 #ifdef CONFIG_PM 6564 6558 static void ufshcd_recover_pm_error(struct ufs_hba *hba) 6565 6559 { 6560 + struct scsi_target *starget = hba->ufs_device_wlun->sdev_target; 6566 6561 struct Scsi_Host *shost = hba->host; 6567 6562 struct scsi_device *sdev; 6568 6563 struct request_queue *q; 6569 - int ret; 6564 + bool resume_sdev_queues = false; 6570 6565 6571 6566 hba->is_sys_suspended = false; 6572 - /* 6573 - * Set RPM status of wlun device to RPM_ACTIVE, 6574 - * this also clears its runtime error. 6575 - */ 6576 - ret = pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev); 6577 6567 6578 - /* hba device might have a runtime error otherwise */ 6579 - if (ret) 6580 - ret = pm_runtime_set_active(hba->dev); 6568 + /* 6569 + * Ensure the parent's error status is cleared before proceeding 6570 + * to the child, as the parent must be active to activate the child. 6571 + */ 6572 + if (hba->dev->power.runtime_error) { 6573 + /* hba->dev has no functional parent thus simplily set RPM_ACTIVE */ 6574 + pm_runtime_set_active(hba->dev); 6575 + resume_sdev_queues = true; 6576 + } 6577 + 6578 + if (hba->ufs_device_wlun->sdev_gendev.power.runtime_error) { 6579 + /* 6580 + * starget, parent of wlun, might be suspended if wlun resume failed. 6581 + * Make sure parent is resumed before set child (wlun) active. 6582 + */ 6583 + pm_runtime_get_sync(&starget->dev); 6584 + pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev); 6585 + pm_runtime_put_sync(&starget->dev); 6586 + resume_sdev_queues = true; 6587 + } 6588 + 6581 6589 /* 6582 6590 * If wlun device had runtime error, we also need to resume those 6583 6591 * consumer scsi devices in case any of them has failed to be 6584 6592 * resumed due to supplier runtime resume failure. This is to unblock 6585 6593 * blk_queue_enter in case there are bios waiting inside it. 6586 6594 */ 6587 - if (!ret) { 6595 + if (resume_sdev_queues) { 6588 6596 shost_for_each_device(sdev, shost) { 6589 6597 q = sdev->request_queue; 6590 6598 if (q->dev && (q->rpm_status == RPM_SUSPENDED ||