Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

net: macb: Shuffle the tx ring before enabling tx

Quanyang observed that when using an NFS rootfs on an AMD ZynqMp board,
the rootfs may take an extended time to recover after a suspend.
Upon investigation, it was determined that the issue originates from a
problem in the macb driver.

According to the Zynq UltraScale TRM [1], when transmit is disabled,
the transmit buffer queue pointer resets to point to the address
specified by the transmit buffer queue base address register.

In the current implementation, the code merely resets `queue->tx_head`
and `queue->tx_tail` to '0'. This approach presents several issues:

- Packets already queued in the tx ring are silently lost,
leading to memory leaks since the associated skbs cannot be released.

- Concurrent write access to `queue->tx_head` and `queue->tx_tail` may
occur from `macb_tx_poll()` or `macb_start_xmit()` when these values
are reset to '0'.

- The transmission may become stuck on a packet that has already been sent
out, with its 'TX_USED' bit set, but has not yet been processed. However,
due to the manipulation of 'queue->tx_head' and 'queue->tx_tail',
`macb_tx_poll()` incorrectly assumes there are no packets to handle
because `queue->tx_head == queue->tx_tail`. This issue is only resolved
when a new packet is placed at this position. This is the root cause of
the prolonged recovery time observed for the NFS root filesystem.

To resolve this issue, shuffle the tx ring and tx skb array so that
the first unsent packet is positioned at the start of the tx ring.
Additionally, ensure that updates to `queue->tx_head` and
`queue->tx_tail` are properly protected with the appropriate lock.

[1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm

Fixes: bf9cf80cab81 ("net: macb: Fix tx/rx malfunction after phy link down and up")
Reported-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260307-zynqmp-v2-1-6ef98a70e1d0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Kevin Hao and committed by
Jakub Kicinski
881a0263 73aefba4

+95 -3
+95 -3
drivers/net/ethernet/cadence/macb_main.c
··· 36 36 #include <linux/tcp.h> 37 37 #include <linux/types.h> 38 38 #include <linux/udp.h> 39 + #include <linux/gcd.h> 39 40 #include <net/pkt_sched.h> 40 41 #include "macb.h" 41 42 ··· 669 668 netif_tx_stop_all_queues(ndev); 670 669 } 671 670 671 + /* Use juggling algorithm to left rotate tx ring and tx skb array */ 672 + static void gem_shuffle_tx_one_ring(struct macb_queue *queue) 673 + { 674 + unsigned int head, tail, count, ring_size, desc_size; 675 + struct macb_tx_skb tx_skb, *skb_curr, *skb_next; 676 + struct macb_dma_desc *desc_curr, *desc_next; 677 + unsigned int i, cycles, shift, curr, next; 678 + struct macb *bp = queue->bp; 679 + unsigned char desc[24]; 680 + unsigned long flags; 681 + 682 + desc_size = macb_dma_desc_get_size(bp); 683 + 684 + if (WARN_ON_ONCE(desc_size > ARRAY_SIZE(desc))) 685 + return; 686 + 687 + spin_lock_irqsave(&queue->tx_ptr_lock, flags); 688 + head = queue->tx_head; 689 + tail = queue->tx_tail; 690 + ring_size = bp->tx_ring_size; 691 + count = CIRC_CNT(head, tail, ring_size); 692 + 693 + if (!(tail % ring_size)) 694 + goto unlock; 695 + 696 + if (!count) { 697 + queue->tx_head = 0; 698 + queue->tx_tail = 0; 699 + goto unlock; 700 + } 701 + 702 + shift = tail % ring_size; 703 + cycles = gcd(ring_size, shift); 704 + 705 + for (i = 0; i < cycles; i++) { 706 + memcpy(&desc, macb_tx_desc(queue, i), desc_size); 707 + memcpy(&tx_skb, macb_tx_skb(queue, i), 708 + sizeof(struct macb_tx_skb)); 709 + 710 + curr = i; 711 + next = (curr + shift) % ring_size; 712 + 713 + while (next != i) { 714 + desc_curr = macb_tx_desc(queue, curr); 715 + desc_next = macb_tx_desc(queue, next); 716 + 717 + memcpy(desc_curr, desc_next, desc_size); 718 + 719 + if (next == ring_size - 1) 720 + desc_curr->ctrl &= ~MACB_BIT(TX_WRAP); 721 + if (curr == ring_size - 1) 722 + desc_curr->ctrl |= MACB_BIT(TX_WRAP); 723 + 724 + skb_curr = macb_tx_skb(queue, curr); 725 + skb_next = macb_tx_skb(queue, next); 726 + memcpy(skb_curr, skb_next, sizeof(struct macb_tx_skb)); 727 + 728 + curr = next; 729 + next = (curr + shift) % ring_size; 730 + } 731 + 732 + desc_curr = macb_tx_desc(queue, curr); 733 + memcpy(desc_curr, &desc, desc_size); 734 + if (i == ring_size - 1) 735 + desc_curr->ctrl &= ~MACB_BIT(TX_WRAP); 736 + if (curr == ring_size - 1) 737 + desc_curr->ctrl |= MACB_BIT(TX_WRAP); 738 + memcpy(macb_tx_skb(queue, curr), &tx_skb, 739 + sizeof(struct macb_tx_skb)); 740 + } 741 + 742 + queue->tx_head = count; 743 + queue->tx_tail = 0; 744 + 745 + /* Make descriptor updates visible to hardware */ 746 + wmb(); 747 + 748 + unlock: 749 + spin_unlock_irqrestore(&queue->tx_ptr_lock, flags); 750 + } 751 + 752 + /* Rotate the queue so that the tail is at index 0 */ 753 + static void gem_shuffle_tx_rings(struct macb *bp) 754 + { 755 + struct macb_queue *queue; 756 + int q; 757 + 758 + for (q = 0, queue = bp->queues; q < bp->num_queues; q++, queue++) 759 + gem_shuffle_tx_one_ring(queue); 760 + } 761 + 672 762 static void macb_mac_link_up(struct phylink_config *config, 673 763 struct phy_device *phy, 674 764 unsigned int mode, phy_interface_t interface, ··· 798 706 ctrl |= MACB_BIT(PAE); 799 707 800 708 for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { 801 - queue->tx_head = 0; 802 - queue->tx_tail = 0; 803 709 queue_writel(queue, IER, 804 710 bp->rx_intr_mask | MACB_TX_INT_FLAGS | MACB_BIT(HRESP)); 805 711 } ··· 811 721 812 722 spin_unlock_irqrestore(&bp->lock, flags); 813 723 814 - if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC)) 724 + if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC)) { 815 725 macb_set_tx_clk(bp, speed); 726 + gem_shuffle_tx_rings(bp); 727 + } 816 728 817 729 /* Enable Rx and Tx; Enable PTP unicast */ 818 730 ctrl = macb_readl(bp, NCR);