Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

net: mscc: ocelot: be resilient to loss of PTP packets during transmission

The Felix DSA driver presents unique challenges that make the simplistic
ocelot PTP TX timestamping procedure unreliable: any transmitted packet
may be lost in hardware before it ever leaves our local system.

This may happen because there is congestion on the DSA conduit, the
switch CPU port or even user port (Qdiscs like taprio may delay packets
indefinitely by design).

The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(),
runs out of timestamp IDs eventually, because it never detects that
packets are lost, and keeps the IDs of the lost packets on hold
indefinitely. The manifestation of the issue once the entire timestamp
ID range becomes busy looks like this in dmesg:

mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp
mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp

At the surface level, we need a timeout timer so that the kernel knows a
timestamp ID is available again. But there is a deeper problem with the
implementation, which is the monotonically increasing ocelot_port->ts_id.
In the presence of packet loss, it will be impossible to detect that and
reuse one of the holes created in the range of free timestamp IDs.

What we actually need is a bitmap of 63 timestamp IDs tracking which one
is available. That is able to use up holes caused by packet loss, but
also gives us a unique opportunity to not implement an actual timer_list
for the timeout timer (very complicated in terms of locking).

We could only declare a timestamp ID stale on demand (lazily), aka when
there's no other timestamp ID available. There are pros and cons to this
approach: the implementation is much more simple than per-packet timers
would be, but most of the stale packets would be quasi-leaked - not
really leaked, but blocked in driver memory, since this algorithm sees
no reason to free them.

An improved technique would be to check for stale timestamp IDs every
time we allocate a new one. Assuming a constant flux of PTP packets,
this avoids stale packets being blocked in memory, but of course,
packets lost at the end of the flux are still blocked until the flux
resumes (nobody left to kick them out).

Since implementing per-packet timers is way too complicated, this should
be good enough.

Testing procedure:

Persistently block traffic class 5 and try to run PTP on it:
$ tc qdisc replace dev swp3 parent root taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0xdf 100000 flags 0x2
[ 126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
$ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3
ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
[ 70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[70.406]: timed out while polling for tx timestamp
ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[70.406]: port 1 (swp3): send peer delay response failed
ptp4l[70.407]: port 1 (swp3): clearing fault immediately
ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1
ptp4l[71.400]: timed out while polling for tx timestamp
ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[71.401]: port 1 (swp3): send peer delay response failed
ptp4l[71.401]: port 1 (swp3): clearing fault immediately
[ 72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2
ptp4l[72.401]: timed out while polling for tx timestamp
ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[72.402]: port 1 (swp3): send peer delay response failed
ptp4l[72.402]: port 1 (swp3): clearing fault immediately
ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3
ptp4l[73.400]: timed out while polling for tx timestamp
ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[73.400]: port 1 (swp3): send peer delay response failed
ptp4l[73.400]: port 1 (swp3): clearing fault immediately
[ 74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4
ptp4l[74.400]: timed out while polling for tx timestamp
ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[74.401]: port 1 (swp3): send peer delay response failed
ptp4l[74.401]: port 1 (swp3): clearing fault immediately
ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
[ 75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[75.410]: timed out while polling for tx timestamp
ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[75.411]: port 1 (swp3): send peer delay response failed
ptp4l[75.411]: port 1 (swp3): clearing fault immediately
(...)

Remove the blocking condition and see that the port recovers:
$ same tc command as above, but use "sched-entry S 0xff" instead
$ same ptp4l command as above
ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
[ 100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
[ 100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost
[ 100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost
[ 100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost
[ 100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost
[ 100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d
ptp4l[104.966]: port 1 (swp3): assuming the grand master role
ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER
[ 105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0
(...)

Notice that in this new usage pattern, a non-congested port should
basically use timestamp ID 0 all the time, progressing to higher numbers
only if there are unacknowledged timestamps in flight. Compare this to
the old usage, where the timestamp ID used to monotonically increase
modulo OCELOT_MAX_PTP_ID.

In terms of implementation, this simplifies the bookkeeping of the
ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse
the list of two-step timestampable skbs for each new packet anyway, the
information can already be computed and does not need to be stored.
Also, ocelot_port->tx_skbs is always accessed under the switch-wide
ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's
lock and can use the unlocked primitives safely.

This problem was actually detected using the tc-taprio offload, and is
causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959)
supports but Ocelot (VSC7514) does not. Thus, I've selected the commit
to blame as the one adding initial timestamping support for the Felix
switch.

Fixes: c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Vladimir Oltean and committed by
Jakub Kicinski
b454abfa 0c53cdb9

+81 -58
+80 -56
drivers/net/ethernet/mscc/ocelot_ptp.c
··· 14 14 #include <soc/mscc/ocelot.h> 15 15 #include "ocelot.h" 16 16 17 + #define OCELOT_PTP_TX_TSTAMP_TIMEOUT (5 * HZ) 18 + 17 19 int ocelot_ptp_gettime64(struct ptp_clock_info *ptp, struct timespec64 *ts) 18 20 { 19 21 struct ocelot *ocelot = container_of(ptp, struct ocelot, ptp_info); ··· 605 603 } 606 604 EXPORT_SYMBOL(ocelot_get_ts_info); 607 605 608 - static int ocelot_port_add_txtstamp_skb(struct ocelot *ocelot, int port, 609 - struct sk_buff *clone) 606 + static struct sk_buff *ocelot_port_dequeue_ptp_tx_skb(struct ocelot *ocelot, 607 + int port, u8 ts_id, 608 + u32 seqid) 610 609 { 611 610 struct ocelot_port *ocelot_port = ocelot->ports[port]; 611 + struct sk_buff *skb, *skb_tmp, *skb_match = NULL; 612 + struct ptp_header *hdr; 612 613 613 614 spin_lock(&ocelot->ts_id_lock); 614 615 615 - if (ocelot_port->ptp_skbs_in_flight == OCELOT_MAX_PTP_ID || 616 - ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) { 616 + skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) { 617 + if (OCELOT_SKB_CB(skb)->ts_id != ts_id) 618 + continue; 619 + 620 + /* Check that the timestamp ID is for the expected PTP 621 + * sequenceId. We don't have to test ptp_parse_header() against 622 + * NULL, because we've pre-validated the packet's ptp_class. 623 + */ 624 + hdr = ptp_parse_header(skb, OCELOT_SKB_CB(skb)->ptp_class); 625 + if (seqid != ntohs(hdr->sequence_id)) 626 + continue; 627 + 628 + __skb_unlink(skb, &ocelot_port->tx_skbs); 629 + ocelot->ptp_skbs_in_flight--; 630 + skb_match = skb; 631 + break; 632 + } 633 + 634 + spin_unlock(&ocelot->ts_id_lock); 635 + 636 + return skb_match; 637 + } 638 + 639 + static int ocelot_port_queue_ptp_tx_skb(struct ocelot *ocelot, int port, 640 + struct sk_buff *clone) 641 + { 642 + struct ocelot_port *ocelot_port = ocelot->ports[port]; 643 + DECLARE_BITMAP(ts_id_in_flight, OCELOT_MAX_PTP_ID); 644 + struct sk_buff *skb, *skb_tmp; 645 + unsigned long n; 646 + 647 + spin_lock(&ocelot->ts_id_lock); 648 + 649 + /* To get a better chance of acquiring a timestamp ID, first flush the 650 + * stale packets still waiting in the TX timestamping queue. They are 651 + * probably lost. 652 + */ 653 + skb_queue_walk_safe(&ocelot_port->tx_skbs, skb, skb_tmp) { 654 + if (time_before(OCELOT_SKB_CB(skb)->ptp_tx_time + 655 + OCELOT_PTP_TX_TSTAMP_TIMEOUT, jiffies)) { 656 + dev_warn_ratelimited(ocelot->dev, 657 + "port %d invalidating stale timestamp ID %u which seems lost\n", 658 + port, OCELOT_SKB_CB(skb)->ts_id); 659 + __skb_unlink(skb, &ocelot_port->tx_skbs); 660 + kfree_skb(skb); 661 + ocelot->ptp_skbs_in_flight--; 662 + } else { 663 + __set_bit(OCELOT_SKB_CB(skb)->ts_id, ts_id_in_flight); 664 + } 665 + } 666 + 667 + if (ocelot->ptp_skbs_in_flight == OCELOT_PTP_FIFO_SIZE) { 617 668 spin_unlock(&ocelot->ts_id_lock); 618 669 return -EBUSY; 619 670 } 620 671 621 - skb_shinfo(clone)->tx_flags |= SKBTX_IN_PROGRESS; 622 - /* Store timestamp ID in OCELOT_SKB_CB(clone)->ts_id */ 623 - OCELOT_SKB_CB(clone)->ts_id = ocelot_port->ts_id; 672 + n = find_first_zero_bit(ts_id_in_flight, OCELOT_MAX_PTP_ID); 673 + if (n == OCELOT_MAX_PTP_ID) { 674 + spin_unlock(&ocelot->ts_id_lock); 675 + return -EBUSY; 676 + } 624 677 625 - ocelot_port->ts_id++; 626 - if (ocelot_port->ts_id == OCELOT_MAX_PTP_ID) 627 - ocelot_port->ts_id = 0; 628 - 629 - ocelot_port->ptp_skbs_in_flight++; 678 + /* Found an available timestamp ID, use it */ 679 + OCELOT_SKB_CB(clone)->ts_id = n; 680 + OCELOT_SKB_CB(clone)->ptp_tx_time = jiffies; 630 681 ocelot->ptp_skbs_in_flight++; 631 - 632 - skb_queue_tail(&ocelot_port->tx_skbs, clone); 682 + __skb_queue_tail(&ocelot_port->tx_skbs, clone); 633 683 634 684 spin_unlock(&ocelot->ts_id_lock); 685 + 686 + dev_dbg_ratelimited(ocelot->dev, "port %d timestamp id %lu\n", port, n); 635 687 636 688 return 0; 637 689 } ··· 742 686 if (!(*clone)) 743 687 return -ENOMEM; 744 688 745 - err = ocelot_port_add_txtstamp_skb(ocelot, port, *clone); 689 + /* Store timestamp ID in OCELOT_SKB_CB(clone)->ts_id */ 690 + err = ocelot_port_queue_ptp_tx_skb(ocelot, port, *clone); 746 691 if (err) { 747 692 kfree_skb(*clone); 748 693 return err; 749 694 } 750 695 696 + skb_shinfo(*clone)->tx_flags |= SKBTX_IN_PROGRESS; 751 697 OCELOT_SKB_CB(skb)->ptp_cmd = ptp_cmd; 752 698 OCELOT_SKB_CB(*clone)->ptp_class = ptp_class; 753 699 } ··· 785 727 spin_unlock_irqrestore(&ocelot->ptp_clock_lock, flags); 786 728 } 787 729 788 - static bool ocelot_validate_ptp_skb(struct sk_buff *clone, u16 seqid) 789 - { 790 - struct ptp_header *hdr; 791 - 792 - hdr = ptp_parse_header(clone, OCELOT_SKB_CB(clone)->ptp_class); 793 - if (WARN_ON(!hdr)) 794 - return false; 795 - 796 - return seqid == ntohs(hdr->sequence_id); 797 - } 798 - 799 730 void ocelot_get_txtstamp(struct ocelot *ocelot) 800 731 { 801 732 int budget = OCELOT_PTP_QUEUE_SZ; 802 733 803 734 while (budget--) { 804 - struct sk_buff *skb, *skb_tmp, *skb_match = NULL; 805 735 struct skb_shared_hwtstamps shhwtstamps; 806 736 u32 val, id, seqid, txport; 807 - struct ocelot_port *port; 737 + struct sk_buff *skb_match; 808 738 struct timespec64 ts; 809 739 810 740 val = ocelot_read(ocelot, SYS_PTP_STATUS); ··· 808 762 txport = SYS_PTP_STATUS_PTP_MESS_TXPORT_X(val); 809 763 seqid = SYS_PTP_STATUS_PTP_MESS_SEQ_ID(val); 810 764 811 - port = ocelot->ports[txport]; 812 - 813 - spin_lock(&ocelot->ts_id_lock); 814 - port->ptp_skbs_in_flight--; 815 - ocelot->ptp_skbs_in_flight--; 816 - spin_unlock(&ocelot->ts_id_lock); 817 - 818 765 /* Retrieve its associated skb */ 819 - try_again: 820 - spin_lock(&port->tx_skbs.lock); 821 - 822 - skb_queue_walk_safe(&port->tx_skbs, skb, skb_tmp) { 823 - if (OCELOT_SKB_CB(skb)->ts_id != id) 824 - continue; 825 - __skb_unlink(skb, &port->tx_skbs); 826 - skb_match = skb; 827 - break; 828 - } 829 - 830 - spin_unlock(&port->tx_skbs.lock); 831 - 832 - if (WARN_ON(!skb_match)) 766 + skb_match = ocelot_port_dequeue_ptp_tx_skb(ocelot, txport, id, 767 + seqid); 768 + if (!skb_match) { 769 + dev_warn_ratelimited(ocelot->dev, 770 + "port %d received TX timestamp (seqid %d, ts id %u) for packet previously declared stale\n", 771 + txport, seqid, id); 833 772 goto next_ts; 834 - 835 - if (!ocelot_validate_ptp_skb(skb_match, seqid)) { 836 - dev_err_ratelimited(ocelot->dev, 837 - "port %d received stale TX timestamp for seqid %d, discarding\n", 838 - txport, seqid); 839 - kfree_skb(skb); 840 - goto try_again; 841 773 } 842 774 843 775 /* Get the h/w timestamp */
+1
include/linux/dsa/ocelot.h
··· 15 15 struct ocelot_skb_cb { 16 16 struct sk_buff *clone; 17 17 unsigned int ptp_class; /* valid only for clones */ 18 + unsigned long ptp_tx_time; /* valid only for clones */ 18 19 u32 tstamp_lo; 19 20 u8 ptp_cmd; 20 21 u8 ts_id;
-2
include/soc/mscc/ocelot.h
··· 778 778 779 779 phy_interface_t phy_mode; 780 780 781 - unsigned int ptp_skbs_in_flight; 782 781 struct sk_buff_head tx_skbs; 783 782 784 783 unsigned int trap_proto; ··· 785 786 u16 mrp_ring_id; 786 787 787 788 u8 ptp_cmd; 788 - u8 ts_id; 789 789 790 790 u8 index; 791 791