Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'improve-gbeth-performance-on-renesas-rz-g2l-and-related-socs'

Paul Barker says:

====================
Improve GbEth performance on Renesas RZ/G2L and related SoCs

This series aims to improve performance of the GbEth IP in the Renesas
RZ/G2L SoC family and the RZ/G3S SoC, which use the ravb driver. Along
the way, we do some refactoring and ensure that napi_complete_done() is
used in accordance with the NAPI documentation for both GbEth and R-Car
code paths.

Much of the performance improvement comes from enabling SW IRQ
Coalescing for all SoCs using the GbEth IP, and NAPI Threaded mode for
single core SoCs using the GbEth IP. These can be enabled/disabled at
runtime via sysfs, but our goal is to set sensible defaults which get
good performance on the affected SoCs.

The rest of the performance improvement comes from using a page pool to
allocate RX buffers, and reducing the allocation size from >8kB to 2kB.

The overall performance impact of this patch series seen in testing with
iperf3 is as follows (see patches 5-7 for more detailed results):
* RZ/G2L:
* TCP TX: +1.8% bandwidth
* TCP RX: +1% bandwidth at 47% less CPU load
* UDP RX: +1% bandwidth at 26% less CPU load

* RZ/G2UL:
* TCP TX: +37% bandwidth
* TCP RX: +43% bandwidth
* UDP TX: -8% bandwidth
* UDP RX: +32500% bandwidth (!)

* RZ/G3S:
* TCP TX: +25% bandwidth
* TCP RX: +76% bandwidth
* UDP TX: -9% bandwidth
* UDP RX: +37900% bandwidth (!)

* RZ/Five:
* TCP TX: +18% bandwidth
* TCP RX: +212% bandwidth
* UDP TX: +2% bandwidth
* UDP RX: +inf bandwidth (test no longer crashes)

There is no significant impact on bandwidth or CPU load in testing on
RZ/G2H or R-Car M3N.

Fixing the crash in UDP RX testing for RZ/Five is a cumulative effect of
patches 1, 2, 5 & 6 so this is very difficult to break out as a bugfix
for backporting.

Changes v4->v5:
* Added Sergey's Reviewed-by tags.
* Improved the commit message for patch 2/7.
* Re-wrapped to 80 cols, except where this would significantly impact
readability.
* Use lower case `skb` consistently in comments.
* Included <net/page_pool/types.h> in ravb.h.
* Moved rx_buffer_size so it is in the same place in ravb_hw_info as
rx_max_desc_use was previously.
* Used reverse xmas tree ordering in variable declarations.
* Split lines after binary operators, instead of before.
* Factor subtraction of sizeof(__sum16) out of the if condition in
ravb_rx_csum_gbeth().
* Add blank lines after variable declarations where needed.
* Used goto instead of break to handle napi_build_skb() failure in
ravb_rx_gbeth(). Break was incorrectly scoped to the surrounding
switch statement, when it's the outer loop we really want to break
out of.
* Used continue instead of break to handle NULL priv->rx_1st_skb in
ravb_rx_gbeth() as we may still be able to process further
descriptors.
* Unconditionally set priv->rx_1st_skb = NULL after processing a
packet in ravb_rx_gbeth(). We don't need to check die_dt as this
will be a no-op for single descriptor packets.
* Moved napi_build_skb() call after dma_sync_single_for_cpu() in
ravb_rx_rcar() to align the order of operations with ravb_rx_gbeth()
and ensure the data is sync'd before it is accessed.
* Moved zeroing of rx_buff->page to the end of packet processing in
ravb_rx_rcar() to align the order of operations with
ravb_rx_gbeth().

Changes v3->v4:
* Dependency patches have merged so this is no longer an RFC.
* Fixed update of stats->rx_packets.
* Simplified refactoring following feedback from Niklas and Sergey.
* Renamed needs_irq_coalesce -> coalesce_irqs.
* Used a separate page pool for each RX queue.
* Passed struct ravb_rx_desc to ravb_alloc_rx_buffer() so that we can
simplify the calling function.
* Explained the calculation of rx_desc->ds_cc.
* Added handling of nonlinear SKBs in ravb_rx_csum_gbeth().
* Used Niklas' suggested commit message for patch 2/7.
* Added Sergey's Reviewed-by tags to patches 5/7 and 6/7.

Changes v2->v3:
* Incorporated feedback on RFC v2 from Sergey.
* Split out bugfixes and rebased. This changed the order of what was
the first 5 patches of v2 and things look a little different so I've
not picked up Reviewed-by tags from v2.
* Further refactoring and tidy up of RX ring refill and
ravb_rx_gbeth().
* Switched to using a page pool to allocate RX buffers.
* Re-tested and provided updated performance figures.

Changes v1->v2:
* Marked as RFC as the series depends on unmerged patches.
* Refactored R-Car code paths as well as GbEth code paths.
* Updated references to the patches this series depends on.
====================

Link: https://lore.kernel.org/r/20240604072825.7490-1-paul.barker.ct@bp.renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

+271 -222
+11 -3
drivers/net/ethernet/renesas/ravb.h
··· 19 19 #include <linux/phy.h> 20 20 #include <linux/platform_device.h> 21 21 #include <linux/ptp_clock_kernel.h> 22 + #include <net/page_pool/types.h> 22 23 23 24 #define BE_TX_RING_SIZE 64 /* TX ring size for Best Effort */ 24 25 #define BE_RX_RING_SIZE 1024 /* RX ring size for Best Effort */ ··· 1040 1039 }; 1041 1040 1042 1041 struct ravb_hw_info { 1043 - bool (*receive)(struct net_device *ndev, int *quota, int q); 1042 + int (*receive)(struct net_device *ndev, int budget, int q); 1044 1043 void (*set_rate)(struct net_device *ndev); 1045 1044 int (*set_feature)(struct net_device *ndev, netdev_features_t features); 1046 1045 int (*dmac_init)(struct net_device *ndev); ··· 1052 1051 int stats_len; 1053 1052 u32 tccr_mask; 1054 1053 u32 rx_max_frame_size; 1055 - u32 rx_max_desc_use; 1054 + u32 rx_buffer_size; 1056 1055 u32 rx_desc_size; 1057 1056 unsigned aligned_tx: 1; 1057 + unsigned coalesce_irqs:1; /* Needs software IRQ coalescing */ 1058 1058 1059 1059 /* hardware features */ 1060 1060 unsigned internal_delay:1; /* AVB-DMAC has internal delays */ ··· 1070 1068 unsigned nc_queues:1; /* AVB-DMAC has RX and TX NC queues */ 1071 1069 unsigned magic_pkt:1; /* E-MAC supports magic packet detection */ 1072 1070 unsigned half_duplex:1; /* E-MAC supports half duplex mode */ 1071 + }; 1072 + 1073 + struct ravb_rx_buffer { 1074 + struct page *page; 1075 + unsigned int offset; 1073 1076 }; 1074 1077 1075 1078 struct ravb_private { ··· 1100 1093 struct ravb_tx_desc *tx_ring[NUM_TX_QUEUE]; 1101 1094 void *tx_align[NUM_TX_QUEUE]; 1102 1095 struct sk_buff *rx_1st_skb; 1103 - struct sk_buff **rx_skb[NUM_RX_QUEUE]; 1096 + struct page_pool *rx_pool[NUM_RX_QUEUE]; 1097 + struct ravb_rx_buffer *rx_buffers[NUM_RX_QUEUE]; 1104 1098 struct sk_buff **tx_skb[NUM_TX_QUEUE]; 1105 1099 u32 rx_over_errors; 1106 1100 u32 rx_fifo_errors;
+260 -219
drivers/net/ethernet/renesas/ravb_main.c
··· 30 30 #include <linux/reset.h> 31 31 #include <linux/math64.h> 32 32 #include <net/ip.h> 33 + #include <net/page_pool/helpers.h> 33 34 34 35 #include "ravb.h" 35 36 ··· 112 111 ravb_write(ndev, GECMR_SPEED_1000, GECMR); 113 112 break; 114 113 } 115 - } 116 - 117 - static struct sk_buff * 118 - ravb_alloc_skb(struct net_device *ndev, const struct ravb_hw_info *info, 119 - gfp_t gfp_mask) 120 - { 121 - struct sk_buff *skb; 122 - u32 reserve; 123 - 124 - skb = __netdev_alloc_skb(ndev, info->rx_max_frame_size + RAVB_ALIGN - 1, 125 - gfp_mask); 126 - if (!skb) 127 - return NULL; 128 - 129 - reserve = (unsigned long)skb->data & (RAVB_ALIGN - 1); 130 - if (reserve) 131 - skb_reserve(skb, RAVB_ALIGN - reserve); 132 - 133 - return skb; 134 114 } 135 115 136 116 /* Get MAC address from the MAC address registers ··· 239 257 { 240 258 struct ravb_private *priv = netdev_priv(ndev); 241 259 unsigned int ring_size; 242 - unsigned int i; 243 260 244 261 if (!priv->rx_ring[q].raw) 245 262 return; 246 263 247 - for (i = 0; i < priv->num_rx_ring[q]; i++) { 248 - struct ravb_rx_desc *desc = ravb_rx_get_desc(priv, q, i); 249 - 250 - if (!dma_mapping_error(ndev->dev.parent, 251 - le32_to_cpu(desc->dptr))) 252 - dma_unmap_single(ndev->dev.parent, 253 - le32_to_cpu(desc->dptr), 254 - priv->info->rx_max_frame_size, 255 - DMA_FROM_DEVICE); 256 - } 257 264 ring_size = priv->info->rx_desc_size * (priv->num_rx_ring[q] + 1); 258 265 dma_free_coherent(ndev->dev.parent, ring_size, priv->rx_ring[q].raw, 259 266 priv->rx_desc_dma[q]); ··· 269 298 priv->tx_ring[q] = NULL; 270 299 } 271 300 272 - /* Free RX skb ringbuffer */ 273 - if (priv->rx_skb[q]) { 274 - for (i = 0; i < priv->num_rx_ring[q]; i++) 275 - dev_kfree_skb(priv->rx_skb[q][i]); 301 + /* Free RX buffers */ 302 + for (i = 0; i < priv->num_rx_ring[q]; i++) { 303 + if (priv->rx_buffers[q][i].page) 304 + page_pool_put_page(priv->rx_pool[q], 305 + priv->rx_buffers[q][i].page, 306 + 0, true); 276 307 } 277 - kfree(priv->rx_skb[q]); 278 - priv->rx_skb[q] = NULL; 308 + kfree(priv->rx_buffers[q]); 309 + priv->rx_buffers[q] = NULL; 310 + page_pool_destroy(priv->rx_pool[q]); 279 311 280 312 /* Free aligned TX buffers */ 281 313 kfree(priv->tx_align[q]); ··· 291 317 priv->tx_skb[q] = NULL; 292 318 } 293 319 294 - static void ravb_rx_ring_format(struct net_device *ndev, int q) 320 + static int 321 + ravb_alloc_rx_buffer(struct net_device *ndev, int q, u32 entry, gfp_t gfp_mask, 322 + struct ravb_rx_desc *rx_desc) 295 323 { 296 324 struct ravb_private *priv = netdev_priv(ndev); 297 - struct ravb_rx_desc *rx_desc; 298 - unsigned int rx_ring_size; 325 + const struct ravb_hw_info *info = priv->info; 326 + struct ravb_rx_buffer *rx_buff; 299 327 dma_addr_t dma_addr; 300 - unsigned int i; 328 + unsigned int size; 301 329 302 - rx_ring_size = priv->info->rx_desc_size * priv->num_rx_ring[q]; 303 - memset(priv->rx_ring[q].raw, 0, rx_ring_size); 304 - /* Build RX ring buffer */ 305 - for (i = 0; i < priv->num_rx_ring[q]; i++) { 306 - /* RX descriptor */ 307 - rx_desc = ravb_rx_get_desc(priv, q, i); 308 - rx_desc->ds_cc = cpu_to_le16(priv->info->rx_max_desc_use); 309 - dma_addr = dma_map_single(ndev->dev.parent, priv->rx_skb[q][i]->data, 310 - priv->info->rx_max_frame_size, 311 - DMA_FROM_DEVICE); 330 + rx_buff = &priv->rx_buffers[q][entry]; 331 + size = info->rx_buffer_size; 332 + rx_buff->page = page_pool_alloc(priv->rx_pool[q], &rx_buff->offset, 333 + &size, gfp_mask); 334 + if (unlikely(!rx_buff->page)) { 312 335 /* We just set the data size to 0 for a failed mapping which 313 336 * should prevent DMA from happening... 314 337 */ 315 - if (dma_mapping_error(ndev->dev.parent, dma_addr)) 316 - rx_desc->ds_cc = cpu_to_le16(0); 317 - rx_desc->dptr = cpu_to_le32(dma_addr); 338 + rx_desc->ds_cc = cpu_to_le16(0); 339 + return -ENOMEM; 340 + } 341 + 342 + dma_addr = page_pool_get_dma_addr(rx_buff->page) + rx_buff->offset; 343 + dma_sync_single_for_device(ndev->dev.parent, dma_addr, 344 + info->rx_buffer_size, DMA_FROM_DEVICE); 345 + rx_desc->dptr = cpu_to_le32(dma_addr); 346 + 347 + /* The end of the RX buffer is used to store skb shared data, so we need 348 + * to ensure that the hardware leaves enough space for this. 349 + */ 350 + rx_desc->ds_cc = cpu_to_le16(info->rx_buffer_size - 351 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) - 352 + ETH_FCS_LEN + sizeof(__sum16)); 353 + return 0; 354 + } 355 + 356 + static u32 357 + ravb_rx_ring_refill(struct net_device *ndev, int q, u32 count, gfp_t gfp_mask) 358 + { 359 + struct ravb_private *priv = netdev_priv(ndev); 360 + struct ravb_rx_desc *rx_desc; 361 + u32 i, entry; 362 + 363 + for (i = 0; i < count; i++) { 364 + entry = (priv->dirty_rx[q] + i) % priv->num_rx_ring[q]; 365 + rx_desc = ravb_rx_get_desc(priv, q, entry); 366 + 367 + if (!priv->rx_buffers[q][entry].page) { 368 + if (unlikely(ravb_alloc_rx_buffer(ndev, q, entry, 369 + gfp_mask, rx_desc))) 370 + break; 371 + } 372 + /* Descriptor type must be set after all the above writes */ 373 + dma_wmb(); 318 374 rx_desc->die_dt = DT_FEMPTY; 319 375 } 320 - rx_desc = ravb_rx_get_desc(priv, q, i); 321 - rx_desc->dptr = cpu_to_le32((u32)priv->rx_desc_dma[q]); 322 - rx_desc->die_dt = DT_LINKFIX; /* type */ 376 + 377 + return i; 323 378 } 324 379 325 380 /* Format skb and descriptor buffer for Ethernet AVB */ ··· 356 353 { 357 354 struct ravb_private *priv = netdev_priv(ndev); 358 355 unsigned int num_tx_desc = priv->num_tx_desc; 356 + struct ravb_rx_desc *rx_desc; 359 357 struct ravb_tx_desc *tx_desc; 360 358 struct ravb_desc *desc; 361 359 unsigned int tx_ring_size = sizeof(*tx_desc) * priv->num_tx_ring[q] * ··· 368 364 priv->dirty_rx[q] = 0; 369 365 priv->dirty_tx[q] = 0; 370 366 371 - ravb_rx_ring_format(ndev, q); 367 + /* Regular RX descriptors have already been initialized by 368 + * ravb_rx_ring_refill(), we just need to initialize the final link 369 + * descriptor. 370 + */ 371 + rx_desc = ravb_rx_get_desc(priv, q, priv->num_rx_ring[q]); 372 + rx_desc->dptr = cpu_to_le32((u32)priv->rx_desc_dma[q]); 373 + rx_desc->die_dt = DT_LINKFIX; /* type */ 372 374 373 375 memset(priv->tx_ring[q], 0, tx_ring_size); 374 376 /* Build TX ring buffer */ ··· 418 408 static int ravb_ring_init(struct net_device *ndev, int q) 419 409 { 420 410 struct ravb_private *priv = netdev_priv(ndev); 421 - const struct ravb_hw_info *info = priv->info; 422 411 unsigned int num_tx_desc = priv->num_tx_desc; 412 + struct page_pool_params params = { 413 + .order = 0, 414 + .flags = PP_FLAG_DMA_MAP, 415 + .pool_size = priv->num_rx_ring[q], 416 + .nid = NUMA_NO_NODE, 417 + .dev = ndev->dev.parent, 418 + .dma_dir = DMA_FROM_DEVICE, 419 + }; 423 420 unsigned int ring_size; 424 - struct sk_buff *skb; 425 - unsigned int i; 421 + u32 num_filled; 426 422 427 - /* Allocate RX and TX skb rings */ 428 - priv->rx_skb[q] = kcalloc(priv->num_rx_ring[q], 429 - sizeof(*priv->rx_skb[q]), GFP_KERNEL); 430 - priv->tx_skb[q] = kcalloc(priv->num_tx_ring[q], 431 - sizeof(*priv->tx_skb[q]), GFP_KERNEL); 432 - if (!priv->rx_skb[q] || !priv->tx_skb[q]) 423 + /* Allocate RX page pool and buffers */ 424 + priv->rx_pool[q] = page_pool_create(&params); 425 + if (IS_ERR(priv->rx_pool[q])) 433 426 goto error; 434 427 435 - for (i = 0; i < priv->num_rx_ring[q]; i++) { 436 - skb = ravb_alloc_skb(ndev, info, GFP_KERNEL); 437 - if (!skb) 438 - goto error; 439 - priv->rx_skb[q][i] = skb; 440 - } 428 + /* Allocate RX buffers */ 429 + priv->rx_buffers[q] = kcalloc(priv->num_rx_ring[q], 430 + sizeof(*priv->rx_buffers[q]), GFP_KERNEL); 431 + if (!priv->rx_buffers[q]) 432 + goto error; 433 + 434 + /* Allocate TX skb rings */ 435 + priv->tx_skb[q] = kcalloc(priv->num_tx_ring[q], 436 + sizeof(*priv->tx_skb[q]), GFP_KERNEL); 437 + if (!priv->tx_skb[q]) 438 + goto error; 439 + 440 + /* Allocate all RX descriptors. */ 441 + if (!ravb_alloc_rx_desc(ndev, q)) 442 + goto error; 443 + 444 + /* Populate RX ring buffer. */ 445 + priv->dirty_rx[q] = 0; 446 + ring_size = priv->info->rx_desc_size * priv->num_rx_ring[q]; 447 + memset(priv->rx_ring[q].raw, 0, ring_size); 448 + num_filled = ravb_rx_ring_refill(ndev, q, priv->num_rx_ring[q], 449 + GFP_KERNEL); 450 + if (num_filled != priv->num_rx_ring[q]) 451 + goto error; 441 452 442 453 if (num_tx_desc > 1) { 443 454 /* Allocate rings for the aligned buffers */ ··· 467 436 if (!priv->tx_align[q]) 468 437 goto error; 469 438 } 470 - 471 - /* Allocate all RX descriptors. */ 472 - if (!ravb_alloc_rx_desc(ndev, q)) 473 - goto error; 474 - 475 - priv->dirty_rx[q] = 0; 476 439 477 440 /* Allocate all TX descriptors. */ 478 441 ring_size = sizeof(struct ravb_tx_desc) * ··· 731 706 732 707 static void ravb_rx_csum_gbeth(struct sk_buff *skb) 733 708 { 709 + struct skb_shared_info *shinfo = skb_shinfo(skb); 734 710 __wsum csum_ip_hdr, csum_proto; 711 + skb_frag_t *last_frag; 735 712 u8 *hw_csum; 736 713 737 714 /* The hardware checksum status is contained in sizeof(__sum16) * 2 = 4 ··· 743 716 if (unlikely(skb->len < sizeof(__sum16) * 2)) 744 717 return; 745 718 746 - hw_csum = skb_tail_pointer(skb) - sizeof(__sum16); 719 + if (skb_is_nonlinear(skb)) { 720 + last_frag = &shinfo->frags[shinfo->nr_frags - 1]; 721 + hw_csum = skb_frag_address(last_frag) + 722 + skb_frag_size(last_frag); 723 + } else { 724 + hw_csum = skb_tail_pointer(skb); 725 + } 726 + 727 + hw_csum -= sizeof(__sum16); 747 728 csum_proto = csum_unfold((__force __sum16)get_unaligned_le16(hw_csum)); 748 729 749 730 hw_csum -= sizeof(__sum16); 750 731 csum_ip_hdr = csum_unfold((__force __sum16)get_unaligned_le16(hw_csum)); 751 - skb_trim(skb, skb->len - 2 * sizeof(__sum16)); 732 + 733 + if (skb_is_nonlinear(skb)) 734 + skb_frag_size_sub(last_frag, 2 * sizeof(__sum16)); 735 + else 736 + skb_trim(skb, skb->len - 2 * sizeof(__sum16)); 752 737 753 738 /* TODO: IPV6 Rx checksum */ 754 739 if (skb->protocol == htons(ETH_P_IP) && !csum_ip_hdr && !csum_proto) ··· 782 743 skb_trim(skb, skb->len - sizeof(__sum16)); 783 744 } 784 745 785 - static struct sk_buff *ravb_get_skb_gbeth(struct net_device *ndev, int entry, 786 - struct ravb_rx_desc *desc) 787 - { 788 - struct ravb_private *priv = netdev_priv(ndev); 789 - struct sk_buff *skb; 790 - 791 - skb = priv->rx_skb[RAVB_BE][entry]; 792 - priv->rx_skb[RAVB_BE][entry] = NULL; 793 - dma_unmap_single(ndev->dev.parent, le32_to_cpu(desc->dptr), 794 - ALIGN(priv->info->rx_max_frame_size, 16), 795 - DMA_FROM_DEVICE); 796 - 797 - return skb; 798 - } 799 - 800 746 /* Packet receive function for Gigabit Ethernet */ 801 - static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q) 747 + static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q) 802 748 { 803 749 struct ravb_private *priv = netdev_priv(ndev); 804 750 const struct ravb_hw_info *info = priv->info; 805 751 struct net_device_stats *stats; 806 752 struct ravb_rx_desc *desc; 807 753 struct sk_buff *skb; 808 - dma_addr_t dma_addr; 809 754 int rx_packets = 0; 810 755 u8 desc_status; 811 756 u16 desc_len; ··· 804 781 for (i = 0; i < limit; i++, priv->cur_rx[q]++) { 805 782 entry = priv->cur_rx[q] % priv->num_rx_ring[q]; 806 783 desc = &priv->rx_ring[q].desc[entry]; 807 - if (rx_packets == *quota || desc->die_dt == DT_FEMPTY) 784 + if (rx_packets == budget || desc->die_dt == DT_FEMPTY) 808 785 break; 809 786 810 787 /* Descriptor type must be checked before all other reads */ ··· 830 807 if (desc_status & MSC_CEEF) 831 808 stats->rx_missed_errors++; 832 809 } else { 810 + struct ravb_rx_buffer *rx_buff; 811 + void *rx_addr; 812 + 813 + rx_buff = &priv->rx_buffers[q][entry]; 814 + rx_addr = page_address(rx_buff->page) + rx_buff->offset; 833 815 die_dt = desc->die_dt & 0xF0; 816 + dma_sync_single_for_cpu(ndev->dev.parent, 817 + le32_to_cpu(desc->dptr), 818 + desc_len, DMA_FROM_DEVICE); 819 + 834 820 switch (die_dt) { 835 821 case DT_FSINGLE: 836 - skb = ravb_get_skb_gbeth(ndev, entry, desc); 822 + case DT_FSTART: 823 + /* Start of packet: Set initial data length. */ 824 + skb = napi_build_skb(rx_addr, 825 + info->rx_buffer_size); 826 + if (unlikely(!skb)) { 827 + stats->rx_errors++; 828 + page_pool_put_page(priv->rx_pool[q], 829 + rx_buff->page, 0, 830 + true); 831 + goto refill; 832 + } 833 + skb_mark_for_recycle(skb); 837 834 skb_put(skb, desc_len); 835 + 836 + /* Save this skb if the packet spans multiple 837 + * descriptors. 838 + */ 839 + if (die_dt == DT_FSTART) 840 + priv->rx_1st_skb = skb; 841 + break; 842 + 843 + case DT_FMID: 844 + case DT_FEND: 845 + /* Continuing a packet: Add this buffer as an RX 846 + * frag. 847 + */ 848 + 849 + /* rx_1st_skb will be NULL if napi_build_skb() 850 + * failed for the first descriptor of a 851 + * multi-descriptor packet. 852 + */ 853 + if (unlikely(!priv->rx_1st_skb)) { 854 + stats->rx_errors++; 855 + page_pool_put_page(priv->rx_pool[q], 856 + rx_buff->page, 0, 857 + true); 858 + 859 + /* We may find a DT_FSINGLE or DT_FSTART 860 + * descriptor in the queue which we can 861 + * process, so don't give up yet. 862 + */ 863 + continue; 864 + } 865 + skb_add_rx_frag(priv->rx_1st_skb, 866 + skb_shinfo(priv->rx_1st_skb)->nr_frags, 867 + rx_buff->page, rx_buff->offset, 868 + desc_len, info->rx_buffer_size); 869 + 870 + /* Set skb to point at the whole packet so that 871 + * we only need one code path for finishing a 872 + * packet. 873 + */ 874 + skb = priv->rx_1st_skb; 875 + } 876 + 877 + switch (die_dt) { 878 + case DT_FSINGLE: 879 + case DT_FEND: 880 + /* Finishing a packet: Determine protocol & 881 + * checksum, hand off to NAPI and update our 882 + * stats. 883 + */ 838 884 skb->protocol = eth_type_trans(skb, ndev); 839 885 if (ndev->features & NETIF_F_RXCSUM) 840 886 ravb_rx_csum_gbeth(skb); 887 + stats->rx_bytes += skb->len; 841 888 napi_gro_receive(&priv->napi[q], skb); 842 889 rx_packets++; 843 - stats->rx_bytes += desc_len; 844 - break; 845 - case DT_FSTART: 846 - priv->rx_1st_skb = ravb_get_skb_gbeth(ndev, entry, desc); 847 - skb_put(priv->rx_1st_skb, desc_len); 848 - break; 849 - case DT_FMID: 850 - skb = ravb_get_skb_gbeth(ndev, entry, desc); 851 - skb_copy_to_linear_data_offset(priv->rx_1st_skb, 852 - priv->rx_1st_skb->len, 853 - skb->data, 854 - desc_len); 855 - skb_put(priv->rx_1st_skb, desc_len); 856 - dev_kfree_skb(skb); 857 - break; 858 - case DT_FEND: 859 - skb = ravb_get_skb_gbeth(ndev, entry, desc); 860 - skb_copy_to_linear_data_offset(priv->rx_1st_skb, 861 - priv->rx_1st_skb->len, 862 - skb->data, 863 - desc_len); 864 - skb_put(priv->rx_1st_skb, desc_len); 865 - dev_kfree_skb(skb); 866 - priv->rx_1st_skb->protocol = 867 - eth_type_trans(priv->rx_1st_skb, ndev); 868 - if (ndev->features & NETIF_F_RXCSUM) 869 - ravb_rx_csum_gbeth(priv->rx_1st_skb); 870 - stats->rx_bytes += priv->rx_1st_skb->len; 871 - napi_gro_receive(&priv->napi[q], 872 - priv->rx_1st_skb); 873 - rx_packets++; 874 - break; 890 + 891 + /* Clear rx_1st_skb so that it will only be 892 + * non-NULL when valid. 893 + */ 894 + priv->rx_1st_skb = NULL; 875 895 } 896 + 897 + /* Mark this RX buffer as consumed. */ 898 + rx_buff->page = NULL; 876 899 } 877 900 } 878 901 902 + refill: 879 903 /* Refill the RX ring buffers. */ 880 - for (; priv->cur_rx[q] - priv->dirty_rx[q] > 0; priv->dirty_rx[q]++) { 881 - entry = priv->dirty_rx[q] % priv->num_rx_ring[q]; 882 - desc = &priv->rx_ring[q].desc[entry]; 883 - desc->ds_cc = cpu_to_le16(priv->info->rx_max_desc_use); 884 - 885 - if (!priv->rx_skb[q][entry]) { 886 - skb = ravb_alloc_skb(ndev, info, GFP_ATOMIC); 887 - if (!skb) 888 - break; 889 - dma_addr = dma_map_single(ndev->dev.parent, 890 - skb->data, 891 - priv->info->rx_max_frame_size, 892 - DMA_FROM_DEVICE); 893 - skb_checksum_none_assert(skb); 894 - /* We just set the data size to 0 for a failed mapping 895 - * which should prevent DMA from happening... 896 - */ 897 - if (dma_mapping_error(ndev->dev.parent, dma_addr)) 898 - desc->ds_cc = cpu_to_le16(0); 899 - desc->dptr = cpu_to_le32(dma_addr); 900 - priv->rx_skb[q][entry] = skb; 901 - } 902 - /* Descriptor type must be set after all the above writes */ 903 - dma_wmb(); 904 - desc->die_dt = DT_FEMPTY; 905 - } 904 + priv->dirty_rx[q] += ravb_rx_ring_refill(ndev, q, 905 + priv->cur_rx[q] - priv->dirty_rx[q], 906 + GFP_ATOMIC); 906 907 907 908 stats->rx_packets += rx_packets; 908 - *quota -= rx_packets; 909 - return *quota == 0; 909 + return rx_packets; 910 910 } 911 911 912 912 /* Packet receive function for Ethernet AVB */ 913 - static bool ravb_rx_rcar(struct net_device *ndev, int *quota, int q) 913 + static int ravb_rx_rcar(struct net_device *ndev, int budget, int q) 914 914 { 915 915 struct ravb_private *priv = netdev_priv(ndev); 916 916 const struct ravb_hw_info *info = priv->info; ··· 941 895 struct ravb_ex_rx_desc *desc; 942 896 unsigned int limit, i; 943 897 struct sk_buff *skb; 944 - dma_addr_t dma_addr; 945 898 struct timespec64 ts; 946 899 int rx_packets = 0; 947 900 u8 desc_status; ··· 951 906 for (i = 0; i < limit; i++, priv->cur_rx[q]++) { 952 907 entry = priv->cur_rx[q] % priv->num_rx_ring[q]; 953 908 desc = &priv->rx_ring[q].ex_desc[entry]; 954 - if (rx_packets == *quota || desc->die_dt == DT_FEMPTY) 909 + if (rx_packets == budget || desc->die_dt == DT_FEMPTY) 955 910 break; 956 911 957 912 /* Descriptor type must be checked before all other reads */ ··· 979 934 stats->rx_missed_errors++; 980 935 } else { 981 936 u32 get_ts = priv->tstamp_rx_ctrl & RAVB_RXTSTAMP_TYPE; 937 + struct ravb_rx_buffer *rx_buff; 938 + void *rx_addr; 982 939 983 - skb = priv->rx_skb[q][entry]; 984 - priv->rx_skb[q][entry] = NULL; 985 - dma_unmap_single(ndev->dev.parent, le32_to_cpu(desc->dptr), 986 - priv->info->rx_max_frame_size, 987 - DMA_FROM_DEVICE); 940 + rx_buff = &priv->rx_buffers[q][entry]; 941 + rx_addr = page_address(rx_buff->page) + rx_buff->offset; 942 + dma_sync_single_for_cpu(ndev->dev.parent, 943 + le32_to_cpu(desc->dptr), 944 + pkt_len, DMA_FROM_DEVICE); 945 + 946 + skb = napi_build_skb(rx_addr, info->rx_buffer_size); 947 + if (unlikely(!skb)) { 948 + stats->rx_errors++; 949 + page_pool_put_page(priv->rx_pool[q], 950 + rx_buff->page, 0, true); 951 + break; 952 + } 953 + skb_mark_for_recycle(skb); 988 954 get_ts &= (q == RAVB_NC) ? 989 955 RAVB_RXTSTAMP_TYPE_V2_L2_EVENT : 990 956 ~RAVB_RXTSTAMP_TYPE_V2_L2_EVENT; ··· 1017 961 napi_gro_receive(&priv->napi[q], skb); 1018 962 rx_packets++; 1019 963 stats->rx_bytes += pkt_len; 964 + 965 + /* Mark this RX buffer as consumed. */ 966 + rx_buff->page = NULL; 1020 967 } 1021 968 } 1022 969 1023 970 /* Refill the RX ring buffers. */ 1024 - for (; priv->cur_rx[q] - priv->dirty_rx[q] > 0; priv->dirty_rx[q]++) { 1025 - entry = priv->dirty_rx[q] % priv->num_rx_ring[q]; 1026 - desc = &priv->rx_ring[q].ex_desc[entry]; 1027 - desc->ds_cc = cpu_to_le16(priv->info->rx_max_desc_use); 1028 - 1029 - if (!priv->rx_skb[q][entry]) { 1030 - skb = ravb_alloc_skb(ndev, info, GFP_ATOMIC); 1031 - if (!skb) 1032 - break; /* Better luck next round. */ 1033 - dma_addr = dma_map_single(ndev->dev.parent, skb->data, 1034 - priv->info->rx_max_frame_size, 1035 - DMA_FROM_DEVICE); 1036 - skb_checksum_none_assert(skb); 1037 - /* We just set the data size to 0 for a failed mapping 1038 - * which should prevent DMA from happening... 1039 - */ 1040 - if (dma_mapping_error(ndev->dev.parent, dma_addr)) 1041 - desc->ds_cc = cpu_to_le16(0); 1042 - desc->dptr = cpu_to_le32(dma_addr); 1043 - priv->rx_skb[q][entry] = skb; 1044 - } 1045 - /* Descriptor type must be set after all the above writes */ 1046 - dma_wmb(); 1047 - desc->die_dt = DT_FEMPTY; 1048 - } 971 + priv->dirty_rx[q] += ravb_rx_ring_refill(ndev, q, 972 + priv->cur_rx[q] - priv->dirty_rx[q], 973 + GFP_ATOMIC); 1049 974 1050 975 stats->rx_packets += rx_packets; 1051 - *quota -= rx_packets; 1052 - return *quota == 0; 976 + return rx_packets; 1053 977 } 1054 978 1055 979 /* Packet receive function for Ethernet AVB */ 1056 - static bool ravb_rx(struct net_device *ndev, int *quota, int q) 980 + static int ravb_rx(struct net_device *ndev, int budget, int q) 1057 981 { 1058 982 struct ravb_private *priv = netdev_priv(ndev); 1059 983 const struct ravb_hw_info *info = priv->info; 1060 984 1061 - return info->receive(ndev, quota, q); 985 + return info->receive(ndev, budget, q); 1062 986 } 1063 987 1064 988 static void ravb_rcv_snd_disable(struct net_device *ndev) ··· 1355 1319 unsigned long flags; 1356 1320 int q = napi - priv->napi; 1357 1321 int mask = BIT(q); 1358 - int quota = budget; 1359 - bool unmask; 1322 + int work_done; 1360 1323 1361 1324 /* Processing RX Descriptor Ring */ 1362 1325 /* Clear RX interrupt */ 1363 1326 ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0); 1364 - unmask = !ravb_rx(ndev, &quota, q); 1327 + work_done = ravb_rx(ndev, budget, q); 1365 1328 1366 1329 /* Processing TX Descriptor Ring */ 1367 1330 spin_lock_irqsave(&priv->lock, flags); ··· 1379 1344 if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) 1380 1345 ndev->stats.rx_fifo_errors = priv->rx_fifo_errors; 1381 1346 1382 - if (!unmask) 1383 - goto out; 1384 - 1385 - napi_complete(napi); 1386 - 1387 - /* Re-enable RX/TX interrupts */ 1388 - spin_lock_irqsave(&priv->lock, flags); 1389 - if (!info->irq_en_dis) { 1390 - ravb_modify(ndev, RIC0, mask, mask); 1391 - ravb_modify(ndev, TIC, mask, mask); 1392 - } else { 1393 - ravb_write(ndev, mask, RIE0); 1394 - ravb_write(ndev, mask, TIE); 1347 + if (work_done < budget && napi_complete_done(napi, work_done)) { 1348 + /* Re-enable RX/TX interrupts */ 1349 + spin_lock_irqsave(&priv->lock, flags); 1350 + if (!info->irq_en_dis) { 1351 + ravb_modify(ndev, RIC0, mask, mask); 1352 + ravb_modify(ndev, TIC, mask, mask); 1353 + } else { 1354 + ravb_write(ndev, mask, RIE0); 1355 + ravb_write(ndev, mask, TIE); 1356 + } 1357 + spin_unlock_irqrestore(&priv->lock, flags); 1395 1358 } 1396 - spin_unlock_irqrestore(&priv->lock, flags); 1397 1359 1398 - out: 1399 - return budget - quota; 1360 + return work_done; 1400 1361 } 1401 1362 1402 1363 static void ravb_set_duplex_gbeth(struct net_device *ndev) ··· 2665 2634 .stats_len = ARRAY_SIZE(ravb_gstrings_stats), 2666 2635 .tccr_mask = TCCR_TSRQ0 | TCCR_TSRQ1 | TCCR_TSRQ2 | TCCR_TSRQ3, 2667 2636 .rx_max_frame_size = SZ_2K, 2668 - .rx_max_desc_use = SZ_2K - ETH_FCS_LEN + sizeof(__sum16), 2637 + .rx_buffer_size = SZ_2K + 2638 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), 2669 2639 .rx_desc_size = sizeof(struct ravb_ex_rx_desc), 2670 2640 .internal_delay = 1, 2671 2641 .tx_counters = 1, ··· 2690 2658 .stats_len = ARRAY_SIZE(ravb_gstrings_stats), 2691 2659 .tccr_mask = TCCR_TSRQ0 | TCCR_TSRQ1 | TCCR_TSRQ2 | TCCR_TSRQ3, 2692 2660 .rx_max_frame_size = SZ_2K, 2693 - .rx_max_desc_use = SZ_2K - ETH_FCS_LEN + sizeof(__sum16), 2661 + .rx_buffer_size = SZ_2K + 2662 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), 2694 2663 .rx_desc_size = sizeof(struct ravb_ex_rx_desc), 2695 2664 .aligned_tx = 1, 2696 2665 .gptp = 1, ··· 2712 2679 .stats_len = ARRAY_SIZE(ravb_gstrings_stats), 2713 2680 .tccr_mask = TCCR_TSRQ0 | TCCR_TSRQ1 | TCCR_TSRQ2 | TCCR_TSRQ3, 2714 2681 .rx_max_frame_size = SZ_2K, 2715 - .rx_max_desc_use = SZ_2K - ETH_FCS_LEN + sizeof(__sum16), 2682 + .rx_buffer_size = SZ_2K + 2683 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), 2716 2684 .rx_desc_size = sizeof(struct ravb_ex_rx_desc), 2717 2685 .multi_irqs = 1, 2718 2686 .err_mgmt_irqs = 1, ··· 2736 2702 .stats_len = ARRAY_SIZE(ravb_gstrings_stats_gbeth), 2737 2703 .tccr_mask = TCCR_TSRQ0, 2738 2704 .rx_max_frame_size = SZ_8K, 2739 - .rx_max_desc_use = 4080, 2705 + .rx_buffer_size = SZ_2K, 2740 2706 .rx_desc_size = sizeof(struct ravb_rx_desc), 2741 2707 .aligned_tx = 1, 2708 + .coalesce_irqs = 1, 2742 2709 .tx_counters = 1, 2743 2710 .carrier_counters = 1, 2744 2711 .half_duplex = 1, ··· 3015 2980 netif_napi_add(ndev, &priv->napi[RAVB_BE], ravb_poll); 3016 2981 if (info->nc_queues) 3017 2982 netif_napi_add(ndev, &priv->napi[RAVB_NC], ravb_poll); 2983 + 2984 + if (info->coalesce_irqs) { 2985 + netdev_sw_irq_coalesce_default_on(ndev); 2986 + if (num_present_cpus() == 1) 2987 + dev_set_threaded(ndev, true); 2988 + } 3018 2989 3019 2990 /* Network device register */ 3020 2991 error = register_netdev(ndev);