Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'intel-wired-lan-driver-updates-2025-01-06-igb-igc-ixgbe-ixgbevf-i40e-fm10k'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-01-06 (igb, igc, ixgbe, ixgbevf, i40e, fm10k)

For igb:

Sriram Yagnaraman and Kurt Kanzenbach add support for AF_XDP
zero-copy.

Original cover letter:
The first couple of patches adds helper functions to prepare for AF_XDP
zero-copy support which comes in the last couple of patches, one each
for Rx and TX paths.

As mentioned in v1 patchset [0], I don't have access to an actual IGB
device to provide correct performance numbers. I have used Intel 82576EB
emulator in QEMU [1] to test the changes to IGB driver.

The tests use one isolated vCPU for RX/TX and one isolated vCPU for the
xdp-sock application [2]. Hope these measurements provide at the least
some indication on the increase in performance when using ZC, especially
in the TX path. It would be awesome if someone with a real IGB NIC can
test the patch.

AF_XDP performance using 64 byte packets in Kpps.
Benchmark: XDP-SKB XDP-DRV XDP-DRV(ZC)
rxdrop 220 235 350
txpush 1.000 1.000 410
l2fwd 1.000 1.000 200

AF_XDP performance using 1500 byte packets in Kpps.
Benchmark: XDP-SKB XDP-DRV XDP-DRV(ZC)
rxdrop 200 210 310
txpush 1.000 1.000 410
l2fwd 0.900 1.000 160

[0]: https://lore.kernel.org/intel-wired-lan/20230704095915.9750-1-sriram.yagnaraman@est.tech/
[1]: https://www.qemu.org/docs/master/system/devices/igb.html
[2]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example

Subsequent changes and information can be found here:
https://lore.kernel.org/intel-wired-lan/20241018-b4-igb_zero_copy-v9-0-da139d78d796@linutronix.de/

Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.

For igc:

Song Yoong Siang allows for the XDP program to be hot-swapped.

Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.

Joe Damato adds sets IRQ and queues to NAPI instances to allow for
reporting via netdev-genl API.

For ixgbe:

Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.

For ixgbevf:

Yue Haibing converts use of ERR_PTR return to traditional error code
which resolves a smatch warning.

For i40e:

Alex implements "mdd-auto-reset-vf" private flag to automatically reset
VFs when encountering an MDD event.

For fm10k:

Dr. David Alan Gilbert removes an unused function.
====================

Link: https://patch.msgid.link/20250106221929.956999-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+1002 -287
+12
Documentation/networking/device_drivers/ethernet/intel/i40e.rst
··· 299 299 ethtool --show-priv-flags ethX 300 300 ethtool --set-priv-flags ethX link-down-on-close [on|off] 301 301 302 + Setting the mdd-auto-reset-vf Private Flag 303 + ------------------------------------------ 304 + 305 + When the mdd-auto-reset-vf private flag is set to "on", the problematic VF will 306 + be automatically reset if a malformed descriptor is detected. If the flag is 307 + set to "off", the problematic VF will be disabled. 308 + 309 + Use ethtool to view and set mdd-auto-reset-vf, as follows:: 310 + 311 + ethtool --show-priv-flags ethX 312 + ethtool --set-priv-flags ethX mdd-auto-reset-vf [on|off] 313 + 302 314 Viewing Link Messages 303 315 --------------------- 304 316 Link messages will not be displayed to the console if the distribution is
-120
drivers/net/ethernet/intel/fm10k/fm10k_pf.c
··· 1180 1180 } 1181 1181 1182 1182 /** 1183 - * fm10k_iov_msg_mac_vlan_pf - Message handler for MAC/VLAN request from VF 1184 - * @hw: Pointer to hardware structure 1185 - * @results: Pointer array to message, results[0] is pointer to message 1186 - * @mbx: Pointer to mailbox information structure 1187 - * 1188 - * This function is a default handler for MAC/VLAN requests from the VF. 1189 - * The assumption is that in this case it is acceptable to just directly 1190 - * hand off the message from the VF to the underlying shared code. 1191 - **/ 1192 - s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *hw, u32 **results, 1193 - struct fm10k_mbx_info *mbx) 1194 - { 1195 - struct fm10k_vf_info *vf_info = (struct fm10k_vf_info *)mbx; 1196 - u8 mac[ETH_ALEN]; 1197 - u32 *result; 1198 - int err = 0; 1199 - bool set; 1200 - u16 vlan; 1201 - u32 vid; 1202 - 1203 - /* we shouldn't be updating rules on a disabled interface */ 1204 - if (!FM10K_VF_FLAG_ENABLED(vf_info)) 1205 - err = FM10K_ERR_PARAM; 1206 - 1207 - if (!err && !!results[FM10K_MAC_VLAN_MSG_VLAN]) { 1208 - result = results[FM10K_MAC_VLAN_MSG_VLAN]; 1209 - 1210 - /* record VLAN id requested */ 1211 - err = fm10k_tlv_attr_get_u32(result, &vid); 1212 - if (err) 1213 - return err; 1214 - 1215 - set = !(vid & FM10K_VLAN_CLEAR); 1216 - vid &= ~FM10K_VLAN_CLEAR; 1217 - 1218 - /* if the length field has been set, this is a multi-bit 1219 - * update request. For multi-bit requests, simply disallow 1220 - * them when the pf_vid has been set. In this case, the PF 1221 - * should have already cleared the VLAN_TABLE, and if we 1222 - * allowed them, it could allow a rogue VF to receive traffic 1223 - * on a VLAN it was not assigned. In the single-bit case, we 1224 - * need to modify requests for VLAN 0 to use the default PF or 1225 - * SW vid when assigned. 1226 - */ 1227 - 1228 - if (vid >> 16) { 1229 - /* prevent multi-bit requests when PF has 1230 - * administratively set the VLAN for this VF 1231 - */ 1232 - if (vf_info->pf_vid) 1233 - return FM10K_ERR_PARAM; 1234 - } else { 1235 - err = fm10k_iov_select_vid(vf_info, (u16)vid); 1236 - if (err < 0) 1237 - return err; 1238 - 1239 - vid = err; 1240 - } 1241 - 1242 - /* update VSI info for VF in regards to VLAN table */ 1243 - err = hw->mac.ops.update_vlan(hw, vid, vf_info->vsi, set); 1244 - } 1245 - 1246 - if (!err && !!results[FM10K_MAC_VLAN_MSG_MAC]) { 1247 - result = results[FM10K_MAC_VLAN_MSG_MAC]; 1248 - 1249 - /* record unicast MAC address requested */ 1250 - err = fm10k_tlv_attr_get_mac_vlan(result, mac, &vlan); 1251 - if (err) 1252 - return err; 1253 - 1254 - /* block attempts to set MAC for a locked device */ 1255 - if (is_valid_ether_addr(vf_info->mac) && 1256 - !ether_addr_equal(mac, vf_info->mac)) 1257 - return FM10K_ERR_PARAM; 1258 - 1259 - set = !(vlan & FM10K_VLAN_CLEAR); 1260 - vlan &= ~FM10K_VLAN_CLEAR; 1261 - 1262 - err = fm10k_iov_select_vid(vf_info, vlan); 1263 - if (err < 0) 1264 - return err; 1265 - 1266 - vlan = (u16)err; 1267 - 1268 - /* notify switch of request for new unicast address */ 1269 - err = hw->mac.ops.update_uc_addr(hw, vf_info->glort, 1270 - mac, vlan, set, 0); 1271 - } 1272 - 1273 - if (!err && !!results[FM10K_MAC_VLAN_MSG_MULTICAST]) { 1274 - result = results[FM10K_MAC_VLAN_MSG_MULTICAST]; 1275 - 1276 - /* record multicast MAC address requested */ 1277 - err = fm10k_tlv_attr_get_mac_vlan(result, mac, &vlan); 1278 - if (err) 1279 - return err; 1280 - 1281 - /* verify that the VF is allowed to request multicast */ 1282 - if (!(vf_info->vf_flags & FM10K_VF_FLAG_MULTI_ENABLED)) 1283 - return FM10K_ERR_PARAM; 1284 - 1285 - set = !(vlan & FM10K_VLAN_CLEAR); 1286 - vlan &= ~FM10K_VLAN_CLEAR; 1287 - 1288 - err = fm10k_iov_select_vid(vf_info, vlan); 1289 - if (err < 0) 1290 - return err; 1291 - 1292 - vlan = (u16)err; 1293 - 1294 - /* notify switch of request for new multicast address */ 1295 - err = hw->mac.ops.update_mc_addr(hw, vf_info->glort, 1296 - mac, vlan, set); 1297 - } 1298 - 1299 - return err; 1300 - } 1301 - 1302 - /** 1303 1183 * fm10k_iov_supported_xcast_mode_pf - Determine best match for xcast mode 1304 1184 * @vf_info: VF info structure containing capability flags 1305 1185 * @mode: Requested xcast mode
-2
drivers/net/ethernet/intel/fm10k/fm10k_pf.h
··· 99 99 100 100 s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid); 101 101 s32 fm10k_iov_msg_msix_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *); 102 - s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *, u32 **, 103 - struct fm10k_mbx_info *); 104 102 s32 fm10k_iov_msg_lport_state_pf(struct fm10k_hw *, u32 **, 105 103 struct fm10k_mbx_info *); 106 104
+3 -1
drivers/net/ethernet/intel/i40e/i40e.h
··· 88 88 __I40E_SERVICE_SCHED, 89 89 __I40E_ADMINQ_EVENT_PENDING, 90 90 __I40E_MDD_EVENT_PENDING, 91 + __I40E_MDD_VF_PRINT_PENDING, 91 92 __I40E_VFLR_EVENT_PENDING, 92 93 __I40E_RESET_RECOVERY_PENDING, 93 94 __I40E_TIMEOUT_RECOVERY_PENDING, ··· 192 191 */ 193 192 I40E_FLAG_TOTAL_PORT_SHUTDOWN_ENA, 194 193 I40E_FLAG_VF_VLAN_PRUNING_ENA, 194 + I40E_FLAG_MDD_AUTO_RESET_VF, 195 195 I40E_PF_FLAGS_NBITS, /* must be last */ 196 196 }; 197 197 ··· 574 572 int num_alloc_vfs; /* actual number of VFs allocated */ 575 573 u32 vf_aq_requests; 576 574 u32 arq_overflows; /* Not fatal, possibly indicative of problems */ 577 - 575 + struct ratelimit_state mdd_message_rate_limit; 578 576 /* DCBx/DCBNL capability for PF that indicates 579 577 * whether DCBx is managed by firmware or host 580 578 * based agent (LLDPAD). Also, indicates what
+1 -1
drivers/net/ethernet/intel/i40e/i40e_debugfs.c
··· 722 722 dev_info(&pf->pdev->dev, "vf %2d: VSI id=%d, seid=%d, qps=%d\n", 723 723 vf_id, vf->lan_vsi_id, vsi->seid, vf->num_queue_pairs); 724 724 dev_info(&pf->pdev->dev, " num MDD=%lld\n", 725 - vf->num_mdd_events); 725 + vf->mdd_tx_events.count + vf->mdd_rx_events.count); 726 726 } else { 727 727 dev_info(&pf->pdev->dev, "invalid VF id %d\n", vf_id); 728 728 }
+2
drivers/net/ethernet/intel/i40e/i40e_ethtool.c
··· 459 459 I40E_PRIV_FLAG("base-r-fec", I40E_FLAG_BASE_R_FEC, 0), 460 460 I40E_PRIV_FLAG("vf-vlan-pruning", 461 461 I40E_FLAG_VF_VLAN_PRUNING_ENA, 0), 462 + I40E_PRIV_FLAG("mdd-auto-reset-vf", 463 + I40E_FLAG_MDD_AUTO_RESET_VF, 0), 462 464 }; 463 465 464 466 #define I40E_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40e_gstrings_priv_flags)
+94 -13
drivers/net/ethernet/intel/i40e/i40e_main.c
··· 11180 11180 } 11181 11181 11182 11182 /** 11183 + * i40e_print_vf_mdd_event - print VF Tx/Rx malicious driver detect event 11184 + * @pf: board private structure 11185 + * @vf: pointer to the VF structure 11186 + * @is_tx: true - for Tx event, false - for Rx 11187 + */ 11188 + static void i40e_print_vf_mdd_event(struct i40e_pf *pf, struct i40e_vf *vf, 11189 + bool is_tx) 11190 + { 11191 + dev_err(&pf->pdev->dev, is_tx ? 11192 + "%lld Tx Malicious Driver Detection events detected on PF %d VF %d MAC %pm. mdd-auto-reset-vfs=%s\n" : 11193 + "%lld Rx Malicious Driver Detection events detected on PF %d VF %d MAC %pm. mdd-auto-reset-vfs=%s\n", 11194 + is_tx ? vf->mdd_tx_events.count : vf->mdd_rx_events.count, 11195 + pf->hw.pf_id, 11196 + vf->vf_id, 11197 + vf->default_lan_addr.addr, 11198 + str_on_off(test_bit(I40E_FLAG_MDD_AUTO_RESET_VF, pf->flags))); 11199 + } 11200 + 11201 + /** 11202 + * i40e_print_vfs_mdd_events - print VFs malicious driver detect event 11203 + * @pf: pointer to the PF structure 11204 + * 11205 + * Called from i40e_handle_mdd_event to rate limit and print VFs MDD events. 11206 + */ 11207 + static void i40e_print_vfs_mdd_events(struct i40e_pf *pf) 11208 + { 11209 + unsigned int i; 11210 + 11211 + /* check that there are pending MDD events to print */ 11212 + if (!test_and_clear_bit(__I40E_MDD_VF_PRINT_PENDING, pf->state)) 11213 + return; 11214 + 11215 + if (!__ratelimit(&pf->mdd_message_rate_limit)) 11216 + return; 11217 + 11218 + for (i = 0; i < pf->num_alloc_vfs; i++) { 11219 + struct i40e_vf *vf = &pf->vf[i]; 11220 + bool is_printed = false; 11221 + 11222 + /* only print Rx MDD event message if there are new events */ 11223 + if (vf->mdd_rx_events.count != vf->mdd_rx_events.last_printed) { 11224 + vf->mdd_rx_events.last_printed = vf->mdd_rx_events.count; 11225 + i40e_print_vf_mdd_event(pf, vf, false); 11226 + is_printed = true; 11227 + } 11228 + 11229 + /* only print Tx MDD event message if there are new events */ 11230 + if (vf->mdd_tx_events.count != vf->mdd_tx_events.last_printed) { 11231 + vf->mdd_tx_events.last_printed = vf->mdd_tx_events.count; 11232 + i40e_print_vf_mdd_event(pf, vf, true); 11233 + is_printed = true; 11234 + } 11235 + 11236 + if (is_printed && !test_bit(I40E_FLAG_MDD_AUTO_RESET_VF, pf->flags)) 11237 + dev_info(&pf->pdev->dev, 11238 + "Use PF Control I/F to re-enable the VF #%d\n", 11239 + i); 11240 + } 11241 + } 11242 + 11243 + /** 11183 11244 * i40e_handle_mdd_event 11184 11245 * @pf: pointer to the PF structure 11185 11246 * ··· 11254 11193 u32 reg; 11255 11194 int i; 11256 11195 11257 - if (!test_bit(__I40E_MDD_EVENT_PENDING, pf->state)) 11196 + if (!test_and_clear_bit(__I40E_MDD_EVENT_PENDING, pf->state)) { 11197 + /* Since the VF MDD event logging is rate limited, check if 11198 + * there are pending MDD events. 11199 + */ 11200 + i40e_print_vfs_mdd_events(pf); 11258 11201 return; 11202 + } 11259 11203 11260 11204 /* find what triggered the MDD event */ 11261 11205 reg = rd32(hw, I40E_GL_MDET_TX); ··· 11304 11238 11305 11239 /* see if one of the VFs needs its hand slapped */ 11306 11240 for (i = 0; i < pf->num_alloc_vfs && mdd_detected; i++) { 11241 + bool is_mdd_on_tx = false; 11242 + bool is_mdd_on_rx = false; 11243 + 11307 11244 vf = &(pf->vf[i]); 11308 11245 reg = rd32(hw, I40E_VP_MDET_TX(i)); 11309 11246 if (reg & I40E_VP_MDET_TX_VALID_MASK) { 11247 + set_bit(__I40E_MDD_VF_PRINT_PENDING, pf->state); 11310 11248 wr32(hw, I40E_VP_MDET_TX(i), 0xFFFF); 11311 - vf->num_mdd_events++; 11312 - dev_info(&pf->pdev->dev, "TX driver issue detected on VF %d\n", 11313 - i); 11314 - dev_info(&pf->pdev->dev, 11315 - "Use PF Control I/F to re-enable the VF\n"); 11249 + vf->mdd_tx_events.count++; 11316 11250 set_bit(I40E_VF_STATE_DISABLED, &vf->vf_states); 11251 + is_mdd_on_tx = true; 11317 11252 } 11318 11253 11319 11254 reg = rd32(hw, I40E_VP_MDET_RX(i)); 11320 11255 if (reg & I40E_VP_MDET_RX_VALID_MASK) { 11256 + set_bit(__I40E_MDD_VF_PRINT_PENDING, pf->state); 11321 11257 wr32(hw, I40E_VP_MDET_RX(i), 0xFFFF); 11322 - vf->num_mdd_events++; 11323 - dev_info(&pf->pdev->dev, "RX driver issue detected on VF %d\n", 11324 - i); 11325 - dev_info(&pf->pdev->dev, 11326 - "Use PF Control I/F to re-enable the VF\n"); 11258 + vf->mdd_rx_events.count++; 11327 11259 set_bit(I40E_VF_STATE_DISABLED, &vf->vf_states); 11260 + is_mdd_on_rx = true; 11261 + } 11262 + 11263 + if ((is_mdd_on_tx || is_mdd_on_rx) && 11264 + test_bit(I40E_FLAG_MDD_AUTO_RESET_VF, pf->flags)) { 11265 + /* VF MDD event counters will be cleared by 11266 + * reset, so print the event prior to reset. 11267 + */ 11268 + if (is_mdd_on_rx) 11269 + i40e_print_vf_mdd_event(pf, vf, false); 11270 + if (is_mdd_on_tx) 11271 + i40e_print_vf_mdd_event(pf, vf, true); 11272 + 11273 + i40e_vc_reset_vf(vf, true); 11328 11274 } 11329 11275 } 11330 11276 11331 - /* re-enable mdd interrupt cause */ 11332 - clear_bit(__I40E_MDD_EVENT_PENDING, pf->state); 11333 11277 reg = rd32(hw, I40E_PFINT_ICR0_ENA); 11334 11278 reg |= I40E_PFINT_ICR0_ENA_MAL_DETECT_MASK; 11335 11279 wr32(hw, I40E_PFINT_ICR0_ENA, reg); 11336 11280 i40e_flush(hw); 11281 + 11282 + i40e_print_vfs_mdd_events(pf); 11337 11283 } 11338 11284 11339 11285 /** ··· 15955 15877 dev_info(&pf->pdev->dev, "set phy mask fail, err %pe aq_err %s\n", 15956 15878 ERR_PTR(err), 15957 15879 i40e_aq_str(&pf->hw, pf->hw.aq.asq_last_status)); 15880 + 15881 + /* VF MDD event logs are rate limited to one second intervals */ 15882 + ratelimit_state_init(&pf->mdd_message_rate_limit, 1 * HZ, 1); 15958 15883 15959 15884 /* Reconfigure hardware for allowing smaller MSS in the case 15960 15885 * of TSO, so that we avoid the MDD being fired and causing
+1 -1
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
··· 216 216 * @notify_vf: notify vf about reset or not 217 217 * Reset VF handler. 218 218 **/ 219 - static void i40e_vc_reset_vf(struct i40e_vf *vf, bool notify_vf) 219 + void i40e_vc_reset_vf(struct i40e_vf *vf, bool notify_vf) 220 220 { 221 221 struct i40e_pf *pf = vf->pf; 222 222 int i;
+10 -1
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
··· 64 64 u64 max_tx_rate; /* bandwidth rate allocation for VSIs */ 65 65 }; 66 66 67 + struct i40e_mdd_vf_events { 68 + u64 count; /* total count of Rx|Tx events */ 69 + /* count number of the last printed event */ 70 + u64 last_printed; 71 + }; 72 + 67 73 /* VF information structure */ 68 74 struct i40e_vf { 69 75 struct i40e_pf *pf; ··· 98 92 99 93 u8 num_queue_pairs; /* num of qps assigned to VF vsis */ 100 94 u8 num_req_queues; /* num of requested qps */ 101 - u64 num_mdd_events; /* num of mdd events detected */ 95 + /* num of mdd tx and rx events detected */ 96 + struct i40e_mdd_vf_events mdd_rx_events; 97 + struct i40e_mdd_vf_events mdd_tx_events; 102 98 103 99 unsigned long vf_caps; /* vf's adv. capabilities */ 104 100 unsigned long vf_states; /* vf's runtime states */ ··· 128 120 int i40e_vc_process_vf_msg(struct i40e_pf *pf, s16 vf_id, u32 v_opcode, 129 121 u32 v_retval, u8 *msg, u16 msglen); 130 122 int i40e_vc_process_vflr_event(struct i40e_pf *pf); 123 + void i40e_vc_reset_vf(struct i40e_vf *vf, bool notify_vf); 131 124 bool i40e_reset_vf(struct i40e_vf *vf, bool flr); 132 125 bool i40e_reset_all_vfs(struct i40e_pf *pf, bool flr); 133 126 void i40e_vc_notify_vf_reset(struct i40e_vf *vf);
+1 -1
drivers/net/ethernet/intel/igb/Makefile
··· 8 8 9 9 igb-y := igb_main.o igb_ethtool.o e1000_82575.o \ 10 10 e1000_mac.o e1000_nvm.o e1000_phy.o e1000_mbx.o \ 11 - e1000_i210.o igb_ptp.o igb_hwmon.o 11 + e1000_i210.o igb_ptp.o igb_hwmon.o igb_xsk.o
+57 -1
drivers/net/ethernet/intel/igb/igb.h
··· 18 18 #include <linux/i2c-algo-bit.h> 19 19 #include <linux/pci.h> 20 20 #include <linux/mdio.h> 21 + #include <linux/lockdep.h> 21 22 22 23 #include <net/xdp.h> 24 + #include <net/xdp_sock_drv.h> 23 25 24 26 struct igb_adapter; 25 27 ··· 88 86 #define IGB_XDP_CONSUMED BIT(0) 89 87 #define IGB_XDP_TX BIT(1) 90 88 #define IGB_XDP_REDIR BIT(2) 89 + #define IGB_XDP_EXIT BIT(3) 91 90 92 91 struct vf_data_storage { 93 92 unsigned char vf_mac_addresses[ETH_ALEN]; ··· 258 255 enum igb_tx_buf_type { 259 256 IGB_TYPE_SKB = 0, 260 257 IGB_TYPE_XDP, 258 + IGB_TYPE_XSK 261 259 }; 262 260 263 261 /* wrapper around a pointer to a socket buffer, ··· 324 320 union { /* array of buffer info structs */ 325 321 struct igb_tx_buffer *tx_buffer_info; 326 322 struct igb_rx_buffer *rx_buffer_info; 323 + struct xdp_buff **rx_buffer_info_zc; 327 324 }; 328 325 void *desc; /* descriptor ring memory */ 329 326 unsigned long flags; /* ring specific flags */ ··· 362 357 }; 363 358 }; 364 359 struct xdp_rxq_info xdp_rxq; 360 + struct xsk_buff_pool *xsk_pool; 365 361 } ____cacheline_internodealigned_in_smp; 366 362 367 363 struct igb_q_vector { ··· 390 384 IGB_RING_FLAG_RX_SCTP_CSUM, 391 385 IGB_RING_FLAG_RX_LB_VLAN_BSWAP, 392 386 IGB_RING_FLAG_TX_CTX_IDX, 393 - IGB_RING_FLAG_TX_DETECT_HANG 387 + IGB_RING_FLAG_TX_DETECT_HANG, 388 + IGB_RING_FLAG_TX_DISABLED 394 389 }; 395 390 396 391 #define ring_uses_large_buffer(ring) \ ··· 738 731 int igb_setup_rx_resources(struct igb_ring *); 739 732 void igb_free_tx_resources(struct igb_ring *); 740 733 void igb_free_rx_resources(struct igb_ring *); 734 + void igb_clean_tx_ring(struct igb_ring *tx_ring); 735 + void igb_clean_rx_ring(struct igb_ring *rx_ring); 741 736 void igb_configure_tx_ring(struct igb_adapter *, struct igb_ring *); 742 737 void igb_configure_rx_ring(struct igb_adapter *, struct igb_ring *); 738 + void igb_finalize_xdp(struct igb_adapter *adapter, unsigned int status); 739 + void igb_update_rx_stats(struct igb_q_vector *q_vector, unsigned int packets, 740 + unsigned int bytes); 743 741 void igb_setup_tctl(struct igb_adapter *); 744 742 void igb_setup_rctl(struct igb_adapter *); 745 743 void igb_setup_srrctl(struct igb_adapter *, struct igb_ring *); 746 744 netdev_tx_t igb_xmit_frame_ring(struct sk_buff *, struct igb_ring *); 745 + int igb_xdp_xmit_back(struct igb_adapter *adapter, struct xdp_buff *xdp); 746 + void igb_process_skb_fields(struct igb_ring *rx_ring, 747 + union e1000_adv_rx_desc *rx_desc, 748 + struct sk_buff *skb); 747 749 void igb_alloc_rx_buffers(struct igb_ring *, u16); 748 750 void igb_update_stats(struct igb_adapter *); 749 751 bool igb_has_link(struct igb_adapter *adapter); ··· 813 797 return netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index); 814 798 } 815 799 800 + /* This function assumes __netif_tx_lock is held by the caller. */ 801 + static inline void igb_xdp_ring_update_tail(struct igb_ring *ring) 802 + { 803 + lockdep_assert_held(&txring_txq(ring)->_xmit_lock); 804 + 805 + /* Force memory writes to complete before letting h/w know there 806 + * are new descriptors to fetch. 807 + */ 808 + wmb(); 809 + writel(ring->next_to_use, ring->tail); 810 + } 811 + 812 + static inline struct igb_ring *igb_xdp_tx_queue_mapping(struct igb_adapter *adapter) 813 + { 814 + unsigned int r_idx = smp_processor_id(); 815 + 816 + if (r_idx >= adapter->num_tx_queues) 817 + r_idx = r_idx % adapter->num_tx_queues; 818 + 819 + return adapter->tx_ring[r_idx]; 820 + } 821 + 822 + static inline bool igb_xdp_is_enabled(struct igb_adapter *adapter) 823 + { 824 + return !!READ_ONCE(adapter->xdp_prog); 825 + } 826 + 816 827 int igb_add_filter(struct igb_adapter *adapter, 817 828 struct igb_nfc_filter *input); 818 829 int igb_erase_filter(struct igb_adapter *adapter, ··· 849 806 const u8 *addr, u8 queue, u8 flags); 850 807 int igb_del_mac_steering_filter(struct igb_adapter *adapter, 851 808 const u8 *addr, u8 queue, u8 flags); 809 + 810 + struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter, 811 + struct igb_ring *ring); 812 + int igb_xsk_pool_setup(struct igb_adapter *adapter, 813 + struct xsk_buff_pool *pool, 814 + u16 qid); 815 + bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring, 816 + struct xsk_buff_pool *xsk_pool, u16 count); 817 + void igb_clean_rx_ring_zc(struct igb_ring *rx_ring); 818 + int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector, 819 + struct xsk_buff_pool *xsk_pool, const int budget); 820 + bool igb_xmit_zc(struct igb_ring *tx_ring, struct xsk_buff_pool *xsk_pool); 821 + int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags); 852 822 853 823 #endif /* _IGB_H_ */
+175 -95
drivers/net/ethernet/intel/igb/igb_main.c
··· 33 33 #include <linux/bpf_trace.h> 34 34 #include <linux/pm_runtime.h> 35 35 #include <linux/etherdevice.h> 36 - #include <linux/lockdep.h> 37 36 #ifdef CONFIG_IGB_DCA 38 37 #include <linux/dca.h> 39 38 #endif ··· 115 116 static void igb_configure_rx(struct igb_adapter *); 116 117 static void igb_clean_all_tx_rings(struct igb_adapter *); 117 118 static void igb_clean_all_rx_rings(struct igb_adapter *); 118 - static void igb_clean_tx_ring(struct igb_ring *); 119 - static void igb_clean_rx_ring(struct igb_ring *); 120 119 static void igb_set_rx_mode(struct net_device *); 121 120 static void igb_update_phy_info(struct timer_list *); 122 121 static void igb_watchdog(struct timer_list *); ··· 472 475 473 476 for (i = 0; i < rx_ring->count; i++) { 474 477 const char *next_desc; 475 - struct igb_rx_buffer *buffer_info; 476 - buffer_info = &rx_ring->rx_buffer_info[i]; 478 + dma_addr_t dma = (dma_addr_t)0; 479 + struct igb_rx_buffer *buffer_info = NULL; 477 480 rx_desc = IGB_RX_DESC(rx_ring, i); 478 481 u0 = (struct my_u0 *)rx_desc; 479 482 staterr = le32_to_cpu(rx_desc->wb.upper.status_error); 483 + 484 + if (!rx_ring->xsk_pool) { 485 + buffer_info = &rx_ring->rx_buffer_info[i]; 486 + dma = buffer_info->dma; 487 + } 480 488 481 489 if (i == rx_ring->next_to_use) 482 490 next_desc = " NTU"; ··· 502 500 "R ", i, 503 501 le64_to_cpu(u0->a), 504 502 le64_to_cpu(u0->b), 505 - (u64)buffer_info->dma, 503 + (u64)dma, 506 504 next_desc); 507 505 508 506 if (netif_msg_pktdata(adapter) && 509 - buffer_info->dma && buffer_info->page) { 507 + buffer_info && dma && buffer_info->page) { 510 508 print_hex_dump(KERN_INFO, "", 511 509 DUMP_PREFIX_ADDRESS, 512 510 16, 1, ··· 1992 1990 */ 1993 1991 for (i = 0; i < adapter->num_rx_queues; i++) { 1994 1992 struct igb_ring *ring = adapter->rx_ring[i]; 1995 - igb_alloc_rx_buffers(ring, igb_desc_unused(ring)); 1993 + if (ring->xsk_pool) 1994 + igb_alloc_rx_buffers_zc(ring, ring->xsk_pool, 1995 + igb_desc_unused(ring)); 1996 + else 1997 + igb_alloc_rx_buffers(ring, igb_desc_unused(ring)); 1996 1998 } 1997 1999 } 1998 2000 ··· 2917 2911 2918 2912 static int igb_xdp(struct net_device *dev, struct netdev_bpf *xdp) 2919 2913 { 2914 + struct igb_adapter *adapter = netdev_priv(dev); 2915 + 2920 2916 switch (xdp->command) { 2921 2917 case XDP_SETUP_PROG: 2922 2918 return igb_xdp_setup(dev, xdp); 2919 + case XDP_SETUP_XSK_POOL: 2920 + return igb_xsk_pool_setup(adapter, xdp->xsk.pool, 2921 + xdp->xsk.queue_id); 2923 2922 default: 2924 2923 return -EINVAL; 2925 2924 } 2926 2925 } 2927 2926 2928 - /* This function assumes __netif_tx_lock is held by the caller. */ 2929 - static void igb_xdp_ring_update_tail(struct igb_ring *ring) 2930 - { 2931 - lockdep_assert_held(&txring_txq(ring)->_xmit_lock); 2932 - 2933 - /* Force memory writes to complete before letting h/w know there 2934 - * are new descriptors to fetch. 2935 - */ 2936 - wmb(); 2937 - writel(ring->next_to_use, ring->tail); 2938 - } 2939 - 2940 - static struct igb_ring *igb_xdp_tx_queue_mapping(struct igb_adapter *adapter) 2941 - { 2942 - unsigned int r_idx = smp_processor_id(); 2943 - 2944 - if (r_idx >= adapter->num_tx_queues) 2945 - r_idx = r_idx % adapter->num_tx_queues; 2946 - 2947 - return adapter->tx_ring[r_idx]; 2948 - } 2949 - 2950 - static int igb_xdp_xmit_back(struct igb_adapter *adapter, struct xdp_buff *xdp) 2927 + int igb_xdp_xmit_back(struct igb_adapter *adapter, struct xdp_buff *xdp) 2951 2928 { 2952 2929 struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp); 2953 2930 int cpu = smp_processor_id(); ··· 2944 2955 /* During program transitions its possible adapter->xdp_prog is assigned 2945 2956 * but ring has not been configured yet. In this case simply abort xmit. 2946 2957 */ 2947 - tx_ring = adapter->xdp_prog ? igb_xdp_tx_queue_mapping(adapter) : NULL; 2958 + tx_ring = igb_xdp_is_enabled(adapter) ? 2959 + igb_xdp_tx_queue_mapping(adapter) : NULL; 2948 2960 if (unlikely(!tx_ring)) 2949 2961 return IGB_XDP_CONSUMED; 2950 2962 ··· 2978 2988 /* During program transitions its possible adapter->xdp_prog is assigned 2979 2989 * but ring has not been configured yet. In this case simply abort xmit. 2980 2990 */ 2981 - tx_ring = adapter->xdp_prog ? igb_xdp_tx_queue_mapping(adapter) : NULL; 2991 + tx_ring = igb_xdp_is_enabled(adapter) ? 2992 + igb_xdp_tx_queue_mapping(adapter) : NULL; 2982 2993 if (unlikely(!tx_ring)) 2994 + return -ENXIO; 2995 + 2996 + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags))) 2983 2997 return -ENXIO; 2984 2998 2985 2999 nq = txring_txq(tx_ring); ··· 3036 3042 .ndo_setup_tc = igb_setup_tc, 3037 3043 .ndo_bpf = igb_xdp, 3038 3044 .ndo_xdp_xmit = igb_xdp_xmit, 3045 + .ndo_xsk_wakeup = igb_xsk_wakeup, 3039 3046 }; 3040 3047 3041 3048 /** ··· 3333 3338 netdev->priv_flags |= IFF_SUPP_NOFCS; 3334 3339 3335 3340 netdev->priv_flags |= IFF_UNICAST_FLT; 3336 - netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; 3341 + netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | 3342 + NETDEV_XDP_ACT_XSK_ZEROCOPY; 3337 3343 3338 3344 /* MTU range: 68 - 9216 */ 3339 3345 netdev->min_mtu = ETH_MIN_MTU; ··· 4360 4364 u64 tdba = ring->dma; 4361 4365 int reg_idx = ring->reg_idx; 4362 4366 4367 + WRITE_ONCE(ring->xsk_pool, igb_xsk_pool(adapter, ring)); 4368 + 4363 4369 wr32(E1000_TDLEN(reg_idx), 4364 4370 ring->count * sizeof(union e1000_adv_tx_desc)); 4365 4371 wr32(E1000_TDBAL(reg_idx), ··· 4422 4424 if (xdp_rxq_info_is_reg(&rx_ring->xdp_rxq)) 4423 4425 xdp_rxq_info_unreg(&rx_ring->xdp_rxq); 4424 4426 res = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, 4425 - rx_ring->queue_index, 0); 4427 + rx_ring->queue_index, 4428 + rx_ring->q_vector->napi.napi_id); 4426 4429 if (res < 0) { 4427 4430 dev_err(dev, "Failed to register xdp_rxq index %u\n", 4428 4431 rx_ring->queue_index); ··· 4719 4720 struct e1000_hw *hw = &adapter->hw; 4720 4721 int reg_idx = ring->reg_idx; 4721 4722 u32 srrctl = 0; 4723 + u32 buf_size; 4724 + 4725 + if (ring->xsk_pool) 4726 + buf_size = xsk_pool_get_rx_frame_size(ring->xsk_pool); 4727 + else if (ring_uses_large_buffer(ring)) 4728 + buf_size = IGB_RXBUFFER_3072; 4729 + else 4730 + buf_size = IGB_RXBUFFER_2048; 4722 4731 4723 4732 srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT; 4724 - if (ring_uses_large_buffer(ring)) 4725 - srrctl |= IGB_RXBUFFER_3072 >> E1000_SRRCTL_BSIZEPKT_SHIFT; 4726 - else 4727 - srrctl |= IGB_RXBUFFER_2048 >> E1000_SRRCTL_BSIZEPKT_SHIFT; 4733 + srrctl |= buf_size >> E1000_SRRCTL_BSIZEPKT_SHIFT; 4728 4734 srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF; 4729 4735 if (hw->mac.type >= e1000_82580) 4730 4736 srrctl |= E1000_SRRCTL_TIMESTAMP; ··· 4761 4757 u32 rxdctl = 0; 4762 4758 4763 4759 xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); 4764 - WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, 4765 - MEM_TYPE_PAGE_SHARED, NULL)); 4760 + WRITE_ONCE(ring->xsk_pool, igb_xsk_pool(adapter, ring)); 4761 + if (ring->xsk_pool) { 4762 + WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, 4763 + MEM_TYPE_XSK_BUFF_POOL, 4764 + NULL)); 4765 + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); 4766 + } else { 4767 + WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, 4768 + MEM_TYPE_PAGE_SHARED, 4769 + NULL)); 4770 + } 4766 4771 4767 4772 /* disable the queue */ 4768 4773 wr32(E1000_RXDCTL(reg_idx), 0); ··· 4798 4785 rxdctl |= IGB_RX_HTHRESH << 8; 4799 4786 rxdctl |= IGB_RX_WTHRESH << 16; 4800 4787 4801 - /* initialize rx_buffer_info */ 4802 - memset(ring->rx_buffer_info, 0, 4803 - sizeof(struct igb_rx_buffer) * ring->count); 4788 + if (ring->xsk_pool) 4789 + memset(ring->rx_buffer_info_zc, 0, 4790 + sizeof(*ring->rx_buffer_info_zc) * ring->count); 4791 + else 4792 + memset(ring->rx_buffer_info, 0, 4793 + sizeof(*ring->rx_buffer_info) * ring->count); 4804 4794 4805 4795 /* initialize Rx descriptor 0 */ 4806 4796 rx_desc = IGB_RX_DESC(ring, 0); ··· 4904 4888 * igb_clean_tx_ring - Free Tx Buffers 4905 4889 * @tx_ring: ring to be cleaned 4906 4890 **/ 4907 - static void igb_clean_tx_ring(struct igb_ring *tx_ring) 4891 + void igb_clean_tx_ring(struct igb_ring *tx_ring) 4908 4892 { 4909 4893 u16 i = tx_ring->next_to_clean; 4910 4894 struct igb_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i]; 4895 + u32 xsk_frames = 0; 4911 4896 4912 4897 while (i != tx_ring->next_to_use) { 4913 4898 union e1000_adv_tx_desc *eop_desc, *tx_desc; 4914 4899 4915 4900 /* Free all the Tx ring sk_buffs or xdp frames */ 4916 - if (tx_buffer->type == IGB_TYPE_SKB) 4901 + if (tx_buffer->type == IGB_TYPE_SKB) { 4917 4902 dev_kfree_skb_any(tx_buffer->skb); 4918 - else 4903 + } else if (tx_buffer->type == IGB_TYPE_XDP) { 4919 4904 xdp_return_frame(tx_buffer->xdpf); 4905 + } else if (tx_buffer->type == IGB_TYPE_XSK) { 4906 + xsk_frames++; 4907 + goto skip_for_xsk; 4908 + } 4920 4909 4921 4910 /* unmap skb header data */ 4922 4911 dma_unmap_single(tx_ring->dev, ··· 4952 4931 DMA_TO_DEVICE); 4953 4932 } 4954 4933 4934 + skip_for_xsk: 4955 4935 tx_buffer->next_to_watch = NULL; 4956 4936 4957 4937 /* move us one more past the eop_desc for start of next pkt */ ··· 4966 4944 4967 4945 /* reset BQL for queue */ 4968 4946 netdev_tx_reset_queue(txring_txq(tx_ring)); 4947 + 4948 + if (tx_ring->xsk_pool && xsk_frames) 4949 + xsk_tx_completed(tx_ring->xsk_pool, xsk_frames); 4969 4950 4970 4951 /* reset next_to_use and next_to_clean */ 4971 4952 tx_ring->next_to_use = 0; ··· 5000 4975 5001 4976 rx_ring->xdp_prog = NULL; 5002 4977 xdp_rxq_info_unreg(&rx_ring->xdp_rxq); 5003 - vfree(rx_ring->rx_buffer_info); 5004 - rx_ring->rx_buffer_info = NULL; 4978 + if (rx_ring->xsk_pool) { 4979 + vfree(rx_ring->rx_buffer_info_zc); 4980 + rx_ring->rx_buffer_info_zc = NULL; 4981 + } else { 4982 + vfree(rx_ring->rx_buffer_info); 4983 + rx_ring->rx_buffer_info = NULL; 4984 + } 5005 4985 5006 4986 /* if not set, then don't free */ 5007 4987 if (!rx_ring->desc) ··· 5037 5007 * igb_clean_rx_ring - Free Rx Buffers per Queue 5038 5008 * @rx_ring: ring to free buffers from 5039 5009 **/ 5040 - static void igb_clean_rx_ring(struct igb_ring *rx_ring) 5010 + void igb_clean_rx_ring(struct igb_ring *rx_ring) 5041 5011 { 5042 5012 u16 i = rx_ring->next_to_clean; 5043 5013 5044 5014 dev_kfree_skb(rx_ring->skb); 5045 5015 rx_ring->skb = NULL; 5016 + 5017 + if (rx_ring->xsk_pool) { 5018 + igb_clean_rx_ring_zc(rx_ring); 5019 + goto skip_for_xsk; 5020 + } 5046 5021 5047 5022 /* Free all the Rx ring sk_buffs */ 5048 5023 while (i != rx_ring->next_to_alloc) { ··· 5076 5041 i = 0; 5077 5042 } 5078 5043 5044 + skip_for_xsk: 5079 5045 rx_ring->next_to_alloc = 0; 5080 5046 rx_ring->next_to_clean = 0; 5081 5047 rx_ring->next_to_use = 0; ··· 6503 6467 return NETDEV_TX_BUSY; 6504 6468 } 6505 6469 6470 + if (unlikely(test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags))) 6471 + return NETDEV_TX_BUSY; 6472 + 6506 6473 /* record the location of the first descriptor for this packet */ 6507 6474 first = &tx_ring->tx_buffer_info[tx_ring->next_to_use]; 6508 6475 first->type = IGB_TYPE_SKB; ··· 6661 6622 struct igb_adapter *adapter = netdev_priv(netdev); 6662 6623 int max_frame = new_mtu + IGB_ETH_PKT_HDR_PAD; 6663 6624 6664 - if (adapter->xdp_prog) { 6625 + if (igb_xdp_is_enabled(adapter)) { 6665 6626 int i; 6666 6627 6667 6628 for (i = 0; i < adapter->num_rx_queues; i++) { ··· 8234 8195 struct igb_q_vector *q_vector = container_of(napi, 8235 8196 struct igb_q_vector, 8236 8197 napi); 8198 + struct xsk_buff_pool *xsk_pool; 8237 8199 bool clean_complete = true; 8238 8200 int work_done = 0; 8239 8201 ··· 8246 8206 clean_complete = igb_clean_tx_irq(q_vector, budget); 8247 8207 8248 8208 if (q_vector->rx.ring) { 8249 - int cleaned = igb_clean_rx_irq(q_vector, budget); 8209 + int cleaned; 8210 + 8211 + xsk_pool = READ_ONCE(q_vector->rx.ring->xsk_pool); 8212 + cleaned = xsk_pool ? 8213 + igb_clean_rx_irq_zc(q_vector, xsk_pool, budget) : 8214 + igb_clean_rx_irq(q_vector, budget); 8250 8215 8251 8216 work_done += cleaned; 8252 8217 if (cleaned >= budget) ··· 8280 8235 **/ 8281 8236 static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget) 8282 8237 { 8283 - struct igb_adapter *adapter = q_vector->adapter; 8284 - struct igb_ring *tx_ring = q_vector->tx.ring; 8285 - struct igb_tx_buffer *tx_buffer; 8286 - union e1000_adv_tx_desc *tx_desc; 8287 8238 unsigned int total_bytes = 0, total_packets = 0; 8239 + struct igb_adapter *adapter = q_vector->adapter; 8288 8240 unsigned int budget = q_vector->tx.work_limit; 8241 + struct igb_ring *tx_ring = q_vector->tx.ring; 8289 8242 unsigned int i = tx_ring->next_to_clean; 8243 + union e1000_adv_tx_desc *tx_desc; 8244 + struct igb_tx_buffer *tx_buffer; 8245 + struct xsk_buff_pool *xsk_pool; 8246 + int cpu = smp_processor_id(); 8247 + bool xsk_xmit_done = true; 8248 + struct netdev_queue *nq; 8249 + u32 xsk_frames = 0; 8290 8250 8291 8251 if (test_bit(__IGB_DOWN, &adapter->state)) 8292 8252 return true; ··· 8322 8272 total_packets += tx_buffer->gso_segs; 8323 8273 8324 8274 /* free the skb */ 8325 - if (tx_buffer->type == IGB_TYPE_SKB) 8275 + if (tx_buffer->type == IGB_TYPE_SKB) { 8326 8276 napi_consume_skb(tx_buffer->skb, napi_budget); 8327 - else 8277 + } else if (tx_buffer->type == IGB_TYPE_XDP) { 8328 8278 xdp_return_frame(tx_buffer->xdpf); 8279 + } else if (tx_buffer->type == IGB_TYPE_XSK) { 8280 + xsk_frames++; 8281 + goto skip_for_xsk; 8282 + } 8329 8283 8330 8284 /* unmap skb header data */ 8331 8285 dma_unmap_single(tx_ring->dev, ··· 8361 8307 } 8362 8308 } 8363 8309 8310 + skip_for_xsk: 8364 8311 /* move us one more past the eop_desc for start of next pkt */ 8365 8312 tx_buffer++; 8366 8313 tx_desc++; ··· 8389 8334 u64_stats_update_end(&tx_ring->tx_syncp); 8390 8335 q_vector->tx.total_bytes += total_bytes; 8391 8336 q_vector->tx.total_packets += total_packets; 8337 + 8338 + xsk_pool = READ_ONCE(tx_ring->xsk_pool); 8339 + if (xsk_pool) { 8340 + if (xsk_frames) 8341 + xsk_tx_completed(xsk_pool, xsk_frames); 8342 + if (xsk_uses_need_wakeup(xsk_pool)) 8343 + xsk_set_tx_need_wakeup(xsk_pool); 8344 + 8345 + nq = txring_txq(tx_ring); 8346 + __netif_tx_lock(nq, cpu); 8347 + /* Avoid transmit queue timeout since we share it with the slow path */ 8348 + txq_trans_cond_update(nq); 8349 + xsk_xmit_done = igb_xmit_zc(tx_ring, xsk_pool); 8350 + __netif_tx_unlock(nq); 8351 + } 8392 8352 8393 8353 if (test_bit(IGB_RING_FLAG_TX_DETECT_HANG, &tx_ring->flags)) { 8394 8354 struct e1000_hw *hw = &adapter->hw; ··· 8467 8397 } 8468 8398 } 8469 8399 8470 - return !!budget; 8400 + return !!budget && xsk_xmit_done; 8471 8401 } 8472 8402 8473 8403 /** ··· 8658 8588 return skb; 8659 8589 } 8660 8590 8661 - static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter, 8662 - struct igb_ring *rx_ring, 8663 - struct xdp_buff *xdp) 8591 + static int igb_run_xdp(struct igb_adapter *adapter, struct igb_ring *rx_ring, 8592 + struct xdp_buff *xdp) 8664 8593 { 8665 8594 int err, result = IGB_XDP_PASS; 8666 8595 struct bpf_prog *xdp_prog; ··· 8699 8630 break; 8700 8631 } 8701 8632 xdp_out: 8702 - return ERR_PTR(-result); 8633 + return result; 8703 8634 } 8704 8635 8705 8636 static unsigned int igb_rx_frame_truesize(struct igb_ring *rx_ring, ··· 8825 8756 union e1000_adv_rx_desc *rx_desc, 8826 8757 struct sk_buff *skb) 8827 8758 { 8828 - /* XDP packets use error pointer so abort at this point */ 8829 - if (IS_ERR(skb)) 8830 - return true; 8831 - 8832 8759 if (unlikely((igb_test_staterr(rx_desc, 8833 8760 E1000_RXDEXT_ERR_FRAME_ERR_MASK)))) { 8834 8761 struct net_device *netdev = rx_ring->netdev; ··· 8851 8786 * order to populate the hash, checksum, VLAN, timestamp, protocol, and 8852 8787 * other fields within the skb. 8853 8788 **/ 8854 - static void igb_process_skb_fields(struct igb_ring *rx_ring, 8855 - union e1000_adv_rx_desc *rx_desc, 8856 - struct sk_buff *skb) 8789 + void igb_process_skb_fields(struct igb_ring *rx_ring, 8790 + union e1000_adv_rx_desc *rx_desc, 8791 + struct sk_buff *skb) 8857 8792 { 8858 8793 struct net_device *dev = rx_ring->netdev; 8859 8794 ··· 8935 8870 rx_buffer->page = NULL; 8936 8871 } 8937 8872 8873 + void igb_finalize_xdp(struct igb_adapter *adapter, unsigned int status) 8874 + { 8875 + int cpu = smp_processor_id(); 8876 + struct netdev_queue *nq; 8877 + 8878 + if (status & IGB_XDP_REDIR) 8879 + xdp_do_flush(); 8880 + 8881 + if (status & IGB_XDP_TX) { 8882 + struct igb_ring *tx_ring = igb_xdp_tx_queue_mapping(adapter); 8883 + 8884 + nq = txring_txq(tx_ring); 8885 + __netif_tx_lock(nq, cpu); 8886 + igb_xdp_ring_update_tail(tx_ring); 8887 + __netif_tx_unlock(nq); 8888 + } 8889 + } 8890 + 8891 + void igb_update_rx_stats(struct igb_q_vector *q_vector, unsigned int packets, 8892 + unsigned int bytes) 8893 + { 8894 + struct igb_ring *ring = q_vector->rx.ring; 8895 + 8896 + u64_stats_update_begin(&ring->rx_syncp); 8897 + ring->rx_stats.packets += packets; 8898 + ring->rx_stats.bytes += bytes; 8899 + u64_stats_update_end(&ring->rx_syncp); 8900 + 8901 + q_vector->rx.total_packets += packets; 8902 + q_vector->rx.total_bytes += bytes; 8903 + } 8904 + 8938 8905 static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget) 8939 8906 { 8940 8907 unsigned int total_bytes = 0, total_packets = 0; ··· 8974 8877 struct igb_ring *rx_ring = q_vector->rx.ring; 8975 8878 u16 cleaned_count = igb_desc_unused(rx_ring); 8976 8879 struct sk_buff *skb = rx_ring->skb; 8977 - int cpu = smp_processor_id(); 8978 8880 unsigned int xdp_xmit = 0; 8979 - struct netdev_queue *nq; 8980 8881 struct xdp_buff xdp; 8981 8882 u32 frame_sz = 0; 8982 8883 int rx_buf_pgcnt; 8884 + int xdp_res = 0; 8983 8885 8984 8886 /* Frame size depend on rx_ring setup when PAGE_SIZE=4K */ 8985 8887 #if (PAGE_SIZE < 8192) ··· 9036 8940 /* At larger PAGE_SIZE, frame_sz depend on len size */ 9037 8941 xdp.frame_sz = igb_rx_frame_truesize(rx_ring, size); 9038 8942 #endif 9039 - skb = igb_run_xdp(adapter, rx_ring, &xdp); 8943 + xdp_res = igb_run_xdp(adapter, rx_ring, &xdp); 9040 8944 } 9041 8945 9042 - if (IS_ERR(skb)) { 9043 - unsigned int xdp_res = -PTR_ERR(skb); 9044 - 8946 + if (xdp_res) { 9045 8947 if (xdp_res & (IGB_XDP_TX | IGB_XDP_REDIR)) { 9046 8948 xdp_xmit |= xdp_res; 9047 8949 igb_rx_buffer_flip(rx_ring, rx_buffer, size); ··· 9058 8964 &xdp, timestamp); 9059 8965 9060 8966 /* exit if we failed to retrieve a buffer */ 9061 - if (!skb) { 8967 + if (!xdp_res && !skb) { 9062 8968 rx_ring->rx_stats.alloc_failed++; 9063 8969 rx_buffer->pagecnt_bias++; 9064 8970 break; ··· 9072 8978 continue; 9073 8979 9074 8980 /* verify the packet layout is correct */ 9075 - if (igb_cleanup_headers(rx_ring, rx_desc, skb)) { 8981 + if (xdp_res || igb_cleanup_headers(rx_ring, rx_desc, skb)) { 9076 8982 skb = NULL; 9077 8983 continue; 9078 8984 } ··· 9095 9001 /* place incomplete frames back on ring for completion */ 9096 9002 rx_ring->skb = skb; 9097 9003 9098 - if (xdp_xmit & IGB_XDP_REDIR) 9099 - xdp_do_flush(); 9004 + if (xdp_xmit) 9005 + igb_finalize_xdp(adapter, xdp_xmit); 9100 9006 9101 - if (xdp_xmit & IGB_XDP_TX) { 9102 - struct igb_ring *tx_ring = igb_xdp_tx_queue_mapping(adapter); 9103 - 9104 - nq = txring_txq(tx_ring); 9105 - __netif_tx_lock(nq, cpu); 9106 - igb_xdp_ring_update_tail(tx_ring); 9107 - __netif_tx_unlock(nq); 9108 - } 9109 - 9110 - u64_stats_update_begin(&rx_ring->rx_syncp); 9111 - rx_ring->rx_stats.packets += total_packets; 9112 - rx_ring->rx_stats.bytes += total_bytes; 9113 - u64_stats_update_end(&rx_ring->rx_syncp); 9114 - q_vector->rx.total_packets += total_packets; 9115 - q_vector->rx.total_bytes += total_bytes; 9007 + igb_update_rx_stats(q_vector, total_packets, total_bytes); 9116 9008 9117 9009 if (cleaned_count) 9118 9010 igb_alloc_rx_buffers(rx_ring, cleaned_count);
+562
drivers/net/ethernet/intel/igb/igb_xsk.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright(c) 2018 Intel Corporation. */ 3 + 4 + #include <linux/bpf_trace.h> 5 + #include <net/xdp_sock_drv.h> 6 + #include <net/xdp.h> 7 + 8 + #include "e1000_hw.h" 9 + #include "igb.h" 10 + 11 + static int igb_realloc_rx_buffer_info(struct igb_ring *ring, bool pool_present) 12 + { 13 + int size = pool_present ? 14 + sizeof(*ring->rx_buffer_info_zc) * ring->count : 15 + sizeof(*ring->rx_buffer_info) * ring->count; 16 + void *buff_info = vmalloc(size); 17 + 18 + if (!buff_info) 19 + return -ENOMEM; 20 + 21 + if (pool_present) { 22 + vfree(ring->rx_buffer_info); 23 + ring->rx_buffer_info = NULL; 24 + ring->rx_buffer_info_zc = buff_info; 25 + } else { 26 + vfree(ring->rx_buffer_info_zc); 27 + ring->rx_buffer_info_zc = NULL; 28 + ring->rx_buffer_info = buff_info; 29 + } 30 + 31 + return 0; 32 + } 33 + 34 + static void igb_txrx_ring_disable(struct igb_adapter *adapter, u16 qid) 35 + { 36 + struct igb_ring *tx_ring = adapter->tx_ring[qid]; 37 + struct igb_ring *rx_ring = adapter->rx_ring[qid]; 38 + struct e1000_hw *hw = &adapter->hw; 39 + 40 + set_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags); 41 + 42 + wr32(E1000_TXDCTL(tx_ring->reg_idx), 0); 43 + wr32(E1000_RXDCTL(rx_ring->reg_idx), 0); 44 + 45 + synchronize_net(); 46 + 47 + /* Rx/Tx share the same napi context. */ 48 + napi_disable(&rx_ring->q_vector->napi); 49 + 50 + igb_clean_tx_ring(tx_ring); 51 + igb_clean_rx_ring(rx_ring); 52 + 53 + memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats)); 54 + memset(&tx_ring->tx_stats, 0, sizeof(tx_ring->tx_stats)); 55 + } 56 + 57 + static void igb_txrx_ring_enable(struct igb_adapter *adapter, u16 qid) 58 + { 59 + struct igb_ring *tx_ring = adapter->tx_ring[qid]; 60 + struct igb_ring *rx_ring = adapter->rx_ring[qid]; 61 + 62 + igb_configure_tx_ring(adapter, tx_ring); 63 + igb_configure_rx_ring(adapter, rx_ring); 64 + 65 + synchronize_net(); 66 + 67 + clear_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags); 68 + 69 + /* call igb_desc_unused which always leaves 70 + * at least 1 descriptor unused to make sure 71 + * next_to_use != next_to_clean 72 + */ 73 + if (rx_ring->xsk_pool) 74 + igb_alloc_rx_buffers_zc(rx_ring, rx_ring->xsk_pool, 75 + igb_desc_unused(rx_ring)); 76 + else 77 + igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring)); 78 + 79 + /* Rx/Tx share the same napi context. */ 80 + napi_enable(&rx_ring->q_vector->napi); 81 + } 82 + 83 + struct xsk_buff_pool *igb_xsk_pool(struct igb_adapter *adapter, 84 + struct igb_ring *ring) 85 + { 86 + int qid = ring->queue_index; 87 + struct xsk_buff_pool *pool; 88 + 89 + pool = xsk_get_pool_from_qid(adapter->netdev, qid); 90 + 91 + if (!igb_xdp_is_enabled(adapter)) 92 + return NULL; 93 + 94 + return (pool && pool->dev) ? pool : NULL; 95 + } 96 + 97 + static int igb_xsk_pool_enable(struct igb_adapter *adapter, 98 + struct xsk_buff_pool *pool, 99 + u16 qid) 100 + { 101 + struct net_device *netdev = adapter->netdev; 102 + struct igb_ring *rx_ring; 103 + bool if_running; 104 + int err; 105 + 106 + if (qid >= adapter->num_rx_queues) 107 + return -EINVAL; 108 + 109 + if (qid >= netdev->real_num_rx_queues || 110 + qid >= netdev->real_num_tx_queues) 111 + return -EINVAL; 112 + 113 + err = xsk_pool_dma_map(pool, &adapter->pdev->dev, IGB_RX_DMA_ATTR); 114 + if (err) 115 + return err; 116 + 117 + rx_ring = adapter->rx_ring[qid]; 118 + if_running = netif_running(adapter->netdev) && igb_xdp_is_enabled(adapter); 119 + if (if_running) 120 + igb_txrx_ring_disable(adapter, qid); 121 + 122 + if (if_running) { 123 + err = igb_realloc_rx_buffer_info(rx_ring, true); 124 + if (!err) { 125 + igb_txrx_ring_enable(adapter, qid); 126 + /* Kick start the NAPI context so that receiving will start */ 127 + err = igb_xsk_wakeup(adapter->netdev, qid, XDP_WAKEUP_RX); 128 + } 129 + 130 + if (err) { 131 + xsk_pool_dma_unmap(pool, IGB_RX_DMA_ATTR); 132 + return err; 133 + } 134 + } 135 + 136 + return 0; 137 + } 138 + 139 + static int igb_xsk_pool_disable(struct igb_adapter *adapter, u16 qid) 140 + { 141 + struct xsk_buff_pool *pool; 142 + struct igb_ring *rx_ring; 143 + bool if_running; 144 + int err; 145 + 146 + pool = xsk_get_pool_from_qid(adapter->netdev, qid); 147 + if (!pool) 148 + return -EINVAL; 149 + 150 + rx_ring = adapter->rx_ring[qid]; 151 + if_running = netif_running(adapter->netdev) && igb_xdp_is_enabled(adapter); 152 + if (if_running) 153 + igb_txrx_ring_disable(adapter, qid); 154 + 155 + xsk_pool_dma_unmap(pool, IGB_RX_DMA_ATTR); 156 + 157 + if (if_running) { 158 + err = igb_realloc_rx_buffer_info(rx_ring, false); 159 + if (err) 160 + return err; 161 + 162 + igb_txrx_ring_enable(adapter, qid); 163 + } 164 + 165 + return 0; 166 + } 167 + 168 + int igb_xsk_pool_setup(struct igb_adapter *adapter, 169 + struct xsk_buff_pool *pool, 170 + u16 qid) 171 + { 172 + return pool ? igb_xsk_pool_enable(adapter, pool, qid) : 173 + igb_xsk_pool_disable(adapter, qid); 174 + } 175 + 176 + static u16 igb_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp, 177 + union e1000_adv_rx_desc *rx_desc, u16 count) 178 + { 179 + dma_addr_t dma; 180 + u16 buffs; 181 + int i; 182 + 183 + /* nothing to do */ 184 + if (!count) 185 + return 0; 186 + 187 + buffs = xsk_buff_alloc_batch(pool, xdp, count); 188 + for (i = 0; i < buffs; i++) { 189 + dma = xsk_buff_xdp_get_dma(*xdp); 190 + rx_desc->read.pkt_addr = cpu_to_le64(dma); 191 + rx_desc->wb.upper.length = 0; 192 + 193 + rx_desc++; 194 + xdp++; 195 + } 196 + 197 + return buffs; 198 + } 199 + 200 + bool igb_alloc_rx_buffers_zc(struct igb_ring *rx_ring, 201 + struct xsk_buff_pool *xsk_pool, u16 count) 202 + { 203 + u32 nb_buffs_extra = 0, nb_buffs = 0; 204 + union e1000_adv_rx_desc *rx_desc; 205 + u16 ntu = rx_ring->next_to_use; 206 + u16 total_count = count; 207 + struct xdp_buff **xdp; 208 + 209 + rx_desc = IGB_RX_DESC(rx_ring, ntu); 210 + xdp = &rx_ring->rx_buffer_info_zc[ntu]; 211 + 212 + if (ntu + count >= rx_ring->count) { 213 + nb_buffs_extra = igb_fill_rx_descs(xsk_pool, xdp, rx_desc, 214 + rx_ring->count - ntu); 215 + if (nb_buffs_extra != rx_ring->count - ntu) { 216 + ntu += nb_buffs_extra; 217 + goto exit; 218 + } 219 + rx_desc = IGB_RX_DESC(rx_ring, 0); 220 + xdp = rx_ring->rx_buffer_info_zc; 221 + ntu = 0; 222 + count -= nb_buffs_extra; 223 + } 224 + 225 + nb_buffs = igb_fill_rx_descs(xsk_pool, xdp, rx_desc, count); 226 + ntu += nb_buffs; 227 + if (ntu == rx_ring->count) 228 + ntu = 0; 229 + 230 + /* clear the length for the next_to_use descriptor */ 231 + rx_desc = IGB_RX_DESC(rx_ring, ntu); 232 + rx_desc->wb.upper.length = 0; 233 + 234 + exit: 235 + if (rx_ring->next_to_use != ntu) { 236 + rx_ring->next_to_use = ntu; 237 + 238 + /* Force memory writes to complete before letting h/w 239 + * know there are new descriptors to fetch. (Only 240 + * applicable for weak-ordered memory model archs, 241 + * such as IA-64). 242 + */ 243 + wmb(); 244 + writel(ntu, rx_ring->tail); 245 + } 246 + 247 + return total_count == (nb_buffs + nb_buffs_extra); 248 + } 249 + 250 + void igb_clean_rx_ring_zc(struct igb_ring *rx_ring) 251 + { 252 + u16 ntc = rx_ring->next_to_clean; 253 + u16 ntu = rx_ring->next_to_use; 254 + 255 + while (ntc != ntu) { 256 + struct xdp_buff *xdp = rx_ring->rx_buffer_info_zc[ntc]; 257 + 258 + xsk_buff_free(xdp); 259 + ntc++; 260 + if (ntc >= rx_ring->count) 261 + ntc = 0; 262 + } 263 + } 264 + 265 + static struct sk_buff *igb_construct_skb_zc(struct igb_ring *rx_ring, 266 + struct xdp_buff *xdp, 267 + ktime_t timestamp) 268 + { 269 + unsigned int totalsize = xdp->data_end - xdp->data_meta; 270 + unsigned int metasize = xdp->data - xdp->data_meta; 271 + struct sk_buff *skb; 272 + 273 + net_prefetch(xdp->data_meta); 274 + 275 + /* allocate a skb to store the frags */ 276 + skb = napi_alloc_skb(&rx_ring->q_vector->napi, totalsize); 277 + if (unlikely(!skb)) 278 + return NULL; 279 + 280 + if (timestamp) 281 + skb_hwtstamps(skb)->hwtstamp = timestamp; 282 + 283 + memcpy(__skb_put(skb, totalsize), xdp->data_meta, 284 + ALIGN(totalsize, sizeof(long))); 285 + 286 + if (metasize) { 287 + skb_metadata_set(skb, metasize); 288 + __skb_pull(skb, metasize); 289 + } 290 + 291 + return skb; 292 + } 293 + 294 + static int igb_run_xdp_zc(struct igb_adapter *adapter, struct igb_ring *rx_ring, 295 + struct xdp_buff *xdp, struct xsk_buff_pool *xsk_pool, 296 + struct bpf_prog *xdp_prog) 297 + { 298 + int err, result = IGB_XDP_PASS; 299 + u32 act; 300 + 301 + prefetchw(xdp->data_hard_start); /* xdp_frame write */ 302 + 303 + act = bpf_prog_run_xdp(xdp_prog, xdp); 304 + 305 + if (likely(act == XDP_REDIRECT)) { 306 + err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog); 307 + if (!err) 308 + return IGB_XDP_REDIR; 309 + 310 + if (xsk_uses_need_wakeup(xsk_pool) && 311 + err == -ENOBUFS) 312 + result = IGB_XDP_EXIT; 313 + else 314 + result = IGB_XDP_CONSUMED; 315 + goto out_failure; 316 + } 317 + 318 + switch (act) { 319 + case XDP_PASS: 320 + break; 321 + case XDP_TX: 322 + result = igb_xdp_xmit_back(adapter, xdp); 323 + if (result == IGB_XDP_CONSUMED) 324 + goto out_failure; 325 + break; 326 + default: 327 + bpf_warn_invalid_xdp_action(adapter->netdev, xdp_prog, act); 328 + fallthrough; 329 + case XDP_ABORTED: 330 + out_failure: 331 + trace_xdp_exception(rx_ring->netdev, xdp_prog, act); 332 + fallthrough; 333 + case XDP_DROP: 334 + result = IGB_XDP_CONSUMED; 335 + break; 336 + } 337 + 338 + return result; 339 + } 340 + 341 + int igb_clean_rx_irq_zc(struct igb_q_vector *q_vector, 342 + struct xsk_buff_pool *xsk_pool, const int budget) 343 + { 344 + struct igb_adapter *adapter = q_vector->adapter; 345 + unsigned int total_bytes = 0, total_packets = 0; 346 + struct igb_ring *rx_ring = q_vector->rx.ring; 347 + u32 ntc = rx_ring->next_to_clean; 348 + struct bpf_prog *xdp_prog; 349 + unsigned int xdp_xmit = 0; 350 + bool failure = false; 351 + u16 entries_to_alloc; 352 + struct sk_buff *skb; 353 + 354 + /* xdp_prog cannot be NULL in the ZC path */ 355 + xdp_prog = READ_ONCE(rx_ring->xdp_prog); 356 + 357 + while (likely(total_packets < budget)) { 358 + union e1000_adv_rx_desc *rx_desc; 359 + ktime_t timestamp = 0; 360 + struct xdp_buff *xdp; 361 + unsigned int size; 362 + int xdp_res = 0; 363 + 364 + rx_desc = IGB_RX_DESC(rx_ring, ntc); 365 + size = le16_to_cpu(rx_desc->wb.upper.length); 366 + if (!size) 367 + break; 368 + 369 + /* This memory barrier is needed to keep us from reading 370 + * any other fields out of the rx_desc until we know the 371 + * descriptor has been written back 372 + */ 373 + dma_rmb(); 374 + 375 + xdp = rx_ring->rx_buffer_info_zc[ntc]; 376 + xsk_buff_set_size(xdp, size); 377 + xsk_buff_dma_sync_for_cpu(xdp); 378 + 379 + /* pull rx packet timestamp if available and valid */ 380 + if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) { 381 + int ts_hdr_len; 382 + 383 + ts_hdr_len = igb_ptp_rx_pktstamp(rx_ring->q_vector, 384 + xdp->data, 385 + &timestamp); 386 + 387 + xdp->data += ts_hdr_len; 388 + xdp->data_meta += ts_hdr_len; 389 + size -= ts_hdr_len; 390 + } 391 + 392 + xdp_res = igb_run_xdp_zc(adapter, rx_ring, xdp, xsk_pool, 393 + xdp_prog); 394 + 395 + if (xdp_res) { 396 + if (likely(xdp_res & (IGB_XDP_TX | IGB_XDP_REDIR))) { 397 + xdp_xmit |= xdp_res; 398 + } else if (xdp_res == IGB_XDP_EXIT) { 399 + failure = true; 400 + break; 401 + } else if (xdp_res == IGB_XDP_CONSUMED) { 402 + xsk_buff_free(xdp); 403 + } 404 + 405 + total_packets++; 406 + total_bytes += size; 407 + ntc++; 408 + if (ntc == rx_ring->count) 409 + ntc = 0; 410 + continue; 411 + } 412 + 413 + skb = igb_construct_skb_zc(rx_ring, xdp, timestamp); 414 + 415 + /* exit if we failed to retrieve a buffer */ 416 + if (!skb) { 417 + rx_ring->rx_stats.alloc_failed++; 418 + break; 419 + } 420 + 421 + xsk_buff_free(xdp); 422 + ntc++; 423 + if (ntc == rx_ring->count) 424 + ntc = 0; 425 + 426 + if (eth_skb_pad(skb)) 427 + continue; 428 + 429 + /* probably a little skewed due to removing CRC */ 430 + total_bytes += skb->len; 431 + 432 + /* populate checksum, timestamp, VLAN, and protocol */ 433 + igb_process_skb_fields(rx_ring, rx_desc, skb); 434 + 435 + napi_gro_receive(&q_vector->napi, skb); 436 + 437 + /* update budget accounting */ 438 + total_packets++; 439 + } 440 + 441 + rx_ring->next_to_clean = ntc; 442 + 443 + if (xdp_xmit) 444 + igb_finalize_xdp(adapter, xdp_xmit); 445 + 446 + igb_update_rx_stats(q_vector, total_packets, total_bytes); 447 + 448 + entries_to_alloc = igb_desc_unused(rx_ring); 449 + if (entries_to_alloc >= IGB_RX_BUFFER_WRITE) 450 + failure |= !igb_alloc_rx_buffers_zc(rx_ring, xsk_pool, 451 + entries_to_alloc); 452 + 453 + if (xsk_uses_need_wakeup(xsk_pool)) { 454 + if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) 455 + xsk_set_rx_need_wakeup(xsk_pool); 456 + else 457 + xsk_clear_rx_need_wakeup(xsk_pool); 458 + 459 + return (int)total_packets; 460 + } 461 + return failure ? budget : (int)total_packets; 462 + } 463 + 464 + bool igb_xmit_zc(struct igb_ring *tx_ring, struct xsk_buff_pool *xsk_pool) 465 + { 466 + unsigned int budget = igb_desc_unused(tx_ring); 467 + u32 cmd_type, olinfo_status, nb_pkts, i = 0; 468 + struct xdp_desc *descs = xsk_pool->tx_descs; 469 + union e1000_adv_tx_desc *tx_desc = NULL; 470 + struct igb_tx_buffer *tx_buffer_info; 471 + unsigned int total_bytes = 0; 472 + dma_addr_t dma; 473 + 474 + if (!netif_carrier_ok(tx_ring->netdev)) 475 + return true; 476 + 477 + if (test_bit(IGB_RING_FLAG_TX_DISABLED, &tx_ring->flags)) 478 + return true; 479 + 480 + nb_pkts = xsk_tx_peek_release_desc_batch(xsk_pool, budget); 481 + if (!nb_pkts) 482 + return true; 483 + 484 + while (nb_pkts-- > 0) { 485 + dma = xsk_buff_raw_get_dma(xsk_pool, descs[i].addr); 486 + xsk_buff_raw_dma_sync_for_device(xsk_pool, dma, descs[i].len); 487 + 488 + tx_buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_use]; 489 + tx_buffer_info->bytecount = descs[i].len; 490 + tx_buffer_info->type = IGB_TYPE_XSK; 491 + tx_buffer_info->xdpf = NULL; 492 + tx_buffer_info->gso_segs = 1; 493 + tx_buffer_info->time_stamp = jiffies; 494 + 495 + tx_desc = IGB_TX_DESC(tx_ring, tx_ring->next_to_use); 496 + tx_desc->read.buffer_addr = cpu_to_le64(dma); 497 + 498 + /* put descriptor type bits */ 499 + cmd_type = E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_DEXT | 500 + E1000_ADVTXD_DCMD_IFCS; 501 + olinfo_status = descs[i].len << E1000_ADVTXD_PAYLEN_SHIFT; 502 + 503 + /* FIXME: This sets the Report Status (RS) bit for every 504 + * descriptor. One nice to have optimization would be to set it 505 + * only for the last descriptor in the whole batch. See Intel 506 + * ice driver for an example on how to do it. 507 + */ 508 + cmd_type |= descs[i].len | IGB_TXD_DCMD; 509 + tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type); 510 + tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status); 511 + 512 + total_bytes += descs[i].len; 513 + 514 + i++; 515 + tx_ring->next_to_use++; 516 + tx_buffer_info->next_to_watch = tx_desc; 517 + if (tx_ring->next_to_use == tx_ring->count) 518 + tx_ring->next_to_use = 0; 519 + } 520 + 521 + netdev_tx_sent_queue(txring_txq(tx_ring), total_bytes); 522 + igb_xdp_ring_update_tail(tx_ring); 523 + 524 + return nb_pkts < budget; 525 + } 526 + 527 + int igb_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags) 528 + { 529 + struct igb_adapter *adapter = netdev_priv(dev); 530 + struct e1000_hw *hw = &adapter->hw; 531 + struct igb_ring *ring; 532 + u32 eics = 0; 533 + 534 + if (test_bit(__IGB_DOWN, &adapter->state)) 535 + return -ENETDOWN; 536 + 537 + if (!igb_xdp_is_enabled(adapter)) 538 + return -EINVAL; 539 + 540 + if (qid >= adapter->num_tx_queues) 541 + return -EINVAL; 542 + 543 + ring = adapter->tx_ring[qid]; 544 + 545 + if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags)) 546 + return -ENETDOWN; 547 + 548 + if (!READ_ONCE(ring->xsk_pool)) 549 + return -EINVAL; 550 + 551 + if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) { 552 + /* Cause software interrupt */ 553 + if (adapter->flags & IGB_FLAG_HAS_MSIX) { 554 + eics |= ring->q_vector->eims_value; 555 + wr32(E1000_EICS, eics); 556 + } else { 557 + wr32(E1000_ICS, E1000_ICS_RXDMT0); 558 + } 559 + } 560 + 561 + return 0; 562 + }
+2
drivers/net/ethernet/intel/igc/igc.h
··· 337 337 struct igc_led_classdev *leds; 338 338 }; 339 339 340 + void igc_set_queue_napi(struct igc_adapter *adapter, int q_idx, 341 + struct napi_struct *napi); 340 342 void igc_up(struct igc_adapter *adapter); 341 343 void igc_down(struct igc_adapter *adapter); 342 344 int igc_open(struct net_device *netdev);
+57 -22
drivers/net/ethernet/intel/igc/igc_main.c
··· 2123 2123 union igc_adv_rx_desc *rx_desc, 2124 2124 struct sk_buff *skb) 2125 2125 { 2126 - /* XDP packets use error pointer so abort at this point */ 2127 - if (IS_ERR(skb)) 2128 - return true; 2129 - 2130 2126 if (unlikely(igc_test_staterr(rx_desc, IGC_RXDEXT_STATERR_RXE))) { 2131 2127 struct net_device *netdev = rx_ring->netdev; 2132 2128 ··· 2511 2515 } 2512 2516 } 2513 2517 2514 - static struct sk_buff *igc_xdp_run_prog(struct igc_adapter *adapter, 2515 - struct xdp_buff *xdp) 2518 + static int igc_xdp_run_prog(struct igc_adapter *adapter, struct xdp_buff *xdp) 2516 2519 { 2517 2520 struct bpf_prog *prog; 2518 2521 int res; ··· 2525 2530 res = __igc_xdp_run_prog(adapter, prog, xdp); 2526 2531 2527 2532 out: 2528 - return ERR_PTR(-res); 2533 + return res; 2529 2534 } 2530 2535 2531 2536 /* This function assumes __netif_tx_lock is held by the caller. */ ··· 2580 2585 struct sk_buff *skb = rx_ring->skb; 2581 2586 u16 cleaned_count = igc_desc_unused(rx_ring); 2582 2587 int xdp_status = 0, rx_buffer_pgcnt; 2588 + int xdp_res = 0; 2583 2589 2584 2590 while (likely(total_packets < budget)) { 2585 2591 struct igc_xdp_buff ctx = { .rx_ts = NULL }; ··· 2626 2630 xdp_buff_clear_frags_flag(&ctx.xdp); 2627 2631 ctx.rx_desc = rx_desc; 2628 2632 2629 - skb = igc_xdp_run_prog(adapter, &ctx.xdp); 2633 + xdp_res = igc_xdp_run_prog(adapter, &ctx.xdp); 2630 2634 } 2631 2635 2632 - if (IS_ERR(skb)) { 2633 - unsigned int xdp_res = -PTR_ERR(skb); 2634 - 2636 + if (xdp_res) { 2635 2637 switch (xdp_res) { 2636 2638 case IGC_XDP_CONSUMED: 2637 2639 rx_buffer->pagecnt_bias++; ··· 2651 2657 skb = igc_construct_skb(rx_ring, rx_buffer, &ctx); 2652 2658 2653 2659 /* exit if we failed to retrieve a buffer */ 2654 - if (!skb) { 2660 + if (!xdp_res && !skb) { 2655 2661 rx_ring->rx_stats.alloc_failed++; 2656 2662 rx_buffer->pagecnt_bias++; 2657 2663 set_bit(IGC_RING_FLAG_RX_ALLOC_FAILED, &rx_ring->flags); ··· 2666 2672 continue; 2667 2673 2668 2674 /* verify the packet layout is correct */ 2669 - if (igc_cleanup_headers(rx_ring, rx_desc, skb)) { 2675 + if (xdp_res || igc_cleanup_headers(rx_ring, rx_desc, skb)) { 2670 2676 skb = NULL; 2671 2677 continue; 2672 2678 } ··· 4942 4948 return 0; 4943 4949 } 4944 4950 4951 + void igc_set_queue_napi(struct igc_adapter *adapter, int vector, 4952 + struct napi_struct *napi) 4953 + { 4954 + struct igc_q_vector *q_vector = adapter->q_vector[vector]; 4955 + 4956 + if (q_vector->rx.ring) 4957 + netif_queue_set_napi(adapter->netdev, 4958 + q_vector->rx.ring->queue_index, 4959 + NETDEV_QUEUE_TYPE_RX, napi); 4960 + 4961 + if (q_vector->tx.ring) 4962 + netif_queue_set_napi(adapter->netdev, 4963 + q_vector->tx.ring->queue_index, 4964 + NETDEV_QUEUE_TYPE_TX, napi); 4965 + } 4966 + 4945 4967 /** 4946 4968 * igc_up - Open the interface and prepare it to handle traffic 4947 4969 * @adapter: board private structure ··· 4965 4955 void igc_up(struct igc_adapter *adapter) 4966 4956 { 4967 4957 struct igc_hw *hw = &adapter->hw; 4958 + struct napi_struct *napi; 4968 4959 int i = 0; 4969 4960 4970 4961 /* hardware has been reset, we need to reload some things */ ··· 4973 4962 4974 4963 clear_bit(__IGC_DOWN, &adapter->state); 4975 4964 4976 - for (i = 0; i < adapter->num_q_vectors; i++) 4977 - napi_enable(&adapter->q_vector[i]->napi); 4965 + for (i = 0; i < adapter->num_q_vectors; i++) { 4966 + napi = &adapter->q_vector[i]->napi; 4967 + napi_enable(napi); 4968 + igc_set_queue_napi(adapter, i, napi); 4969 + } 4978 4970 4979 4971 if (adapter->msix_entries) 4980 4972 igc_configure_msix(adapter); ··· 5206 5192 for (i = 0; i < adapter->num_q_vectors; i++) { 5207 5193 if (adapter->q_vector[i]) { 5208 5194 napi_synchronize(&adapter->q_vector[i]->napi); 5195 + igc_set_queue_napi(adapter, i, NULL); 5209 5196 napi_disable(&adapter->q_vector[i]->napi); 5210 5197 } 5211 5198 } ··· 5591 5576 q_vector); 5592 5577 if (err) 5593 5578 goto err_free; 5579 + 5580 + netif_napi_set_irq(&q_vector->napi, 5581 + adapter->msix_entries[vector].vector); 5594 5582 } 5595 5583 5596 5584 igc_configure_msix(adapter); ··· 6036 6018 struct igc_adapter *adapter = netdev_priv(netdev); 6037 6019 struct pci_dev *pdev = adapter->pdev; 6038 6020 struct igc_hw *hw = &adapter->hw; 6021 + struct napi_struct *napi; 6039 6022 int err = 0; 6040 6023 int i = 0; 6041 6024 ··· 6072 6053 6073 6054 clear_bit(__IGC_DOWN, &adapter->state); 6074 6055 6075 - for (i = 0; i < adapter->num_q_vectors; i++) 6076 - napi_enable(&adapter->q_vector[i]->napi); 6056 + for (i = 0; i < adapter->num_q_vectors; i++) { 6057 + napi = &adapter->q_vector[i]->napi; 6058 + napi_enable(napi); 6059 + igc_set_queue_napi(adapter, i, napi); 6060 + } 6077 6061 6078 6062 /* Clear any pending interrupts. */ 6079 6063 rd32(IGC_ICR); ··· 7321 7299 netif_rx(skb); 7322 7300 } 7323 7301 7324 - static int igc_resume(struct device *dev) 7302 + static int __igc_resume(struct device *dev, bool rpm) 7325 7303 { 7326 7304 struct pci_dev *pdev = to_pci_dev(dev); 7327 7305 struct net_device *netdev = pci_get_drvdata(pdev); ··· 7364 7342 wr32(IGC_WUS, ~0); 7365 7343 7366 7344 if (netif_running(netdev)) { 7345 + if (!rpm) 7346 + rtnl_lock(); 7367 7347 err = __igc_open(netdev, true); 7348 + if (!rpm) 7349 + rtnl_unlock(); 7368 7350 if (!err) 7369 7351 netif_device_attach(netdev); 7370 7352 } ··· 7376 7350 return err; 7377 7351 } 7378 7352 7353 + static int igc_resume(struct device *dev) 7354 + { 7355 + return __igc_resume(dev, false); 7356 + } 7357 + 7379 7358 static int igc_runtime_resume(struct device *dev) 7380 7359 { 7381 - return igc_resume(dev); 7360 + return __igc_resume(dev, true); 7382 7361 } 7383 7362 7384 7363 static int igc_suspend(struct device *dev) ··· 7428 7397 struct net_device *netdev = pci_get_drvdata(pdev); 7429 7398 struct igc_adapter *adapter = netdev_priv(netdev); 7430 7399 7400 + rtnl_lock(); 7431 7401 netif_device_detach(netdev); 7432 7402 7433 - if (state == pci_channel_io_perm_failure) 7403 + if (state == pci_channel_io_perm_failure) { 7404 + rtnl_unlock(); 7434 7405 return PCI_ERS_RESULT_DISCONNECT; 7406 + } 7435 7407 7436 7408 if (netif_running(netdev)) 7437 7409 igc_down(adapter); 7438 7410 pci_disable_device(pdev); 7411 + rtnl_unlock(); 7439 7412 7440 7413 /* Request a slot reset. */ 7441 7414 return PCI_ERS_RESULT_NEED_RESET; ··· 7450 7415 * @pdev: Pointer to PCI device 7451 7416 * 7452 7417 * Restart the card from scratch, as if from a cold-boot. Implementation 7453 - * resembles the first-half of the igc_resume routine. 7418 + * resembles the first-half of the __igc_resume routine. 7454 7419 **/ 7455 7420 static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev) 7456 7421 { ··· 7489 7454 * 7490 7455 * This callback is called when the error recovery driver tells us that 7491 7456 * its OK to resume normal operation. Implementation resembles the 7492 - * second-half of the igc_resume routine. 7457 + * second-half of the __igc_resume routine. 7493 7458 */ 7494 7459 static void igc_io_resume(struct pci_dev *pdev) 7495 7460 {
+6 -2
drivers/net/ethernet/intel/igc/igc_xdp.c
··· 13 13 struct net_device *dev = adapter->netdev; 14 14 bool if_running = netif_running(dev); 15 15 struct bpf_prog *old_prog; 16 + bool need_update; 16 17 17 18 if (dev->mtu > ETH_DATA_LEN) { 18 19 /* For now, the driver doesn't support XDP functionality with ··· 23 22 return -EOPNOTSUPP; 24 23 } 25 24 26 - if (if_running) 25 + need_update = !!adapter->xdp_prog != !!prog; 26 + if (if_running && need_update) 27 27 igc_close(dev); 28 28 29 29 old_prog = xchg(&adapter->xdp_prog, prog); ··· 36 34 else 37 35 xdp_features_clear_redirect_target(dev); 38 36 39 - if (if_running) 37 + if (if_running && need_update) 40 38 igc_open(dev); 41 39 42 40 return 0; ··· 86 84 napi_disable(napi); 87 85 } 88 86 87 + igc_set_queue_napi(adapter, queue_id, NULL); 89 88 set_bit(IGC_RING_FLAG_AF_XDP_ZC, &rx_ring->flags); 90 89 set_bit(IGC_RING_FLAG_AF_XDP_ZC, &tx_ring->flags); 91 90 ··· 136 133 xsk_pool_dma_unmap(pool, IGC_RX_DMA_ATTR); 137 134 clear_bit(IGC_RING_FLAG_AF_XDP_ZC, &rx_ring->flags); 138 135 clear_bit(IGC_RING_FLAG_AF_XDP_ZC, &tx_ring->flags); 136 + igc_set_queue_napi(adapter, queue_id, napi); 139 137 140 138 if (needs_reset) { 141 139 napi_enable(napi);
+9 -14
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
··· 1923 1923 { 1924 1924 struct net_device *netdev = rx_ring->netdev; 1925 1925 1926 - /* XDP packets use error pointer so abort at this point */ 1927 - if (IS_ERR(skb)) 1928 - return true; 1929 - 1930 1926 /* Verify netdev is present, and that packet does not have any 1931 1927 * errors that would be unacceptable to the netdev. 1932 1928 */ ··· 2230 2234 return skb; 2231 2235 } 2232 2236 2233 - static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter, 2234 - struct ixgbe_ring *rx_ring, 2235 - struct xdp_buff *xdp) 2237 + static int ixgbe_run_xdp(struct ixgbe_adapter *adapter, 2238 + struct ixgbe_ring *rx_ring, 2239 + struct xdp_buff *xdp) 2236 2240 { 2237 2241 int err, result = IXGBE_XDP_PASS; 2238 2242 struct bpf_prog *xdp_prog; ··· 2282 2286 break; 2283 2287 } 2284 2288 xdp_out: 2285 - return ERR_PTR(-result); 2289 + return result; 2286 2290 } 2287 2291 2288 2292 static unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring, ··· 2340 2344 unsigned int offset = rx_ring->rx_offset; 2341 2345 unsigned int xdp_xmit = 0; 2342 2346 struct xdp_buff xdp; 2347 + int xdp_res = 0; 2343 2348 2344 2349 /* Frame size depend on rx_ring setup when PAGE_SIZE=4K */ 2345 2350 #if (PAGE_SIZE < 8192) ··· 2386 2389 /* At larger PAGE_SIZE, frame_sz depend on len size */ 2387 2390 xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size); 2388 2391 #endif 2389 - skb = ixgbe_run_xdp(adapter, rx_ring, &xdp); 2392 + xdp_res = ixgbe_run_xdp(adapter, rx_ring, &xdp); 2390 2393 } 2391 2394 2392 - if (IS_ERR(skb)) { 2393 - unsigned int xdp_res = -PTR_ERR(skb); 2394 - 2395 + if (xdp_res) { 2395 2396 if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) { 2396 2397 xdp_xmit |= xdp_res; 2397 2398 ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); ··· 2409 2414 } 2410 2415 2411 2416 /* exit if we failed to retrieve a buffer */ 2412 - if (!skb) { 2417 + if (!xdp_res && !skb) { 2413 2418 rx_ring->rx_stats.alloc_rx_buff_failed++; 2414 2419 rx_buffer->pagecnt_bias++; 2415 2420 break; ··· 2423 2428 continue; 2424 2429 2425 2430 /* verify the packet layout is correct */ 2426 - if (ixgbe_cleanup_headers(rx_ring, rx_desc, skb)) 2431 + if (xdp_res || ixgbe_cleanup_headers(rx_ring, rx_desc, skb)) 2427 2432 continue; 2428 2433 2429 2434 /* probably a little skewed due to removing CRC */
+10 -13
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
··· 737 737 union ixgbe_adv_rx_desc *rx_desc, 738 738 struct sk_buff *skb) 739 739 { 740 - /* XDP packets use error pointer so abort at this point */ 741 - if (IS_ERR(skb)) 742 - return true; 743 - 744 740 /* verify that the packet does not have any known errors */ 745 741 if (unlikely(ixgbevf_test_staterr(rx_desc, 746 742 IXGBE_RXDADV_ERR_FRAME_ERR_MASK))) { ··· 1045 1049 return IXGBEVF_XDP_TX; 1046 1050 } 1047 1051 1048 - static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter, 1049 - struct ixgbevf_ring *rx_ring, 1050 - struct xdp_buff *xdp) 1052 + static int ixgbevf_run_xdp(struct ixgbevf_adapter *adapter, 1053 + struct ixgbevf_ring *rx_ring, 1054 + struct xdp_buff *xdp) 1051 1055 { 1052 1056 int result = IXGBEVF_XDP_PASS; 1053 1057 struct ixgbevf_ring *xdp_ring; ··· 1081 1085 break; 1082 1086 } 1083 1087 xdp_out: 1084 - return ERR_PTR(-result); 1088 + return result; 1085 1089 } 1086 1090 1087 1091 static unsigned int ixgbevf_rx_frame_truesize(struct ixgbevf_ring *rx_ring, ··· 1123 1127 struct sk_buff *skb = rx_ring->skb; 1124 1128 bool xdp_xmit = false; 1125 1129 struct xdp_buff xdp; 1130 + int xdp_res = 0; 1126 1131 1127 1132 /* Frame size depend on rx_ring setup when PAGE_SIZE=4K */ 1128 1133 #if (PAGE_SIZE < 8192) ··· 1167 1170 /* At larger PAGE_SIZE, frame_sz depend on len size */ 1168 1171 xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, size); 1169 1172 #endif 1170 - skb = ixgbevf_run_xdp(adapter, rx_ring, &xdp); 1173 + xdp_res = ixgbevf_run_xdp(adapter, rx_ring, &xdp); 1171 1174 } 1172 1175 1173 - if (IS_ERR(skb)) { 1174 - if (PTR_ERR(skb) == -IXGBEVF_XDP_TX) { 1176 + if (xdp_res) { 1177 + if (xdp_res == IXGBEVF_XDP_TX) { 1175 1178 xdp_xmit = true; 1176 1179 ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, 1177 1180 size); ··· 1191 1194 } 1192 1195 1193 1196 /* exit if we failed to retrieve a buffer */ 1194 - if (!skb) { 1197 + if (!xdp_res && !skb) { 1195 1198 rx_ring->rx_stats.alloc_rx_buff_failed++; 1196 1199 rx_buffer->pagecnt_bias++; 1197 1200 break; ··· 1205 1208 continue; 1206 1209 1207 1210 /* verify the packet layout is correct */ 1208 - if (ixgbevf_cleanup_headers(rx_ring, rx_desc, skb)) { 1211 + if (xdp_res || ixgbevf_cleanup_headers(rx_ring, rx_desc, skb)) { 1209 1212 skb = NULL; 1210 1213 continue; 1211 1214 }