Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

net/tcp: Allow asynchronous delete for TCP-AO keys (MKTs)

Delete becomes very, very fast - almost free, but after setsockopt()
syscall returns, the key is still alive until next RCU grace period.
Which is fine for listen sockets as userspace needs to be aware of
setsockopt(TCP_AO) and accept() race and resolve it with verification
by getsockopt() after TCP connection was accepted.

The benchmark results (on non-loaded box, worse with more RCU work pending):
> ok 33 Worst case delete 16384 keys: min=5ms max=10ms mean=6.93904ms stddev=0.263421
> ok 34 Add a new key 16384 keys: min=1ms max=4ms mean=2.17751ms stddev=0.147564
> ok 35 Remove random-search 16384 keys: min=5ms max=10ms mean=6.50243ms stddev=0.254999
> ok 36 Remove async 16384 keys: min=0ms max=0ms mean=0.0296107ms stddev=0.0172078

Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Dmitry Safonov and committed by
David S. Miller
d6732b95 ef84703a

+20 -4
+2 -1
include/uapi/linux/tcp.h
··· 396 396 __s32 ifindex; /* L3 dev index for VRF */ 397 397 __u32 set_current :1, /* corresponding ::current_key */ 398 398 set_rnext :1, /* corresponding ::rnext */ 399 - reserved :30; /* must be 0 */ 399 + del_async :1, /* only valid for listen sockets */ 400 + reserved :29; /* must be 0 */ 400 401 __u16 reserved2; /* padding, must be 0 */ 401 402 __u8 prefix; /* peer's address prefix */ 402 403 __u8 sndid; /* SendID for outgoing segments */
+18 -3
net/ipv4/tcp_ao.c
··· 1628 1628 } 1629 1629 1630 1630 static int tcp_ao_delete_key(struct sock *sk, struct tcp_ao_info *ao_info, 1631 - struct tcp_ao_key *key, 1631 + bool del_async, struct tcp_ao_key *key, 1632 1632 struct tcp_ao_key *new_current, 1633 1633 struct tcp_ao_key *new_rnext) 1634 1634 { ··· 1636 1636 1637 1637 hlist_del_rcu(&key->node); 1638 1638 1639 + /* Support for async delete on listening sockets: as they don't 1640 + * need current_key/rnext_key maintaining, we don't need to check 1641 + * them and we can just free all resources in RCU fashion. 1642 + */ 1643 + if (del_async) { 1644 + atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc); 1645 + call_rcu(&key->rcu, tcp_ao_key_free_rcu); 1646 + return 0; 1647 + } 1648 + 1639 1649 /* At this moment another CPU could have looked this key up 1640 1650 * while it was unlinked from the list. Wait for RCU grace period, 1641 1651 * after which the key is off-list and can't be looked up again; 1642 1652 * the rx path [just before RCU came] might have used it and set it 1643 1653 * as current_key (very unlikely). 1654 + * Free the key with next RCU grace period (in case it was 1655 + * current_key before tcp_ao_current_rnext() might have 1656 + * changed it in forced-delete). 1644 1657 */ 1645 1658 synchronize_rcu(); 1646 1659 if (new_current) ··· 1724 1711 if (!new_rnext) 1725 1712 return -ENOENT; 1726 1713 } 1714 + if (cmd.del_async && sk->sk_state != TCP_LISTEN) 1715 + return -EINVAL; 1727 1716 1728 1717 if (family == AF_INET) { 1729 1718 struct sockaddr_in *sin = (struct sockaddr_in *)&cmd.addr; ··· 1773 1758 if (key == new_current || key == new_rnext) 1774 1759 continue; 1775 1760 1776 - return tcp_ao_delete_key(sk, ao_info, key, 1777 - new_current, new_rnext); 1761 + return tcp_ao_delete_key(sk, ao_info, cmd.del_async, key, 1762 + new_current, new_rnext); 1778 1763 } 1779 1764 return -ENOENT; 1780 1765 }