Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

tcp: Destroy TCP-AO, TCP-MD5 keys in .sk_destruct()

Currently there are a couple of minor issues with destroying the keys
tcp_v4_destroy_sock():

1. The socket is yet in TCP bind buckets, making it reachable for
incoming segments [on another CPU core], potentially available to send
late FIN/ACK/RST replies.

2. There is at least one code path, where tcp_done() is called before
sending RST [kudos to Bob for investigation]. This is a case of
a server, that finished sending its data and just called close().

The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
__tcp_close())

tcp_v4_do_rcv()/tcp_v6_do_rcv()
tcp_rcv_state_process() /* LINUX_MIB_TCPABORTONDATA */
tcp_reset()
tcp_done_with_error()
tcp_done()
inet_csk_destroy_sock() /* Destroys AO/MD5 keys */
/* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
tcp_v4_send_reset() /* Sends an unsigned RST segment */

tcpdump:
> 22:53:15.399377 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
> 22:53:15.399396 00:00:01:01:00:00 > 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
> 1.0.0.2.49848 > 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
> 22:53:16.429588 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
> 22:53:16.664725 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
> 22:53:17.289832 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0

Note the signed RSTs later in the dump - those are sent by the server
when the fin-wait socket gets removed from hash buckets, by
the listener socket.

Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
socket can yet send non-data replies, they should be signed in order for
the peer to process them. Now it also matches how AO/MD5 gets destructed
for TIME-WAIT sockets (in tcp_twsk_destructor()).

This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
open problem: once RST get sent and socket gets actually destructed,
there is no information on the initial sequence numbers. So, in case
this last RST gets lost in the network, the server's listener socket
won't be able to properly sign another RST. Nothing in RFC 1122
prescribes keeping any local state after non-graceful reset.
Luckily, BGP are known to use keep alive(s).

While the issue is quite minor/cosmetic, these days monitoring network
counters is a common practice and getting invalid signed segments from
a trusted BGP peer can get customers worried.

Investigated-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-1-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Dmitry Safonov and committed by
Jakub Kicinski
9e472d9e bf2650d0

+47 -25
+4
include/net/tcp.h
··· 1941 1941 } 1942 1942 1943 1943 #define tcp_twsk_md5_key(twsk) ((twsk)->tw_md5_key) 1944 + void tcp_md5_destruct_sock(struct sock *sk); 1944 1945 #else 1945 1946 static inline struct tcp_md5sig_key * 1946 1947 tcp_md5_do_lookup(const struct sock *sk, int l3index, ··· 1958 1957 } 1959 1958 1960 1959 #define tcp_twsk_md5_key(twsk) NULL 1960 + static inline void tcp_md5_destruct_sock(struct sock *sk) 1961 + { 1962 + } 1961 1963 #endif 1962 1964 1963 1965 int tcp_md5_alloc_sigpool(void);
+27
net/ipv4/tcp.c
··· 412 412 return rate64; 413 413 } 414 414 415 + #ifdef CONFIG_TCP_MD5SIG 416 + static void tcp_md5sig_info_free_rcu(struct rcu_head *head) 417 + { 418 + struct tcp_md5sig_info *md5sig; 419 + 420 + md5sig = container_of(head, struct tcp_md5sig_info, rcu); 421 + kfree(md5sig); 422 + static_branch_slow_dec_deferred(&tcp_md5_needed); 423 + tcp_md5_release_sigpool(); 424 + } 425 + 426 + void tcp_md5_destruct_sock(struct sock *sk) 427 + { 428 + struct tcp_sock *tp = tcp_sk(sk); 429 + 430 + if (tp->md5sig_info) { 431 + struct tcp_md5sig_info *md5sig; 432 + 433 + md5sig = rcu_dereference_protected(tp->md5sig_info, 1); 434 + tcp_clear_md5_list(sk); 435 + rcu_assign_pointer(tp->md5sig_info, NULL); 436 + call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu); 437 + } 438 + } 439 + EXPORT_IPV6_MOD_GPL(tcp_md5_destruct_sock); 440 + #endif 441 + 415 442 /* Address-family independent initialization for a tcp_sock. 416 443 * 417 444 * NOTE: A lot of things set to zero explicitly by call to
+8 -25
net/ipv4/tcp_ipv4.c
··· 2494 2494 .ao_calc_key_sk = tcp_v4_ao_calc_key_sk, 2495 2495 #endif 2496 2496 }; 2497 + 2498 + static void tcp4_destruct_sock(struct sock *sk) 2499 + { 2500 + tcp_md5_destruct_sock(sk); 2501 + tcp_ao_destroy_sock(sk, false); 2502 + inet_sock_destruct(sk); 2503 + } 2497 2504 #endif 2498 2505 2499 2506 /* NOTE: A lot of things set to zero explicitly by call to ··· 2516 2509 2517 2510 #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) 2518 2511 tcp_sk(sk)->af_specific = &tcp_sock_ipv4_specific; 2512 + sk->sk_destruct = tcp4_destruct_sock; 2519 2513 #endif 2520 2514 2521 2515 return 0; 2522 2516 } 2523 - 2524 - #ifdef CONFIG_TCP_MD5SIG 2525 - static void tcp_md5sig_info_free_rcu(struct rcu_head *head) 2526 - { 2527 - struct tcp_md5sig_info *md5sig; 2528 - 2529 - md5sig = container_of(head, struct tcp_md5sig_info, rcu); 2530 - kfree(md5sig); 2531 - static_branch_slow_dec_deferred(&tcp_md5_needed); 2532 - tcp_md5_release_sigpool(); 2533 - } 2534 - #endif 2535 2517 2536 2518 static void tcp_release_user_frags(struct sock *sk) 2537 2519 { ··· 2557 2561 2558 2562 /* Cleans up our, hopefully empty, out_of_order_queue. */ 2559 2563 skb_rbtree_purge(&tp->out_of_order_queue); 2560 - 2561 - #ifdef CONFIG_TCP_MD5SIG 2562 - /* Clean up the MD5 key list, if any */ 2563 - if (tp->md5sig_info) { 2564 - struct tcp_md5sig_info *md5sig; 2565 - 2566 - md5sig = rcu_dereference_protected(tp->md5sig_info, 1); 2567 - tcp_clear_md5_list(sk); 2568 - call_rcu(&md5sig->rcu, tcp_md5sig_info_free_rcu); 2569 - rcu_assign_pointer(tp->md5sig_info, NULL); 2570 - } 2571 - #endif 2572 - tcp_ao_destroy_sock(sk, false); 2573 2564 2574 2565 /* Clean up a referenced TCP bind bucket. */ 2575 2566 if (inet_csk(sk)->icsk_bind_hash)
+8
net/ipv6/tcp_ipv6.c
··· 2110 2110 .ao_calc_key_sk = tcp_v4_ao_calc_key_sk, 2111 2111 #endif 2112 2112 }; 2113 + 2114 + static void tcp6_destruct_sock(struct sock *sk) 2115 + { 2116 + tcp_md5_destruct_sock(sk); 2117 + tcp_ao_destroy_sock(sk, false); 2118 + inet6_sock_destruct(sk); 2119 + } 2113 2120 #endif 2114 2121 2115 2122 /* NOTE: A lot of things set to zero explicitly by call to ··· 2132 2125 2133 2126 #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) 2134 2127 tcp_sk(sk)->af_specific = &tcp_sock_ipv6_specific; 2128 + sk->sk_destruct = tcp6_destruct_sock; 2135 2129 #endif 2136 2130 2137 2131 return 0;