Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'rxrpc-implement-jumbo-data-transmission-and-rack-tlp'

David Howells says:

====================
rxrpc: Implement jumbo DATA transmission and RACK-TLP

Here's a series of patches to implement two main features:

(1) The transmission of jumbo data packets whereby several DATA packets of
a particular size can be glued together into a single UDP packet,
allowing us to make use of larger MTU sizes. The basic jumbo
subpacket capacity is 1412 bytes (RXRPC_JUMBO_DATALEN) and, say, an
MTU of 8192 allows five of them to be transmitted as one.

An alternative (and possibly more efficient way) would be to
expand/shrink the capacity of each DATA packet to match the MTU and
thus save on header and tail-gap overhead, but the Rx protocol does
not provide a mechanism for splitting the data - especially as the
transported data is encrypted per-packet - and so UDP fragmentation
would be the only way to handle this.

In fact, in the future, AF_RXRPC also needs to look at shrinking the
packet size where the MTU is smaller - for instance in the case of
being carried by IPv6 over wifi where there isn't capacity for a 1412
byte capacity.

(2) RACK-TLP to manage packet loss and retransmission in conjunction with
the congestion control algorithm.

These allow for better data throughput and work towards being able to have
larger transmission windows.

To this end, the following changes are also made:

(1) Use a single large array of kvec structs for the I/O thread rather
than having one per transmission buffer. We need a much bigger
collection of kvecs for ping padding

(2) Implement path-MTU probing by sending padded PING ACK packets and
monitoring for PING RESPONSE ACKs. The pmtud value determined is used
to configure the construction of jumbo DATA packets.

(3) The transmission queue is changed from a linked list of transmission
buffer structs to a linked list of transmission-queue structs, each of
which points to either 32 or 64 transmission buffers (depending on cpu
word size) and various bits of metadata are concentrated in the queue
structs rather than the buffers to make better use of the cpu cache.

(4) SACK data is stored in the transmission-queue structures in batches of
32 or 64 making it faster to process rather than being spread amongst
all the individual packet buffers.

(5) Don't change the DF flag on the UDP socket unless we need to - and
basically only enable it for path-MTU probing.

There are also some additional bits:

(1) Fix the handling of connection aborts to poke the aborted connections.

(2) Don't set the MORE-PACKETS Rx header flag on the wire. No one
actually checks it and it is, in any case, generated inconsistently
between implementations.

(3) Request an ACK when, during call transmission, there's a stall in the
app generating the data to be transmitted.

(4) Fix attention starvation in the I/O thread by making sure we go
through all outstanding events rather than returning to the beginning
of the check cycle after any time we process an event.

(5) Don't use the skbuff timestamp in the calculation of timeouts and RTT
as we really should include local processing time in that too.
Further, getting receive skbuff timestamps may be expensive.

(6) Make RTT tracking per call with the saving of the value between calls,
even within the same connection channel. The initial call timeout
starts off large to allow the server time to set up its state before
the initial reply.

(7) Don't allocate txbuf structs for ACK packets, but rather use page
frags and MSG_SPLICE_PAGES.

(8) Use irq-disabling locks for interactions between app threads and I/O
threads so that the I/O thread doesn't get help up.

(9) Make rxrpc set the REQUEST-ACK flag on an outgoing packet when cwnd is
at RXRPC_MIN_CWND (currently 4), not at 2 which it can never reach.

(10) Add some tracing bits and pieces (including displaying the userStatus
field in an ACK header) and some more stats counters (including
different sizes of jumbo packets sent/received).

Link: https://lore.kernel.org/r/20240306000655.1100294-1-dhowells@redhat.com/ [1]
====================

Link: https://patch.msgid.link/20241204074710.990092-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+3006 -1269
+5
include/linux/ktime.h
··· 222 222 return ns; 223 223 } 224 224 225 + static inline ktime_t us_to_ktime(u64 us) 226 + { 227 + return us * NSEC_PER_USEC; 228 + } 229 + 225 230 static inline ktime_t ms_to_ktime(u64 ms) 226 231 { 227 232 return ms * NSEC_PER_MSEC;
+766 -114
include/trace/events/rxrpc.h
··· 117 117 #define rxrpc_call_poke_traces \ 118 118 EM(rxrpc_call_poke_abort, "Abort") \ 119 119 EM(rxrpc_call_poke_complete, "Compl") \ 120 + EM(rxrpc_call_poke_conn_abort, "Conn-abort") \ 120 121 EM(rxrpc_call_poke_error, "Error") \ 121 122 EM(rxrpc_call_poke_idle, "Idle") \ 123 + EM(rxrpc_call_poke_rx_packet, "Rx-packet") \ 122 124 EM(rxrpc_call_poke_set_timeout, "Set-timo") \ 123 125 EM(rxrpc_call_poke_start, "Start") \ 124 126 EM(rxrpc_call_poke_timer, "Timer") \ ··· 129 127 #define rxrpc_skb_traces \ 130 128 EM(rxrpc_skb_eaten_by_unshare, "ETN unshare ") \ 131 129 EM(rxrpc_skb_eaten_by_unshare_nomem, "ETN unshar-nm") \ 130 + EM(rxrpc_skb_get_call_rx, "GET call-rx ") \ 132 131 EM(rxrpc_skb_get_conn_secured, "GET conn-secd") \ 133 132 EM(rxrpc_skb_get_conn_work, "GET conn-work") \ 134 - EM(rxrpc_skb_get_last_nack, "GET last-nack") \ 135 133 EM(rxrpc_skb_get_local_work, "GET locl-work") \ 136 134 EM(rxrpc_skb_get_reject_work, "GET rej-work ") \ 137 135 EM(rxrpc_skb_get_to_recvmsg, "GET to-recv ") \ ··· 140 138 EM(rxrpc_skb_new_error_report, "NEW error-rpt") \ 141 139 EM(rxrpc_skb_new_jumbo_subpacket, "NEW jumbo-sub") \ 142 140 EM(rxrpc_skb_new_unshared, "NEW unshared ") \ 141 + EM(rxrpc_skb_put_call_rx, "PUT call-rx ") \ 143 142 EM(rxrpc_skb_put_conn_secured, "PUT conn-secd") \ 144 143 EM(rxrpc_skb_put_conn_work, "PUT conn-work") \ 145 144 EM(rxrpc_skb_put_error_report, "PUT error-rep") \ 146 145 EM(rxrpc_skb_put_input, "PUT input ") \ 147 146 EM(rxrpc_skb_put_jumbo_subpacket, "PUT jumbo-sub") \ 148 - EM(rxrpc_skb_put_last_nack, "PUT last-nack") \ 149 147 EM(rxrpc_skb_put_purge, "PUT purge ") \ 150 148 EM(rxrpc_skb_put_rotate, "PUT rotate ") \ 151 149 EM(rxrpc_skb_put_unknown, "PUT unknown ") \ ··· 284 282 EM(rxrpc_call_see_activate_client, "SEE act-clnt") \ 285 283 EM(rxrpc_call_see_connect_failed, "SEE con-fail") \ 286 284 EM(rxrpc_call_see_connected, "SEE connect ") \ 285 + EM(rxrpc_call_see_conn_abort, "SEE conn-abt") \ 287 286 EM(rxrpc_call_see_disconnected, "SEE disconn ") \ 288 287 EM(rxrpc_call_see_distribute_error, "SEE dist-err") \ 289 288 EM(rxrpc_call_see_input, "SEE input ") \ ··· 295 292 296 293 #define rxrpc_txqueue_traces \ 297 294 EM(rxrpc_txqueue_await_reply, "AWR") \ 298 - EM(rxrpc_txqueue_dequeue, "DEQ") \ 299 295 EM(rxrpc_txqueue_end, "END") \ 300 296 EM(rxrpc_txqueue_queue, "QUE") \ 301 297 EM(rxrpc_txqueue_queue_last, "QLS") \ 302 298 EM(rxrpc_txqueue_rotate, "ROT") \ 303 299 EM(rxrpc_txqueue_rotate_last, "RLS") \ 304 300 E_(rxrpc_txqueue_wait, "WAI") 301 + 302 + #define rxrpc_txdata_traces \ 303 + EM(rxrpc_txdata_inject_loss, " *INJ-LOSS*") \ 304 + EM(rxrpc_txdata_new_data, " ") \ 305 + EM(rxrpc_txdata_retransmit, " *RETRANS*") \ 306 + EM(rxrpc_txdata_tlp_new_data, " *TLP-NEW*") \ 307 + E_(rxrpc_txdata_tlp_retransmit, " *TLP-RETRANS*") 305 308 306 309 #define rxrpc_receive_traces \ 307 310 EM(rxrpc_receive_end, "END") \ ··· 344 335 E_(rxrpc_rtt_tx_ping, "PING") 345 336 346 337 #define rxrpc_rtt_rx_traces \ 347 - EM(rxrpc_rtt_rx_other_ack, "OACK") \ 338 + EM(rxrpc_rtt_rx_data_ack, "DACK") \ 348 339 EM(rxrpc_rtt_rx_obsolete, "OBSL") \ 349 340 EM(rxrpc_rtt_rx_lost, "LOST") \ 350 - EM(rxrpc_rtt_rx_ping_response, "PONG") \ 351 - E_(rxrpc_rtt_rx_requested_ack, "RACK") 341 + E_(rxrpc_rtt_rx_ping_response, "PONG") 352 342 353 343 #define rxrpc_timer_traces \ 354 344 EM(rxrpc_timer_trace_delayed_ack, "DelayAck ") \ ··· 355 347 EM(rxrpc_timer_trace_hard, "HardLimit") \ 356 348 EM(rxrpc_timer_trace_idle, "IdleLimit") \ 357 349 EM(rxrpc_timer_trace_keepalive, "KeepAlive") \ 358 - EM(rxrpc_timer_trace_lost_ack, "LostAck ") \ 359 350 EM(rxrpc_timer_trace_ping, "DelayPing") \ 360 - EM(rxrpc_timer_trace_resend, "Resend ") \ 361 - EM(rxrpc_timer_trace_resend_reset, "ResendRst") \ 362 - E_(rxrpc_timer_trace_resend_tx, "ResendTx ") 351 + EM(rxrpc_timer_trace_rack_off, "RACK-OFF ") \ 352 + EM(rxrpc_timer_trace_rack_zwp, "RACK-ZWP ") \ 353 + EM(rxrpc_timer_trace_rack_reo, "RACK-Reo ") \ 354 + EM(rxrpc_timer_trace_rack_tlp_pto, "TLP-PTO ") \ 355 + E_(rxrpc_timer_trace_rack_rto, "RTO ") 363 356 364 357 #define rxrpc_propose_ack_traces \ 365 358 EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \ ··· 371 362 EM(rxrpc_propose_ack_ping_for_lost_ack, "LostAck") \ 372 363 EM(rxrpc_propose_ack_ping_for_lost_reply, "LostRpl") \ 373 364 EM(rxrpc_propose_ack_ping_for_0_retrans, "0-Retrn") \ 365 + EM(rxrpc_propose_ack_ping_for_mtu_probe, "MTUProb") \ 374 366 EM(rxrpc_propose_ack_ping_for_old_rtt, "OldRtt ") \ 375 367 EM(rxrpc_propose_ack_ping_for_params, "Params ") \ 376 368 EM(rxrpc_propose_ack_ping_for_rtt, "Rtt ") \ 377 369 EM(rxrpc_propose_ack_processing_op, "ProcOp ") \ 378 370 EM(rxrpc_propose_ack_respond_to_ack, "Rsp2Ack") \ 379 371 EM(rxrpc_propose_ack_respond_to_ping, "Rsp2Png") \ 372 + EM(rxrpc_propose_ack_retransmit, "Retrans") \ 380 373 EM(rxrpc_propose_ack_retry_tx, "RetryTx") \ 381 374 EM(rxrpc_propose_ack_rotate_rx, "RxAck ") \ 382 375 EM(rxrpc_propose_ack_rx_idle, "RxIdle ") \ 383 376 E_(rxrpc_propose_ack_terminal_ack, "ClTerm ") 384 377 385 - #define rxrpc_congest_modes \ 386 - EM(RXRPC_CALL_CONGEST_AVOIDANCE, "CongAvoid") \ 387 - EM(RXRPC_CALL_FAST_RETRANSMIT, "FastReTx ") \ 388 - EM(RXRPC_CALL_PACKET_LOSS, "PktLoss ") \ 389 - E_(RXRPC_CALL_SLOW_START, "SlowStart") 378 + #define rxrpc_ca_states \ 379 + EM(RXRPC_CA_CONGEST_AVOIDANCE, "CongAvoid") \ 380 + EM(RXRPC_CA_FAST_RETRANSMIT, "FastReTx ") \ 381 + EM(RXRPC_CA_PACKET_LOSS, "PktLoss ") \ 382 + E_(RXRPC_CA_SLOW_START, "SlowStart") 390 383 391 384 #define rxrpc_congest_changes \ 392 385 EM(rxrpc_cong_begin_retransmission, " Retrans") \ ··· 461 450 462 451 #define rxrpc_req_ack_traces \ 463 452 EM(rxrpc_reqack_ack_lost, "ACK-LOST ") \ 464 - EM(rxrpc_reqack_already_on, "ALREADY-ON") \ 453 + EM(rxrpc_reqack_app_stall, "APP-STALL ") \ 465 454 EM(rxrpc_reqack_more_rtt, "MORE-RTT ") \ 466 455 EM(rxrpc_reqack_no_srv_last, "NO-SRVLAST") \ 467 456 EM(rxrpc_reqack_old_rtt, "OLD-RTT ") \ ··· 471 460 /* ---- Must update size of stat_why_req_ack[] if more are added! */ 472 461 473 462 #define rxrpc_txbuf_traces \ 474 - EM(rxrpc_txbuf_alloc_ack, "ALLOC ACK ") \ 475 463 EM(rxrpc_txbuf_alloc_data, "ALLOC DATA ") \ 476 464 EM(rxrpc_txbuf_free, "FREE ") \ 477 465 EM(rxrpc_txbuf_get_buffer, "GET BUFFER ") \ 478 466 EM(rxrpc_txbuf_get_trans, "GET TRANS ") \ 479 467 EM(rxrpc_txbuf_get_retrans, "GET RETRANS") \ 480 - EM(rxrpc_txbuf_put_ack_tx, "PUT ACK TX ") \ 481 468 EM(rxrpc_txbuf_put_cleaned, "PUT CLEANED") \ 482 469 EM(rxrpc_txbuf_put_nomem, "PUT NOMEM ") \ 483 470 EM(rxrpc_txbuf_put_rotated, "PUT ROTATED") \ 484 471 EM(rxrpc_txbuf_put_send_aborted, "PUT SEND-X ") \ 485 472 EM(rxrpc_txbuf_put_trans, "PUT TRANS ") \ 473 + EM(rxrpc_txbuf_see_lost, "SEE LOST ") \ 486 474 EM(rxrpc_txbuf_see_out_of_step, "OUT-OF-STEP") \ 487 - EM(rxrpc_txbuf_see_send_more, "SEE SEND+ ") \ 488 - E_(rxrpc_txbuf_see_unacked, "SEE UNACKED") 475 + E_(rxrpc_txbuf_see_send_more, "SEE SEND+ ") 476 + 477 + #define rxrpc_tq_traces \ 478 + EM(rxrpc_tq_alloc, "ALLOC") \ 479 + EM(rxrpc_tq_cleaned, "CLEAN") \ 480 + EM(rxrpc_tq_decant, "DCNT ") \ 481 + EM(rxrpc_tq_decant_advance, "DCNT>") \ 482 + EM(rxrpc_tq_queue, "QUEUE") \ 483 + EM(rxrpc_tq_queue_dup, "QUE!!") \ 484 + EM(rxrpc_tq_rotate, "ROT ") \ 485 + EM(rxrpc_tq_rotate_and_free, "ROT-F") \ 486 + EM(rxrpc_tq_rotate_and_keep, "ROT-K") \ 487 + EM(rxrpc_tq_transmit, "XMIT ") \ 488 + E_(rxrpc_tq_transmit_advance, "XMIT>") 489 + 490 + #define rxrpc_pmtud_reduce_traces \ 491 + EM(rxrpc_pmtud_reduce_ack, "Ack ") \ 492 + EM(rxrpc_pmtud_reduce_icmp, "Icmp ") \ 493 + E_(rxrpc_pmtud_reduce_route, "Route") 494 + 495 + #define rxrpc_rotate_traces \ 496 + EM(rxrpc_rotate_trace_hack, "hard-ack") \ 497 + EM(rxrpc_rotate_trace_sack, "soft-ack") \ 498 + E_(rxrpc_rotate_trace_snak, "soft-nack") 499 + 500 + #define rxrpc_rack_timer_modes \ 501 + EM(RXRPC_CALL_RACKTIMER_OFF, "---") \ 502 + EM(RXRPC_CALL_RACKTIMER_RACK_REORDER, "REO") \ 503 + EM(RXRPC_CALL_RACKTIMER_TLP_PTO, "TLP") \ 504 + E_(RXRPC_CALL_RACKTIMER_RTO, "RTO") 505 + 506 + #define rxrpc_tlp_probe_traces \ 507 + EM(rxrpc_tlp_probe_trace_busy, "busy") \ 508 + EM(rxrpc_tlp_probe_trace_transmit_new, "transmit-new") \ 509 + E_(rxrpc_tlp_probe_trace_retransmit, "retransmit") 510 + 511 + #define rxrpc_tlp_ack_traces \ 512 + EM(rxrpc_tlp_ack_trace_acked, "acked") \ 513 + EM(rxrpc_tlp_ack_trace_dup_acked, "dup-acked") \ 514 + EM(rxrpc_tlp_ack_trace_hard_beyond, "hard-beyond") \ 515 + EM(rxrpc_tlp_ack_trace_incomplete, "incomplete") \ 516 + E_(rxrpc_tlp_ack_trace_new_data, "new-data") 489 517 490 518 /* 491 519 * Generate enums for tracing information. ··· 546 496 enum rxrpc_conn_trace { rxrpc_conn_traces } __mode(byte); 547 497 enum rxrpc_local_trace { rxrpc_local_traces } __mode(byte); 548 498 enum rxrpc_peer_trace { rxrpc_peer_traces } __mode(byte); 499 + enum rxrpc_pmtud_reduce_trace { rxrpc_pmtud_reduce_traces } __mode(byte); 549 500 enum rxrpc_propose_ack_outcome { rxrpc_propose_ack_outcomes } __mode(byte); 550 501 enum rxrpc_propose_ack_trace { rxrpc_propose_ack_traces } __mode(byte); 551 502 enum rxrpc_receive_trace { rxrpc_receive_traces } __mode(byte); 552 503 enum rxrpc_recvmsg_trace { rxrpc_recvmsg_traces } __mode(byte); 553 504 enum rxrpc_req_ack_trace { rxrpc_req_ack_traces } __mode(byte); 505 + enum rxrpc_rotate_trace { rxrpc_rotate_traces } __mode(byte); 554 506 enum rxrpc_rtt_rx_trace { rxrpc_rtt_rx_traces } __mode(byte); 555 507 enum rxrpc_rtt_tx_trace { rxrpc_rtt_tx_traces } __mode(byte); 556 508 enum rxrpc_sack_trace { rxrpc_sack_traces } __mode(byte); 557 509 enum rxrpc_skb_trace { rxrpc_skb_traces } __mode(byte); 558 510 enum rxrpc_timer_trace { rxrpc_timer_traces } __mode(byte); 511 + enum rxrpc_tlp_ack_trace { rxrpc_tlp_ack_traces } __mode(byte); 512 + enum rxrpc_tlp_probe_trace { rxrpc_tlp_probe_traces } __mode(byte); 513 + enum rxrpc_tq_trace { rxrpc_tq_traces } __mode(byte); 559 514 enum rxrpc_tx_point { rxrpc_tx_points } __mode(byte); 560 515 enum rxrpc_txbuf_trace { rxrpc_txbuf_traces } __mode(byte); 516 + enum rxrpc_txdata_trace { rxrpc_txdata_traces } __mode(byte); 561 517 enum rxrpc_txqueue_trace { rxrpc_txqueue_traces } __mode(byte); 562 518 563 519 #endif /* end __RXRPC_DECLARE_TRACE_ENUMS_ONCE_ONLY */ ··· 581 525 582 526 rxrpc_abort_reasons; 583 527 rxrpc_bundle_traces; 528 + rxrpc_ca_states; 584 529 rxrpc_call_poke_traces; 585 530 rxrpc_call_traces; 586 531 rxrpc_client_traces; 587 532 rxrpc_congest_changes; 588 - rxrpc_congest_modes; 589 533 rxrpc_conn_traces; 590 534 rxrpc_local_traces; 535 + rxrpc_pmtud_reduce_traces; 591 536 rxrpc_propose_ack_traces; 537 + rxrpc_rack_timer_modes; 592 538 rxrpc_receive_traces; 593 539 rxrpc_recvmsg_traces; 594 540 rxrpc_req_ack_traces; 541 + rxrpc_rotate_traces; 595 542 rxrpc_rtt_rx_traces; 596 543 rxrpc_rtt_tx_traces; 597 544 rxrpc_sack_traces; 598 545 rxrpc_skb_traces; 599 546 rxrpc_timer_traces; 547 + rxrpc_tlp_ack_traces; 548 + rxrpc_tlp_probe_traces; 549 + rxrpc_tq_traces; 600 550 rxrpc_tx_points; 601 551 rxrpc_txbuf_traces; 552 + rxrpc_txdata_traces; 602 553 rxrpc_txqueue_traces; 603 554 604 555 /* ··· 642 579 __print_symbolic(__entry->op, rxrpc_local_traces), 643 580 __entry->ref, 644 581 __entry->usage) 582 + ); 583 + 584 + TRACE_EVENT(rxrpc_iothread_rx, 585 + TP_PROTO(struct rxrpc_local *local, unsigned int nr_rx), 586 + TP_ARGS(local, nr_rx), 587 + TP_STRUCT__entry( 588 + __field(unsigned int, local) 589 + __field(unsigned int, nr_rx) 590 + ), 591 + TP_fast_assign( 592 + __entry->local = local->debug_id; 593 + __entry->nr_rx = nr_rx; 594 + ), 595 + TP_printk("L=%08x nrx=%u", __entry->local, __entry->nr_rx) 645 596 ); 646 597 647 598 TRACE_EVENT(rxrpc_peer, ··· 942 865 TP_STRUCT__entry( 943 866 __field(unsigned int, call) 944 867 __field(enum rxrpc_txqueue_trace, why) 945 - __field(rxrpc_seq_t, acks_hard_ack) 946 868 __field(rxrpc_seq_t, tx_bottom) 869 + __field(rxrpc_seq_t, acks_hard_ack) 947 870 __field(rxrpc_seq_t, tx_top) 948 - __field(rxrpc_seq_t, tx_prepared) 871 + __field(rxrpc_seq_t, send_top) 949 872 __field(int, tx_winsize) 950 873 ), 951 874 952 875 TP_fast_assign( 953 876 __entry->call = call->debug_id; 954 877 __entry->why = why; 955 - __entry->acks_hard_ack = call->acks_hard_ack; 956 878 __entry->tx_bottom = call->tx_bottom; 879 + __entry->acks_hard_ack = call->acks_hard_ack; 957 880 __entry->tx_top = call->tx_top; 958 - __entry->tx_prepared = call->tx_prepared; 881 + __entry->send_top = call->send_top; 959 882 __entry->tx_winsize = call->tx_winsize; 960 883 ), 961 884 962 - TP_printk("c=%08x %s f=%08x h=%08x n=%u/%u/%u/%u", 885 + TP_printk("c=%08x %s b=%08x h=%08x n=%u/%u/%u/%u", 963 886 __entry->call, 964 887 __print_symbolic(__entry->why, rxrpc_txqueue_traces), 965 888 __entry->tx_bottom, 966 889 __entry->acks_hard_ack, 967 - __entry->tx_top - __entry->tx_bottom, 890 + __entry->acks_hard_ack - __entry->tx_bottom, 968 891 __entry->tx_top - __entry->acks_hard_ack, 969 - __entry->tx_prepared - __entry->tx_bottom, 892 + __entry->send_top - __entry->tx_top, 970 893 __entry->tx_winsize) 894 + ); 895 + 896 + TRACE_EVENT(rxrpc_transmit, 897 + TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t send_top, int space), 898 + 899 + TP_ARGS(call, send_top, space), 900 + 901 + TP_STRUCT__entry( 902 + __field(unsigned int, call) 903 + __field(rxrpc_seq_t, seq) 904 + __field(u16, space) 905 + __field(u16, tx_winsize) 906 + __field(u16, cong_cwnd) 907 + __field(u16, cong_extra) 908 + __field(u16, in_flight) 909 + __field(u16, prepared) 910 + __field(u16, pmtud_jumbo) 911 + ), 912 + 913 + TP_fast_assign( 914 + __entry->call = call->debug_id; 915 + __entry->seq = call->tx_top + 1; 916 + __entry->space = space; 917 + __entry->tx_winsize = call->tx_winsize; 918 + __entry->cong_cwnd = call->cong_cwnd; 919 + __entry->cong_extra = call->cong_extra; 920 + __entry->prepared = send_top - call->tx_bottom; 921 + __entry->in_flight = call->tx_top - call->tx_bottom; 922 + __entry->pmtud_jumbo = call->peer->pmtud_jumbo; 923 + ), 924 + 925 + TP_printk("c=%08x q=%08x sp=%u tw=%u cw=%u+%u pr=%u if=%u pj=%u", 926 + __entry->call, 927 + __entry->seq, 928 + __entry->space, 929 + __entry->tx_winsize, 930 + __entry->cong_cwnd, 931 + __entry->cong_extra, 932 + __entry->prepared, 933 + __entry->in_flight, 934 + __entry->pmtud_jumbo) 935 + ); 936 + 937 + TRACE_EVENT(rxrpc_tx_rotate, 938 + TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, rxrpc_seq_t to), 939 + 940 + TP_ARGS(call, seq, to), 941 + 942 + TP_STRUCT__entry( 943 + __field(unsigned int, call) 944 + __field(rxrpc_seq_t, seq) 945 + __field(rxrpc_seq_t, to) 946 + __field(rxrpc_seq_t, top) 947 + ), 948 + 949 + TP_fast_assign( 950 + __entry->call = call->debug_id; 951 + __entry->seq = seq; 952 + __entry->to = to; 953 + __entry->top = call->tx_top; 954 + ), 955 + 956 + TP_printk("c=%08x q=%08x-%08x-%08x", 957 + __entry->call, 958 + __entry->seq, 959 + __entry->to, 960 + __entry->top) 971 961 ); 972 962 973 963 TRACE_EVENT(rxrpc_rx_data, ··· 1065 921 ); 1066 922 1067 923 TRACE_EVENT(rxrpc_rx_ack, 1068 - TP_PROTO(struct rxrpc_call *call, 1069 - rxrpc_serial_t serial, rxrpc_serial_t ack_serial, 1070 - rxrpc_seq_t first, rxrpc_seq_t prev, u8 reason, u8 n_acks), 924 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_skb_priv *sp), 1071 925 1072 - TP_ARGS(call, serial, ack_serial, first, prev, reason, n_acks), 926 + TP_ARGS(call, sp), 1073 927 1074 928 TP_STRUCT__entry( 1075 929 __field(unsigned int, call) ··· 1077 935 __field(rxrpc_seq_t, prev) 1078 936 __field(u8, reason) 1079 937 __field(u8, n_acks) 938 + __field(u8, user_status) 1080 939 ), 1081 940 1082 941 TP_fast_assign( 1083 - __entry->call = call->debug_id; 1084 - __entry->serial = serial; 1085 - __entry->ack_serial = ack_serial; 1086 - __entry->first = first; 1087 - __entry->prev = prev; 1088 - __entry->reason = reason; 1089 - __entry->n_acks = n_acks; 942 + __entry->call = call->debug_id; 943 + __entry->serial = sp->hdr.serial; 944 + __entry->user_status = sp->hdr.userStatus; 945 + __entry->ack_serial = sp->ack.acked_serial; 946 + __entry->first = sp->ack.first_ack; 947 + __entry->prev = sp->ack.prev_ack; 948 + __entry->reason = sp->ack.reason; 949 + __entry->n_acks = sp->ack.nr_acks; 1090 950 ), 1091 951 1092 - TP_printk("c=%08x %08x %s r=%08x f=%08x p=%08x n=%u", 952 + TP_printk("c=%08x %08x %s r=%08x us=%02x f=%08x p=%08x n=%u", 1093 953 __entry->call, 1094 954 __entry->serial, 1095 955 __print_symbolic(__entry->reason, rxrpc_ack_names), 1096 956 __entry->ack_serial, 957 + __entry->user_status, 1097 958 __entry->first, 1098 959 __entry->prev, 1099 960 __entry->n_acks) ··· 1122 977 1123 978 TP_printk("c=%08x ABORT %08x ac=%d", 1124 979 __entry->call, 980 + __entry->serial, 981 + __entry->abort_code) 982 + ); 983 + 984 + TRACE_EVENT(rxrpc_rx_conn_abort, 985 + TP_PROTO(const struct rxrpc_connection *conn, const struct sk_buff *skb), 986 + 987 + TP_ARGS(conn, skb), 988 + 989 + TP_STRUCT__entry( 990 + __field(unsigned int, conn) 991 + __field(rxrpc_serial_t, serial) 992 + __field(u32, abort_code) 993 + ), 994 + 995 + TP_fast_assign( 996 + __entry->conn = conn->debug_id; 997 + __entry->serial = rxrpc_skb(skb)->hdr.serial; 998 + __entry->abort_code = skb->priority; 999 + ), 1000 + 1001 + TP_printk("C=%08x ABORT %08x ac=%d", 1002 + __entry->conn, 1125 1003 __entry->serial, 1126 1004 __entry->abort_code) 1127 1005 ); ··· 1270 1102 1271 1103 TRACE_EVENT(rxrpc_tx_data, 1272 1104 TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, 1273 - rxrpc_serial_t serial, unsigned int flags, bool lose), 1105 + rxrpc_serial_t serial, unsigned int flags, 1106 + enum rxrpc_txdata_trace trace), 1274 1107 1275 - TP_ARGS(call, seq, serial, flags, lose), 1108 + TP_ARGS(call, seq, serial, flags, trace), 1276 1109 1277 1110 TP_STRUCT__entry( 1278 1111 __field(unsigned int, call) ··· 1282 1113 __field(u32, cid) 1283 1114 __field(u32, call_id) 1284 1115 __field(u16, flags) 1285 - __field(bool, lose) 1116 + __field(enum rxrpc_txdata_trace, trace) 1286 1117 ), 1287 1118 1288 1119 TP_fast_assign( ··· 1292 1123 __entry->seq = seq; 1293 1124 __entry->serial = serial; 1294 1125 __entry->flags = flags; 1295 - __entry->lose = lose; 1126 + __entry->trace = trace; 1296 1127 ), 1297 1128 1298 - TP_printk("c=%08x DATA %08x:%08x %08x q=%08x fl=%02x%s%s", 1129 + TP_printk("c=%08x DATA %08x:%08x %08x q=%08x fl=%02x%s", 1299 1130 __entry->call, 1300 1131 __entry->cid, 1301 1132 __entry->call_id, 1302 1133 __entry->serial, 1303 1134 __entry->seq, 1304 1135 __entry->flags & RXRPC_TXBUF_WIRE_FLAGS, 1305 - __entry->flags & RXRPC_TXBUF_RESENT ? " *RETRANS*" : "", 1306 - __entry->lose ? " *LOSE*" : "") 1136 + __print_symbolic(__entry->trace, rxrpc_txdata_traces)) 1307 1137 ); 1308 1138 1309 1139 TRACE_EVENT(rxrpc_tx_ack, 1310 1140 TP_PROTO(unsigned int call, rxrpc_serial_t serial, 1311 1141 rxrpc_seq_t ack_first, rxrpc_serial_t ack_serial, 1312 - u8 reason, u8 n_acks, u16 rwind), 1142 + u8 reason, u8 n_acks, u16 rwind, 1143 + enum rxrpc_propose_ack_trace trace), 1313 1144 1314 - TP_ARGS(call, serial, ack_first, ack_serial, reason, n_acks, rwind), 1145 + TP_ARGS(call, serial, ack_first, ack_serial, reason, n_acks, rwind, trace), 1315 1146 1316 1147 TP_STRUCT__entry( 1317 1148 __field(unsigned int, call) ··· 1321 1152 __field(u8, reason) 1322 1153 __field(u8, n_acks) 1323 1154 __field(u16, rwind) 1155 + __field(enum rxrpc_propose_ack_trace, trace) 1324 1156 ), 1325 1157 1326 1158 TP_fast_assign( ··· 1332 1162 __entry->reason = reason; 1333 1163 __entry->n_acks = n_acks; 1334 1164 __entry->rwind = rwind; 1165 + __entry->trace = trace; 1335 1166 ), 1336 1167 1337 - TP_printk(" c=%08x ACK %08x %s f=%08x r=%08x n=%u rw=%u", 1168 + TP_printk(" c=%08x ACK %08x %s f=%08x r=%08x n=%u rw=%u %s", 1338 1169 __entry->call, 1339 1170 __entry->serial, 1340 1171 __print_symbolic(__entry->reason, rxrpc_ack_names), 1341 1172 __entry->ack_first, 1342 1173 __entry->ack_serial, 1343 1174 __entry->n_acks, 1344 - __entry->rwind) 1175 + __entry->rwind, 1176 + __print_symbolic(__entry->trace, rxrpc_propose_ack_traces)) 1345 1177 ); 1346 1178 1347 1179 TRACE_EVENT(rxrpc_receive, ··· 1468 1296 TP_PROTO(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, 1469 1297 int slot, 1470 1298 rxrpc_serial_t send_serial, rxrpc_serial_t resp_serial, 1471 - u32 rtt, u32 rto), 1299 + u32 rtt, u32 srtt, u32 rto), 1472 1300 1473 - TP_ARGS(call, why, slot, send_serial, resp_serial, rtt, rto), 1301 + TP_ARGS(call, why, slot, send_serial, resp_serial, rtt, srtt, rto), 1474 1302 1475 1303 TP_STRUCT__entry( 1476 1304 __field(unsigned int, call) ··· 1479 1307 __field(rxrpc_serial_t, send_serial) 1480 1308 __field(rxrpc_serial_t, resp_serial) 1481 1309 __field(u32, rtt) 1310 + __field(u32, srtt) 1482 1311 __field(u32, rto) 1312 + __field(u32, min_rtt) 1483 1313 ), 1484 1314 1485 1315 TP_fast_assign( ··· 1491 1317 __entry->send_serial = send_serial; 1492 1318 __entry->resp_serial = resp_serial; 1493 1319 __entry->rtt = rtt; 1320 + __entry->srtt = srtt; 1494 1321 __entry->rto = rto; 1322 + __entry->min_rtt = minmax_get(&call->min_rtt) 1495 1323 ), 1496 1324 1497 - TP_printk("c=%08x [%d] %s sr=%08x rr=%08x rtt=%u rto=%u", 1325 + TP_printk("c=%08x [%d] %s sr=%08x rr=%08x rtt=%u srtt=%u rto=%u min=%u", 1498 1326 __entry->call, 1499 1327 __entry->slot, 1500 1328 __print_symbolic(__entry->why, rxrpc_rtt_rx_traces), 1501 1329 __entry->send_serial, 1502 1330 __entry->resp_serial, 1503 1331 __entry->rtt, 1504 - __entry->rto) 1332 + __entry->srtt / 8, 1333 + __entry->rto, 1334 + __entry->min_rtt) 1505 1335 ); 1506 1336 1507 1337 TRACE_EVENT(rxrpc_timer_set, ··· 1722 1544 ); 1723 1545 1724 1546 TRACE_EVENT(rxrpc_retransmit, 1725 - TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, 1726 - rxrpc_serial_t serial, ktime_t expiry), 1547 + TP_PROTO(struct rxrpc_call *call, 1548 + struct rxrpc_send_data_req *req, 1549 + struct rxrpc_txbuf *txb), 1727 1550 1728 - TP_ARGS(call, seq, serial, expiry), 1551 + TP_ARGS(call, req, txb), 1729 1552 1730 1553 TP_STRUCT__entry( 1731 1554 __field(unsigned int, call) 1555 + __field(unsigned int, qbase) 1732 1556 __field(rxrpc_seq_t, seq) 1733 1557 __field(rxrpc_serial_t, serial) 1734 - __field(ktime_t, expiry) 1735 1558 ), 1736 1559 1737 1560 TP_fast_assign( 1738 1561 __entry->call = call->debug_id; 1739 - __entry->seq = seq; 1740 - __entry->serial = serial; 1741 - __entry->expiry = expiry; 1562 + __entry->qbase = req->tq->qbase; 1563 + __entry->seq = req->seq; 1564 + __entry->serial = txb->serial; 1742 1565 ), 1743 1566 1744 - TP_printk("c=%08x q=%x r=%x xp=%lld", 1567 + TP_printk("c=%08x tq=%x q=%x r=%x", 1745 1568 __entry->call, 1569 + __entry->qbase, 1746 1570 __entry->seq, 1747 - __entry->serial, 1748 - ktime_to_us(__entry->expiry)) 1571 + __entry->serial) 1749 1572 ); 1750 1573 1751 1574 TRACE_EVENT(rxrpc_congest, 1752 - TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, 1753 - rxrpc_serial_t ack_serial, enum rxrpc_congest_change change), 1575 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary), 1754 1576 1755 - TP_ARGS(call, summary, ack_serial, change), 1577 + TP_ARGS(call, summary), 1756 1578 1757 1579 TP_STRUCT__entry( 1758 1580 __field(unsigned int, call) 1759 - __field(enum rxrpc_congest_change, change) 1581 + __field(enum rxrpc_ca_state, ca_state) 1760 1582 __field(rxrpc_seq_t, hard_ack) 1761 1583 __field(rxrpc_seq_t, top) 1762 1584 __field(rxrpc_seq_t, lowest_nak) 1763 - __field(rxrpc_serial_t, ack_serial) 1585 + __field(u16, nr_sacks) 1586 + __field(u16, nr_snacks) 1587 + __field(u16, cwnd) 1588 + __field(u16, ssthresh) 1589 + __field(u16, cumul_acks) 1590 + __field(u16, dup_acks) 1764 1591 __field_struct(struct rxrpc_ack_summary, sum) 1765 1592 ), 1766 1593 1767 1594 TP_fast_assign( 1768 1595 __entry->call = call->debug_id; 1769 - __entry->change = change; 1596 + __entry->ca_state = call->cong_ca_state; 1770 1597 __entry->hard_ack = call->acks_hard_ack; 1771 1598 __entry->top = call->tx_top; 1772 1599 __entry->lowest_nak = call->acks_lowest_nak; 1773 - __entry->ack_serial = ack_serial; 1600 + __entry->nr_sacks = call->acks_nr_sacks; 1601 + __entry->nr_snacks = call->acks_nr_snacks; 1602 + __entry->cwnd = call->cong_cwnd; 1603 + __entry->ssthresh = call->cong_ssthresh; 1604 + __entry->cumul_acks = call->cong_cumul_acks; 1605 + __entry->dup_acks = call->cong_dup_acks; 1774 1606 memcpy(&__entry->sum, summary, sizeof(__entry->sum)); 1775 1607 ), 1776 1608 1777 - TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u nA=%u,%u+%u,%u b=%u u=%u d=%u l=%x%s%s%s", 1609 + TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u A=%u+%u/%u+%u r=%u b=%u u=%u d=%u l=%x%s%s%s", 1778 1610 __entry->call, 1779 - __entry->ack_serial, 1611 + __entry->sum.acked_serial, 1780 1612 __print_symbolic(__entry->sum.ack_reason, rxrpc_ack_names), 1781 1613 __entry->hard_ack, 1782 - __print_symbolic(__entry->sum.mode, rxrpc_congest_modes), 1783 - __entry->sum.cwnd, 1784 - __entry->sum.ssthresh, 1785 - __entry->sum.nr_acks, __entry->sum.nr_retained_nacks, 1786 - __entry->sum.nr_new_acks, 1787 - __entry->sum.nr_new_nacks, 1614 + __print_symbolic(__entry->ca_state, rxrpc_ca_states), 1615 + __entry->cwnd, 1616 + __entry->ssthresh, 1617 + __entry->nr_sacks, __entry->sum.nr_new_sacks, 1618 + __entry->nr_snacks, __entry->sum.nr_new_snacks, 1619 + __entry->sum.nr_new_hacks, 1788 1620 __entry->top - __entry->hard_ack, 1789 - __entry->sum.cumulative_acks, 1790 - __entry->sum.dup_acks, 1791 - __entry->lowest_nak, __entry->sum.new_low_nack ? "!" : "", 1792 - __print_symbolic(__entry->change, rxrpc_congest_changes), 1621 + __entry->cumul_acks, 1622 + __entry->dup_acks, 1623 + __entry->lowest_nak, __entry->sum.new_low_snack ? "!" : "", 1624 + __print_symbolic(__entry->sum.change, rxrpc_congest_changes), 1793 1625 __entry->sum.retrans_timeo ? " rTxTo" : "") 1794 1626 ); 1795 1627 1796 1628 TRACE_EVENT(rxrpc_reset_cwnd, 1797 - TP_PROTO(struct rxrpc_call *call, ktime_t now), 1629 + TP_PROTO(struct rxrpc_call *call, ktime_t since_last_tx, ktime_t rtt), 1798 1630 1799 - TP_ARGS(call, now), 1631 + TP_ARGS(call, since_last_tx, rtt), 1800 1632 1801 1633 TP_STRUCT__entry( 1802 1634 __field(unsigned int, call) 1803 - __field(enum rxrpc_congest_mode, mode) 1635 + __field(enum rxrpc_ca_state, ca_state) 1804 1636 __field(unsigned short, cwnd) 1805 1637 __field(unsigned short, extra) 1806 1638 __field(rxrpc_seq_t, hard_ack) 1807 1639 __field(rxrpc_seq_t, prepared) 1808 1640 __field(ktime_t, since_last_tx) 1641 + __field(ktime_t, rtt) 1809 1642 __field(bool, has_data) 1810 1643 ), 1811 1644 1812 1645 TP_fast_assign( 1813 1646 __entry->call = call->debug_id; 1814 - __entry->mode = call->cong_mode; 1647 + __entry->ca_state = call->cong_ca_state; 1815 1648 __entry->cwnd = call->cong_cwnd; 1816 1649 __entry->extra = call->cong_extra; 1817 1650 __entry->hard_ack = call->acks_hard_ack; 1818 - __entry->prepared = call->tx_prepared - call->tx_bottom; 1819 - __entry->since_last_tx = ktime_sub(now, call->tx_last_sent); 1820 - __entry->has_data = !list_empty(&call->tx_sendmsg); 1651 + __entry->prepared = call->send_top - call->tx_bottom; 1652 + __entry->since_last_tx = since_last_tx; 1653 + __entry->rtt = rtt; 1654 + __entry->has_data = call->tx_bottom != call->tx_top; 1821 1655 ), 1822 1656 1823 - TP_printk("c=%08x q=%08x %s cw=%u+%u pr=%u tm=%llu d=%u", 1657 + TP_printk("c=%08x q=%08x %s cw=%u+%u pr=%u tm=%llu/%llu d=%u", 1824 1658 __entry->call, 1825 1659 __entry->hard_ack, 1826 - __print_symbolic(__entry->mode, rxrpc_congest_modes), 1660 + __print_symbolic(__entry->ca_state, rxrpc_ca_states), 1827 1661 __entry->cwnd, 1828 1662 __entry->extra, 1829 1663 __entry->prepared, 1830 - ktime_to_ns(__entry->since_last_tx), 1664 + ktime_to_us(__entry->since_last_tx), 1665 + ktime_to_us(__entry->rtt), 1831 1666 __entry->has_data) 1832 1667 ); 1833 1668 ··· 1913 1722 &__entry->srx.transport) 1914 1723 ); 1915 1724 1916 - TRACE_EVENT(rxrpc_resend, 1917 - TP_PROTO(struct rxrpc_call *call, struct sk_buff *ack), 1725 + TRACE_EVENT(rxrpc_apply_acks, 1726 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq), 1918 1727 1919 - TP_ARGS(call, ack), 1728 + TP_ARGS(call, tq), 1729 + 1730 + TP_STRUCT__entry( 1731 + __field(unsigned int, call) 1732 + __field(unsigned int, nr_rep) 1733 + __field(rxrpc_seq_t, qbase) 1734 + __field(unsigned long, acks) 1735 + ), 1736 + 1737 + TP_fast_assign( 1738 + __entry->call = call->debug_id; 1739 + __entry->qbase = tq->qbase; 1740 + __entry->acks = tq->segment_acked; 1741 + __entry->nr_rep = tq->nr_reported_acks; 1742 + ), 1743 + 1744 + TP_printk("c=%08x tq=%x acks=%016lx rep=%u", 1745 + __entry->call, 1746 + __entry->qbase, 1747 + __entry->acks, 1748 + __entry->nr_rep) 1749 + ); 1750 + 1751 + TRACE_EVENT(rxrpc_resend, 1752 + TP_PROTO(struct rxrpc_call *call, rxrpc_serial_t ack_serial), 1753 + 1754 + TP_ARGS(call, ack_serial), 1920 1755 1921 1756 TP_STRUCT__entry( 1922 1757 __field(unsigned int, call) ··· 1952 1735 ), 1953 1736 1954 1737 TP_fast_assign( 1955 - struct rxrpc_skb_priv *sp = ack ? rxrpc_skb(ack) : NULL; 1956 1738 __entry->call = call->debug_id; 1957 1739 __entry->seq = call->acks_hard_ack; 1958 1740 __entry->transmitted = call->tx_transmitted; 1959 - __entry->ack_serial = sp ? sp->hdr.serial : 0; 1741 + __entry->ack_serial = ack_serial; 1960 1742 ), 1961 1743 1962 1744 TP_printk("c=%08x r=%x q=%x tq=%x", ··· 1963 1747 __entry->ack_serial, 1964 1748 __entry->seq, 1965 1749 __entry->transmitted) 1750 + ); 1751 + 1752 + TRACE_EVENT(rxrpc_resend_lost, 1753 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq, unsigned long lost), 1754 + 1755 + TP_ARGS(call, tq, lost), 1756 + 1757 + TP_STRUCT__entry( 1758 + __field(unsigned int, call) 1759 + __field(rxrpc_seq_t, qbase) 1760 + __field(u8, nr_rep) 1761 + __field(unsigned long, lost) 1762 + ), 1763 + 1764 + TP_fast_assign( 1765 + __entry->call = call->debug_id; 1766 + __entry->qbase = tq->qbase; 1767 + __entry->nr_rep = tq->nr_reported_acks; 1768 + __entry->lost = lost; 1769 + ), 1770 + 1771 + TP_printk("c=%08x tq=%x lost=%016lx nr=%u", 1772 + __entry->call, 1773 + __entry->qbase, 1774 + __entry->lost, 1775 + __entry->nr_rep) 1776 + ); 1777 + 1778 + TRACE_EVENT(rxrpc_rotate, 1779 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq, 1780 + struct rxrpc_ack_summary *summary, rxrpc_seq_t seq, 1781 + enum rxrpc_rotate_trace trace), 1782 + 1783 + TP_ARGS(call, tq, summary, seq, trace), 1784 + 1785 + TP_STRUCT__entry( 1786 + __field(unsigned int, call) 1787 + __field(rxrpc_seq_t, qbase) 1788 + __field(rxrpc_seq_t, seq) 1789 + __field(unsigned int, nr_rep) 1790 + __field(enum rxrpc_rotate_trace, trace) 1791 + ), 1792 + 1793 + TP_fast_assign( 1794 + __entry->call = call->debug_id; 1795 + __entry->qbase = tq->qbase; 1796 + __entry->seq = seq; 1797 + __entry->nr_rep = tq->nr_reported_acks; 1798 + __entry->trace = trace; 1799 + ), 1800 + 1801 + TP_printk("c=%08x tq=%x q=%x nr=%x %s", 1802 + __entry->call, 1803 + __entry->qbase, 1804 + __entry->seq, 1805 + __entry->nr_rep, 1806 + __print_symbolic(__entry->trace, rxrpc_rotate_traces)) 1966 1807 ); 1967 1808 1968 1809 TRACE_EVENT(rxrpc_rx_icmp, ··· 2131 1858 ); 2132 1859 2133 1860 TRACE_EVENT(rxrpc_rx_discard_ack, 2134 - TP_PROTO(unsigned int debug_id, rxrpc_serial_t serial, 2135 - rxrpc_seq_t first_soft_ack, rxrpc_seq_t call_ackr_first, 2136 - rxrpc_seq_t prev_pkt, rxrpc_seq_t call_ackr_prev), 1861 + TP_PROTO(struct rxrpc_call *call, rxrpc_serial_t serial, 1862 + rxrpc_seq_t hard_ack, rxrpc_seq_t prev_pkt), 2137 1863 2138 - TP_ARGS(debug_id, serial, first_soft_ack, call_ackr_first, 2139 - prev_pkt, call_ackr_prev), 1864 + TP_ARGS(call, serial, hard_ack, prev_pkt), 2140 1865 2141 1866 TP_STRUCT__entry( 2142 1867 __field(unsigned int, debug_id) 2143 1868 __field(rxrpc_serial_t, serial) 2144 - __field(rxrpc_seq_t, first_soft_ack) 2145 - __field(rxrpc_seq_t, call_ackr_first) 1869 + __field(rxrpc_seq_t, hard_ack) 2146 1870 __field(rxrpc_seq_t, prev_pkt) 2147 - __field(rxrpc_seq_t, call_ackr_prev) 1871 + __field(rxrpc_seq_t, acks_hard_ack) 1872 + __field(rxrpc_seq_t, acks_prev_seq) 2148 1873 ), 2149 1874 2150 1875 TP_fast_assign( 2151 - __entry->debug_id = debug_id; 1876 + __entry->debug_id = call->debug_id; 2152 1877 __entry->serial = serial; 2153 - __entry->first_soft_ack = first_soft_ack; 2154 - __entry->call_ackr_first = call_ackr_first; 1878 + __entry->hard_ack = hard_ack; 2155 1879 __entry->prev_pkt = prev_pkt; 2156 - __entry->call_ackr_prev = call_ackr_prev; 1880 + __entry->acks_hard_ack = call->acks_hard_ack; 1881 + __entry->acks_prev_seq = call->acks_prev_seq; 2157 1882 ), 2158 1883 2159 1884 TP_printk("c=%08x r=%08x %08x<%08x %08x<%08x", 2160 1885 __entry->debug_id, 2161 1886 __entry->serial, 2162 - __entry->first_soft_ack, 2163 - __entry->call_ackr_first, 1887 + __entry->hard_ack, 1888 + __entry->acks_hard_ack, 2164 1889 __entry->prev_pkt, 2165 - __entry->call_ackr_prev) 1890 + __entry->acks_prev_seq) 2166 1891 ); 2167 1892 2168 1893 TRACE_EVENT(rxrpc_req_ack, ··· 2216 1945 __entry->seq, 2217 1946 __print_symbolic(__entry->what, rxrpc_txbuf_traces), 2218 1947 __entry->ref) 1948 + ); 1949 + 1950 + TRACE_EVENT(rxrpc_tq, 1951 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_txqueue *tq, 1952 + rxrpc_seq_t seq, enum rxrpc_tq_trace trace), 1953 + 1954 + TP_ARGS(call, tq, seq, trace), 1955 + 1956 + TP_STRUCT__entry( 1957 + __field(unsigned int, call_debug_id) 1958 + __field(rxrpc_seq_t, qbase) 1959 + __field(rxrpc_seq_t, seq) 1960 + __field(enum rxrpc_tq_trace, trace) 1961 + ), 1962 + 1963 + TP_fast_assign( 1964 + __entry->call_debug_id = call->debug_id; 1965 + __entry->qbase = tq ? tq->qbase : call->tx_qbase; 1966 + __entry->seq = seq; 1967 + __entry->trace = trace; 1968 + ), 1969 + 1970 + TP_printk("c=%08x bq=%08x q=%08x %s", 1971 + __entry->call_debug_id, 1972 + __entry->qbase, 1973 + __entry->seq, 1974 + __print_symbolic(__entry->trace, rxrpc_tq_traces)) 2219 1975 ); 2220 1976 2221 1977 TRACE_EVENT(rxrpc_poke_call, ··· 2311 2013 __entry->seq, 2312 2014 __print_symbolic(__entry->what, rxrpc_sack_traces), 2313 2015 __entry->sack) 2016 + ); 2017 + 2018 + TRACE_EVENT(rxrpc_pmtud_tx, 2019 + TP_PROTO(struct rxrpc_call *call), 2020 + 2021 + TP_ARGS(call), 2022 + 2023 + TP_STRUCT__entry( 2024 + __field(unsigned int, peer_debug_id) 2025 + __field(unsigned int, call_debug_id) 2026 + __field(rxrpc_serial_t, ping_serial) 2027 + __field(unsigned short, pmtud_trial) 2028 + __field(unsigned short, pmtud_good) 2029 + __field(unsigned short, pmtud_bad) 2030 + ), 2031 + 2032 + TP_fast_assign( 2033 + __entry->peer_debug_id = call->peer->debug_id; 2034 + __entry->call_debug_id = call->debug_id; 2035 + __entry->ping_serial = call->conn->pmtud_probe; 2036 + __entry->pmtud_trial = call->peer->pmtud_trial; 2037 + __entry->pmtud_good = call->peer->pmtud_good; 2038 + __entry->pmtud_bad = call->peer->pmtud_bad; 2039 + ), 2040 + 2041 + TP_printk("P=%08x c=%08x pr=%08x %u-%u-%u", 2042 + __entry->peer_debug_id, 2043 + __entry->call_debug_id, 2044 + __entry->ping_serial, 2045 + __entry->pmtud_good, 2046 + __entry->pmtud_trial, 2047 + __entry->pmtud_bad) 2048 + ); 2049 + 2050 + TRACE_EVENT(rxrpc_pmtud_rx, 2051 + TP_PROTO(struct rxrpc_connection *conn, rxrpc_serial_t resp_serial), 2052 + 2053 + TP_ARGS(conn, resp_serial), 2054 + 2055 + TP_STRUCT__entry( 2056 + __field(unsigned int, peer_debug_id) 2057 + __field(unsigned int, call_debug_id) 2058 + __field(rxrpc_serial_t, ping_serial) 2059 + __field(rxrpc_serial_t, resp_serial) 2060 + __field(unsigned short, max_data) 2061 + __field(u8, jumbo_max) 2062 + ), 2063 + 2064 + TP_fast_assign( 2065 + __entry->peer_debug_id = conn->peer->debug_id; 2066 + __entry->call_debug_id = conn->pmtud_call; 2067 + __entry->ping_serial = conn->pmtud_probe; 2068 + __entry->resp_serial = resp_serial; 2069 + __entry->max_data = conn->peer->max_data; 2070 + __entry->jumbo_max = conn->peer->pmtud_jumbo; 2071 + ), 2072 + 2073 + TP_printk("P=%08x c=%08x pr=%08x rr=%08x max=%u jm=%u", 2074 + __entry->peer_debug_id, 2075 + __entry->call_debug_id, 2076 + __entry->ping_serial, 2077 + __entry->resp_serial, 2078 + __entry->max_data, 2079 + __entry->jumbo_max) 2080 + ); 2081 + 2082 + TRACE_EVENT(rxrpc_pmtud_lost, 2083 + TP_PROTO(struct rxrpc_connection *conn, rxrpc_serial_t resp_serial), 2084 + 2085 + TP_ARGS(conn, resp_serial), 2086 + 2087 + TP_STRUCT__entry( 2088 + __field(unsigned int, peer_debug_id) 2089 + __field(unsigned int, call_debug_id) 2090 + __field(rxrpc_serial_t, ping_serial) 2091 + __field(rxrpc_serial_t, resp_serial) 2092 + ), 2093 + 2094 + TP_fast_assign( 2095 + __entry->peer_debug_id = conn->peer->debug_id; 2096 + __entry->call_debug_id = conn->pmtud_call; 2097 + __entry->ping_serial = conn->pmtud_probe; 2098 + __entry->resp_serial = resp_serial; 2099 + ), 2100 + 2101 + TP_printk("P=%08x c=%08x pr=%08x rr=%08x", 2102 + __entry->peer_debug_id, 2103 + __entry->call_debug_id, 2104 + __entry->ping_serial, 2105 + __entry->resp_serial) 2106 + ); 2107 + 2108 + TRACE_EVENT(rxrpc_pmtud_reduce, 2109 + TP_PROTO(struct rxrpc_peer *peer, rxrpc_serial_t serial, 2110 + unsigned int max_data, enum rxrpc_pmtud_reduce_trace reason), 2111 + 2112 + TP_ARGS(peer, serial, max_data, reason), 2113 + 2114 + TP_STRUCT__entry( 2115 + __field(unsigned int, peer_debug_id) 2116 + __field(rxrpc_serial_t, serial) 2117 + __field(unsigned int, max_data) 2118 + __field(enum rxrpc_pmtud_reduce_trace, reason) 2119 + ), 2120 + 2121 + TP_fast_assign( 2122 + __entry->peer_debug_id = peer->debug_id; 2123 + __entry->serial = serial; 2124 + __entry->max_data = max_data; 2125 + __entry->reason = reason; 2126 + ), 2127 + 2128 + TP_printk("P=%08x %s r=%08x m=%u", 2129 + __entry->peer_debug_id, 2130 + __print_symbolic(__entry->reason, rxrpc_pmtud_reduce_traces), 2131 + __entry->serial, __entry->max_data) 2132 + ); 2133 + 2134 + TRACE_EVENT(rxrpc_rack, 2135 + TP_PROTO(struct rxrpc_call *call, ktime_t timo), 2136 + 2137 + TP_ARGS(call, timo), 2138 + 2139 + TP_STRUCT__entry( 2140 + __field(unsigned int, call) 2141 + __field(rxrpc_serial_t, ack_serial) 2142 + __field(rxrpc_seq_t, seq) 2143 + __field(enum rxrpc_rack_timer_mode, mode) 2144 + __field(unsigned short, nr_sent) 2145 + __field(unsigned short, nr_lost) 2146 + __field(unsigned short, nr_resent) 2147 + __field(unsigned short, nr_sacked) 2148 + __field(ktime_t, timo) 2149 + ), 2150 + 2151 + TP_fast_assign( 2152 + __entry->call = call->debug_id; 2153 + __entry->ack_serial = call->rx_serial; 2154 + __entry->seq = call->rack_end_seq; 2155 + __entry->mode = call->rack_timer_mode; 2156 + __entry->nr_sent = call->tx_nr_sent; 2157 + __entry->nr_lost = call->tx_nr_lost; 2158 + __entry->nr_resent = call->tx_nr_resent; 2159 + __entry->nr_sacked = call->acks_nr_sacks; 2160 + __entry->timo = timo; 2161 + ), 2162 + 2163 + TP_printk("c=%08x r=%08x q=%08x %s slrs=%u,%u,%u,%u t=%lld", 2164 + __entry->call, __entry->ack_serial, __entry->seq, 2165 + __print_symbolic(__entry->mode, rxrpc_rack_timer_modes), 2166 + __entry->nr_sent, __entry->nr_lost, 2167 + __entry->nr_resent, __entry->nr_sacked, 2168 + ktime_to_us(__entry->timo)) 2169 + ); 2170 + 2171 + TRACE_EVENT(rxrpc_rack_update, 2172 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary), 2173 + 2174 + TP_ARGS(call, summary), 2175 + 2176 + TP_STRUCT__entry( 2177 + __field(unsigned int, call) 2178 + __field(rxrpc_serial_t, ack_serial) 2179 + __field(rxrpc_seq_t, seq) 2180 + __field(int, xmit_ts) 2181 + ), 2182 + 2183 + TP_fast_assign( 2184 + __entry->call = call->debug_id; 2185 + __entry->ack_serial = call->rx_serial; 2186 + __entry->seq = call->rack_end_seq; 2187 + __entry->xmit_ts = ktime_sub(call->acks_latest_ts, call->rack_xmit_ts); 2188 + ), 2189 + 2190 + TP_printk("c=%08x r=%08x q=%08x xt=%lld", 2191 + __entry->call, __entry->ack_serial, __entry->seq, 2192 + ktime_to_us(__entry->xmit_ts)) 2193 + ); 2194 + 2195 + TRACE_EVENT(rxrpc_rack_scan_loss, 2196 + TP_PROTO(struct rxrpc_call *call), 2197 + 2198 + TP_ARGS(call), 2199 + 2200 + TP_STRUCT__entry( 2201 + __field(unsigned int, call) 2202 + __field(ktime_t, rack_rtt) 2203 + __field(ktime_t, rack_reo_wnd) 2204 + ), 2205 + 2206 + TP_fast_assign( 2207 + __entry->call = call->debug_id; 2208 + __entry->rack_rtt = call->rack_rtt; 2209 + __entry->rack_reo_wnd = call->rack_reo_wnd; 2210 + ), 2211 + 2212 + TP_printk("c=%08x rtt=%lld reow=%lld", 2213 + __entry->call, ktime_to_us(__entry->rack_rtt), 2214 + ktime_to_us(__entry->rack_reo_wnd)) 2215 + ); 2216 + 2217 + TRACE_EVENT(rxrpc_rack_scan_loss_tq, 2218 + TP_PROTO(struct rxrpc_call *call, const struct rxrpc_txqueue *tq, 2219 + unsigned long nacks), 2220 + 2221 + TP_ARGS(call, tq, nacks), 2222 + 2223 + TP_STRUCT__entry( 2224 + __field(unsigned int, call) 2225 + __field(rxrpc_seq_t, qbase) 2226 + __field(unsigned long, nacks) 2227 + __field(unsigned long, lost) 2228 + __field(unsigned long, retrans) 2229 + ), 2230 + 2231 + TP_fast_assign( 2232 + __entry->call = call->debug_id; 2233 + __entry->qbase = tq->qbase; 2234 + __entry->nacks = nacks; 2235 + __entry->lost = tq->segment_lost; 2236 + __entry->retrans = tq->segment_retransmitted; 2237 + ), 2238 + 2239 + TP_printk("c=%08x q=%08x n=%lx l=%lx r=%lx", 2240 + __entry->call, __entry->qbase, 2241 + __entry->nacks, __entry->lost, __entry->retrans) 2242 + ); 2243 + 2244 + TRACE_EVENT(rxrpc_rack_detect_loss, 2245 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, 2246 + rxrpc_seq_t seq), 2247 + 2248 + TP_ARGS(call, summary, seq), 2249 + 2250 + TP_STRUCT__entry( 2251 + __field(unsigned int, call) 2252 + __field(rxrpc_serial_t, ack_serial) 2253 + __field(rxrpc_seq_t, seq) 2254 + ), 2255 + 2256 + TP_fast_assign( 2257 + __entry->call = call->debug_id; 2258 + __entry->ack_serial = call->rx_serial; 2259 + __entry->seq = seq; 2260 + ), 2261 + 2262 + TP_printk("c=%08x r=%08x q=%08x", 2263 + __entry->call, __entry->ack_serial, __entry->seq) 2264 + ); 2265 + 2266 + TRACE_EVENT(rxrpc_rack_mark_loss_tq, 2267 + TP_PROTO(struct rxrpc_call *call, const struct rxrpc_txqueue *tq), 2268 + 2269 + TP_ARGS(call, tq), 2270 + 2271 + TP_STRUCT__entry( 2272 + __field(unsigned int, call) 2273 + __field(rxrpc_seq_t, qbase) 2274 + __field(rxrpc_seq_t, trans) 2275 + __field(unsigned long, acked) 2276 + __field(unsigned long, lost) 2277 + __field(unsigned long, retrans) 2278 + ), 2279 + 2280 + TP_fast_assign( 2281 + __entry->call = call->debug_id; 2282 + __entry->qbase = tq->qbase; 2283 + __entry->trans = call->tx_transmitted; 2284 + __entry->acked = tq->segment_acked; 2285 + __entry->lost = tq->segment_lost; 2286 + __entry->retrans = tq->segment_retransmitted; 2287 + ), 2288 + 2289 + TP_printk("c=%08x tq=%08x txq=%08x a=%lx l=%lx r=%lx", 2290 + __entry->call, __entry->qbase, __entry->trans, 2291 + __entry->acked, __entry->lost, __entry->retrans) 2292 + ); 2293 + 2294 + TRACE_EVENT(rxrpc_tlp_probe, 2295 + TP_PROTO(struct rxrpc_call *call, enum rxrpc_tlp_probe_trace trace), 2296 + 2297 + TP_ARGS(call, trace), 2298 + 2299 + TP_STRUCT__entry( 2300 + __field(unsigned int, call) 2301 + __field(rxrpc_serial_t, serial) 2302 + __field(rxrpc_seq_t, seq) 2303 + __field(enum rxrpc_tlp_probe_trace, trace) 2304 + ), 2305 + 2306 + TP_fast_assign( 2307 + __entry->call = call->debug_id; 2308 + __entry->serial = call->tlp_serial; 2309 + __entry->seq = call->tlp_seq; 2310 + __entry->trace = trace; 2311 + ), 2312 + 2313 + TP_printk("c=%08x r=%08x pq=%08x %s", 2314 + __entry->call, __entry->serial, __entry->seq, 2315 + __print_symbolic(__entry->trace, rxrpc_tlp_probe_traces)) 2316 + ); 2317 + 2318 + TRACE_EVENT(rxrpc_tlp_ack, 2319 + TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary, 2320 + enum rxrpc_tlp_ack_trace trace), 2321 + 2322 + TP_ARGS(call, summary, trace), 2323 + 2324 + TP_STRUCT__entry( 2325 + __field(unsigned int, call) 2326 + __field(rxrpc_serial_t, serial) 2327 + __field(rxrpc_seq_t, tlp_seq) 2328 + __field(rxrpc_seq_t, hard_ack) 2329 + __field(enum rxrpc_tlp_ack_trace, trace) 2330 + ), 2331 + 2332 + TP_fast_assign( 2333 + __entry->call = call->debug_id; 2334 + __entry->serial = call->tlp_serial; 2335 + __entry->tlp_seq = call->tlp_seq; 2336 + __entry->hard_ack = call->acks_hard_ack; 2337 + __entry->trace = trace; 2338 + ), 2339 + 2340 + TP_printk("c=%08x r=%08x pq=%08x hq=%08x %s", 2341 + __entry->call, __entry->serial, 2342 + __entry->tlp_seq, __entry->hard_ack, 2343 + __print_symbolic(__entry->trace, rxrpc_tlp_ack_traces)) 2344 + ); 2345 + 2346 + TRACE_EVENT(rxrpc_rack_timer, 2347 + TP_PROTO(struct rxrpc_call *call, ktime_t delay, bool exp), 2348 + 2349 + TP_ARGS(call, delay, exp), 2350 + 2351 + TP_STRUCT__entry( 2352 + __field(unsigned int, call) 2353 + __field(bool, exp) 2354 + __field(enum rxrpc_rack_timer_mode, mode) 2355 + __field(ktime_t, delay) 2356 + ), 2357 + 2358 + TP_fast_assign( 2359 + __entry->call = call->debug_id; 2360 + __entry->exp = exp; 2361 + __entry->mode = call->rack_timer_mode; 2362 + __entry->delay = delay; 2363 + ), 2364 + 2365 + TP_printk("c=%08x %s %s to=%lld", 2366 + __entry->call, 2367 + __entry->exp ? "Exp" : "Set", 2368 + __print_symbolic(__entry->mode, rxrpc_rack_timer_modes), 2369 + ktime_to_us(__entry->delay)) 2314 2370 ); 2315 2371 2316 2372 #undef EM
+1
lib/win_minmax.c
··· 97 97 98 98 return minmax_subwin_update(m, win, &val); 99 99 } 100 + EXPORT_SYMBOL(minmax_running_min);
+1
net/rxrpc/Makefile
··· 16 16 conn_object.o \ 17 17 conn_service.o \ 18 18 input.o \ 19 + input_rack.o \ 19 20 insecure.o \ 20 21 io_thread.o \ 21 22 key.o \
+2 -2
net/rxrpc/af_rxrpc.c
··· 408 408 409 409 /* Make sure we're not going to call back into a kernel service */ 410 410 if (call->notify_rx) { 411 - spin_lock(&call->notify_lock); 411 + spin_lock_irq(&call->notify_lock); 412 412 call->notify_rx = rxrpc_dummy_notify_rx; 413 - spin_unlock(&call->notify_lock); 413 + spin_unlock_irq(&call->notify_lock); 414 414 } 415 415 } 416 416 mutex_unlock(&call->user_mutex);
+268 -73
net/rxrpc/ar-internal.h
··· 30 30 struct key_preparsed_payload; 31 31 struct rxrpc_connection; 32 32 struct rxrpc_txbuf; 33 + struct rxrpc_txqueue; 33 34 34 35 /* 35 36 * Mark applied to socket buffers in skb->mark. skb->priority is used ··· 99 98 atomic_t stat_tx_data_send; 100 99 atomic_t stat_tx_data_send_frag; 101 100 atomic_t stat_tx_data_send_fail; 101 + atomic_t stat_tx_data_send_msgsize; 102 102 atomic_t stat_tx_data_underflow; 103 103 atomic_t stat_tx_data_cwnd_reset; 104 104 atomic_t stat_rx_data; ··· 111 109 atomic_t stat_tx_ack_skip; 112 110 atomic_t stat_tx_acks[256]; 113 111 atomic_t stat_rx_acks[256]; 112 + atomic_t stat_tx_jumbo[10]; 113 + atomic_t stat_rx_jumbo[10]; 114 114 115 115 atomic_t stat_why_req_ack[8]; 116 116 ··· 214 210 rxrpc_seq_t first_ack; /* First packet in acks table */ 215 211 rxrpc_seq_t prev_ack; /* Highest seq seen */ 216 212 rxrpc_serial_t acked_serial; /* Packet in response to (or 0) */ 213 + u16 nr_acks; /* Number of acks+nacks */ 217 214 u8 reason; /* Reason for ack */ 218 - u8 nr_acks; /* Number of acks+nacks */ 219 - u8 nr_nacks; /* Number of nacks */ 220 215 } ack; 221 216 }; 222 217 struct rxrpc_host_header hdr; /* RxRPC packet header from this packet */ ··· 323 320 struct list_head new_client_calls; /* Newly created client calls need connection */ 324 321 spinlock_t client_call_lock; /* Lock for ->new_client_calls */ 325 322 struct sockaddr_rxrpc srx; /* local address */ 323 + /* Provide a kvec table sufficiently large to manage either a DATA 324 + * packet with a maximum set of jumbo subpackets or a PING ACK padded 325 + * out to 64K with zeropages for PMTUD. 326 + */ 327 + struct kvec kvec[RXRPC_MAX_NR_JUMBO > 3 + 16 ? 328 + RXRPC_MAX_NR_JUMBO : 3 + 16]; 326 329 }; 327 330 328 331 /* ··· 347 338 time64_t last_tx_at; /* Last time packet sent here */ 348 339 seqlock_t service_conn_lock; 349 340 spinlock_t lock; /* access lock */ 350 - unsigned int if_mtu; /* interface MTU for this peer */ 351 - unsigned int mtu; /* network MTU for this peer */ 352 - unsigned int maxdata; /* data size (MTU - hdrsize) */ 353 - unsigned short hdrsize; /* header size (IP + UDP + RxRPC) */ 354 341 int debug_id; /* debug ID for printks */ 355 342 struct sockaddr_rxrpc srx; /* remote address */ 356 343 357 - /* calculated RTT cache */ 358 - #define RXRPC_RTT_CACHE_SIZE 32 359 - spinlock_t rtt_input_lock; /* RTT lock for input routine */ 360 - ktime_t rtt_last_req; /* Time of last RTT request */ 361 - unsigned int rtt_count; /* Number of samples we've got */ 344 + /* Path MTU discovery [RFC8899] */ 345 + unsigned int pmtud_trial; /* Current MTU probe size */ 346 + unsigned int pmtud_good; /* Largest working MTU probe we've tried */ 347 + unsigned int pmtud_bad; /* Smallest non-working MTU probe we've tried */ 348 + bool pmtud_lost; /* T if MTU probe was lost */ 349 + bool pmtud_probing; /* T if we have an active probe outstanding */ 350 + bool pmtud_pending; /* T if a call to this peer should send a probe */ 351 + u8 pmtud_jumbo; /* Max jumbo packets for the MTU */ 352 + bool ackr_adv_pmtud; /* T if the peer advertises path-MTU */ 353 + unsigned int ackr_max_data; /* Maximum data advertised by peer */ 354 + seqcount_t mtu_lock; /* Lockless MTU access management */ 355 + unsigned int if_mtu; /* Local interface MTU (- hdrsize) for this peer */ 356 + unsigned int max_data; /* Maximum packet data capacity for this peer */ 357 + unsigned short hdrsize; /* header size (IP + UDP + RxRPC) */ 358 + unsigned short tx_seg_max; /* Maximum number of transmissable segments */ 362 359 363 - u32 srtt_us; /* smoothed round trip time << 3 in usecs */ 364 - u32 mdev_us; /* medium deviation */ 365 - u32 mdev_max_us; /* maximal mdev for the last rtt period */ 366 - u32 rttvar_us; /* smoothed mdev_max */ 367 - u32 rto_us; /* Retransmission timeout in usec */ 368 - u8 backoff; /* Backoff timeout (as shift) */ 360 + /* Calculated RTT cache */ 361 + unsigned int recent_srtt_us; 362 + unsigned int recent_rto_us; 369 363 370 364 u8 cong_ssthresh; /* Congestion slow-start threshold */ 371 365 }; ··· 537 525 int debug_id; /* debug ID for printks */ 538 526 rxrpc_serial_t tx_serial; /* Outgoing packet serial number counter */ 539 527 unsigned int hi_serial; /* highest serial number received */ 528 + rxrpc_serial_t pmtud_probe; /* Serial of MTU probe (or 0) */ 529 + unsigned int pmtud_call; /* ID of call used for probe */ 540 530 u32 service_id; /* Service ID, possibly upgraded */ 541 531 u32 security_level; /* Security level selected */ 542 532 u8 security_ix; /* security type */ ··· 613 599 /* 614 600 * Call Tx congestion management modes. 615 601 */ 616 - enum rxrpc_congest_mode { 617 - RXRPC_CALL_SLOW_START, 618 - RXRPC_CALL_CONGEST_AVOIDANCE, 619 - RXRPC_CALL_PACKET_LOSS, 620 - RXRPC_CALL_FAST_RETRANSMIT, 621 - NR__RXRPC_CONGEST_MODES 622 - }; 602 + enum rxrpc_ca_state { 603 + RXRPC_CA_SLOW_START, 604 + RXRPC_CA_CONGEST_AVOIDANCE, 605 + RXRPC_CA_PACKET_LOSS, 606 + RXRPC_CA_FAST_RETRANSMIT, 607 + NR__RXRPC_CA_STATES 608 + } __mode(byte); 609 + 610 + /* 611 + * Current purpose of call RACK timer. According to the RACK-TLP protocol 612 + * [RFC8985], the transmission timer (call->rack_timo_at) may only be used for 613 + * one of these at once. 614 + */ 615 + enum rxrpc_rack_timer_mode { 616 + RXRPC_CALL_RACKTIMER_OFF, /* Timer not running */ 617 + RXRPC_CALL_RACKTIMER_RACK_REORDER, /* RACK reordering timer */ 618 + RXRPC_CALL_RACKTIMER_TLP_PTO, /* TLP timeout */ 619 + RXRPC_CALL_RACKTIMER_RTO, /* Retransmission timeout */ 620 + } __mode(byte); 623 621 624 622 /* 625 623 * RxRPC call definition ··· 650 624 struct mutex user_mutex; /* User access mutex */ 651 625 struct sockaddr_rxrpc dest_srx; /* Destination address */ 652 626 ktime_t delay_ack_at; /* When DELAY ACK needs to happen */ 653 - ktime_t ack_lost_at; /* When ACK is figured as lost */ 654 - ktime_t resend_at; /* When next resend needs to happen */ 627 + ktime_t rack_timo_at; /* When ACK is figured as lost */ 655 628 ktime_t ping_at; /* When next to send a ping */ 656 629 ktime_t keepalive_at; /* When next to send a keepalive ping */ 657 630 ktime_t expect_rx_by; /* When we expect to get a packet by */ ··· 695 670 unsigned short rx_pkt_offset; /* Current recvmsg packet offset */ 696 671 unsigned short rx_pkt_len; /* Current recvmsg packet len */ 697 672 673 + /* Sendmsg data tracking. */ 674 + rxrpc_seq_t send_top; /* Highest Tx slot filled by sendmsg. */ 675 + struct rxrpc_txqueue *send_queue; /* Queue that sendmsg is writing into */ 676 + 698 677 /* Transmitted data tracking. */ 699 - spinlock_t tx_lock; /* Transmit queue lock */ 700 - struct list_head tx_sendmsg; /* Sendmsg prepared packets */ 701 - struct list_head tx_buffer; /* Buffer of transmissible packets */ 678 + struct rxrpc_txqueue *tx_queue; /* Start of transmission buffers */ 679 + struct rxrpc_txqueue *tx_qtail; /* End of transmission buffers */ 680 + rxrpc_seq_t tx_qbase; /* First slot in tx_queue */ 702 681 rxrpc_seq_t tx_bottom; /* First packet in buffer */ 703 682 rxrpc_seq_t tx_transmitted; /* Highest packet transmitted */ 704 - rxrpc_seq_t tx_prepared; /* Highest Tx slot prepared. */ 705 683 rxrpc_seq_t tx_top; /* Highest Tx slot allocated. */ 684 + rxrpc_serial_t tx_last_serial; /* Serial of last DATA transmitted */ 706 685 u16 tx_backoff; /* Delay to insert due to Tx failure (ms) */ 707 - u8 tx_winsize; /* Maximum size of Tx window */ 686 + u16 tx_nr_sent; /* Number of packets sent, but unacked */ 687 + u16 tx_nr_lost; /* Number of packets marked lost */ 688 + u16 tx_nr_resent; /* Number of packets resent, but unacked */ 689 + u16 tx_winsize; /* Maximum size of Tx window */ 708 690 #define RXRPC_TX_MAX_WINDOW 128 691 + u8 tx_jumbo_max; /* Maximum subpkts peer will accept */ 709 692 ktime_t tx_last_sent; /* Last time a transmission occurred */ 710 693 711 694 /* Received data tracking */ 712 695 struct sk_buff_head recvmsg_queue; /* Queue of packets ready for recvmsg() */ 696 + struct sk_buff_head rx_queue; /* Queue of packets for this call to receive */ 713 697 struct sk_buff_head rx_oos_queue; /* Queue of out of sequence packets */ 714 698 715 699 rxrpc_seq_t rx_highest_seq; /* Higest sequence number received */ ··· 732 698 */ 733 699 #define RXRPC_TX_SMSS RXRPC_JUMBO_DATALEN 734 700 #define RXRPC_MIN_CWND 4 735 - u8 cong_cwnd; /* Congestion window size */ 701 + enum rxrpc_ca_state cong_ca_state; /* Congestion control state */ 736 702 u8 cong_extra; /* Extra to send for congestion management */ 737 - u8 cong_ssthresh; /* Slow-start threshold */ 738 - enum rxrpc_congest_mode cong_mode:8; /* Congestion management mode */ 739 - u8 cong_dup_acks; /* Count of ACKs showing missing packets */ 740 - u8 cong_cumul_acks; /* Cumulative ACK count */ 703 + u16 cong_cwnd; /* Congestion window size */ 704 + u16 cong_ssthresh; /* Slow-start threshold */ 705 + u16 cong_dup_acks; /* Count of ACKs showing missing packets */ 706 + u16 cong_cumul_acks; /* Cumulative ACK count */ 741 707 ktime_t cong_tstamp; /* Last time cwnd was changed */ 742 - struct sk_buff *cong_last_nack; /* Last ACK with nacks received */ 708 + 709 + /* RACK-TLP [RFC8985] state. */ 710 + ktime_t rack_xmit_ts; /* Latest transmission timestamp */ 711 + ktime_t rack_rtt; /* RTT of most recently ACK'd segment */ 712 + ktime_t rack_rtt_ts; /* Timestamp of rack_rtt */ 713 + ktime_t rack_reo_wnd; /* Reordering window */ 714 + unsigned int rack_reo_wnd_mult; /* Multiplier applied to rack_reo_wnd */ 715 + int rack_reo_wnd_persist; /* Num loss recoveries before reset reo_wnd */ 716 + rxrpc_seq_t rack_fack; /* Highest sequence so far ACK'd */ 717 + rxrpc_seq_t rack_end_seq; /* Highest sequence seen */ 718 + rxrpc_seq_t rack_dsack_round; /* DSACK opt recv'd in latest roundtrip */ 719 + bool rack_dsack_round_none; /* T if dsack_round is "None" */ 720 + bool rack_reordering_seen; /* T if detected reordering event */ 721 + enum rxrpc_rack_timer_mode rack_timer_mode; /* Current mode of RACK timer */ 722 + bool tlp_is_retrans; /* T if unacked TLP retransmission */ 723 + rxrpc_serial_t tlp_serial; /* Serial of TLP probe (or 0 if none in progress) */ 724 + rxrpc_seq_t tlp_seq; /* Sequence of TLP probe */ 725 + unsigned int tlp_rtt_taken; /* Last time RTT taken */ 726 + ktime_t tlp_max_ack_delay; /* Sender budget for max delayed ACK interval */ 743 727 744 728 /* Receive-phase ACK management (ACKs we send). */ 745 729 u8 ackr_reason; /* reason to ACK */ ··· 782 730 783 731 /* Transmission-phase ACK management (ACKs we've received). */ 784 732 ktime_t acks_latest_ts; /* Timestamp of latest ACK received */ 785 - rxrpc_seq_t acks_first_seq; /* first sequence number received */ 733 + rxrpc_seq_t acks_hard_ack; /* Highest sequence hard acked */ 786 734 rxrpc_seq_t acks_prev_seq; /* Highest previousPacket received */ 787 - rxrpc_seq_t acks_hard_ack; /* Latest hard-ack point */ 788 735 rxrpc_seq_t acks_lowest_nak; /* Lowest NACK in the buffer (or ==tx_hard_ack) */ 789 736 rxrpc_serial_t acks_highest_serial; /* Highest serial number ACK'd */ 737 + unsigned short acks_nr_sacks; /* Number of soft acks recorded */ 738 + unsigned short acks_nr_snacks; /* Number of soft nacks recorded */ 739 + 740 + /* Calculated RTT cache */ 741 + ktime_t rtt_last_req; /* Time of last RTT request */ 742 + unsigned int rtt_count; /* Number of samples we've got */ 743 + unsigned int rtt_taken; /* Number of samples taken (wrapping) */ 744 + struct minmax min_rtt; /* Estimated minimum RTT */ 745 + u32 srtt_us; /* smoothed round trip time << 3 in usecs */ 746 + u32 mdev_us; /* medium deviation */ 747 + u32 mdev_max_us; /* maximal mdev for the last rtt period */ 748 + u32 rttvar_us; /* smoothed mdev_max */ 749 + u32 rto_us; /* Retransmission timeout in usec */ 750 + u8 backoff; /* Backoff timeout (as shift) */ 790 751 }; 791 752 792 753 /* 793 754 * Summary of a new ACK and the changes it made to the Tx buffer packet states. 794 755 */ 795 756 struct rxrpc_ack_summary { 796 - u16 nr_acks; /* Number of ACKs in packet */ 797 - u16 nr_new_acks; /* Number of new ACKs in packet */ 798 - u16 nr_new_nacks; /* Number of new nacks in packet */ 799 - u16 nr_retained_nacks; /* Number of nacks retained between ACKs */ 800 - u8 ack_reason; 801 - bool saw_nacks; /* Saw NACKs in packet */ 802 - bool new_low_nack; /* T if new low NACK found */ 803 - bool retrans_timeo; /* T if reTx due to timeout happened */ 804 - u8 flight_size; /* Number of unreceived transmissions */ 805 - /* Place to stash values for tracing */ 806 - enum rxrpc_congest_mode mode:8; 807 - u8 cwnd; 808 - u8 ssthresh; 809 - u8 dup_acks; 810 - u8 cumulative_acks; 757 + rxrpc_serial_t ack_serial; /* Serial number of ACK */ 758 + rxrpc_serial_t acked_serial; /* Serial number ACK'd */ 759 + u16 in_flight; /* Number of unreceived transmissions */ 760 + u16 nr_new_hacks; /* Number of rotated new ACKs */ 761 + u16 nr_new_sacks; /* Number of new soft ACKs in packet */ 762 + u16 nr_new_snacks; /* Number of new soft nacks in packet */ 763 + u8 ack_reason; 764 + bool new_low_snack:1; /* T if new low soft NACK found */ 765 + bool retrans_timeo:1; /* T if reTx due to timeout happened */ 766 + bool need_retransmit:1; /* T if we need transmission */ 767 + bool rtt_sample_avail:1; /* T if RTT sample available */ 768 + bool in_fast_or_rto_recovery:1; 769 + bool exiting_fast_or_rto_recovery:1; 770 + bool tlp_probe_acked:1; /* T if the TLP probe seq was acked */ 771 + u8 /*enum rxrpc_congest_change*/ change; 811 772 }; 812 773 813 774 /* ··· 858 793 * Buffer of data to be output as a packet. 859 794 */ 860 795 struct rxrpc_txbuf { 861 - struct list_head call_link; /* Link in call->tx_sendmsg/tx_buffer */ 862 - struct list_head tx_link; /* Link in live Enc queue or Tx queue */ 863 - ktime_t last_sent; /* Time at which last transmitted */ 864 796 refcount_t ref; 865 797 rxrpc_seq_t seq; /* Sequence number of this packet */ 866 798 rxrpc_serial_t serial; /* Last serial number transmitted with */ 867 799 unsigned int call_debug_id; 868 800 unsigned int debug_id; 869 - unsigned int len; /* Amount of data in buffer */ 870 - unsigned int space; /* Remaining data space */ 871 - unsigned int offset; /* Offset of fill point */ 801 + unsigned short len; /* Amount of data in buffer */ 802 + unsigned short space; /* Remaining data space */ 803 + unsigned short offset; /* Offset of fill point */ 804 + unsigned short pkt_len; /* Size of packet content */ 805 + unsigned short alloc_size; /* Amount of bufferage allocated */ 872 806 unsigned int flags; 873 807 #define RXRPC_TXBUF_WIRE_FLAGS 0xff /* The wire protocol flags */ 874 808 #define RXRPC_TXBUF_RESENT 0x100 /* Set if has been resent */ 875 809 __be16 cksum; /* Checksum to go in header */ 876 - unsigned short ack_rwind; /* ACK receive window */ 877 - u8 /*enum rxrpc_propose_ack_trace*/ ack_why; /* If ack, why */ 810 + bool jumboable; /* Can be non-terminal jumbo subpacket */ 878 811 u8 nr_kvec; /* Amount of kvec[] used */ 879 - struct kvec kvec[3]; 812 + struct kvec kvec[1]; 880 813 }; 881 814 882 815 static inline bool rxrpc_sending_to_server(const struct rxrpc_txbuf *txb) ··· 886 823 { 887 824 return !rxrpc_sending_to_server(txb); 888 825 } 826 + 827 + /* 828 + * Transmit queue element, including RACK [RFC8985] per-segment metadata. The 829 + * transmission timestamp is in usec from the base. 830 + */ 831 + struct rxrpc_txqueue { 832 + /* Start with the members we want to prefetch. */ 833 + struct rxrpc_txqueue *next; 834 + ktime_t xmit_ts_base; 835 + rxrpc_seq_t qbase; 836 + u8 nr_reported_acks; /* Number of segments explicitly acked/nacked */ 837 + unsigned long segment_acked; /* Bit-per-buf: Set if ACK'd */ 838 + unsigned long segment_lost; /* Bit-per-buf: Set if declared lost */ 839 + unsigned long segment_retransmitted; /* Bit-per-buf: Set if retransmitted */ 840 + unsigned long rtt_samples; /* Bit-per-buf: Set if available for RTT */ 841 + unsigned long ever_retransmitted; /* Bit-per-buf: Set if ever retransmitted */ 842 + 843 + /* The arrays we want to pack into as few cache lines as possible. */ 844 + struct { 845 + #define RXRPC_NR_TXQUEUE BITS_PER_LONG 846 + #define RXRPC_TXQ_MASK (RXRPC_NR_TXQUEUE - 1) 847 + struct rxrpc_txbuf *bufs[RXRPC_NR_TXQUEUE]; 848 + unsigned int segment_serial[RXRPC_NR_TXQUEUE]; 849 + unsigned int segment_xmit_ts[RXRPC_NR_TXQUEUE]; 850 + } ____cacheline_aligned; 851 + }; 852 + 853 + /* 854 + * Data transmission request. 855 + */ 856 + struct rxrpc_send_data_req { 857 + ktime_t now; /* Current time */ 858 + struct rxrpc_txqueue *tq; /* Tx queue segment holding first DATA */ 859 + rxrpc_seq_t seq; /* Sequence of first data */ 860 + int n; /* Number of DATA packets to glue into jumbo */ 861 + bool retrans; /* T if this is a retransmission */ 862 + bool did_send; /* T if did actually send */ 863 + bool tlp_probe; /* T if this is a TLP probe */ 864 + int /* enum rxrpc_txdata_trace */ trace; 865 + }; 889 866 890 867 #include <trace/events/rxrpc.h> 891 868 ··· 940 837 if (serial == 0) 941 838 serial = 1; 942 839 conn->tx_serial = serial + 1; 840 + return serial; 841 + } 842 + 843 + /* 844 + * Allocate the next serial n numbers on a connection. 0 must be skipped. 845 + */ 846 + static inline rxrpc_serial_t rxrpc_get_next_serials(struct rxrpc_connection *conn, 847 + unsigned int n) 848 + { 849 + rxrpc_serial_t serial; 850 + 851 + serial = conn->tx_serial; 852 + if (serial + n <= n) 853 + serial = 1; 854 + conn->tx_serial = serial + n; 943 855 return serial; 944 856 } 945 857 ··· 983 865 enum rxrpc_propose_ack_trace why); 984 866 void rxrpc_propose_delay_ACK(struct rxrpc_call *, rxrpc_serial_t, 985 867 enum rxrpc_propose_ack_trace); 986 - void rxrpc_shrink_call_tx_buffer(struct rxrpc_call *); 987 - void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb); 988 - 989 - bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb); 868 + void rxrpc_resend_tlp(struct rxrpc_call *call); 869 + void rxrpc_transmit_some_data(struct rxrpc_call *call, unsigned int limit, 870 + enum rxrpc_txdata_trace trace); 871 + bool rxrpc_input_call_event(struct rxrpc_call *call); 990 872 991 873 /* 992 874 * call_object.c ··· 1165 1047 void rxrpc_implicit_end_call(struct rxrpc_call *, struct sk_buff *); 1166 1048 1167 1049 /* 1050 + * input_rack.c 1051 + */ 1052 + void rxrpc_input_rack_one(struct rxrpc_call *call, 1053 + struct rxrpc_ack_summary *summary, 1054 + struct rxrpc_txqueue *tq, 1055 + unsigned int ix); 1056 + void rxrpc_input_rack(struct rxrpc_call *call, 1057 + struct rxrpc_ack_summary *summary, 1058 + struct rxrpc_txqueue *tq, 1059 + unsigned long new_acks); 1060 + void rxrpc_rack_detect_loss_and_arm_timer(struct rxrpc_call *call, 1061 + struct rxrpc_ack_summary *summary); 1062 + ktime_t rxrpc_tlp_calc_pto(struct rxrpc_call *call, ktime_t now); 1063 + void rxrpc_tlp_send_probe(struct rxrpc_call *call); 1064 + void rxrpc_tlp_process_ack(struct rxrpc_call *call, struct rxrpc_ack_summary *summary); 1065 + void rxrpc_rack_timer_expired(struct rxrpc_call *call, ktime_t overran_by); 1066 + 1067 + /* Initialise TLP state [RFC8958 7.1]. */ 1068 + static inline void rxrpc_tlp_init(struct rxrpc_call *call) 1069 + { 1070 + call->tlp_serial = 0; 1071 + call->tlp_seq = call->acks_hard_ack; 1072 + call->tlp_is_retrans = false; 1073 + } 1074 + 1075 + /* 1168 1076 * io_thread.c 1169 1077 */ 1170 1078 int rxrpc_encap_rcv(struct sock *, struct sk_buff *); ··· 1293 1149 */ 1294 1150 void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, 1295 1151 rxrpc_serial_t serial, enum rxrpc_propose_ack_trace why); 1152 + void rxrpc_send_probe_for_pmtud(struct rxrpc_call *call); 1296 1153 int rxrpc_send_abort_packet(struct rxrpc_call *); 1154 + void rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req *req); 1297 1155 void rxrpc_send_conn_abort(struct rxrpc_connection *conn); 1298 1156 void rxrpc_reject_packet(struct rxrpc_local *local, struct sk_buff *skb); 1299 1157 void rxrpc_send_keepalive(struct rxrpc_peer *); 1300 - void rxrpc_transmit_one(struct rxrpc_call *call, struct rxrpc_txbuf *txb); 1301 1158 1302 1159 /* 1303 1160 * peer_event.c 1304 1161 */ 1305 1162 void rxrpc_input_error(struct rxrpc_local *, struct sk_buff *); 1306 1163 void rxrpc_peer_keepalive_worker(struct work_struct *); 1164 + void rxrpc_input_probe_for_pmtud(struct rxrpc_connection *conn, rxrpc_serial_t acked_serial, 1165 + bool sendmsg_fail); 1307 1166 1308 1167 /* 1309 1168 * peer_object.c ··· 1355 1208 /* 1356 1209 * rtt.c 1357 1210 */ 1358 - void rxrpc_peer_add_rtt(struct rxrpc_call *, enum rxrpc_rtt_rx_trace, int, 1359 - rxrpc_serial_t, rxrpc_serial_t, ktime_t, ktime_t); 1360 - ktime_t rxrpc_get_rto_backoff(struct rxrpc_peer *peer, bool retrans); 1361 - void rxrpc_peer_init_rtt(struct rxrpc_peer *); 1211 + void rxrpc_call_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, 1212 + int rtt_slot, 1213 + rxrpc_serial_t send_serial, rxrpc_serial_t resp_serial, 1214 + ktime_t send_time, ktime_t resp_time); 1215 + ktime_t rxrpc_get_rto_backoff(struct rxrpc_call *call, bool retrans); 1216 + void rxrpc_call_init_rtt(struct rxrpc_call *call); 1362 1217 1363 1218 /* 1364 1219 * rxkad.c ··· 1433 1284 extern atomic_t rxrpc_nr_txbuf; 1434 1285 struct rxrpc_txbuf *rxrpc_alloc_data_txbuf(struct rxrpc_call *call, size_t data_size, 1435 1286 size_t data_align, gfp_t gfp); 1436 - struct rxrpc_txbuf *rxrpc_alloc_ack_txbuf(struct rxrpc_call *call, size_t sack_size); 1437 1287 void rxrpc_get_txbuf(struct rxrpc_txbuf *txb, enum rxrpc_txbuf_trace what); 1438 1288 void rxrpc_see_txbuf(struct rxrpc_txbuf *txb, enum rxrpc_txbuf_trace what); 1439 1289 void rxrpc_put_txbuf(struct rxrpc_txbuf *txb, enum rxrpc_txbuf_trace what); ··· 1457 1309 static inline bool after_eq(u32 seq1, u32 seq2) 1458 1310 { 1459 1311 return (s32)(seq1 - seq2) >= 0; 1312 + } 1313 + 1314 + static inline u32 earliest(u32 seq1, u32 seq2) 1315 + { 1316 + return before(seq1, seq2) ? seq1 : seq2; 1317 + } 1318 + 1319 + static inline u32 latest(u32 seq1, u32 seq2) 1320 + { 1321 + return after(seq1, seq2) ? seq1 : seq2; 1322 + } 1323 + 1324 + static inline bool rxrpc_seq_in_txq(const struct rxrpc_txqueue *tq, rxrpc_seq_t seq) 1325 + { 1326 + return (seq & (RXRPC_NR_TXQUEUE - 1)) == tq->qbase; 1327 + } 1328 + 1329 + static inline void rxrpc_queue_rx_call_packet(struct rxrpc_call *call, struct sk_buff *skb) 1330 + { 1331 + rxrpc_get_skb(skb, rxrpc_skb_get_call_rx); 1332 + __skb_queue_tail(&call->rx_queue, skb); 1333 + rxrpc_poke_call(call, rxrpc_call_poke_rx_packet); 1334 + } 1335 + 1336 + /* 1337 + * Calculate how much space there is for transmitting more DATA packets. 1338 + */ 1339 + static inline unsigned int rxrpc_tx_window_space(const struct rxrpc_call *call) 1340 + { 1341 + int winsize = umin(call->tx_winsize, call->cong_cwnd + call->cong_extra); 1342 + int transmitted = call->tx_top - call->tx_bottom; 1343 + 1344 + return max(winsize - transmitted, 0); 1345 + } 1346 + 1347 + static inline unsigned int rxrpc_left_out(const struct rxrpc_call *call) 1348 + { 1349 + return call->acks_nr_sacks + call->tx_nr_lost; 1350 + } 1351 + 1352 + /* 1353 + * Calculate the number of transmitted DATA packets assumed to be in flight 1354 + * [approx RFC6675]. 1355 + */ 1356 + static inline unsigned int rxrpc_tx_in_flight(const struct rxrpc_call *call) 1357 + { 1358 + return call->tx_nr_sent - rxrpc_left_out(call) + call->tx_nr_resent; 1460 1359 } 1461 1360 1462 1361 /*
+11 -11
net/rxrpc/call_accept.c
··· 188 188 /* Make sure that there aren't any incoming calls in progress before we 189 189 * clear the preallocation buffers. 190 190 */ 191 - spin_lock(&rx->incoming_lock); 192 - spin_unlock(&rx->incoming_lock); 191 + spin_lock_irq(&rx->incoming_lock); 192 + spin_unlock_irq(&rx->incoming_lock); 193 193 194 194 head = b->peer_backlog_head; 195 195 tail = b->peer_backlog_tail; ··· 343 343 if (sp->hdr.type != RXRPC_PACKET_TYPE_DATA) 344 344 return rxrpc_protocol_error(skb, rxrpc_eproto_no_service_call); 345 345 346 - read_lock(&local->services_lock); 346 + read_lock_irq(&local->services_lock); 347 347 348 348 /* Weed out packets to services we're not offering. Packets that would 349 349 * begin a call are explicitly rejected and the rest are just ··· 399 399 spin_unlock(&conn->state_lock); 400 400 401 401 spin_unlock(&rx->incoming_lock); 402 - read_unlock(&local->services_lock); 402 + read_unlock_irq(&local->services_lock); 403 403 404 404 if (hlist_unhashed(&call->error_link)) { 405 - spin_lock(&call->peer->lock); 405 + spin_lock_irq(&call->peer->lock); 406 406 hlist_add_head(&call->error_link, &call->peer->error_targets); 407 - spin_unlock(&call->peer->lock); 407 + spin_unlock_irq(&call->peer->lock); 408 408 } 409 409 410 410 _leave(" = %p{%d}", call, call->debug_id); 411 - rxrpc_input_call_event(call, skb); 411 + rxrpc_queue_rx_call_packet(call, skb); 412 412 rxrpc_put_call(call, rxrpc_call_put_input); 413 413 return true; 414 414 415 415 unsupported_service: 416 - read_unlock(&local->services_lock); 416 + read_unlock_irq(&local->services_lock); 417 417 return rxrpc_direct_abort(skb, rxrpc_abort_service_not_offered, 418 418 RX_INVALID_OPERATION, -EOPNOTSUPP); 419 419 unsupported_security: 420 - read_unlock(&local->services_lock); 420 + read_unlock_irq(&local->services_lock); 421 421 return rxrpc_direct_abort(skb, rxrpc_abort_service_not_offered, 422 422 RX_INVALID_OPERATION, -EKEYREJECTED); 423 423 no_call: 424 424 spin_unlock(&rx->incoming_lock); 425 - read_unlock(&local->services_lock); 425 + read_unlock_irq(&local->services_lock); 426 426 _leave(" = f [%u]", skb->mark); 427 427 return false; 428 428 discard: 429 - read_unlock(&local->services_lock); 429 + read_unlock_irq(&local->services_lock); 430 430 return true; 431 431 } 432 432
+194 -201
net/rxrpc/call_event.c
··· 44 44 45 45 trace_rxrpc_propose_ack(call, why, RXRPC_ACK_DELAY, serial); 46 46 47 - if (call->peer->srtt_us) 48 - delay = (call->peer->srtt_us >> 3) * NSEC_PER_USEC; 47 + if (call->srtt_us) 48 + delay = (call->srtt_us >> 3) * NSEC_PER_USEC; 49 49 else 50 50 delay = ms_to_ktime(READ_ONCE(rxrpc_soft_ack_delay)); 51 51 ktime_add_ms(delay, call->tx_backoff); ··· 55 55 } 56 56 57 57 /* 58 - * Handle congestion being detected by the retransmit timeout. 58 + * Retransmit one or more packets. 59 59 */ 60 - static void rxrpc_congestion_timeout(struct rxrpc_call *call) 60 + static bool rxrpc_retransmit_data(struct rxrpc_call *call, 61 + struct rxrpc_send_data_req *req) 61 62 { 62 - set_bit(RXRPC_CALL_RETRANS_TIMEOUT, &call->flags); 63 + struct rxrpc_txqueue *tq = req->tq; 64 + unsigned int ix = req->seq & RXRPC_TXQ_MASK; 65 + struct rxrpc_txbuf *txb = tq->bufs[ix]; 66 + 67 + _enter("%x,%x,%x,%x", tq->qbase, req->seq, ix, txb->debug_id); 68 + 69 + req->retrans = true; 70 + trace_rxrpc_retransmit(call, req, txb); 71 + 72 + txb->flags |= RXRPC_TXBUF_RESENT; 73 + rxrpc_send_data_packet(call, req); 74 + rxrpc_inc_stat(call->rxnet, stat_tx_data_retrans); 75 + 76 + req->tq = NULL; 77 + req->n = 0; 78 + req->did_send = true; 79 + req->now = ktime_get_real(); 80 + return true; 63 81 } 64 82 65 83 /* 66 84 * Perform retransmission of NAK'd and unack'd packets. 67 85 */ 68 - void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb) 86 + static void rxrpc_resend(struct rxrpc_call *call) 69 87 { 70 - struct rxrpc_ackpacket *ack = NULL; 71 - struct rxrpc_skb_priv *sp; 72 - struct rxrpc_txbuf *txb; 73 - rxrpc_seq_t transmitted = call->tx_transmitted; 74 - ktime_t next_resend = KTIME_MAX, rto = ns_to_ktime(call->peer->rto_us * NSEC_PER_USEC); 75 - ktime_t resend_at = KTIME_MAX, now, delay; 76 - bool unacked = false, did_send = false; 77 - unsigned int i; 88 + struct rxrpc_send_data_req req = { 89 + .now = ktime_get_real(), 90 + .trace = rxrpc_txdata_retransmit, 91 + }; 92 + struct rxrpc_txqueue *tq; 78 93 79 - _enter("{%d,%d}", call->acks_hard_ack, call->tx_top); 94 + _enter("{%d,%d}", call->tx_bottom, call->tx_top); 80 95 81 - now = ktime_get_real(); 96 + trace_rxrpc_resend(call, call->acks_highest_serial); 82 97 83 - if (list_empty(&call->tx_buffer)) 84 - goto no_resend; 98 + /* Scan the transmission queue, looking for lost packets. */ 99 + for (tq = call->tx_queue; tq; tq = tq->next) { 100 + unsigned long lost = tq->segment_lost; 85 101 86 - trace_rxrpc_resend(call, ack_skb); 87 - txb = list_first_entry(&call->tx_buffer, struct rxrpc_txbuf, call_link); 102 + if (after(tq->qbase, call->tx_transmitted)) 103 + break; 88 104 89 - /* Scan the soft ACK table without dropping the lock and resend any 90 - * explicitly NAK'd packets. 91 - */ 92 - if (ack_skb) { 93 - sp = rxrpc_skb(ack_skb); 94 - ack = (void *)ack_skb->data + sizeof(struct rxrpc_wire_header); 105 + _debug("retr %16lx %u c=%08x [%x]", 106 + tq->segment_acked, tq->nr_reported_acks, call->debug_id, tq->qbase); 107 + _debug("lost %16lx", lost); 95 108 96 - for (i = 0; i < sp->ack.nr_acks; i++) { 97 - rxrpc_seq_t seq; 109 + trace_rxrpc_resend_lost(call, tq, lost); 110 + while (lost) { 111 + unsigned int ix = __ffs(lost); 112 + struct rxrpc_txbuf *txb = tq->bufs[ix]; 98 113 99 - if (ack->acks[i] & 1) 100 - continue; 101 - seq = sp->ack.first_ack + i; 102 - if (after(txb->seq, transmitted)) 103 - break; 104 - if (after(txb->seq, seq)) 105 - continue; /* A new hard ACK probably came in */ 106 - list_for_each_entry_from(txb, &call->tx_buffer, call_link) { 107 - if (txb->seq == seq) 108 - goto found_txb; 109 - } 110 - goto no_further_resend; 114 + __clear_bit(ix, &lost); 115 + rxrpc_see_txbuf(txb, rxrpc_txbuf_see_lost); 111 116 112 - found_txb: 113 - resend_at = ktime_add(txb->last_sent, rto); 114 - if (after(txb->serial, call->acks_highest_serial)) { 115 - if (ktime_after(resend_at, now) && 116 - ktime_before(resend_at, next_resend)) 117 - next_resend = resend_at; 118 - continue; /* Ack point not yet reached */ 119 - } 120 - 121 - rxrpc_see_txbuf(txb, rxrpc_txbuf_see_unacked); 122 - 123 - trace_rxrpc_retransmit(call, txb->seq, txb->serial, 124 - ktime_sub(resend_at, now)); 125 - 126 - txb->flags |= RXRPC_TXBUF_RESENT; 127 - rxrpc_transmit_one(call, txb); 128 - did_send = true; 129 - now = ktime_get_real(); 130 - 131 - if (list_is_last(&txb->call_link, &call->tx_buffer)) 132 - goto no_further_resend; 133 - txb = list_next_entry(txb, call_link); 117 + req.tq = tq; 118 + req.seq = tq->qbase + ix; 119 + req.n = 1; 120 + rxrpc_retransmit_data(call, &req); 134 121 } 135 122 } 136 123 137 - /* Fast-forward through the Tx queue to the point the peer says it has 138 - * seen. Anything between the soft-ACK table and that point will get 139 - * ACK'd or NACK'd in due course, so don't worry about it here; here we 140 - * need to consider retransmitting anything beyond that point. 141 - */ 142 - if (after_eq(call->acks_prev_seq, call->tx_transmitted)) 143 - goto no_further_resend; 144 - 145 - list_for_each_entry_from(txb, &call->tx_buffer, call_link) { 146 - resend_at = ktime_add(txb->last_sent, rto); 147 - 148 - if (before_eq(txb->seq, call->acks_prev_seq)) 149 - continue; 150 - if (after(txb->seq, call->tx_transmitted)) 151 - break; /* Not transmitted yet */ 152 - 153 - if (ack && ack->reason == RXRPC_ACK_PING_RESPONSE && 154 - before(txb->serial, ntohl(ack->serial))) 155 - goto do_resend; /* Wasn't accounted for by a more recent ping. */ 156 - 157 - if (ktime_after(resend_at, now)) { 158 - if (ktime_before(resend_at, next_resend)) 159 - next_resend = resend_at; 160 - continue; 161 - } 162 - 163 - do_resend: 164 - unacked = true; 165 - 166 - txb->flags |= RXRPC_TXBUF_RESENT; 167 - rxrpc_transmit_one(call, txb); 168 - did_send = true; 169 - rxrpc_inc_stat(call->rxnet, stat_tx_data_retrans); 170 - now = ktime_get_real(); 171 - } 172 - 173 - no_further_resend: 174 - no_resend: 175 - if (resend_at < KTIME_MAX) { 176 - delay = rxrpc_get_rto_backoff(call->peer, did_send); 177 - resend_at = ktime_add(resend_at, delay); 178 - trace_rxrpc_timer_set(call, resend_at - now, rxrpc_timer_trace_resend_reset); 179 - } 180 - call->resend_at = resend_at; 181 - 182 - if (unacked) 183 - rxrpc_congestion_timeout(call); 184 - 185 - /* If there was nothing that needed retransmission then it's likely 186 - * that an ACK got lost somewhere. Send a ping to find out instead of 187 - * retransmitting data. 188 - */ 189 - if (!did_send) { 190 - ktime_t next_ping = ktime_add_us(call->acks_latest_ts, 191 - call->peer->srtt_us >> 3); 192 - 193 - if (ktime_sub(next_ping, now) <= 0) 194 - rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, 195 - rxrpc_propose_ack_ping_for_0_retrans); 196 - } 197 - 124 + rxrpc_get_rto_backoff(call, req.did_send); 198 125 _leave(""); 126 + } 127 + 128 + /* 129 + * Resend the highest-seq DATA packet so far transmitted for RACK-TLP [RFC8985 7.3]. 130 + */ 131 + void rxrpc_resend_tlp(struct rxrpc_call *call) 132 + { 133 + struct rxrpc_send_data_req req = { 134 + .now = ktime_get_real(), 135 + .seq = call->tx_transmitted, 136 + .n = 1, 137 + .tlp_probe = true, 138 + .trace = rxrpc_txdata_tlp_retransmit, 139 + }; 140 + 141 + /* There's a chance it'll be on the tail segment of the queue. */ 142 + req.tq = READ_ONCE(call->tx_qtail); 143 + if (req.tq && 144 + before(call->tx_transmitted, req.tq->qbase + RXRPC_NR_TXQUEUE)) { 145 + rxrpc_retransmit_data(call, &req); 146 + return; 147 + } 148 + 149 + for (req.tq = call->tx_queue; req.tq; req.tq = req.tq->next) { 150 + if (after_eq(call->tx_transmitted, req.tq->qbase) && 151 + before(call->tx_transmitted, req.tq->qbase + RXRPC_NR_TXQUEUE)) { 152 + rxrpc_retransmit_data(call, &req); 153 + return; 154 + } 155 + } 199 156 } 200 157 201 158 /* ··· 188 231 } 189 232 } 190 233 191 - static bool rxrpc_tx_window_has_space(struct rxrpc_call *call) 192 - { 193 - unsigned int winsize = min_t(unsigned int, call->tx_winsize, 194 - call->cong_cwnd + call->cong_extra); 195 - rxrpc_seq_t window = call->acks_hard_ack, wtop = window + winsize; 196 - rxrpc_seq_t tx_top = call->tx_top; 197 - int space; 198 - 199 - space = wtop - tx_top; 200 - return space > 0; 201 - } 202 - 203 234 /* 204 - * Decant some if the sendmsg prepared queue into the transmission buffer. 235 + * Transmit some as-yet untransmitted data, to a maximum of the supplied limit. 205 236 */ 206 - static void rxrpc_decant_prepared_tx(struct rxrpc_call *call) 237 + static void rxrpc_transmit_fresh_data(struct rxrpc_call *call, unsigned int limit, 238 + enum rxrpc_txdata_trace trace) 207 239 { 208 - struct rxrpc_txbuf *txb; 240 + int space = rxrpc_tx_window_space(call); 209 241 210 242 if (!test_bit(RXRPC_CALL_EXPOSED, &call->flags)) { 211 - if (list_empty(&call->tx_sendmsg)) 243 + if (call->send_top == call->tx_top) 212 244 return; 213 245 rxrpc_expose_client_call(call); 214 246 } 215 247 216 - while ((txb = list_first_entry_or_null(&call->tx_sendmsg, 217 - struct rxrpc_txbuf, call_link))) { 218 - spin_lock(&call->tx_lock); 219 - list_del(&txb->call_link); 220 - spin_unlock(&call->tx_lock); 248 + while (space > 0) { 249 + struct rxrpc_send_data_req req = { 250 + .now = ktime_get_real(), 251 + .seq = call->tx_transmitted + 1, 252 + .n = 0, 253 + .trace = trace, 254 + }; 255 + struct rxrpc_txqueue *tq; 256 + struct rxrpc_txbuf *txb; 257 + rxrpc_seq_t send_top, seq; 258 + int limit = min(space, max(call->peer->pmtud_jumbo, 1)); 221 259 222 - call->tx_top = txb->seq; 223 - list_add_tail(&txb->call_link, &call->tx_buffer); 224 - 225 - if (txb->flags & RXRPC_LAST_PACKET) 226 - rxrpc_close_tx_phase(call); 227 - 228 - rxrpc_transmit_one(call, txb); 229 - 230 - if (!rxrpc_tx_window_has_space(call)) 260 + /* Order send_top before the contents of the new txbufs and 261 + * txqueue pointers 262 + */ 263 + send_top = smp_load_acquire(&call->send_top); 264 + if (call->tx_top == send_top) 231 265 break; 266 + 267 + trace_rxrpc_transmit(call, send_top, space); 268 + 269 + tq = call->tx_qtail; 270 + seq = call->tx_top; 271 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_decant); 272 + 273 + do { 274 + int ix; 275 + 276 + seq++; 277 + ix = seq & RXRPC_TXQ_MASK; 278 + if (!ix) { 279 + tq = tq->next; 280 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_decant_advance); 281 + } 282 + if (!req.tq) 283 + req.tq = tq; 284 + txb = tq->bufs[ix]; 285 + req.n++; 286 + if (!txb->jumboable) 287 + break; 288 + } while (req.n < limit && before(seq, send_top)); 289 + 290 + if (txb->flags & RXRPC_LAST_PACKET) { 291 + rxrpc_close_tx_phase(call); 292 + tq = NULL; 293 + } 294 + call->tx_qtail = tq; 295 + call->tx_top = seq; 296 + 297 + space -= req.n; 298 + rxrpc_send_data_packet(call, &req); 232 299 } 233 300 } 234 301 235 - static void rxrpc_transmit_some_data(struct rxrpc_call *call) 302 + void rxrpc_transmit_some_data(struct rxrpc_call *call, unsigned int limit, 303 + enum rxrpc_txdata_trace trace) 236 304 { 237 305 switch (__rxrpc_call_state(call)) { 238 306 case RXRPC_CALL_SERVER_ACK_REQUEST: 239 - if (list_empty(&call->tx_sendmsg)) 307 + if (call->tx_bottom == READ_ONCE(call->send_top)) 240 308 return; 241 309 rxrpc_begin_service_reply(call); 242 310 fallthrough; 243 311 244 312 case RXRPC_CALL_SERVER_SEND_REPLY: 245 313 case RXRPC_CALL_CLIENT_SEND_REQUEST: 246 - if (!rxrpc_tx_window_has_space(call)) 314 + if (!rxrpc_tx_window_space(call)) 247 315 return; 248 - if (list_empty(&call->tx_sendmsg)) { 316 + if (call->tx_bottom == READ_ONCE(call->send_top)) { 249 317 rxrpc_inc_stat(call->rxnet, stat_tx_data_underflow); 250 318 return; 251 319 } 252 - rxrpc_decant_prepared_tx(call); 320 + rxrpc_transmit_fresh_data(call, limit, trace); 253 321 break; 254 322 default: 255 323 return; ··· 287 305 */ 288 306 static void rxrpc_send_initial_ping(struct rxrpc_call *call) 289 307 { 290 - if (call->peer->rtt_count < 3 || 291 - ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), 308 + if (call->rtt_count < 3 || 309 + ktime_before(ktime_add_ms(call->rtt_last_req, 1000), 292 310 ktime_get_real())) 293 311 rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, 294 312 rxrpc_propose_ack_ping_for_params); ··· 297 315 /* 298 316 * Handle retransmission and deferred ACK/abort generation. 299 317 */ 300 - bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb) 318 + bool rxrpc_input_call_event(struct rxrpc_call *call) 301 319 { 320 + struct sk_buff *skb; 302 321 ktime_t now, t; 303 - bool resend = false; 322 + bool did_receive = false, saw_ack = false; 304 323 s32 abort_code; 305 324 306 325 rxrpc_see_call(call, rxrpc_call_see_input); ··· 311 328 call->debug_id, rxrpc_call_states[__rxrpc_call_state(call)], 312 329 call->events); 313 330 314 - if (__rxrpc_call_is_complete(call)) 315 - goto out; 316 - 317 331 /* Handle abort request locklessly, vs rxrpc_propose_abort(). */ 318 332 abort_code = smp_load_acquire(&call->send_abort); 319 333 if (abort_code) { ··· 319 339 goto out; 320 340 } 321 341 322 - if (skb && skb->mark == RXRPC_SKB_MARK_ERROR) 323 - goto out; 342 + do { 343 + skb = __skb_dequeue(&call->rx_queue); 344 + if (skb) { 345 + struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 324 346 325 - if (skb) 326 - rxrpc_input_call_packet(call, skb); 347 + if (__rxrpc_call_is_complete(call) || 348 + skb->mark == RXRPC_SKB_MARK_ERROR) { 349 + rxrpc_free_skb(skb, rxrpc_skb_put_call_rx); 350 + goto out; 351 + } 352 + 353 + saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK; 354 + 355 + rxrpc_input_call_packet(call, skb); 356 + rxrpc_free_skb(skb, rxrpc_skb_put_call_rx); 357 + did_receive = true; 358 + } 359 + 360 + t = ktime_sub(call->rack_timo_at, ktime_get_real()); 361 + if (t <= 0) { 362 + trace_rxrpc_timer_exp(call, t, 363 + rxrpc_timer_trace_rack_off + call->rack_timer_mode); 364 + call->rack_timo_at = KTIME_MAX; 365 + rxrpc_rack_timer_expired(call, t); 366 + } 367 + 368 + } while (!skb_queue_empty(&call->rx_queue)); 327 369 328 370 /* If we see our async-event poke, check for timeout trippage. */ 329 371 now = ktime_get_real(); ··· 378 376 rxrpc_propose_ack_delayed_ack); 379 377 } 380 378 381 - t = ktime_sub(call->ack_lost_at, now); 382 - if (t <= 0) { 383 - trace_rxrpc_timer_exp(call, t, rxrpc_timer_trace_lost_ack); 384 - call->ack_lost_at = KTIME_MAX; 385 - set_bit(RXRPC_CALL_EV_ACK_LOST, &call->events); 386 - } 387 - 388 379 t = ktime_sub(call->ping_at, now); 389 380 if (t <= 0) { 390 381 trace_rxrpc_timer_exp(call, t, rxrpc_timer_trace_ping); ··· 385 390 rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, 386 391 rxrpc_propose_ack_ping_for_keepalive); 387 392 } 388 - 389 - t = ktime_sub(call->resend_at, now); 390 - if (t <= 0) { 391 - trace_rxrpc_timer_exp(call, t, rxrpc_timer_trace_resend); 392 - call->resend_at = KTIME_MAX; 393 - resend = true; 394 - } 395 - 396 - rxrpc_transmit_some_data(call); 397 393 398 394 now = ktime_get_real(); 399 395 t = ktime_sub(call->keepalive_at, now); ··· 395 409 rxrpc_propose_ack_ping_for_keepalive); 396 410 } 397 411 398 - if (skb) { 399 - struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 400 - 401 - if (sp->hdr.type == RXRPC_PACKET_TYPE_ACK) 402 - rxrpc_congestion_degrade(call); 403 - } 404 - 405 412 if (test_and_clear_bit(RXRPC_CALL_EV_INITIAL_PING, &call->events)) 406 413 rxrpc_send_initial_ping(call); 414 + 415 + rxrpc_transmit_some_data(call, UINT_MAX, rxrpc_txdata_new_data); 416 + 417 + if (saw_ack) 418 + rxrpc_congestion_degrade(call); 419 + 420 + if (did_receive && 421 + (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_SEND_REQUEST || 422 + __rxrpc_call_state(call) == RXRPC_CALL_SERVER_SEND_REPLY)) { 423 + t = ktime_sub(call->rack_timo_at, ktime_get_real()); 424 + trace_rxrpc_rack(call, t); 425 + } 407 426 408 427 /* Process events */ 409 428 if (test_and_clear_bit(RXRPC_CALL_EV_ACK_LOST, &call->events)) 410 429 rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, 411 430 rxrpc_propose_ack_ping_for_lost_ack); 412 431 413 - if (resend && 432 + if (call->tx_nr_lost > 0 && 414 433 __rxrpc_call_state(call) != RXRPC_CALL_CLIENT_RECV_REPLY && 415 434 !test_bit(RXRPC_CALL_TX_ALL_ACKED, &call->flags)) 416 - rxrpc_resend(call, NULL); 435 + rxrpc_resend(call); 417 436 418 437 if (test_and_clear_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags)) 419 438 rxrpc_send_ACK(call, RXRPC_ACK_IDLE, 0, 420 439 rxrpc_propose_ack_rx_idle); 421 440 422 441 if (call->ackr_nr_unacked > 2) { 423 - if (call->peer->rtt_count < 3) 442 + if (call->rtt_count < 3) 424 443 rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, 425 444 rxrpc_propose_ack_ping_for_rtt); 426 - else if (ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), 445 + else if (ktime_before(ktime_add_ms(call->rtt_last_req, 1000), 427 446 ktime_get_real())) 428 447 rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, 429 448 rxrpc_propose_ack_ping_for_old_rtt); ··· 446 455 set(call->expect_req_by); 447 456 set(call->expect_rx_by); 448 457 set(call->delay_ack_at); 449 - set(call->ack_lost_at); 450 - set(call->resend_at); 458 + set(call->rack_timo_at); 451 459 set(call->keepalive_at); 452 460 set(call->ping_at); 453 461 ··· 457 467 } else { 458 468 unsigned long nowj = jiffies, delayj, nextj; 459 469 460 - delayj = max(nsecs_to_jiffies(delay), 1); 470 + delayj = umax(nsecs_to_jiffies(delay), 1); 461 471 nextj = nowj + delayj; 462 472 if (time_before(nextj, call->timer.expires) || 463 473 !timer_pending(&call->timer)) { ··· 474 484 rxrpc_disconnect_call(call); 475 485 if (call->security) 476 486 call->security->free_call_crypto(call); 487 + } else { 488 + if (did_receive && 489 + call->peer->ackr_adv_pmtud && 490 + call->peer->pmtud_pending) 491 + rxrpc_send_probe_for_pmtud(call); 477 492 } 478 - if (call->acks_hard_ack != call->tx_bottom) 479 - rxrpc_shrink_call_tx_buffer(call); 480 493 _leave(""); 481 494 return true; 482 495
+36 -30
net/rxrpc/call_object.c
··· 49 49 bool busy; 50 50 51 51 if (!test_bit(RXRPC_CALL_DISCONNECTED, &call->flags)) { 52 - spin_lock_bh(&local->lock); 52 + spin_lock_irq(&local->lock); 53 53 busy = !list_empty(&call->attend_link); 54 54 trace_rxrpc_poke_call(call, busy, what); 55 55 if (!busy && !rxrpc_try_get_call(call, rxrpc_call_get_poke)) ··· 57 57 if (!busy) { 58 58 list_add_tail(&call->attend_link, &local->call_attend_q); 59 59 } 60 - spin_unlock_bh(&local->lock); 60 + spin_unlock_irq(&local->lock); 61 61 if (!busy) 62 62 rxrpc_wake_up_io_thread(local); 63 63 } ··· 146 146 INIT_LIST_HEAD(&call->recvmsg_link); 147 147 INIT_LIST_HEAD(&call->sock_link); 148 148 INIT_LIST_HEAD(&call->attend_link); 149 - INIT_LIST_HEAD(&call->tx_sendmsg); 150 - INIT_LIST_HEAD(&call->tx_buffer); 149 + skb_queue_head_init(&call->rx_queue); 151 150 skb_queue_head_init(&call->recvmsg_queue); 152 151 skb_queue_head_init(&call->rx_oos_queue); 153 152 init_waitqueue_head(&call->waitq); 154 153 spin_lock_init(&call->notify_lock); 155 - spin_lock_init(&call->tx_lock); 156 154 refcount_set(&call->ref, 1); 157 155 call->debug_id = debug_id; 158 156 call->tx_total_len = -1; 157 + call->tx_jumbo_max = 1; 159 158 call->next_rx_timo = 20 * HZ; 160 159 call->next_req_timo = 1 * HZ; 161 160 call->ackr_window = 1; 162 161 call->ackr_wtop = 1; 163 162 call->delay_ack_at = KTIME_MAX; 164 - call->ack_lost_at = KTIME_MAX; 165 - call->resend_at = KTIME_MAX; 163 + call->rack_timo_at = KTIME_MAX; 166 164 call->ping_at = KTIME_MAX; 167 165 call->keepalive_at = KTIME_MAX; 168 166 call->expect_rx_by = KTIME_MAX; ··· 174 176 175 177 call->cong_cwnd = RXRPC_MIN_CWND; 176 178 call->cong_ssthresh = RXRPC_TX_MAX_WINDOW; 179 + 180 + rxrpc_call_init_rtt(call); 177 181 178 182 call->rxnet = rxnet; 179 183 call->rtt_avail = RXRPC_CALL_RTT_AVAIL_MASK; ··· 220 220 __set_bit(RXRPC_CALL_EXCLUSIVE, &call->flags); 221 221 222 222 if (p->timeouts.normal) 223 - call->next_rx_timo = min(p->timeouts.normal, 1); 223 + call->next_rx_timo = umin(p->timeouts.normal, 1); 224 224 if (p->timeouts.idle) 225 - call->next_req_timo = min(p->timeouts.idle, 1); 225 + call->next_req_timo = umin(p->timeouts.idle, 1); 226 226 if (p->timeouts.hard) 227 227 call->hard_timo = p->timeouts.hard; 228 228 ··· 302 302 303 303 trace_rxrpc_client(NULL, -1, rxrpc_client_queue_new_call); 304 304 rxrpc_get_call(call, rxrpc_call_get_io_thread); 305 - spin_lock(&local->client_call_lock); 305 + spin_lock_irq(&local->client_call_lock); 306 306 list_add_tail(&call->wait_link, &local->new_client_calls); 307 - spin_unlock(&local->client_call_lock); 307 + spin_unlock_irq(&local->client_call_lock); 308 308 rxrpc_wake_up_io_thread(local); 309 309 return 0; 310 310 ··· 434 434 435 435 /* 436 436 * Set up an incoming call. call->conn points to the connection. 437 - * This is called in BH context and isn't allowed to fail. 437 + * This is called with interrupts disabled and isn't allowed to fail. 438 438 */ 439 439 void rxrpc_incoming_call(struct rxrpc_sock *rx, 440 440 struct rxrpc_call *call, ··· 531 531 } 532 532 533 533 /* 534 - * Clean up the Rx skb ring. 534 + * Clean up the transmission buffers. 535 535 */ 536 - static void rxrpc_cleanup_ring(struct rxrpc_call *call) 536 + static void rxrpc_cleanup_tx_buffers(struct rxrpc_call *call) 537 + { 538 + struct rxrpc_txqueue *tq, *next; 539 + 540 + for (tq = call->tx_queue; tq; tq = next) { 541 + next = tq->next; 542 + for (int i = 0; i < RXRPC_NR_TXQUEUE; i++) 543 + if (tq->bufs[i]) 544 + rxrpc_put_txbuf(tq->bufs[i], rxrpc_txbuf_put_cleaned); 545 + trace_rxrpc_tq(call, tq, 0, rxrpc_tq_cleaned); 546 + kfree(tq); 547 + } 548 + } 549 + 550 + /* 551 + * Clean up the receive buffers. 552 + */ 553 + static void rxrpc_cleanup_rx_buffers(struct rxrpc_call *call) 537 554 { 538 555 rxrpc_purge_queue(&call->recvmsg_queue); 556 + rxrpc_purge_queue(&call->rx_queue); 539 557 rxrpc_purge_queue(&call->rx_oos_queue); 540 558 } 541 559 ··· 576 558 rxrpc_put_call_slot(call); 577 559 578 560 /* Make sure we don't get any more notifications */ 579 - spin_lock(&rx->recvmsg_lock); 561 + spin_lock_irq(&rx->recvmsg_lock); 580 562 581 563 if (!list_empty(&call->recvmsg_link)) { 582 564 _debug("unlinking once-pending call %p { e=%lx f=%lx }", ··· 589 571 call->recvmsg_link.next = NULL; 590 572 call->recvmsg_link.prev = NULL; 591 573 592 - spin_unlock(&rx->recvmsg_lock); 574 + spin_unlock_irq(&rx->recvmsg_lock); 593 575 if (put) 594 576 rxrpc_put_call(call, rxrpc_call_put_unnotify); 595 577 ··· 689 671 static void rxrpc_destroy_call(struct work_struct *work) 690 672 { 691 673 struct rxrpc_call *call = container_of(work, struct rxrpc_call, destroyer); 692 - struct rxrpc_txbuf *txb; 693 674 694 675 del_timer_sync(&call->timer); 695 676 696 - rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); 697 - rxrpc_cleanup_ring(call); 698 - while ((txb = list_first_entry_or_null(&call->tx_sendmsg, 699 - struct rxrpc_txbuf, call_link))) { 700 - list_del(&txb->call_link); 701 - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_cleaned); 702 - } 703 - while ((txb = list_first_entry_or_null(&call->tx_buffer, 704 - struct rxrpc_txbuf, call_link))) { 705 - list_del(&txb->call_link); 706 - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_cleaned); 707 - } 708 - 677 + rxrpc_cleanup_tx_buffers(call); 678 + rxrpc_cleanup_rx_buffers(call); 709 679 rxrpc_put_txbuf(call->tx_pending, rxrpc_txbuf_put_cleaned); 710 680 rxrpc_put_connection(call->conn, rxrpc_conn_put_call); 711 681 rxrpc_deactivate_bundle(call->bundle);
+14 -12
net/rxrpc/conn_client.c
··· 231 231 distance = id - id_cursor; 232 232 if (distance < 0) 233 233 distance = -distance; 234 - limit = max_t(unsigned long, atomic_read(&rxnet->nr_conns) * 4, 1024); 234 + limit = umax(atomic_read(&rxnet->nr_conns) * 4, 1024); 235 235 if (distance > limit) 236 236 goto mark_dont_reuse; 237 237 ··· 437 437 call->dest_srx.srx_service = conn->service_id; 438 438 call->cong_ssthresh = call->peer->cong_ssthresh; 439 439 if (call->cong_cwnd >= call->cong_ssthresh) 440 - call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; 440 + call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; 441 441 else 442 - call->cong_mode = RXRPC_CALL_SLOW_START; 442 + call->cong_ca_state = RXRPC_CA_SLOW_START; 443 443 444 444 chan->call_id = call_id; 445 445 chan->call_debug_id = call->debug_id; ··· 508 508 void rxrpc_connect_client_calls(struct rxrpc_local *local) 509 509 { 510 510 struct rxrpc_call *call; 511 + LIST_HEAD(new_client_calls); 511 512 512 - while ((call = list_first_entry_or_null(&local->new_client_calls, 513 - struct rxrpc_call, wait_link)) 514 - ) { 513 + spin_lock_irq(&local->client_call_lock); 514 + list_splice_tail_init(&local->new_client_calls, &new_client_calls); 515 + spin_unlock_irq(&local->client_call_lock); 516 + 517 + while ((call = list_first_entry_or_null(&new_client_calls, 518 + struct rxrpc_call, wait_link))) { 515 519 struct rxrpc_bundle *bundle = call->bundle; 516 520 517 - spin_lock(&local->client_call_lock); 518 521 list_move_tail(&call->wait_link, &bundle->waiting_calls); 519 522 rxrpc_see_call(call, rxrpc_call_see_waiting_call); 520 - spin_unlock(&local->client_call_lock); 521 523 522 524 if (rxrpc_bundle_has_space(bundle)) 523 525 rxrpc_activate_channels(bundle); ··· 547 545 set_bit(RXRPC_CONN_DONT_REUSE, &conn->flags); 548 546 trace_rxrpc_client(conn, channel, rxrpc_client_exposed); 549 547 550 - spin_lock(&call->peer->lock); 548 + spin_lock_irq(&call->peer->lock); 551 549 hlist_add_head(&call->error_link, &call->peer->error_targets); 552 - spin_unlock(&call->peer->lock); 550 + spin_unlock_irq(&call->peer->lock); 553 551 } 554 552 } 555 553 ··· 590 588 ASSERTCMP(call->call_id, ==, 0); 591 589 ASSERT(!test_bit(RXRPC_CALL_EXPOSED, &call->flags)); 592 590 /* May still be on ->new_client_calls. */ 593 - spin_lock(&local->client_call_lock); 591 + spin_lock_irq(&local->client_call_lock); 594 592 list_del_init(&call->wait_link); 595 - spin_unlock(&local->client_call_lock); 593 + spin_unlock_irq(&local->client_call_lock); 596 594 return; 597 595 } 598 596
+25 -15
net/rxrpc/conn_event.c
··· 26 26 bool aborted = false; 27 27 28 28 if (conn->state != RXRPC_CONN_ABORTED) { 29 - spin_lock(&conn->state_lock); 29 + spin_lock_irq(&conn->state_lock); 30 30 if (conn->state != RXRPC_CONN_ABORTED) { 31 31 conn->abort_code = abort_code; 32 32 conn->error = err; ··· 37 37 set_bit(RXRPC_CONN_EV_ABORT_CALLS, &conn->events); 38 38 aborted = true; 39 39 } 40 - spin_unlock(&conn->state_lock); 40 + spin_unlock_irq(&conn->state_lock); 41 41 } 42 42 43 43 return aborted; ··· 63 63 /* 64 64 * Mark a connection as being remotely aborted. 65 65 */ 66 - static bool rxrpc_input_conn_abort(struct rxrpc_connection *conn, 66 + static void rxrpc_input_conn_abort(struct rxrpc_connection *conn, 67 67 struct sk_buff *skb) 68 68 { 69 - return rxrpc_set_conn_aborted(conn, skb, skb->priority, -ECONNABORTED, 70 - RXRPC_CALL_REMOTELY_ABORTED); 69 + trace_rxrpc_rx_conn_abort(conn, skb); 70 + rxrpc_set_conn_aborted(conn, skb, skb->priority, -ECONNABORTED, 71 + RXRPC_CALL_REMOTELY_ABORTED); 71 72 } 72 73 73 74 /* ··· 92 91 struct rxrpc_acktrailer trailer; 93 92 size_t len; 94 93 int ret, ioc; 95 - u32 serial, mtu, call_id, padding; 94 + u32 serial, max_mtu, if_mtu, call_id, padding; 96 95 97 96 _enter("%d", conn->debug_id); 98 97 ··· 150 149 break; 151 150 152 151 case RXRPC_PACKET_TYPE_ACK: 153 - mtu = conn->peer->if_mtu; 154 - mtu -= conn->peer->hdrsize; 152 + if_mtu = conn->peer->if_mtu - conn->peer->hdrsize; 153 + if (conn->peer->ackr_adv_pmtud) { 154 + max_mtu = umax(conn->peer->max_data, rxrpc_rx_mtu); 155 + } else { 156 + if_mtu = umin(1444, if_mtu); 157 + max_mtu = if_mtu; 158 + } 155 159 pkt.ack.bufferSpace = 0; 156 160 pkt.ack.maxSkew = htons(skb ? skb->priority : 0); 157 161 pkt.ack.firstPacket = htonl(chan->last_seq + 1); ··· 164 158 pkt.ack.serial = htonl(skb ? sp->hdr.serial : 0); 165 159 pkt.ack.reason = skb ? RXRPC_ACK_DUPLICATE : RXRPC_ACK_IDLE; 166 160 pkt.ack.nAcks = 0; 167 - trailer.maxMTU = htonl(rxrpc_rx_mtu); 168 - trailer.ifMTU = htonl(mtu); 161 + trailer.maxMTU = htonl(max_mtu); 162 + trailer.ifMTU = htonl(if_mtu); 169 163 trailer.rwind = htonl(rxrpc_rx_window_size); 170 - trailer.jumbo_max = htonl(rxrpc_rx_jumbo_max); 164 + trailer.jumbo_max = 0; 171 165 pkt.whdr.flags |= RXRPC_SLOW_START_OK; 172 166 padding = 0; 173 167 iov[0].iov_len += sizeof(pkt.ack); ··· 177 171 trace_rxrpc_tx_ack(chan->call_debug_id, serial, 178 172 ntohl(pkt.ack.firstPacket), 179 173 ntohl(pkt.ack.serial), 180 - pkt.ack.reason, 0, rxrpc_rx_window_size); 174 + pkt.ack.reason, 0, rxrpc_rx_window_size, 175 + rxrpc_propose_ack_retransmit); 181 176 break; 182 177 183 178 default: ··· 209 202 210 203 for (i = 0; i < RXRPC_MAXCALLS; i++) { 211 204 call = conn->channels[i].call; 212 - if (call) 205 + if (call) { 206 + rxrpc_see_call(call, rxrpc_call_see_conn_abort); 213 207 rxrpc_set_call_completion(call, 214 208 conn->completion, 215 209 conn->abort_code, 216 210 conn->error); 211 + rxrpc_poke_call(call, rxrpc_call_poke_conn_abort); 212 + } 217 213 } 218 214 219 215 _leave(""); ··· 262 252 if (ret < 0) 263 253 return ret; 264 254 265 - spin_lock(&conn->state_lock); 255 + spin_lock_irq(&conn->state_lock); 266 256 if (conn->state == RXRPC_CONN_SERVICE_CHALLENGING) 267 257 conn->state = RXRPC_CONN_SERVICE; 268 - spin_unlock(&conn->state_lock); 258 + spin_unlock_irq(&conn->state_lock); 269 259 270 260 if (conn->state == RXRPC_CONN_SERVICE) { 271 261 /* Offload call state flipping to the I/O thread. As
+10 -4
net/rxrpc/conn_object.c
··· 31 31 if (WARN_ON_ONCE(!local)) 32 32 return; 33 33 34 - spin_lock_bh(&local->lock); 34 + spin_lock_irq(&local->lock); 35 35 busy = !list_empty(&conn->attend_link); 36 36 if (!busy) { 37 37 rxrpc_get_connection(conn, why); 38 38 list_add_tail(&conn->attend_link, &local->conn_attend_q); 39 39 } 40 - spin_unlock_bh(&local->lock); 40 + spin_unlock_irq(&local->lock); 41 41 rxrpc_wake_up_io_thread(local); 42 42 } 43 43 ··· 196 196 call->peer->cong_ssthresh = call->cong_ssthresh; 197 197 198 198 if (!hlist_unhashed(&call->error_link)) { 199 - spin_lock(&call->peer->lock); 199 + spin_lock_irq(&call->peer->lock); 200 200 hlist_del_init(&call->error_link); 201 - spin_unlock(&call->peer->lock); 201 + spin_unlock_irq(&call->peer->lock); 202 202 } 203 203 204 204 if (rxrpc_is_client_call(call)) { ··· 320 320 write_lock(&rxnet->conn_lock); 321 321 list_del_init(&conn->proc_link); 322 322 write_unlock(&rxnet->conn_lock); 323 + 324 + if (conn->pmtud_probe) { 325 + trace_rxrpc_pmtud_lost(conn, 0); 326 + conn->peer->pmtud_probing = false; 327 + conn->peer->pmtud_pending = true; 328 + } 323 329 324 330 rxrpc_purge_queue(&conn->rx_queue); 325 331
+443 -265
net/rxrpc/input.c
··· 27 27 } 28 28 29 29 /* 30 - * Do TCP-style congestion management [RFC 5681]. 30 + * Do TCP-style congestion management [RFC5681]. 31 31 */ 32 32 static void rxrpc_congestion_management(struct rxrpc_call *call, 33 - struct sk_buff *skb, 34 - struct rxrpc_ack_summary *summary, 35 - rxrpc_serial_t acked_serial) 33 + struct rxrpc_ack_summary *summary) 36 34 { 37 - enum rxrpc_congest_change change = rxrpc_cong_no_change; 38 - unsigned int cumulative_acks = call->cong_cumul_acks; 39 - unsigned int cwnd = call->cong_cwnd; 40 - bool resend = false; 41 - 42 - summary->flight_size = 43 - (call->tx_top - call->acks_hard_ack) - summary->nr_acks; 35 + summary->change = rxrpc_cong_no_change; 36 + summary->in_flight = rxrpc_tx_in_flight(call); 44 37 45 38 if (test_and_clear_bit(RXRPC_CALL_RETRANS_TIMEOUT, &call->flags)) { 46 39 summary->retrans_timeo = true; 47 - call->cong_ssthresh = max_t(unsigned int, 48 - summary->flight_size / 2, 2); 49 - cwnd = 1; 50 - if (cwnd >= call->cong_ssthresh && 51 - call->cong_mode == RXRPC_CALL_SLOW_START) { 52 - call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; 53 - call->cong_tstamp = skb->tstamp; 54 - cumulative_acks = 0; 40 + call->cong_ssthresh = umax(summary->in_flight / 2, 2); 41 + call->cong_cwnd = 1; 42 + if (call->cong_cwnd >= call->cong_ssthresh && 43 + call->cong_ca_state == RXRPC_CA_SLOW_START) { 44 + call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; 45 + call->cong_tstamp = call->acks_latest_ts; 46 + call->cong_cumul_acks = 0; 55 47 } 56 48 } 57 49 58 - cumulative_acks += summary->nr_new_acks; 59 - if (cumulative_acks > 255) 60 - cumulative_acks = 255; 50 + call->cong_cumul_acks += summary->nr_new_sacks; 51 + call->cong_cumul_acks += summary->nr_new_hacks; 52 + if (call->cong_cumul_acks > 255) 53 + call->cong_cumul_acks = 255; 61 54 62 - summary->cwnd = call->cong_cwnd; 63 - summary->ssthresh = call->cong_ssthresh; 64 - summary->cumulative_acks = cumulative_acks; 65 - summary->dup_acks = call->cong_dup_acks; 66 - 67 - switch (call->cong_mode) { 68 - case RXRPC_CALL_SLOW_START: 69 - if (summary->saw_nacks) 55 + switch (call->cong_ca_state) { 56 + case RXRPC_CA_SLOW_START: 57 + if (call->acks_nr_snacks > 0) 70 58 goto packet_loss_detected; 71 - if (summary->cumulative_acks > 0) 72 - cwnd += 1; 73 - if (cwnd >= call->cong_ssthresh) { 74 - call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; 75 - call->cong_tstamp = skb->tstamp; 59 + if (call->cong_cumul_acks > 0) 60 + call->cong_cwnd += 1; 61 + if (call->cong_cwnd >= call->cong_ssthresh) { 62 + call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; 63 + call->cong_tstamp = call->acks_latest_ts; 76 64 } 77 65 goto out; 78 66 79 - case RXRPC_CALL_CONGEST_AVOIDANCE: 80 - if (summary->saw_nacks) 67 + case RXRPC_CA_CONGEST_AVOIDANCE: 68 + if (call->acks_nr_snacks > 0) 81 69 goto packet_loss_detected; 82 70 83 71 /* We analyse the number of packets that get ACK'd per RTT 84 72 * period and increase the window if we managed to fill it. 85 73 */ 86 - if (call->peer->rtt_count == 0) 74 + if (call->rtt_count == 0) 87 75 goto out; 88 - if (ktime_before(skb->tstamp, 76 + if (ktime_before(call->acks_latest_ts, 89 77 ktime_add_us(call->cong_tstamp, 90 - call->peer->srtt_us >> 3))) 78 + call->srtt_us >> 3))) 91 79 goto out_no_clear_ca; 92 - change = rxrpc_cong_rtt_window_end; 93 - call->cong_tstamp = skb->tstamp; 94 - if (cumulative_acks >= cwnd) 95 - cwnd++; 80 + summary->change = rxrpc_cong_rtt_window_end; 81 + call->cong_tstamp = call->acks_latest_ts; 82 + if (call->cong_cumul_acks >= call->cong_cwnd) 83 + call->cong_cwnd++; 96 84 goto out; 97 85 98 - case RXRPC_CALL_PACKET_LOSS: 99 - if (!summary->saw_nacks) 86 + case RXRPC_CA_PACKET_LOSS: 87 + if (call->acks_nr_snacks == 0) 100 88 goto resume_normality; 101 89 102 - if (summary->new_low_nack) { 103 - change = rxrpc_cong_new_low_nack; 90 + if (summary->new_low_snack) { 91 + summary->change = rxrpc_cong_new_low_nack; 104 92 call->cong_dup_acks = 1; 105 93 if (call->cong_extra > 1) 106 94 call->cong_extra = 1; ··· 99 111 if (call->cong_dup_acks < 3) 100 112 goto send_extra_data; 101 113 102 - change = rxrpc_cong_begin_retransmission; 103 - call->cong_mode = RXRPC_CALL_FAST_RETRANSMIT; 104 - call->cong_ssthresh = max_t(unsigned int, 105 - summary->flight_size / 2, 2); 106 - cwnd = call->cong_ssthresh + 3; 114 + summary->change = rxrpc_cong_begin_retransmission; 115 + call->cong_ca_state = RXRPC_CA_FAST_RETRANSMIT; 116 + call->cong_ssthresh = umax(summary->in_flight / 2, 2); 117 + call->cong_cwnd = call->cong_ssthresh + 3; 107 118 call->cong_extra = 0; 108 119 call->cong_dup_acks = 0; 109 - resend = true; 120 + summary->need_retransmit = true; 121 + summary->in_fast_or_rto_recovery = true; 110 122 goto out; 111 123 112 - case RXRPC_CALL_FAST_RETRANSMIT: 113 - if (!summary->new_low_nack) { 114 - if (summary->nr_new_acks == 0) 115 - cwnd += 1; 124 + case RXRPC_CA_FAST_RETRANSMIT: 125 + rxrpc_tlp_init(call); 126 + summary->in_fast_or_rto_recovery = true; 127 + if (!summary->new_low_snack) { 128 + if (summary->nr_new_sacks == 0) 129 + call->cong_cwnd += 1; 116 130 call->cong_dup_acks++; 117 131 if (call->cong_dup_acks == 2) { 118 - change = rxrpc_cong_retransmit_again; 132 + summary->change = rxrpc_cong_retransmit_again; 119 133 call->cong_dup_acks = 0; 120 - resend = true; 134 + summary->need_retransmit = true; 121 135 } 122 136 } else { 123 - change = rxrpc_cong_progress; 124 - cwnd = call->cong_ssthresh; 125 - if (!summary->saw_nacks) 137 + summary->change = rxrpc_cong_progress; 138 + call->cong_cwnd = call->cong_ssthresh; 139 + if (call->acks_nr_snacks == 0) { 140 + summary->exiting_fast_or_rto_recovery = true; 126 141 goto resume_normality; 142 + } 127 143 } 128 144 goto out; 129 145 ··· 137 145 } 138 146 139 147 resume_normality: 140 - change = rxrpc_cong_cleared_nacks; 148 + summary->change = rxrpc_cong_cleared_nacks; 141 149 call->cong_dup_acks = 0; 142 150 call->cong_extra = 0; 143 - call->cong_tstamp = skb->tstamp; 144 - if (cwnd < call->cong_ssthresh) 145 - call->cong_mode = RXRPC_CALL_SLOW_START; 151 + call->cong_tstamp = call->acks_latest_ts; 152 + if (call->cong_cwnd < call->cong_ssthresh) 153 + call->cong_ca_state = RXRPC_CA_SLOW_START; 146 154 else 147 - call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE; 155 + call->cong_ca_state = RXRPC_CA_CONGEST_AVOIDANCE; 148 156 out: 149 - cumulative_acks = 0; 157 + call->cong_cumul_acks = 0; 150 158 out_no_clear_ca: 151 - if (cwnd >= RXRPC_TX_MAX_WINDOW) 152 - cwnd = RXRPC_TX_MAX_WINDOW; 153 - call->cong_cwnd = cwnd; 154 - call->cong_cumul_acks = cumulative_acks; 155 - summary->mode = call->cong_mode; 156 - trace_rxrpc_congest(call, summary, acked_serial, change); 157 - if (resend) 158 - rxrpc_resend(call, skb); 159 + if (call->cong_cwnd >= RXRPC_TX_MAX_WINDOW) 160 + call->cong_cwnd = RXRPC_TX_MAX_WINDOW; 161 + trace_rxrpc_congest(call, summary); 159 162 return; 160 163 161 164 packet_loss_detected: 162 - change = rxrpc_cong_saw_nack; 163 - call->cong_mode = RXRPC_CALL_PACKET_LOSS; 165 + summary->change = rxrpc_cong_saw_nack; 166 + call->cong_ca_state = RXRPC_CA_PACKET_LOSS; 164 167 call->cong_dup_acks = 0; 165 168 goto send_extra_data; 166 169 ··· 164 177 * state. 165 178 */ 166 179 if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) || 167 - summary->nr_acks != call->tx_top - call->acks_hard_ack) { 180 + call->acks_nr_sacks != call->tx_top - call->tx_bottom) { 168 181 call->cong_extra++; 169 182 wake_up(&call->waitq); 170 183 } ··· 176 189 */ 177 190 void rxrpc_congestion_degrade(struct rxrpc_call *call) 178 191 { 179 - ktime_t rtt, now; 192 + ktime_t rtt, now, time_since; 180 193 181 - if (call->cong_mode != RXRPC_CALL_SLOW_START && 182 - call->cong_mode != RXRPC_CALL_CONGEST_AVOIDANCE) 194 + if (call->cong_ca_state != RXRPC_CA_SLOW_START && 195 + call->cong_ca_state != RXRPC_CA_CONGEST_AVOIDANCE) 183 196 return; 184 197 if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY) 185 198 return; 186 199 187 - rtt = ns_to_ktime(call->peer->srtt_us * (1000 / 8)); 200 + rtt = ns_to_ktime(call->srtt_us * (NSEC_PER_USEC / 8)); 188 201 now = ktime_get_real(); 189 - if (!ktime_before(ktime_add(call->tx_last_sent, rtt), now)) 202 + time_since = ktime_sub(now, call->tx_last_sent); 203 + if (ktime_before(time_since, rtt)) 190 204 return; 191 205 192 - trace_rxrpc_reset_cwnd(call, now); 206 + trace_rxrpc_reset_cwnd(call, time_since, rtt); 193 207 rxrpc_inc_stat(call->rxnet, stat_tx_data_cwnd_reset); 194 208 call->tx_last_sent = now; 195 - call->cong_mode = RXRPC_CALL_SLOW_START; 196 - call->cong_ssthresh = max_t(unsigned int, call->cong_ssthresh, 197 - call->cong_cwnd * 3 / 4); 198 - call->cong_cwnd = max_t(unsigned int, call->cong_cwnd / 2, RXRPC_MIN_CWND); 209 + call->cong_ca_state = RXRPC_CA_SLOW_START; 210 + call->cong_ssthresh = umax(call->cong_ssthresh, call->cong_cwnd * 3 / 4); 211 + call->cong_cwnd = umax(call->cong_cwnd / 2, RXRPC_MIN_CWND); 212 + } 213 + 214 + /* 215 + * Add an RTT sample derived from an ACK'd DATA packet. 216 + */ 217 + static void rxrpc_add_data_rtt_sample(struct rxrpc_call *call, 218 + struct rxrpc_ack_summary *summary, 219 + struct rxrpc_txqueue *tq, 220 + int ix) 221 + { 222 + ktime_t xmit_ts = ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]); 223 + 224 + rxrpc_call_add_rtt(call, rxrpc_rtt_rx_data_ack, -1, 225 + summary->acked_serial, summary->ack_serial, 226 + xmit_ts, call->acks_latest_ts); 227 + __clear_bit(ix, &tq->rtt_samples); /* Prevent repeat RTT sample */ 199 228 } 200 229 201 230 /* ··· 220 217 static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to, 221 218 struct rxrpc_ack_summary *summary) 222 219 { 223 - struct rxrpc_txbuf *txb; 224 - bool rot_last = false; 220 + struct rxrpc_txqueue *tq = call->tx_queue; 221 + rxrpc_seq_t seq = call->tx_bottom + 1; 222 + bool rot_last = false, trace = false; 225 223 226 - list_for_each_entry_rcu(txb, &call->tx_buffer, call_link, false) { 227 - if (before_eq(txb->seq, call->acks_hard_ack)) 228 - continue; 229 - if (txb->flags & RXRPC_LAST_PACKET) { 224 + _enter("%x,%x", call->tx_bottom, to); 225 + 226 + trace_rxrpc_tx_rotate(call, seq, to); 227 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate); 228 + 229 + if (call->acks_lowest_nak == call->tx_bottom) { 230 + call->acks_lowest_nak = to; 231 + } else if (after(to, call->acks_lowest_nak)) { 232 + summary->new_low_snack = true; 233 + call->acks_lowest_nak = to; 234 + } 235 + 236 + /* We may have a left over fully-consumed buffer at the front that we 237 + * couldn't drop before (rotate_and_keep below). 238 + */ 239 + if (seq == call->tx_qbase + RXRPC_NR_TXQUEUE) { 240 + call->tx_qbase += RXRPC_NR_TXQUEUE; 241 + call->tx_queue = tq->next; 242 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate_and_free); 243 + kfree(tq); 244 + tq = call->tx_queue; 245 + } 246 + 247 + do { 248 + unsigned int ix = seq - call->tx_qbase; 249 + 250 + _debug("tq=%x seq=%x i=%d f=%x", tq->qbase, seq, ix, tq->bufs[ix]->flags); 251 + if (tq->bufs[ix]->flags & RXRPC_LAST_PACKET) { 230 252 set_bit(RXRPC_CALL_TX_LAST, &call->flags); 231 253 rot_last = true; 232 254 } 233 - if (txb->seq == to) 234 - break; 235 - } 236 255 237 - if (rot_last) 256 + if (summary->acked_serial == tq->segment_serial[ix] && 257 + test_bit(ix, &tq->rtt_samples)) 258 + rxrpc_add_data_rtt_sample(call, summary, tq, ix); 259 + 260 + if (ix == tq->nr_reported_acks) { 261 + /* Packet directly hard ACK'd. */ 262 + tq->nr_reported_acks++; 263 + rxrpc_input_rack_one(call, summary, tq, ix); 264 + if (seq == call->tlp_seq) 265 + summary->tlp_probe_acked = true; 266 + summary->nr_new_hacks++; 267 + __set_bit(ix, &tq->segment_acked); 268 + trace_rxrpc_rotate(call, tq, summary, seq, rxrpc_rotate_trace_hack); 269 + } else if (test_bit(ix, &tq->segment_acked)) { 270 + /* Soft ACK -> hard ACK. */ 271 + call->acks_nr_sacks--; 272 + trace_rxrpc_rotate(call, tq, summary, seq, rxrpc_rotate_trace_sack); 273 + } else { 274 + /* Soft NAK -> hard ACK. */ 275 + call->acks_nr_snacks--; 276 + rxrpc_input_rack_one(call, summary, tq, ix); 277 + if (seq == call->tlp_seq) 278 + summary->tlp_probe_acked = true; 279 + summary->nr_new_hacks++; 280 + __set_bit(ix, &tq->segment_acked); 281 + trace_rxrpc_rotate(call, tq, summary, seq, rxrpc_rotate_trace_snak); 282 + } 283 + 284 + call->tx_nr_sent--; 285 + if (__test_and_clear_bit(ix, &tq->segment_lost)) 286 + call->tx_nr_lost--; 287 + if (__test_and_clear_bit(ix, &tq->segment_retransmitted)) 288 + call->tx_nr_resent--; 289 + __clear_bit(ix, &tq->ever_retransmitted); 290 + 291 + rxrpc_put_txbuf(tq->bufs[ix], rxrpc_txbuf_put_rotated); 292 + tq->bufs[ix] = NULL; 293 + 294 + WRITE_ONCE(call->tx_bottom, seq); 295 + trace_rxrpc_txqueue(call, (rot_last ? 296 + rxrpc_txqueue_rotate_last : 297 + rxrpc_txqueue_rotate)); 298 + 299 + seq++; 300 + trace = true; 301 + if (!(seq & RXRPC_TXQ_MASK)) { 302 + trace_rxrpc_rack_update(call, summary); 303 + trace = false; 304 + prefetch(tq->next); 305 + if (tq != call->tx_qtail) { 306 + call->tx_qbase += RXRPC_NR_TXQUEUE; 307 + call->tx_queue = tq->next; 308 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate_and_free); 309 + kfree(tq); 310 + tq = call->tx_queue; 311 + } else { 312 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate_and_keep); 313 + tq = NULL; 314 + break; 315 + } 316 + } 317 + 318 + } while (before_eq(seq, to)); 319 + 320 + if (trace) 321 + trace_rxrpc_rack_update(call, summary); 322 + 323 + if (rot_last) { 238 324 set_bit(RXRPC_CALL_TX_ALL_ACKED, &call->flags); 239 - 240 - _enter("%x,%x,%x,%d", to, call->acks_hard_ack, call->tx_top, rot_last); 241 - 242 - if (call->acks_lowest_nak == call->acks_hard_ack) { 243 - call->acks_lowest_nak = to; 244 - } else if (after(to, call->acks_lowest_nak)) { 245 - summary->new_low_nack = true; 246 - call->acks_lowest_nak = to; 325 + if (tq) { 326 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_rotate_and_free); 327 + kfree(tq); 328 + call->tx_queue = NULL; 329 + } 247 330 } 248 331 249 - smp_store_release(&call->acks_hard_ack, to); 332 + _debug("%x,%x,%x,%d", to, call->tx_bottom, call->tx_top, rot_last); 250 333 251 - trace_rxrpc_txqueue(call, (rot_last ? 252 - rxrpc_txqueue_rotate_last : 253 - rxrpc_txqueue_rotate)); 254 334 wake_up(&call->waitq); 255 335 return rot_last; 256 336 } ··· 349 263 { 350 264 ASSERT(test_bit(RXRPC_CALL_TX_LAST, &call->flags)); 351 265 352 - call->resend_at = KTIME_MAX; 353 - trace_rxrpc_timer_can(call, rxrpc_timer_trace_resend); 354 - 355 - if (unlikely(call->cong_last_nack)) { 356 - rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); 357 - call->cong_last_nack = NULL; 358 - } 266 + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_OFF; 267 + call->rack_timo_at = KTIME_MAX; 268 + trace_rxrpc_rack_timer(call, 0, false); 269 + trace_rxrpc_timer_can(call, rxrpc_timer_trace_rack_off + call->rack_timer_mode); 359 270 360 271 switch (__rxrpc_call_state(call)) { 361 272 case RXRPC_CALL_CLIENT_SEND_REQUEST: ··· 448 365 struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 449 366 bool last = sp->hdr.flags & RXRPC_LAST_PACKET; 450 367 451 - __skb_queue_tail(&call->recvmsg_queue, skb); 368 + skb_queue_tail(&call->recvmsg_queue, skb); 452 369 rxrpc_input_update_ack_window(call, window, wtop); 453 370 trace_rxrpc_receive(call, last ? why + 1 : why, sp->hdr.serial, sp->hdr.seq); 454 371 if (last) ··· 525 442 526 443 rxrpc_get_skb(skb, rxrpc_skb_get_to_recvmsg); 527 444 528 - spin_lock(&call->recvmsg_queue.lock); 529 445 rxrpc_input_queue_data(call, skb, window, wtop, rxrpc_receive_queue); 530 446 *_notify = true; 531 447 ··· 545 463 rxrpc_input_queue_data(call, oos, window, wtop, 546 464 rxrpc_receive_queue_oos); 547 465 } 548 - 549 - spin_unlock(&call->recvmsg_queue.lock); 550 466 551 467 call->ackr_sack_base = sack; 552 468 } else { ··· 610 530 unsigned int offset = sizeof(struct rxrpc_wire_header); 611 531 unsigned int len = skb->len - offset; 612 532 bool notify = false; 613 - int ack_reason = 0; 533 + int ack_reason = 0, count = 1, stat_ix; 614 534 615 535 while (sp->hdr.flags & RXRPC_JUMBO_PACKET) { 616 536 if (len < RXRPC_JUMBO_SUBPKTLEN) ··· 639 559 sp->hdr.serial++; 640 560 offset += RXRPC_JUMBO_SUBPKTLEN; 641 561 len -= RXRPC_JUMBO_SUBPKTLEN; 562 + count++; 642 563 } 643 564 644 565 sp->offset = offset; 645 566 sp->len = len; 646 567 rxrpc_input_data_one(call, skb, &notify, &ack_serial, &ack_reason); 568 + 569 + stat_ix = umin(count, ARRAY_SIZE(call->rxnet->stat_rx_jumbo)) - 1; 570 + atomic_inc(&call->rxnet->stat_rx_jumbo[stat_ix]); 647 571 648 572 if (ack_reason > 0) { 649 573 rxrpc_send_ACK(call, ack_reason, ack_serial, ··· 751 667 clear_bit(i + RXRPC_CALL_RTT_PEND_SHIFT, &call->rtt_avail); 752 668 smp_mb(); /* Read data before setting avail bit */ 753 669 set_bit(i, &call->rtt_avail); 754 - rxrpc_peer_add_rtt(call, type, i, acked_serial, ack_serial, 670 + rxrpc_call_add_rtt(call, type, i, acked_serial, ack_serial, 755 671 sent_at, resp_time); 756 672 matched = true; 757 673 } ··· 761 677 */ 762 678 if (after(acked_serial, orig_serial)) { 763 679 trace_rxrpc_rtt_rx(call, rxrpc_rtt_rx_obsolete, i, 764 - orig_serial, acked_serial, 0, 0); 680 + orig_serial, acked_serial, 0, 0, 0); 765 681 clear_bit(i + RXRPC_CALL_RTT_PEND_SHIFT, &call->rtt_avail); 766 682 smp_wmb(); 767 683 set_bit(i, &call->rtt_avail); ··· 769 685 } 770 686 771 687 if (!matched) 772 - trace_rxrpc_rtt_rx(call, rxrpc_rtt_rx_lost, 9, 0, acked_serial, 0, 0); 688 + trace_rxrpc_rtt_rx(call, rxrpc_rtt_rx_lost, 9, 0, acked_serial, 0, 0, 0); 773 689 } 774 690 775 691 /* ··· 779 695 struct rxrpc_acktrailer *trailer) 780 696 { 781 697 struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 782 - struct rxrpc_peer *peer; 783 - unsigned int mtu; 698 + struct rxrpc_peer *peer = call->peer; 699 + unsigned int max_data, capacity; 784 700 bool wake = false; 785 - u32 rwind = ntohl(trailer->rwind); 701 + u32 max_mtu = ntohl(trailer->maxMTU); 702 + //u32 if_mtu = ntohl(trailer->ifMTU); 703 + u32 rwind = ntohl(trailer->rwind); 704 + u32 jumbo_max = ntohl(trailer->jumbo_max); 786 705 787 706 if (rwind > RXRPC_TX_MAX_WINDOW) 788 707 rwind = RXRPC_TX_MAX_WINDOW; ··· 796 709 call->tx_winsize = rwind; 797 710 } 798 711 799 - mtu = min(ntohl(trailer->maxMTU), ntohl(trailer->ifMTU)); 712 + max_mtu = clamp(max_mtu, 500, 65535); 713 + peer->ackr_max_data = max_mtu; 800 714 801 - peer = call->peer; 802 - if (mtu < peer->maxdata) { 803 - spin_lock(&peer->lock); 804 - peer->maxdata = mtu; 805 - peer->mtu = mtu + peer->hdrsize; 806 - spin_unlock(&peer->lock); 715 + if (max_mtu < peer->max_data) { 716 + trace_rxrpc_pmtud_reduce(peer, sp->hdr.serial, max_mtu, 717 + rxrpc_pmtud_reduce_ack); 718 + write_seqcount_begin(&peer->mtu_lock); 719 + peer->max_data = max_mtu; 720 + write_seqcount_end(&peer->mtu_lock); 807 721 } 722 + 723 + max_data = umin(max_mtu, peer->max_data); 724 + capacity = max_data; 725 + capacity += sizeof(struct rxrpc_jumbo_header); /* First subpacket has main hdr, not jumbo */ 726 + capacity /= sizeof(struct rxrpc_jumbo_header) + RXRPC_JUMBO_DATALEN; 727 + 728 + if (jumbo_max == 0) { 729 + /* The peer says it supports pmtu discovery */ 730 + peer->ackr_adv_pmtud = true; 731 + } else { 732 + peer->ackr_adv_pmtud = false; 733 + capacity = clamp(capacity, 1, jumbo_max); 734 + } 735 + 736 + call->tx_jumbo_max = capacity; 808 737 809 738 if (wake) 810 739 wake_up(&call->waitq); 811 740 } 812 741 742 + #if defined(CONFIG_X86) && __GNUC__ && !defined(__clang__) 743 + /* Clang doesn't support the %z constraint modifier */ 744 + #define shiftr_adv_rotr(shift_from, rotate_into) ({ \ 745 + asm(" shr%z1 %1\n" \ 746 + " inc %0\n" \ 747 + " rcr%z2 %2\n" \ 748 + : "+d"(shift_from), "+m"(*(shift_from)), "+rm"(rotate_into) \ 749 + ); \ 750 + }) 751 + #else 752 + #define shiftr_adv_rotr(shift_from, rotate_into) ({ \ 753 + typeof(rotate_into) __bit0 = *(shift_from) & 1; \ 754 + *(shift_from) >>= 1; \ 755 + shift_from++; \ 756 + rotate_into >>= 1; \ 757 + rotate_into |= __bit0 << (sizeof(rotate_into) * 8 - 1); \ 758 + }) 759 + #endif 760 + 813 761 /* 814 - * Determine how many nacks from the previous ACK have now been satisfied. 762 + * Deal with RTT samples from soft ACKs. 815 763 */ 816 - static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call, 817 - struct rxrpc_ack_summary *summary, 818 - rxrpc_seq_t seq) 764 + static void rxrpc_input_soft_rtt(struct rxrpc_call *call, 765 + struct rxrpc_ack_summary *summary, 766 + struct rxrpc_txqueue *tq) 819 767 { 820 - struct sk_buff *skb = call->cong_last_nack; 821 - struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 822 - unsigned int i, new_acks = 0, retained_nacks = 0; 823 - rxrpc_seq_t old_seq = sp->ack.first_ack; 824 - u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); 768 + for (int ix = 0; ix < RXRPC_NR_TXQUEUE; ix++) 769 + if (summary->acked_serial == tq->segment_serial[ix]) 770 + return rxrpc_add_data_rtt_sample(call, summary, tq, ix); 771 + } 825 772 826 - if (after_eq(seq, old_seq + sp->ack.nr_acks)) { 827 - summary->nr_new_acks += sp->ack.nr_nacks; 828 - summary->nr_new_acks += seq - (old_seq + sp->ack.nr_acks); 829 - summary->nr_retained_nacks = 0; 830 - } else if (seq == old_seq) { 831 - summary->nr_retained_nacks = sp->ack.nr_nacks; 832 - } else { 833 - for (i = 0; i < sp->ack.nr_acks; i++) { 834 - if (acks[i] == RXRPC_ACK_TYPE_NACK) { 835 - if (before(old_seq + i, seq)) 836 - new_acks++; 837 - else 838 - retained_nacks++; 839 - } 840 - } 773 + /* 774 + * Process a batch of soft ACKs specific to a transmission queue segment. 775 + */ 776 + static void rxrpc_input_soft_ack_tq(struct rxrpc_call *call, 777 + struct rxrpc_ack_summary *summary, 778 + struct rxrpc_txqueue *tq, 779 + unsigned long extracted_acks, 780 + int nr_reported, 781 + rxrpc_seq_t seq, 782 + rxrpc_seq_t *lowest_nak) 783 + { 784 + unsigned long old_reported = 0, flipped, new_acks = 0; 785 + unsigned long a_to_n, n_to_a = 0; 786 + int new, a, n; 841 787 842 - summary->nr_new_acks += new_acks; 843 - summary->nr_retained_nacks = retained_nacks; 788 + if (tq->nr_reported_acks > 0) 789 + old_reported = ~0UL >> (RXRPC_NR_TXQUEUE - tq->nr_reported_acks); 790 + 791 + _enter("{%x,%lx,%d},%lx,%d,%x", 792 + tq->qbase, tq->segment_acked, tq->nr_reported_acks, 793 + extracted_acks, nr_reported, seq); 794 + 795 + _debug("[%x]", tq->qbase); 796 + _debug("tq %16lx %u", tq->segment_acked, tq->nr_reported_acks); 797 + _debug("sack %16lx %u", extracted_acks, nr_reported); 798 + 799 + /* See how many previously logged ACKs/NAKs have flipped. */ 800 + flipped = (tq->segment_acked ^ extracted_acks) & old_reported; 801 + if (flipped) { 802 + n_to_a = ~tq->segment_acked & flipped; /* Old NAK -> ACK */ 803 + a_to_n = tq->segment_acked & flipped; /* Old ACK -> NAK */ 804 + a = hweight_long(n_to_a); 805 + n = hweight_long(a_to_n); 806 + _debug("flip %16lx", flipped); 807 + _debug("ntoa %16lx %d", n_to_a, a); 808 + _debug("aton %16lx %d", a_to_n, n); 809 + call->acks_nr_sacks += a - n; 810 + call->acks_nr_snacks += n - a; 811 + summary->nr_new_sacks += a; 812 + summary->nr_new_snacks += n; 844 813 } 845 814 846 - return old_seq + sp->ack.nr_acks; 815 + /* See how many new ACKs/NAKs have been acquired. */ 816 + new = nr_reported - tq->nr_reported_acks; 817 + if (new > 0) { 818 + new_acks = extracted_acks & ~old_reported; 819 + if (new_acks) { 820 + a = hweight_long(new_acks); 821 + n = new - a; 822 + _debug("new_a %16lx new=%d a=%d n=%d", new_acks, new, a, n); 823 + call->acks_nr_sacks += a; 824 + call->acks_nr_snacks += n; 825 + summary->nr_new_sacks += a; 826 + summary->nr_new_snacks += n; 827 + } else { 828 + call->acks_nr_snacks += new; 829 + summary->nr_new_snacks += new; 830 + } 831 + } 832 + 833 + tq->nr_reported_acks = nr_reported; 834 + tq->segment_acked = extracted_acks; 835 + trace_rxrpc_apply_acks(call, tq); 836 + 837 + if (extracted_acks != ~0UL) { 838 + rxrpc_seq_t lowest = seq + ffz(extracted_acks); 839 + 840 + if (before(lowest, *lowest_nak)) 841 + *lowest_nak = lowest; 842 + } 843 + 844 + if (summary->acked_serial) 845 + rxrpc_input_soft_rtt(call, summary, tq); 846 + 847 + new_acks |= n_to_a; 848 + if (new_acks) 849 + rxrpc_input_rack(call, summary, tq, new_acks); 850 + 851 + if (call->tlp_serial && 852 + rxrpc_seq_in_txq(tq, call->tlp_seq) && 853 + test_bit(call->tlp_seq - tq->qbase, &new_acks)) 854 + summary->tlp_probe_acked = true; 847 855 } 848 856 849 857 /* ··· 952 770 */ 953 771 static void rxrpc_input_soft_acks(struct rxrpc_call *call, 954 772 struct rxrpc_ack_summary *summary, 955 - struct sk_buff *skb, 956 - rxrpc_seq_t seq, 957 - rxrpc_seq_t since) 773 + struct sk_buff *skb) 958 774 { 959 775 struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 960 - unsigned int i, old_nacks = 0; 776 + struct rxrpc_txqueue *tq = call->tx_queue; 777 + unsigned long extracted = ~0UL; 778 + unsigned int nr = 0; 779 + rxrpc_seq_t seq = call->acks_hard_ack + 1; 961 780 rxrpc_seq_t lowest_nak = seq + sp->ack.nr_acks; 962 781 u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); 963 782 964 - for (i = 0; i < sp->ack.nr_acks; i++) { 965 - if (acks[i] == RXRPC_ACK_TYPE_ACK) { 966 - summary->nr_acks++; 967 - if (after_eq(seq, since)) 968 - summary->nr_new_acks++; 969 - } else { 970 - summary->saw_nacks = true; 971 - if (before(seq, since)) { 972 - /* Overlap with previous ACK */ 973 - old_nacks++; 974 - } else { 975 - summary->nr_new_nacks++; 976 - sp->ack.nr_nacks++; 977 - } 783 + _enter("%x,%x,%u", tq->qbase, seq, sp->ack.nr_acks); 978 784 979 - if (before(seq, lowest_nak)) 980 - lowest_nak = seq; 785 + while (after(seq, tq->qbase + RXRPC_NR_TXQUEUE - 1)) 786 + tq = tq->next; 787 + 788 + for (unsigned int i = 0; i < sp->ack.nr_acks; i++) { 789 + /* Decant ACKs until we hit a txqueue boundary. */ 790 + shiftr_adv_rotr(acks, extracted); 791 + if (i == 256) { 792 + acks -= i; 793 + i = 0; 981 794 } 982 795 seq++; 796 + nr++; 797 + if ((seq & RXRPC_TXQ_MASK) != 0) 798 + continue; 799 + 800 + _debug("bound %16lx %u", extracted, nr); 801 + 802 + rxrpc_input_soft_ack_tq(call, summary, tq, extracted, RXRPC_NR_TXQUEUE, 803 + seq - RXRPC_NR_TXQUEUE, &lowest_nak); 804 + extracted = ~0UL; 805 + nr = 0; 806 + tq = tq->next; 807 + prefetch(tq); 983 808 } 984 809 985 - if (lowest_nak != call->acks_lowest_nak) { 986 - call->acks_lowest_nak = lowest_nak; 987 - summary->new_low_nack = true; 810 + if (nr) { 811 + unsigned int nr_reported = seq & RXRPC_TXQ_MASK; 812 + 813 + extracted >>= RXRPC_NR_TXQUEUE - nr_reported; 814 + _debug("tail %16lx %u", extracted, nr_reported); 815 + rxrpc_input_soft_ack_tq(call, summary, tq, extracted, nr_reported, 816 + seq & ~RXRPC_TXQ_MASK, &lowest_nak); 988 817 } 989 818 990 819 /* We *can* have more nacks than we did - the peer is permitted to drop ··· 1003 810 * possible for the nack distribution to change whilst the number of 1004 811 * nacks stays the same or goes down. 1005 812 */ 1006 - if (old_nacks < summary->nr_retained_nacks) 1007 - summary->nr_new_acks += summary->nr_retained_nacks - old_nacks; 1008 - summary->nr_retained_nacks = old_nacks; 813 + if (lowest_nak != call->acks_lowest_nak) { 814 + call->acks_lowest_nak = lowest_nak; 815 + summary->new_low_snack = true; 816 + } 817 + 818 + _debug("summary A=%d+%d N=%d+%d", 819 + call->acks_nr_sacks, summary->nr_new_sacks, 820 + call->acks_nr_snacks, summary->nr_new_snacks); 1009 821 } 1010 822 1011 823 /* ··· 1018 820 * with respect to the ack state conveyed by preceding ACKs. 1019 821 */ 1020 822 static bool rxrpc_is_ack_valid(struct rxrpc_call *call, 1021 - rxrpc_seq_t first_pkt, rxrpc_seq_t prev_pkt) 823 + rxrpc_seq_t hard_ack, rxrpc_seq_t prev_pkt) 1022 824 { 1023 - rxrpc_seq_t base = READ_ONCE(call->acks_first_seq); 825 + rxrpc_seq_t base = READ_ONCE(call->acks_hard_ack); 1024 826 1025 - if (after(first_pkt, base)) 827 + if (after(hard_ack, base)) 1026 828 return true; /* The window advanced */ 1027 829 1028 - if (before(first_pkt, base)) 830 + if (before(hard_ack, base)) 1029 831 return false; /* firstPacket regressed */ 1030 832 1031 833 if (after_eq(prev_pkt, call->acks_prev_seq)) 1032 834 return true; /* previousPacket hasn't regressed. */ 1033 835 1034 836 /* Some rx implementations put a serial number in previousPacket. */ 1035 - if (after_eq(prev_pkt, base + call->tx_winsize)) 837 + if (after(prev_pkt, base + call->tx_winsize)) 1036 838 return false; 1037 839 return true; 1038 840 } ··· 1050 852 static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb) 1051 853 { 1052 854 struct rxrpc_ack_summary summary = { 0 }; 1053 - struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 1054 855 struct rxrpc_acktrailer trailer; 1055 - rxrpc_serial_t ack_serial, acked_serial; 1056 - rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt, since; 856 + struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 857 + rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt; 1057 858 int nr_acks, offset, ioffset; 1058 859 1059 860 _enter(""); 1060 861 1061 862 offset = sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket); 1062 863 1063 - ack_serial = sp->hdr.serial; 1064 - acked_serial = sp->ack.acked_serial; 1065 - first_soft_ack = sp->ack.first_ack; 1066 - prev_pkt = sp->ack.prev_ack; 1067 - nr_acks = sp->ack.nr_acks; 1068 - hard_ack = first_soft_ack - 1; 1069 - summary.ack_reason = (sp->ack.reason < RXRPC_ACK__INVALID ? 1070 - sp->ack.reason : RXRPC_ACK__INVALID); 864 + summary.ack_serial = sp->hdr.serial; 865 + first_soft_ack = sp->ack.first_ack; 866 + prev_pkt = sp->ack.prev_ack; 867 + nr_acks = sp->ack.nr_acks; 868 + hard_ack = first_soft_ack - 1; 869 + summary.acked_serial = sp->ack.acked_serial; 870 + summary.ack_reason = (sp->ack.reason < RXRPC_ACK__INVALID ? 871 + sp->ack.reason : RXRPC_ACK__INVALID); 1071 872 1072 - trace_rxrpc_rx_ack(call, ack_serial, acked_serial, 1073 - first_soft_ack, prev_pkt, 1074 - summary.ack_reason, nr_acks); 873 + trace_rxrpc_rx_ack(call, sp); 1075 874 rxrpc_inc_stat(call->rxnet, stat_rx_acks[summary.ack_reason]); 1076 - 1077 - if (acked_serial != 0) { 1078 - switch (summary.ack_reason) { 1079 - case RXRPC_ACK_PING_RESPONSE: 1080 - rxrpc_complete_rtt_probe(call, skb->tstamp, acked_serial, ack_serial, 1081 - rxrpc_rtt_rx_ping_response); 1082 - break; 1083 - case RXRPC_ACK_REQUESTED: 1084 - rxrpc_complete_rtt_probe(call, skb->tstamp, acked_serial, ack_serial, 1085 - rxrpc_rtt_rx_requested_ack); 1086 - break; 1087 - default: 1088 - rxrpc_complete_rtt_probe(call, skb->tstamp, acked_serial, ack_serial, 1089 - rxrpc_rtt_rx_other_ack); 1090 - break; 1091 - } 1092 - } 875 + prefetch(call->tx_queue); 1093 876 1094 877 /* If we get an EXCEEDS_WINDOW ACK from the server, it probably 1095 878 * indicates that the client address changed due to NAT. The server 1096 879 * lost the call because it switched to a different peer. 1097 880 */ 1098 881 if (unlikely(summary.ack_reason == RXRPC_ACK_EXCEEDS_WINDOW) && 1099 - first_soft_ack == 1 && 882 + hard_ack == 0 && 1100 883 prev_pkt == 0 && 1101 884 rxrpc_is_client_call(call)) { 1102 885 rxrpc_set_call_completion(call, RXRPC_CALL_REMOTELY_ABORTED, ··· 1090 911 * if we still have it buffered to the beginning. 1091 912 */ 1092 913 if (unlikely(summary.ack_reason == RXRPC_ACK_OUT_OF_SEQUENCE) && 1093 - first_soft_ack == 1 && 914 + hard_ack == 0 && 1094 915 prev_pkt == 0 && 1095 - call->acks_hard_ack == 0 && 916 + call->tx_bottom == 0 && 1096 917 rxrpc_is_client_call(call)) { 1097 918 rxrpc_set_call_completion(call, RXRPC_CALL_REMOTELY_ABORTED, 1098 919 0, -ENETRESET); ··· 1100 921 } 1101 922 1102 923 /* Discard any out-of-order or duplicate ACKs (outside lock). */ 1103 - if (!rxrpc_is_ack_valid(call, first_soft_ack, prev_pkt)) { 1104 - trace_rxrpc_rx_discard_ack(call->debug_id, ack_serial, 1105 - first_soft_ack, call->acks_first_seq, 1106 - prev_pkt, call->acks_prev_seq); 1107 - goto send_response; 924 + if (!rxrpc_is_ack_valid(call, hard_ack, prev_pkt)) { 925 + trace_rxrpc_rx_discard_ack(call, summary.ack_serial, hard_ack, prev_pkt); 926 + goto send_response; /* Still respond if requested. */ 1108 927 } 1109 928 1110 929 trailer.maxMTU = 0; ··· 1114 937 if (nr_acks > 0) 1115 938 skb_condense(skb); 1116 939 1117 - if (call->cong_last_nack) { 1118 - since = rxrpc_input_check_prev_ack(call, &summary, first_soft_ack); 1119 - rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack); 1120 - call->cong_last_nack = NULL; 1121 - } else { 1122 - summary.nr_new_acks = first_soft_ack - call->acks_first_seq; 1123 - call->acks_lowest_nak = first_soft_ack + nr_acks; 1124 - since = first_soft_ack; 1125 - } 1126 - 1127 - call->acks_latest_ts = skb->tstamp; 1128 - call->acks_first_seq = first_soft_ack; 940 + call->acks_latest_ts = ktime_get_real(); 941 + call->acks_hard_ack = hard_ack; 1129 942 call->acks_prev_seq = prev_pkt; 1130 943 1131 - switch (summary.ack_reason) { 1132 - case RXRPC_ACK_PING: 1133 - break; 1134 - default: 1135 - if (acked_serial && after(acked_serial, call->acks_highest_serial)) 1136 - call->acks_highest_serial = acked_serial; 1137 - break; 944 + if (summary.acked_serial) { 945 + switch (summary.ack_reason) { 946 + case RXRPC_ACK_PING_RESPONSE: 947 + rxrpc_complete_rtt_probe(call, call->acks_latest_ts, 948 + summary.acked_serial, summary.ack_serial, 949 + rxrpc_rtt_rx_ping_response); 950 + break; 951 + default: 952 + if (after(summary.acked_serial, call->acks_highest_serial)) 953 + call->acks_highest_serial = summary.acked_serial; 954 + summary.rtt_sample_avail = true; 955 + break; 956 + } 1138 957 } 1139 958 1140 959 /* Parse rwind and mtu sizes if provided. */ 1141 960 if (trailer.maxMTU) 1142 961 rxrpc_input_ack_trailer(call, skb, &trailer); 1143 962 1144 - if (first_soft_ack == 0) 963 + if (hard_ack + 1 == 0) 1145 964 return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_zero); 1146 965 1147 966 /* Ignore ACKs unless we are or have just been transmitting. */ ··· 1151 978 goto send_response; 1152 979 } 1153 980 1154 - if (before(hard_ack, call->acks_hard_ack) || 981 + if (before(hard_ack, call->tx_bottom) || 1155 982 after(hard_ack, call->tx_top)) 1156 983 return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_outside_window); 1157 984 if (nr_acks > call->tx_top - hard_ack) 1158 985 return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_sack_overflow); 1159 986 1160 - if (after(hard_ack, call->acks_hard_ack)) { 987 + if (after(hard_ack, call->tx_bottom)) { 1161 988 if (rxrpc_rotate_tx_window(call, hard_ack, &summary)) { 1162 989 rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ack); 1163 990 goto send_response; ··· 1167 994 if (nr_acks > 0) { 1168 995 if (offset > (int)skb->len - nr_acks) 1169 996 return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_short_sack); 1170 - rxrpc_input_soft_acks(call, &summary, skb, first_soft_ack, since); 1171 - rxrpc_get_skb(skb, rxrpc_skb_get_last_nack); 1172 - call->cong_last_nack = skb; 997 + rxrpc_input_soft_acks(call, &summary, skb); 1173 998 } 1174 999 1175 1000 if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) && 1176 - summary.nr_acks == call->tx_top - hard_ack && 1001 + call->acks_nr_sacks == call->tx_top - hard_ack && 1177 1002 rxrpc_is_client_call(call)) 1178 - rxrpc_propose_ping(call, ack_serial, 1003 + rxrpc_propose_ping(call, summary.ack_serial, 1179 1004 rxrpc_propose_ack_ping_for_lost_reply); 1180 1005 1181 - rxrpc_congestion_management(call, skb, &summary, acked_serial); 1006 + /* Drive the congestion management algorithm first and then RACK-TLP as 1007 + * the latter depends on the state/change in state in the former. 1008 + */ 1009 + rxrpc_congestion_management(call, &summary); 1010 + rxrpc_rack_detect_loss_and_arm_timer(call, &summary); 1011 + rxrpc_tlp_process_ack(call, &summary); 1012 + if (call->tlp_serial && after_eq(summary.acked_serial, call->tlp_serial)) 1013 + call->tlp_serial = 0; 1182 1014 1183 1015 send_response: 1184 1016 if (summary.ack_reason == RXRPC_ACK_PING) 1185 - rxrpc_send_ACK(call, RXRPC_ACK_PING_RESPONSE, ack_serial, 1017 + rxrpc_send_ACK(call, RXRPC_ACK_PING_RESPONSE, summary.ack_serial, 1186 1018 rxrpc_propose_ack_respond_to_ping); 1187 1019 else if (sp->hdr.flags & RXRPC_REQUEST_ACK) 1188 - rxrpc_send_ACK(call, RXRPC_ACK_REQUESTED, ack_serial, 1020 + rxrpc_send_ACK(call, RXRPC_ACK_REQUESTED, summary.ack_serial, 1189 1021 rxrpc_propose_ack_respond_to_ack); 1190 1022 } 1191 1023 ··· 1289 1111 break; 1290 1112 } 1291 1113 1292 - rxrpc_input_call_event(call, skb); 1114 + rxrpc_input_call_event(call); 1293 1115 }
+418
net/rxrpc/input_rack.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* RACK-TLP [RFC8958] Implementation 3 + * 4 + * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved. 5 + * Written by David Howells (dhowells@redhat.com) 6 + */ 7 + 8 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 9 + 10 + #include "ar-internal.h" 11 + 12 + static bool rxrpc_rack_sent_after(ktime_t t1, rxrpc_seq_t seq1, 13 + ktime_t t2, rxrpc_seq_t seq2) 14 + { 15 + if (ktime_after(t1, t2)) 16 + return true; 17 + return t1 == t2 && after(seq1, seq2); 18 + } 19 + 20 + /* 21 + * Mark a packet lost. 22 + */ 23 + static void rxrpc_rack_mark_lost(struct rxrpc_call *call, 24 + struct rxrpc_txqueue *tq, unsigned int ix) 25 + { 26 + if (__test_and_set_bit(ix, &tq->segment_lost)) { 27 + if (__test_and_clear_bit(ix, &tq->segment_retransmitted)) 28 + call->tx_nr_resent--; 29 + } else { 30 + call->tx_nr_lost++; 31 + } 32 + tq->segment_xmit_ts[ix] = UINT_MAX; 33 + } 34 + 35 + /* 36 + * Get the transmission time of a packet in the Tx queue. 37 + */ 38 + static ktime_t rxrpc_get_xmit_ts(const struct rxrpc_txqueue *tq, unsigned int ix) 39 + { 40 + if (tq->segment_xmit_ts[ix] == UINT_MAX) 41 + return KTIME_MAX; 42 + return ktime_add_us(tq->xmit_ts_base, tq->segment_xmit_ts[ix]); 43 + } 44 + 45 + /* 46 + * Get a bitmask of nack bits for a queue segment and mask off any that aren't 47 + * yet reported. 48 + */ 49 + static unsigned long rxrpc_tq_nacks(const struct rxrpc_txqueue *tq) 50 + { 51 + unsigned long nacks = ~tq->segment_acked; 52 + 53 + if (tq->nr_reported_acks < RXRPC_NR_TXQUEUE) 54 + nacks &= (1UL << tq->nr_reported_acks) - 1; 55 + return nacks; 56 + } 57 + 58 + /* 59 + * Update the RACK state for the most recently sent packet that has been 60 + * delivered [RFC8958 6.2 Step 2]. 61 + */ 62 + static void rxrpc_rack_update(struct rxrpc_call *call, 63 + struct rxrpc_ack_summary *summary, 64 + struct rxrpc_txqueue *tq, 65 + unsigned int ix) 66 + { 67 + rxrpc_seq_t seq = tq->qbase + ix; 68 + ktime_t xmit_ts = rxrpc_get_xmit_ts(tq, ix); 69 + ktime_t rtt = ktime_sub(call->acks_latest_ts, xmit_ts); 70 + 71 + if (__test_and_clear_bit(ix, &tq->segment_lost)) 72 + call->tx_nr_lost--; 73 + 74 + if (test_bit(ix, &tq->segment_retransmitted)) { 75 + /* Use Rx.serial instead of TCP.ACK.ts_option.echo_reply. */ 76 + if (before(call->acks_highest_serial, tq->segment_serial[ix])) 77 + return; 78 + if (rtt < minmax_get(&call->min_rtt)) 79 + return; 80 + } 81 + 82 + /* The RACK algorithm requires the segment ACKs to be traversed in 83 + * order of segment transmission - but the only thing this seems to 84 + * matter for is that RACK.rtt is set to the rtt of the most recently 85 + * transmitted segment. We should be able to achieve the same by only 86 + * setting RACK.rtt if the xmit time is greater. 87 + */ 88 + if (ktime_after(xmit_ts, call->rack_rtt_ts)) { 89 + call->rack_rtt = rtt; 90 + call->rack_rtt_ts = xmit_ts; 91 + } 92 + 93 + if (rxrpc_rack_sent_after(xmit_ts, seq, call->rack_xmit_ts, call->rack_end_seq)) { 94 + call->rack_rtt = rtt; 95 + call->rack_xmit_ts = xmit_ts; 96 + call->rack_end_seq = seq; 97 + } 98 + } 99 + 100 + /* 101 + * Detect data segment reordering [RFC8958 6.2 Step 3]. 102 + */ 103 + static void rxrpc_rack_detect_reordering(struct rxrpc_call *call, 104 + struct rxrpc_ack_summary *summary, 105 + struct rxrpc_txqueue *tq, 106 + unsigned int ix) 107 + { 108 + rxrpc_seq_t seq = tq->qbase + ix; 109 + 110 + /* Track the highest sequence number so far ACK'd. This is not 111 + * necessarily the same as ack.firstPacket + ack.nAcks - 1 as the peer 112 + * could put a NACK in the last SACK slot. 113 + */ 114 + if (after(seq, call->rack_fack)) 115 + call->rack_fack = seq; 116 + else if (before(seq, call->rack_fack) && 117 + test_bit(ix, &tq->segment_retransmitted)) 118 + call->rack_reordering_seen = true; 119 + } 120 + 121 + void rxrpc_input_rack_one(struct rxrpc_call *call, 122 + struct rxrpc_ack_summary *summary, 123 + struct rxrpc_txqueue *tq, 124 + unsigned int ix) 125 + { 126 + rxrpc_rack_update(call, summary, tq, ix); 127 + rxrpc_rack_detect_reordering(call, summary, tq, ix); 128 + } 129 + 130 + void rxrpc_input_rack(struct rxrpc_call *call, 131 + struct rxrpc_ack_summary *summary, 132 + struct rxrpc_txqueue *tq, 133 + unsigned long new_acks) 134 + { 135 + while (new_acks) { 136 + unsigned int ix = __ffs(new_acks); 137 + 138 + __clear_bit(ix, &new_acks); 139 + rxrpc_input_rack_one(call, summary, tq, ix); 140 + } 141 + 142 + trace_rxrpc_rack_update(call, summary); 143 + } 144 + 145 + /* 146 + * Update the reordering window [RFC8958 6.2 Step 4]. Returns the updated 147 + * duration of the reordering window. 148 + * 149 + * Note that the Rx protocol doesn't have a 'DSACK option' per se, but ACKs can 150 + * be given a 'DUPLICATE' reason with the serial number referring to the 151 + * duplicated DATA packet. Rx does not inform as to whether this was a 152 + * reception of the same packet twice or of a retransmission of a packet we 153 + * already received (though this could be determined by the transmitter based 154 + * on the serial number). 155 + */ 156 + static ktime_t rxrpc_rack_update_reo_wnd(struct rxrpc_call *call, 157 + struct rxrpc_ack_summary *summary) 158 + { 159 + rxrpc_seq_t snd_una = call->acks_lowest_nak; /* Lowest unack'd seq */ 160 + rxrpc_seq_t snd_nxt = call->tx_transmitted + 1; /* Next seq to be sent */ 161 + bool have_dsack_option = summary->ack_reason == RXRPC_ACK_DUPLICATE; 162 + int dup_thresh = 3; 163 + 164 + /* DSACK-based reordering window adaptation */ 165 + if (!call->rack_dsack_round_none && 166 + after_eq(snd_una, call->rack_dsack_round)) 167 + call->rack_dsack_round_none = true; 168 + 169 + /* Grow the reordering window per round that sees DSACK. Reset the 170 + * window after 16 DSACK-free recoveries. 171 + */ 172 + if (call->rack_dsack_round_none && have_dsack_option) { 173 + call->rack_dsack_round_none = false; 174 + call->rack_dsack_round = snd_nxt; 175 + call->rack_reo_wnd_mult++; 176 + call->rack_reo_wnd_persist = 16; 177 + } else if (summary->exiting_fast_or_rto_recovery) { 178 + call->rack_reo_wnd_persist--; 179 + if (call->rack_reo_wnd_persist <= 0) 180 + call->rack_reo_wnd_mult = 1; 181 + } 182 + 183 + if (!call->rack_reordering_seen) { 184 + if (summary->in_fast_or_rto_recovery) 185 + return 0; 186 + if (call->acks_nr_sacks >= dup_thresh) 187 + return 0; 188 + } 189 + 190 + return us_to_ktime(umin(call->rack_reo_wnd_mult * minmax_get(&call->min_rtt) / 4, 191 + call->srtt_us >> 3)); 192 + } 193 + 194 + /* 195 + * Detect losses [RFC8958 6.2 Step 5]. 196 + */ 197 + static ktime_t rxrpc_rack_detect_loss(struct rxrpc_call *call, 198 + struct rxrpc_ack_summary *summary) 199 + { 200 + struct rxrpc_txqueue *tq; 201 + ktime_t timeout = 0, lost_after, now = ktime_get_real(); 202 + 203 + call->rack_reo_wnd = rxrpc_rack_update_reo_wnd(call, summary); 204 + lost_after = ktime_add(call->rack_rtt, call->rack_reo_wnd); 205 + trace_rxrpc_rack_scan_loss(call); 206 + 207 + for (tq = call->tx_queue; tq; tq = tq->next) { 208 + unsigned long nacks = rxrpc_tq_nacks(tq); 209 + 210 + if (after(tq->qbase, call->tx_transmitted)) 211 + break; 212 + trace_rxrpc_rack_scan_loss_tq(call, tq, nacks); 213 + 214 + /* Skip ones marked lost but not yet retransmitted */ 215 + nacks &= ~tq->segment_lost | tq->segment_retransmitted; 216 + 217 + while (nacks) { 218 + unsigned int ix = __ffs(nacks); 219 + rxrpc_seq_t seq = tq->qbase + ix; 220 + ktime_t remaining; 221 + ktime_t xmit_ts = rxrpc_get_xmit_ts(tq, ix); 222 + 223 + __clear_bit(ix, &nacks); 224 + 225 + if (rxrpc_rack_sent_after(call->rack_xmit_ts, call->rack_end_seq, 226 + xmit_ts, seq)) { 227 + remaining = ktime_sub(ktime_add(xmit_ts, lost_after), now); 228 + if (remaining <= 0) { 229 + rxrpc_rack_mark_lost(call, tq, ix); 230 + trace_rxrpc_rack_detect_loss(call, summary, seq); 231 + } else { 232 + timeout = max(remaining, timeout); 233 + } 234 + } 235 + } 236 + } 237 + 238 + return timeout; 239 + } 240 + 241 + /* 242 + * Detect losses and set a timer to retry the detection [RFC8958 6.2 Step 5]. 243 + */ 244 + void rxrpc_rack_detect_loss_and_arm_timer(struct rxrpc_call *call, 245 + struct rxrpc_ack_summary *summary) 246 + { 247 + ktime_t timeout = rxrpc_rack_detect_loss(call, summary); 248 + 249 + if (timeout) { 250 + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_RACK_REORDER; 251 + call->rack_timo_at = ktime_add(ktime_get_real(), timeout); 252 + trace_rxrpc_rack_timer(call, timeout, false); 253 + trace_rxrpc_timer_set(call, timeout, rxrpc_timer_trace_rack_reo); 254 + } 255 + } 256 + 257 + /* 258 + * Handle RACK-TLP RTO expiration [RFC8958 6.3]. 259 + */ 260 + static void rxrpc_rack_mark_losses_on_rto(struct rxrpc_call *call) 261 + { 262 + struct rxrpc_txqueue *tq; 263 + rxrpc_seq_t snd_una = call->acks_lowest_nak; /* Lowest unack'd seq */ 264 + ktime_t lost_after = ktime_add(call->rack_rtt, call->rack_reo_wnd); 265 + ktime_t deadline = ktime_sub(ktime_get_real(), lost_after); 266 + 267 + for (tq = call->tx_queue; tq; tq = tq->next) { 268 + unsigned long unacked = ~tq->segment_acked; 269 + 270 + trace_rxrpc_rack_mark_loss_tq(call, tq); 271 + while (unacked) { 272 + unsigned int ix = __ffs(unacked); 273 + rxrpc_seq_t seq = tq->qbase + ix; 274 + ktime_t xmit_ts = rxrpc_get_xmit_ts(tq, ix); 275 + 276 + if (after(seq, call->tx_transmitted)) 277 + return; 278 + __clear_bit(ix, &unacked); 279 + 280 + if (seq == snd_una || 281 + ktime_before(xmit_ts, deadline)) 282 + rxrpc_rack_mark_lost(call, tq, ix); 283 + } 284 + } 285 + } 286 + 287 + /* 288 + * Calculate the TLP loss probe timeout (PTO) [RFC8958 7.2]. 289 + */ 290 + ktime_t rxrpc_tlp_calc_pto(struct rxrpc_call *call, ktime_t now) 291 + { 292 + unsigned int flight_size = rxrpc_tx_in_flight(call); 293 + ktime_t rto_at = ktime_add(call->tx_last_sent, 294 + rxrpc_get_rto_backoff(call, false)); 295 + ktime_t pto; 296 + 297 + if (call->rtt_count > 0) { 298 + /* Use 2*SRTT as the timeout. */ 299 + pto = ns_to_ktime(call->srtt_us * NSEC_PER_USEC / 4); 300 + if (flight_size) 301 + pto = ktime_add(pto, call->tlp_max_ack_delay); 302 + } else { 303 + pto = NSEC_PER_SEC; 304 + } 305 + 306 + if (ktime_after(ktime_add(now, pto), rto_at)) 307 + pto = ktime_sub(rto_at, now); 308 + return pto; 309 + } 310 + 311 + /* 312 + * Send a TLP loss probe on PTO expiration [RFC8958 7.3]. 313 + */ 314 + void rxrpc_tlp_send_probe(struct rxrpc_call *call) 315 + { 316 + unsigned int in_flight = rxrpc_tx_in_flight(call); 317 + 318 + if (after_eq(call->acks_hard_ack, call->tx_transmitted)) 319 + return; /* Everything we transmitted has been acked. */ 320 + 321 + /* There must be no other loss probe still in flight and we need to 322 + * have taken a new RTT sample since last probe or the start of 323 + * connection. 324 + */ 325 + if (!call->tlp_serial && 326 + call->tlp_rtt_taken != call->rtt_taken) { 327 + call->tlp_is_retrans = false; 328 + if (after(call->send_top, call->tx_transmitted) && 329 + rxrpc_tx_window_space(call) > 0) { 330 + /* Transmit the lowest-sequence unsent DATA */ 331 + call->tx_last_serial = 0; 332 + rxrpc_transmit_some_data(call, 1, rxrpc_txdata_tlp_new_data); 333 + call->tlp_serial = call->tx_last_serial; 334 + call->tlp_seq = call->tx_transmitted; 335 + trace_rxrpc_tlp_probe(call, rxrpc_tlp_probe_trace_transmit_new); 336 + in_flight = rxrpc_tx_in_flight(call); 337 + } else { 338 + /* Retransmit the highest-sequence DATA sent */ 339 + call->tx_last_serial = 0; 340 + rxrpc_resend_tlp(call); 341 + call->tlp_is_retrans = true; 342 + trace_rxrpc_tlp_probe(call, rxrpc_tlp_probe_trace_retransmit); 343 + } 344 + } else { 345 + trace_rxrpc_tlp_probe(call, rxrpc_tlp_probe_trace_busy); 346 + } 347 + 348 + if (in_flight != 0) { 349 + ktime_t rto = rxrpc_get_rto_backoff(call, false); 350 + 351 + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_RTO; 352 + call->rack_timo_at = ktime_add(ktime_get_real(), rto); 353 + trace_rxrpc_rack_timer(call, rto, false); 354 + trace_rxrpc_timer_set(call, rto, rxrpc_timer_trace_rack_rto); 355 + } 356 + } 357 + 358 + /* 359 + * Detect losses using the ACK of a TLP loss probe [RFC8958 7.4]. 360 + */ 361 + void rxrpc_tlp_process_ack(struct rxrpc_call *call, struct rxrpc_ack_summary *summary) 362 + { 363 + if (!call->tlp_serial || after(call->tlp_seq, call->acks_hard_ack)) 364 + return; 365 + 366 + if (!call->tlp_is_retrans) { 367 + /* TLP of new data delivered */ 368 + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_new_data); 369 + call->tlp_serial = 0; 370 + } else if (summary->ack_reason == RXRPC_ACK_DUPLICATE && 371 + summary->acked_serial == call->tlp_serial) { 372 + /* General Case: Detected packet losses using RACK [7.4.1] */ 373 + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_dup_acked); 374 + call->tlp_serial = 0; 375 + } else if (after(call->acks_hard_ack, call->tlp_seq)) { 376 + /* Repaired the single loss */ 377 + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_hard_beyond); 378 + call->tlp_serial = 0; 379 + // TODO: Invoke congestion control to react to the loss 380 + // event the probe has repaired 381 + } else if (summary->tlp_probe_acked) { 382 + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_acked); 383 + /* Special Case: Detected a single loss repaired by the loss 384 + * probe [7.4.2] 385 + */ 386 + call->tlp_serial = 0; 387 + } else { 388 + trace_rxrpc_tlp_ack(call, summary, rxrpc_tlp_ack_trace_incomplete); 389 + } 390 + } 391 + 392 + /* 393 + * Handle RACK timer expiration; returns true to request a resend. 394 + */ 395 + void rxrpc_rack_timer_expired(struct rxrpc_call *call, ktime_t overran_by) 396 + { 397 + struct rxrpc_ack_summary summary = {}; 398 + enum rxrpc_rack_timer_mode mode = call->rack_timer_mode; 399 + 400 + trace_rxrpc_rack_timer(call, overran_by, true); 401 + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_OFF; 402 + 403 + switch (mode) { 404 + case RXRPC_CALL_RACKTIMER_RACK_REORDER: 405 + rxrpc_rack_detect_loss_and_arm_timer(call, &summary); 406 + break; 407 + case RXRPC_CALL_RACKTIMER_TLP_PTO: 408 + rxrpc_tlp_send_probe(call); 409 + break; 410 + case RXRPC_CALL_RACKTIMER_RTO: 411 + // Might need to poke the congestion algo in some way 412 + rxrpc_rack_mark_losses_on_rto(call); 413 + break; 414 + //case RXRPC_CALL_RACKTIMER_ZEROWIN: 415 + default: 416 + pr_warn("Unexpected rack timer %u", call->rack_timer_mode); 417 + } 418 + }
+4 -1
net/rxrpc/insecure.c
··· 19 19 */ 20 20 static struct rxrpc_txbuf *none_alloc_txbuf(struct rxrpc_call *call, size_t remain, gfp_t gfp) 21 21 { 22 - return rxrpc_alloc_data_txbuf(call, min_t(size_t, remain, RXRPC_JUMBO_DATALEN), 1, gfp); 22 + return rxrpc_alloc_data_txbuf(call, umin(remain, RXRPC_JUMBO_DATALEN), 1, gfp); 23 23 } 24 24 25 25 static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) 26 26 { 27 + txb->pkt_len = txb->len; 28 + if (txb->len == RXRPC_JUMBO_DATALEN) 29 + txb->jumboable = true; 27 30 return 0; 28 31 } 29 32
+63 -54
net/rxrpc/io_thread.c
··· 338 338 struct rxrpc_channel *chan; 339 339 struct rxrpc_call *call = NULL; 340 340 unsigned int channel; 341 - bool ret; 342 341 343 342 if (sp->hdr.securityIndex != conn->security_ix) 344 343 return rxrpc_direct_abort(skb, rxrpc_eproto_wrong_security, ··· 362 363 /* It's a connection-level packet if the call number is 0. */ 363 364 if (sp->hdr.callNumber == 0) 364 365 return rxrpc_input_conn_packet(conn, skb); 366 + 367 + /* Deal with path MTU discovery probing. */ 368 + if (sp->hdr.type == RXRPC_PACKET_TYPE_ACK && 369 + conn->pmtud_probe && 370 + after_eq(sp->ack.acked_serial, conn->pmtud_probe)) 371 + rxrpc_input_probe_for_pmtud(conn, sp->ack.acked_serial, false); 365 372 366 373 /* Call-bound packets are routed by connection channel. */ 367 374 channel = sp->hdr.cid & RXRPC_CHANNELMASK; ··· 424 419 peer_srx, skb); 425 420 } 426 421 427 - ret = rxrpc_input_call_event(call, skb); 422 + rxrpc_queue_rx_call_packet(call, skb); 428 423 rxrpc_put_call(call, rxrpc_call_put_input); 429 - return ret; 424 + return true; 430 425 } 431 426 432 427 /* ··· 443 438 ktime_t now; 444 439 #endif 445 440 bool should_stop; 441 + LIST_HEAD(conn_attend_q); 442 + LIST_HEAD(call_attend_q); 446 443 447 444 complete(&local->io_thread_ready); 448 445 ··· 455 448 for (;;) { 456 449 rxrpc_inc_stat(local->rxnet, stat_io_loop); 457 450 458 - /* Deal with connections that want immediate attention. */ 459 - conn = list_first_entry_or_null(&local->conn_attend_q, 460 - struct rxrpc_connection, 461 - attend_link); 462 - if (conn) { 463 - spin_lock_bh(&local->lock); 464 - list_del_init(&conn->attend_link); 465 - spin_unlock_bh(&local->lock); 451 + /* Inject a delay into packets if requested. */ 452 + #ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY 453 + now = ktime_get_real(); 454 + while ((skb = skb_peek(&local->rx_delay_queue))) { 455 + if (ktime_before(now, skb->tstamp)) 456 + break; 457 + skb = skb_dequeue(&local->rx_delay_queue); 458 + skb_queue_tail(&local->rx_queue, skb); 459 + } 460 + #endif 466 461 467 - rxrpc_input_conn_event(conn, NULL); 468 - rxrpc_put_connection(conn, rxrpc_conn_put_poke); 469 - continue; 462 + if (!skb_queue_empty(&local->rx_queue)) { 463 + spin_lock_irq(&local->rx_queue.lock); 464 + skb_queue_splice_tail_init(&local->rx_queue, &rx_queue); 465 + spin_unlock_irq(&local->rx_queue.lock); 466 + trace_rxrpc_iothread_rx(local, skb_queue_len(&rx_queue)); 470 467 } 471 468 472 - if (test_and_clear_bit(RXRPC_CLIENT_CONN_REAP_TIMER, 473 - &local->client_conn_flags)) 474 - rxrpc_discard_expired_client_conns(local); 475 - 476 - /* Deal with calls that want immediate attention. */ 477 - if ((call = list_first_entry_or_null(&local->call_attend_q, 478 - struct rxrpc_call, 479 - attend_link))) { 480 - spin_lock_bh(&local->lock); 481 - list_del_init(&call->attend_link); 482 - spin_unlock_bh(&local->lock); 483 - 484 - trace_rxrpc_call_poked(call); 485 - rxrpc_input_call_event(call, NULL); 486 - rxrpc_put_call(call, rxrpc_call_put_poke); 487 - continue; 488 - } 489 - 490 - if (!list_empty(&local->new_client_calls)) 491 - rxrpc_connect_client_calls(local); 492 - 493 - /* Process received packets and errors. */ 494 - if ((skb = __skb_dequeue(&rx_queue))) { 469 + /* Distribute packets and errors. */ 470 + while ((skb = __skb_dequeue(&rx_queue))) { 495 471 struct rxrpc_skb_priv *sp = rxrpc_skb(skb); 496 472 switch (skb->mark) { 497 473 case RXRPC_SKB_MARK_PACKET: ··· 498 508 rxrpc_free_skb(skb, rxrpc_skb_put_unknown); 499 509 break; 500 510 } 501 - continue; 502 511 } 503 512 504 - /* Inject a delay into packets if requested. */ 505 - #ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY 506 - now = ktime_get_real(); 507 - while ((skb = skb_peek(&local->rx_delay_queue))) { 508 - if (ktime_before(now, skb->tstamp)) 509 - break; 510 - skb = skb_dequeue(&local->rx_delay_queue); 511 - skb_queue_tail(&local->rx_queue, skb); 512 - } 513 - #endif 513 + /* Deal with connections that want immediate attention. */ 514 + spin_lock_irq(&local->lock); 515 + list_splice_tail_init(&local->conn_attend_q, &conn_attend_q); 516 + spin_unlock_irq(&local->lock); 514 517 515 - if (!skb_queue_empty(&local->rx_queue)) { 516 - spin_lock_irq(&local->rx_queue.lock); 517 - skb_queue_splice_tail_init(&local->rx_queue, &rx_queue); 518 - spin_unlock_irq(&local->rx_queue.lock); 519 - continue; 518 + while ((conn = list_first_entry_or_null(&conn_attend_q, 519 + struct rxrpc_connection, 520 + attend_link))) { 521 + spin_lock_bh(&local->lock); 522 + list_del_init(&conn->attend_link); 523 + spin_unlock_bh(&local->lock); 524 + rxrpc_input_conn_event(conn, NULL); 525 + rxrpc_put_connection(conn, rxrpc_conn_put_poke); 520 526 } 527 + 528 + if (test_and_clear_bit(RXRPC_CLIENT_CONN_REAP_TIMER, 529 + &local->client_conn_flags)) 530 + rxrpc_discard_expired_client_conns(local); 531 + 532 + /* Deal with calls that want immediate attention. */ 533 + spin_lock_irq(&local->lock); 534 + list_splice_tail_init(&local->call_attend_q, &call_attend_q); 535 + spin_unlock_irq(&local->lock); 536 + 537 + while ((call = list_first_entry_or_null(&call_attend_q, 538 + struct rxrpc_call, 539 + attend_link))) { 540 + spin_lock_bh(&local->lock); 541 + list_del_init(&call->attend_link); 542 + spin_unlock_bh(&local->lock); 543 + trace_rxrpc_call_poked(call); 544 + rxrpc_input_call_event(call); 545 + rxrpc_put_call(call, rxrpc_call_put_poke); 546 + } 547 + 548 + if (!list_empty(&local->new_client_calls)) 549 + rxrpc_connect_client_calls(local); 521 550 522 551 set_current_state(TASK_INTERRUPTIBLE); 523 552 should_stop = kthread_should_stop(); ··· 567 558 } 568 559 569 560 timeout = nsecs_to_jiffies(delay_ns); 570 - timeout = max(timeout, 1UL); 561 + timeout = umax(timeout, 1); 571 562 schedule_timeout(timeout); 572 563 __set_current_state(TASK_RUNNING); 573 564 continue;
-3
net/rxrpc/local_object.c
··· 215 215 216 216 /* we want to set the don't fragment bit */ 217 217 rxrpc_local_dont_fragment(local, true); 218 - 219 - /* We want receive timestamps. */ 220 - sock_enable_timestamps(usk); 221 218 break; 222 219 223 220 default:
+2 -2
net/rxrpc/misc.c
··· 46 46 * Maximum Rx MTU size. This indicates to the sender the size of jumbo packet 47 47 * made by gluing normal packets together that we're willing to handle. 48 48 */ 49 - unsigned int rxrpc_rx_mtu = 5692; 49 + unsigned int rxrpc_rx_mtu = RXRPC_JUMBO(46); 50 50 51 51 /* 52 52 * The maximum number of fragments in a received jumbo packet that we tell the 53 53 * sender that we're willing to handle. 54 54 */ 55 - unsigned int rxrpc_rx_jumbo_max = 4; 55 + unsigned int rxrpc_rx_jumbo_max = 46; 56 56 57 57 #ifdef CONFIG_AF_RXRPC_INJECT_RX_DELAY 58 58 /*
+381 -215
net/rxrpc/output.c
··· 72 72 } 73 73 74 74 /* 75 - * Fill out an ACK packet. 75 + * Allocate transmission buffers for an ACK and attach them to local->kv[]. 76 76 */ 77 - static void rxrpc_fill_out_ack(struct rxrpc_call *call, 78 - struct rxrpc_txbuf *txb, 79 - u8 ack_reason, 80 - rxrpc_serial_t serial) 77 + static int rxrpc_alloc_ack(struct rxrpc_call *call, size_t sack_size) 81 78 { 82 - struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; 83 - struct rxrpc_acktrailer *trailer = txb->kvec[2].iov_base + 3; 84 - struct rxrpc_ackpacket *ack = (struct rxrpc_ackpacket *)(whdr + 1); 85 - unsigned int qsize, sack, wrap, to; 86 - rxrpc_seq_t window, wtop; 87 - int rsize; 88 - u32 mtu, jmax; 89 - u8 *filler = txb->kvec[2].iov_base; 90 - u8 *sackp = txb->kvec[1].iov_base; 79 + struct rxrpc_wire_header *whdr; 80 + struct rxrpc_acktrailer *trailer; 81 + struct rxrpc_ackpacket *ack; 82 + struct kvec *kv = call->local->kvec; 83 + gfp_t gfp = rcu_read_lock_held() ? GFP_ATOMIC | __GFP_NOWARN : GFP_NOFS; 84 + void *buf, *buf2 = NULL; 85 + u8 *filler; 91 86 92 - rxrpc_inc_stat(call->rxnet, stat_tx_ack_fill); 87 + buf = page_frag_alloc(&call->local->tx_alloc, 88 + sizeof(*whdr) + sizeof(*ack) + 1 + 3 + sizeof(*trailer), gfp); 89 + if (!buf) 90 + return -ENOMEM; 93 91 94 - window = call->ackr_window; 95 - wtop = call->ackr_wtop; 96 - sack = call->ackr_sack_base % RXRPC_SACK_SIZE; 97 - 98 - whdr->seq = 0; 99 - whdr->type = RXRPC_PACKET_TYPE_ACK; 100 - txb->flags |= RXRPC_SLOW_START_OK; 101 - ack->bufferSpace = 0; 102 - ack->maxSkew = 0; 103 - ack->firstPacket = htonl(window); 104 - ack->previousPacket = htonl(call->rx_highest_seq); 105 - ack->serial = htonl(serial); 106 - ack->reason = ack_reason; 107 - ack->nAcks = wtop - window; 108 - filler[0] = 0; 109 - filler[1] = 0; 110 - filler[2] = 0; 111 - 112 - if (ack_reason == RXRPC_ACK_PING) 113 - txb->flags |= RXRPC_REQUEST_ACK; 114 - 115 - if (after(wtop, window)) { 116 - txb->len += ack->nAcks; 117 - txb->kvec[1].iov_base = sackp; 118 - txb->kvec[1].iov_len = ack->nAcks; 119 - 120 - wrap = RXRPC_SACK_SIZE - sack; 121 - to = min_t(unsigned int, ack->nAcks, RXRPC_SACK_SIZE); 122 - 123 - if (sack + ack->nAcks <= RXRPC_SACK_SIZE) { 124 - memcpy(sackp, call->ackr_sack_table + sack, ack->nAcks); 125 - } else { 126 - memcpy(sackp, call->ackr_sack_table + sack, wrap); 127 - memcpy(sackp + wrap, call->ackr_sack_table, to - wrap); 92 + if (sack_size) { 93 + buf2 = page_frag_alloc(&call->local->tx_alloc, sack_size, gfp); 94 + if (!buf2) { 95 + page_frag_free(buf); 96 + return -ENOMEM; 128 97 } 129 - } else if (before(wtop, window)) { 130 - pr_warn("ack window backward %x %x", window, wtop); 131 - } else if (ack->reason == RXRPC_ACK_DELAY) { 132 - ack->reason = RXRPC_ACK_IDLE; 133 98 } 134 99 135 - mtu = call->peer->if_mtu; 136 - mtu -= call->peer->hdrsize; 137 - jmax = rxrpc_rx_jumbo_max; 138 - qsize = (window - 1) - call->rx_consumed; 139 - rsize = max_t(int, call->rx_winsize - qsize, 0); 140 - txb->ack_rwind = rsize; 141 - trailer->maxMTU = htonl(rxrpc_rx_mtu); 142 - trailer->ifMTU = htonl(mtu); 143 - trailer->rwind = htonl(rsize); 144 - trailer->jumbo_max = htonl(jmax); 100 + whdr = buf; 101 + ack = buf + sizeof(*whdr); 102 + filler = buf + sizeof(*whdr) + sizeof(*ack) + 1; 103 + trailer = buf + sizeof(*whdr) + sizeof(*ack) + 1 + 3; 104 + 105 + kv[0].iov_base = whdr; 106 + kv[0].iov_len = sizeof(*whdr) + sizeof(*ack); 107 + kv[1].iov_base = buf2; 108 + kv[1].iov_len = sack_size; 109 + kv[2].iov_base = filler; 110 + kv[2].iov_len = 3 + sizeof(*trailer); 111 + return 3; /* Number of kvec[] used. */ 112 + } 113 + 114 + static void rxrpc_free_ack(struct rxrpc_call *call) 115 + { 116 + page_frag_free(call->local->kvec[0].iov_base); 117 + if (call->local->kvec[1].iov_base) 118 + page_frag_free(call->local->kvec[1].iov_base); 145 119 } 146 120 147 121 /* ··· 147 173 } 148 174 149 175 /* 176 + * Fill out an ACK packet. 177 + */ 178 + static int rxrpc_fill_out_ack(struct rxrpc_call *call, int nr_kv, u8 ack_reason, 179 + rxrpc_serial_t serial_to_ack, rxrpc_serial_t *_ack_serial) 180 + { 181 + struct kvec *kv = call->local->kvec; 182 + struct rxrpc_wire_header *whdr = kv[0].iov_base; 183 + struct rxrpc_acktrailer *trailer = kv[2].iov_base + 3; 184 + struct rxrpc_ackpacket *ack = (struct rxrpc_ackpacket *)(whdr + 1); 185 + unsigned int qsize, sack, wrap, to, max_mtu, if_mtu; 186 + rxrpc_seq_t window, wtop; 187 + ktime_t now = ktime_get_real(); 188 + int rsize; 189 + u8 *filler = kv[2].iov_base; 190 + u8 *sackp = kv[1].iov_base; 191 + 192 + rxrpc_inc_stat(call->rxnet, stat_tx_ack_fill); 193 + 194 + window = call->ackr_window; 195 + wtop = call->ackr_wtop; 196 + sack = call->ackr_sack_base % RXRPC_SACK_SIZE; 197 + 198 + *_ack_serial = rxrpc_get_next_serial(call->conn); 199 + 200 + whdr->epoch = htonl(call->conn->proto.epoch); 201 + whdr->cid = htonl(call->cid); 202 + whdr->callNumber = htonl(call->call_id); 203 + whdr->serial = htonl(*_ack_serial); 204 + whdr->seq = 0; 205 + whdr->type = RXRPC_PACKET_TYPE_ACK; 206 + whdr->flags = call->conn->out_clientflag | RXRPC_SLOW_START_OK; 207 + whdr->userStatus = 0; 208 + whdr->securityIndex = call->security_ix; 209 + whdr->_rsvd = 0; 210 + whdr->serviceId = htons(call->dest_srx.srx_service); 211 + 212 + ack->bufferSpace = 0; 213 + ack->maxSkew = 0; 214 + ack->firstPacket = htonl(window); 215 + ack->previousPacket = htonl(call->rx_highest_seq); 216 + ack->serial = htonl(serial_to_ack); 217 + ack->reason = ack_reason; 218 + ack->nAcks = wtop - window; 219 + filler[0] = 0; 220 + filler[1] = 0; 221 + filler[2] = 0; 222 + 223 + if (ack_reason == RXRPC_ACK_PING) 224 + whdr->flags |= RXRPC_REQUEST_ACK; 225 + 226 + if (after(wtop, window)) { 227 + kv[1].iov_len = ack->nAcks; 228 + 229 + wrap = RXRPC_SACK_SIZE - sack; 230 + to = umin(ack->nAcks, RXRPC_SACK_SIZE); 231 + 232 + if (sack + ack->nAcks <= RXRPC_SACK_SIZE) { 233 + memcpy(sackp, call->ackr_sack_table + sack, ack->nAcks); 234 + } else { 235 + memcpy(sackp, call->ackr_sack_table + sack, wrap); 236 + memcpy(sackp + wrap, call->ackr_sack_table, to - wrap); 237 + } 238 + } else if (before(wtop, window)) { 239 + pr_warn("ack window backward %x %x", window, wtop); 240 + } else if (ack->reason == RXRPC_ACK_DELAY) { 241 + ack->reason = RXRPC_ACK_IDLE; 242 + } 243 + 244 + qsize = (window - 1) - call->rx_consumed; 245 + rsize = max_t(int, call->rx_winsize - qsize, 0); 246 + 247 + if_mtu = call->peer->if_mtu - call->peer->hdrsize; 248 + if (call->peer->ackr_adv_pmtud) { 249 + max_mtu = umax(call->peer->max_data, rxrpc_rx_mtu); 250 + } else { 251 + if_mtu = umin(if_mtu, 1444); 252 + max_mtu = if_mtu; 253 + } 254 + 255 + trailer->maxMTU = htonl(max_mtu); 256 + trailer->ifMTU = htonl(if_mtu); 257 + trailer->rwind = htonl(rsize); 258 + trailer->jumbo_max = 0; /* Advertise pmtu discovery */ 259 + 260 + if (ack_reason == RXRPC_ACK_PING) 261 + rxrpc_begin_rtt_probe(call, *_ack_serial, now, rxrpc_rtt_tx_ping); 262 + if (whdr->flags & RXRPC_REQUEST_ACK) 263 + call->rtt_last_req = now; 264 + rxrpc_set_keepalive(call, now); 265 + return nr_kv; 266 + } 267 + 268 + /* 150 269 * Transmit an ACK packet. 151 270 */ 152 - static void rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) 271 + static void rxrpc_send_ack_packet(struct rxrpc_call *call, int nr_kv, size_t len, 272 + rxrpc_serial_t serial, enum rxrpc_propose_ack_trace why) 153 273 { 154 - struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; 274 + struct kvec *kv = call->local->kvec; 275 + struct rxrpc_wire_header *whdr = kv[0].iov_base; 276 + struct rxrpc_acktrailer *trailer = kv[2].iov_base + 3; 155 277 struct rxrpc_connection *conn; 156 278 struct rxrpc_ackpacket *ack = (struct rxrpc_ackpacket *)(whdr + 1); 157 279 struct msghdr msg; 158 - ktime_t now; 159 280 int ret; 160 281 161 282 if (test_bit(RXRPC_CALL_DISCONNECTED, &call->flags)) ··· 264 195 msg.msg_controllen = 0; 265 196 msg.msg_flags = MSG_SPLICE_PAGES; 266 197 267 - whdr->flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; 268 - 269 - txb->serial = rxrpc_get_next_serial(conn); 270 - whdr->serial = htonl(txb->serial); 271 - trace_rxrpc_tx_ack(call->debug_id, txb->serial, 198 + trace_rxrpc_tx_ack(call->debug_id, serial, 272 199 ntohl(ack->firstPacket), 273 200 ntohl(ack->serial), ack->reason, ack->nAcks, 274 - txb->ack_rwind); 201 + ntohl(trailer->rwind), why); 275 202 276 203 rxrpc_inc_stat(call->rxnet, stat_tx_ack_send); 277 204 278 - iov_iter_kvec(&msg.msg_iter, WRITE, txb->kvec, txb->nr_kvec, txb->len); 279 - rxrpc_local_dont_fragment(conn->local, false); 280 - ret = do_udp_sendmsg(conn->local->socket, &msg, txb->len); 205 + iov_iter_kvec(&msg.msg_iter, WRITE, kv, nr_kv, len); 206 + rxrpc_local_dont_fragment(conn->local, why == rxrpc_propose_ack_ping_for_mtu_probe); 207 + 208 + ret = do_udp_sendmsg(conn->local->socket, &msg, len); 281 209 call->peer->last_tx_at = ktime_get_seconds(); 282 210 if (ret < 0) { 283 - trace_rxrpc_tx_fail(call->debug_id, txb->serial, ret, 211 + trace_rxrpc_tx_fail(call->debug_id, serial, ret, 284 212 rxrpc_tx_point_call_ack); 213 + if (why == rxrpc_propose_ack_ping_for_mtu_probe && 214 + ret == -EMSGSIZE) 215 + rxrpc_input_probe_for_pmtud(conn, serial, true); 285 216 } else { 286 217 trace_rxrpc_tx_packet(call->debug_id, whdr, 287 218 rxrpc_tx_point_call_ack); 288 - now = ktime_get_real(); 289 - if (ack->reason == RXRPC_ACK_PING) 290 - rxrpc_begin_rtt_probe(call, txb->serial, now, rxrpc_rtt_tx_ping); 291 - if (txb->flags & RXRPC_REQUEST_ACK) 292 - call->peer->rtt_last_req = now; 293 - rxrpc_set_keepalive(call, now); 219 + if (why == rxrpc_propose_ack_ping_for_mtu_probe) { 220 + call->peer->pmtud_pending = false; 221 + call->peer->pmtud_probing = true; 222 + call->conn->pmtud_probe = serial; 223 + call->conn->pmtud_call = call->debug_id; 224 + trace_rxrpc_pmtud_tx(call); 225 + } 294 226 } 295 227 rxrpc_tx_backoff(call, ret); 296 228 } ··· 300 230 * Queue an ACK for immediate transmission. 301 231 */ 302 232 void rxrpc_send_ACK(struct rxrpc_call *call, u8 ack_reason, 303 - rxrpc_serial_t serial, enum rxrpc_propose_ack_trace why) 233 + rxrpc_serial_t serial_to_ack, enum rxrpc_propose_ack_trace why) 304 234 { 305 - struct rxrpc_txbuf *txb; 235 + struct kvec *kv = call->local->kvec; 236 + rxrpc_serial_t ack_serial; 237 + size_t len; 238 + int nr_kv; 306 239 307 240 if (test_bit(RXRPC_CALL_DISCONNECTED, &call->flags)) 308 241 return; 309 242 310 243 rxrpc_inc_stat(call->rxnet, stat_tx_acks[ack_reason]); 311 244 312 - txb = rxrpc_alloc_ack_txbuf(call, call->ackr_wtop - call->ackr_window); 313 - if (!txb) { 245 + nr_kv = rxrpc_alloc_ack(call, call->ackr_wtop - call->ackr_window); 246 + if (nr_kv < 0) { 314 247 kleave(" = -ENOMEM"); 315 248 return; 316 249 } 317 250 318 - txb->ack_why = why; 251 + nr_kv = rxrpc_fill_out_ack(call, nr_kv, ack_reason, serial_to_ack, &ack_serial); 252 + len = kv[0].iov_len; 253 + len += kv[1].iov_len; 254 + len += kv[2].iov_len; 319 255 320 - rxrpc_fill_out_ack(call, txb, ack_reason, serial); 256 + /* Extend a path MTU probe ACK. */ 257 + if (why == rxrpc_propose_ack_ping_for_mtu_probe) { 258 + size_t probe_mtu = call->peer->pmtud_trial + sizeof(struct rxrpc_wire_header); 259 + 260 + if (len > probe_mtu) 261 + goto skip; 262 + while (len < probe_mtu) { 263 + size_t part = umin(probe_mtu - len, PAGE_SIZE); 264 + 265 + kv[nr_kv].iov_base = page_address(ZERO_PAGE(0)); 266 + kv[nr_kv].iov_len = part; 267 + len += part; 268 + nr_kv++; 269 + } 270 + } 271 + 321 272 call->ackr_nr_unacked = 0; 322 273 atomic_set(&call->ackr_nr_consumed, 0); 323 274 clear_bit(RXRPC_CALL_RX_IS_IDLE, &call->flags); 324 275 325 - trace_rxrpc_send_ack(call, why, ack_reason, serial); 326 - rxrpc_send_ack_packet(call, txb); 327 - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_ack_tx); 276 + trace_rxrpc_send_ack(call, why, ack_reason, ack_serial); 277 + rxrpc_send_ack_packet(call, nr_kv, len, ack_serial, why); 278 + skip: 279 + rxrpc_free_ack(call); 280 + } 281 + 282 + /* 283 + * Send an ACK probe for path MTU discovery. 284 + */ 285 + void rxrpc_send_probe_for_pmtud(struct rxrpc_call *call) 286 + { 287 + rxrpc_send_ACK(call, RXRPC_ACK_PING, 0, 288 + rxrpc_propose_ack_ping_for_mtu_probe); 328 289 } 329 290 330 291 /* ··· 425 324 /* 426 325 * Prepare a (sub)packet for transmission. 427 326 */ 428 - static void rxrpc_prepare_data_subpacket(struct rxrpc_call *call, struct rxrpc_txbuf *txb, 429 - rxrpc_serial_t serial) 327 + static size_t rxrpc_prepare_data_subpacket(struct rxrpc_call *call, 328 + struct rxrpc_send_data_req *req, 329 + struct rxrpc_txbuf *txb, 330 + rxrpc_serial_t serial, int subpkt) 430 331 { 431 332 struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; 333 + struct rxrpc_jumbo_header *jumbo = (void *)(whdr + 1) - sizeof(*jumbo); 432 334 enum rxrpc_req_ack_trace why; 433 335 struct rxrpc_connection *conn = call->conn; 336 + struct kvec *kv = &call->local->kvec[subpkt]; 337 + size_t len = txb->pkt_len; 338 + bool last; 339 + u8 flags; 434 340 435 - _enter("%x,{%d}", txb->seq, txb->len); 341 + _enter("%x,%zd", txb->seq, len); 436 342 437 343 txb->serial = serial; 438 344 439 345 if (test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags) && 440 346 txb->seq == 1) 441 347 whdr->userStatus = RXRPC_USERSTATUS_SERVICE_UPGRADE; 348 + 349 + txb->flags &= ~RXRPC_REQUEST_ACK; 350 + flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; 351 + last = txb->flags & RXRPC_LAST_PACKET; 352 + 353 + if (subpkt < req->n - 1) { 354 + len = RXRPC_JUMBO_DATALEN; 355 + goto dont_set_request_ack; 356 + } 442 357 443 358 /* If our RTT cache needs working on, request an ACK. Also request 444 359 * ACKs if a DATA packet appears to have been lost. ··· 463 346 * service call, lest OpenAFS incorrectly send us an ACK with some 464 347 * soft-ACKs in it and then never follow up with a proper hard ACK. 465 348 */ 466 - if (txb->flags & RXRPC_REQUEST_ACK) 467 - why = rxrpc_reqack_already_on; 468 - else if ((txb->flags & RXRPC_LAST_PACKET) && rxrpc_sending_to_client(txb)) 349 + if (last && rxrpc_sending_to_client(txb)) 469 350 why = rxrpc_reqack_no_srv_last; 470 351 else if (test_and_clear_bit(RXRPC_CALL_EV_ACK_LOST, &call->events)) 471 352 why = rxrpc_reqack_ack_lost; 472 353 else if (txb->flags & RXRPC_TXBUF_RESENT) 473 354 why = rxrpc_reqack_retrans; 474 - else if (call->cong_mode == RXRPC_CALL_SLOW_START && call->cong_cwnd <= 2) 355 + else if (call->cong_ca_state == RXRPC_CA_SLOW_START && call->cong_cwnd <= RXRPC_MIN_CWND) 475 356 why = rxrpc_reqack_slow_start; 476 357 else if (call->tx_winsize <= 2) 477 358 why = rxrpc_reqack_small_txwin; 478 - else if (call->peer->rtt_count < 3 && txb->seq & 1) 359 + else if (call->rtt_count < 3) 479 360 why = rxrpc_reqack_more_rtt; 480 - else if (ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), ktime_get_real())) 361 + else if (ktime_before(ktime_add_ms(call->rtt_last_req, 1000), ktime_get_real())) 481 362 why = rxrpc_reqack_old_rtt; 363 + else if (!last && !after(READ_ONCE(call->send_top), txb->seq)) 364 + why = rxrpc_reqack_app_stall; 482 365 else 483 366 goto dont_set_request_ack; 484 367 485 368 rxrpc_inc_stat(call->rxnet, stat_why_req_ack[why]); 486 369 trace_rxrpc_req_ack(call->debug_id, txb->seq, why); 487 - if (why != rxrpc_reqack_no_srv_last) 488 - txb->flags |= RXRPC_REQUEST_ACK; 370 + if (why != rxrpc_reqack_no_srv_last) { 371 + flags |= RXRPC_REQUEST_ACK; 372 + trace_rxrpc_rtt_tx(call, rxrpc_rtt_tx_data, -1, serial); 373 + call->rtt_last_req = req->now; 374 + } 489 375 dont_set_request_ack: 490 376 491 - whdr->flags = txb->flags & RXRPC_TXBUF_WIRE_FLAGS; 492 - whdr->serial = htonl(txb->serial); 493 - whdr->cksum = txb->cksum; 377 + /* The jumbo header overlays the wire header in the txbuf. */ 378 + if (subpkt < req->n - 1) 379 + flags |= RXRPC_JUMBO_PACKET; 380 + else 381 + flags &= ~RXRPC_JUMBO_PACKET; 382 + if (subpkt == 0) { 383 + whdr->flags = flags; 384 + whdr->serial = htonl(txb->serial); 385 + whdr->cksum = txb->cksum; 386 + whdr->serviceId = htons(conn->service_id); 387 + kv->iov_base = whdr; 388 + len += sizeof(*whdr); 389 + } else { 390 + jumbo->flags = flags; 391 + jumbo->pad = 0; 392 + jumbo->cksum = txb->cksum; 393 + kv->iov_base = jumbo; 394 + len += sizeof(*jumbo); 395 + } 494 396 495 - trace_rxrpc_tx_data(call, txb->seq, txb->serial, txb->flags, false); 397 + trace_rxrpc_tx_data(call, txb->seq, txb->serial, flags, req->trace); 398 + kv->iov_len = len; 399 + return len; 496 400 } 497 401 498 402 /* 499 - * Prepare a packet for transmission. 403 + * Prepare a transmission queue object for initial transmission. Returns the 404 + * number of microseconds since the transmission queue base timestamp. 500 405 */ 501 - static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) 406 + static unsigned int rxrpc_prepare_txqueue(struct rxrpc_txqueue *tq, 407 + struct rxrpc_send_data_req *req) 502 408 { 409 + if (!tq) 410 + return 0; 411 + if (tq->xmit_ts_base == KTIME_MIN) { 412 + tq->xmit_ts_base = req->now; 413 + return 0; 414 + } 415 + return ktime_to_us(ktime_sub(req->now, tq->xmit_ts_base)); 416 + } 417 + 418 + /* 419 + * Prepare a (jumbo) packet for transmission. 420 + */ 421 + static size_t rxrpc_prepare_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req *req) 422 + { 423 + struct rxrpc_txqueue *tq = req->tq; 503 424 rxrpc_serial_t serial; 425 + unsigned int xmit_ts; 426 + rxrpc_seq_t seq = req->seq; 427 + size_t len = 0; 428 + bool start_tlp = false; 429 + 430 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_transmit); 504 431 505 432 /* Each transmission of a Tx packet needs a new serial number */ 506 - serial = rxrpc_get_next_serial(call->conn); 433 + serial = rxrpc_get_next_serials(call->conn, req->n); 507 434 508 - rxrpc_prepare_data_subpacket(call, txb, serial); 435 + call->tx_last_serial = serial + req->n - 1; 436 + call->tx_last_sent = req->now; 437 + xmit_ts = rxrpc_prepare_txqueue(tq, req); 438 + prefetch(tq->next); 509 439 510 - return txb->len; 511 - } 440 + for (int i = 0;;) { 441 + int ix = seq & RXRPC_TXQ_MASK; 442 + struct rxrpc_txbuf *txb = tq->bufs[seq & RXRPC_TXQ_MASK]; 512 443 513 - /* 514 - * Set timeouts after transmitting a packet. 515 - */ 516 - static void rxrpc_tstamp_data_packets(struct rxrpc_call *call, struct rxrpc_txbuf *txb) 517 - { 518 - ktime_t now = ktime_get_real(); 519 - bool ack_requested = txb->flags & RXRPC_REQUEST_ACK; 444 + _debug("prep[%u] tq=%x q=%x", i, tq->qbase, seq); 520 445 521 - call->tx_last_sent = now; 522 - txb->last_sent = now; 523 - 524 - if (ack_requested) { 525 - rxrpc_begin_rtt_probe(call, txb->serial, now, rxrpc_rtt_tx_data); 526 - 527 - call->peer->rtt_last_req = now; 528 - if (call->peer->rtt_count > 1) { 529 - ktime_t delay = rxrpc_get_rto_backoff(call->peer, false); 530 - 531 - call->ack_lost_at = ktime_add(now, delay); 532 - trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_lost_ack); 446 + /* Record (re-)transmission for RACK [RFC8985 6.1]. */ 447 + if (__test_and_clear_bit(ix, &tq->segment_lost)) 448 + call->tx_nr_lost--; 449 + if (req->retrans) { 450 + __set_bit(ix, &tq->ever_retransmitted); 451 + __set_bit(ix, &tq->segment_retransmitted); 452 + call->tx_nr_resent++; 453 + } else { 454 + call->tx_nr_sent++; 455 + start_tlp = true; 533 456 } 457 + tq->segment_xmit_ts[ix] = xmit_ts; 458 + tq->segment_serial[ix] = serial; 459 + if (i + 1 == req->n) 460 + /* Only sample the last subpacket in a jumbo. */ 461 + __set_bit(ix, &tq->rtt_samples); 462 + len += rxrpc_prepare_data_subpacket(call, req, txb, serial, i); 463 + serial++; 464 + seq++; 465 + i++; 466 + if (i >= req->n) 467 + break; 468 + if (!(seq & RXRPC_TXQ_MASK)) { 469 + tq = tq->next; 470 + trace_rxrpc_tq(call, tq, seq, rxrpc_tq_transmit_advance); 471 + xmit_ts = rxrpc_prepare_txqueue(tq, req); 472 + } 473 + } 474 + 475 + /* Set timeouts */ 476 + if (req->tlp_probe) { 477 + /* Sending TLP loss probe [RFC8985 7.3]. */ 478 + call->tlp_serial = serial - 1; 479 + call->tlp_seq = seq - 1; 480 + } else if (start_tlp) { 481 + /* Schedule TLP loss probe [RFC8985 7.2]. */ 482 + ktime_t pto; 483 + 484 + if (!test_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) 485 + /* The first packet may take longer to elicit a response. */ 486 + pto = NSEC_PER_SEC; 487 + else 488 + pto = rxrpc_tlp_calc_pto(call, req->now); 489 + 490 + call->rack_timer_mode = RXRPC_CALL_RACKTIMER_TLP_PTO; 491 + call->rack_timo_at = ktime_add(req->now, pto); 492 + trace_rxrpc_rack_timer(call, pto, false); 493 + trace_rxrpc_timer_set(call, pto, rxrpc_timer_trace_rack_tlp_pto); 534 494 } 535 495 536 496 if (!test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) { 537 497 ktime_t delay = ms_to_ktime(READ_ONCE(call->next_rx_timo)); 538 498 539 - call->expect_rx_by = ktime_add(now, delay); 499 + call->expect_rx_by = ktime_add(req->now, delay); 540 500 trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_expect_rx); 541 501 } 542 502 543 - rxrpc_set_keepalive(call, now); 503 + rxrpc_set_keepalive(call, req->now); 504 + return len; 544 505 } 545 506 546 507 /* 547 - * send a packet through the transport endpoint 508 + * Send one or more packets through the transport endpoint 548 509 */ 549 - static int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb) 510 + void rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_send_data_req *req) 550 511 { 551 - struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; 552 512 struct rxrpc_connection *conn = call->conn; 553 513 enum rxrpc_tx_point frag; 514 + struct rxrpc_txqueue *tq = req->tq; 515 + struct rxrpc_txbuf *txb; 554 516 struct msghdr msg; 517 + rxrpc_seq_t seq = req->seq; 555 518 size_t len; 556 - int ret; 519 + bool new_call = test_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags); 520 + int ret, stat_ix; 557 521 558 - _enter("%x,{%d}", txb->seq, txb->len); 522 + _enter("%x,%x-%x", tq->qbase, seq, seq + req->n - 1); 559 523 560 - len = rxrpc_prepare_data_packet(call, txb); 524 + stat_ix = umin(req->n, ARRAY_SIZE(call->rxnet->stat_tx_jumbo)) - 1; 525 + atomic_inc(&call->rxnet->stat_tx_jumbo[stat_ix]); 561 526 562 - if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) { 563 - static int lose; 564 - if ((lose++ & 7) == 7) { 565 - ret = 0; 566 - trace_rxrpc_tx_data(call, txb->seq, txb->serial, 567 - txb->flags, true); 568 - goto done; 569 - } 570 - } 527 + len = rxrpc_prepare_data_packet(call, req); 528 + txb = tq->bufs[seq & RXRPC_TXQ_MASK]; 571 529 572 - iov_iter_kvec(&msg.msg_iter, WRITE, txb->kvec, txb->nr_kvec, len); 530 + iov_iter_kvec(&msg.msg_iter, WRITE, call->local->kvec, req->n, len); 573 531 574 532 msg.msg_name = &call->peer->srx.transport; 575 533 msg.msg_namelen = call->peer->srx.transport_len; ··· 652 460 msg.msg_controllen = 0; 653 461 msg.msg_flags = MSG_SPLICE_PAGES; 654 462 655 - /* Track what we've attempted to transmit at least once so that the 656 - * retransmission algorithm doesn't try to resend what we haven't sent 657 - * yet. 463 + /* Send the packet with the don't fragment bit set unless we think it's 464 + * too big or if this is a retransmission. 658 465 */ 659 - if (txb->seq == call->tx_transmitted + 1) 660 - call->tx_transmitted = txb->seq; 661 - 662 - /* send the packet with the don't fragment bit set if we currently 663 - * think it's small enough */ 664 - if (txb->len >= call->peer->maxdata) { 466 + if (seq == call->tx_transmitted + 1 && 467 + len >= sizeof(struct rxrpc_wire_header) + call->peer->max_data) { 665 468 rxrpc_local_dont_fragment(conn->local, false); 666 469 frag = rxrpc_tx_point_call_data_frag; 667 470 } else { ··· 664 477 frag = rxrpc_tx_point_call_data_nofrag; 665 478 } 666 479 667 - retry: 480 + /* Track what we've attempted to transmit at least once so that the 481 + * retransmission algorithm doesn't try to resend what we haven't sent 482 + * yet. 483 + */ 484 + if (seq == call->tx_transmitted + 1) 485 + call->tx_transmitted = seq + req->n - 1; 486 + 487 + if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) { 488 + static int lose; 489 + 490 + if ((lose++ & 7) == 7) { 491 + ret = 0; 492 + trace_rxrpc_tx_data(call, txb->seq, txb->serial, txb->flags, 493 + rxrpc_txdata_inject_loss); 494 + conn->peer->last_tx_at = ktime_get_seconds(); 495 + goto done; 496 + } 497 + } 498 + 668 499 /* send the packet by UDP 669 500 * - returns -EMSGSIZE if UDP would have to fragment the packet 670 501 * to go out of the interface ··· 693 488 ret = do_udp_sendmsg(conn->local->socket, &msg, len); 694 489 conn->peer->last_tx_at = ktime_get_seconds(); 695 490 696 - if (ret < 0) { 491 + if (ret == -EMSGSIZE) { 492 + rxrpc_inc_stat(call->rxnet, stat_tx_data_send_msgsize); 493 + trace_rxrpc_tx_packet(call->debug_id, call->local->kvec[0].iov_base, frag); 494 + ret = 0; 495 + } else if (ret < 0) { 697 496 rxrpc_inc_stat(call->rxnet, stat_tx_data_send_fail); 698 497 trace_rxrpc_tx_fail(call->debug_id, txb->serial, ret, frag); 699 498 } else { 700 - trace_rxrpc_tx_packet(call->debug_id, whdr, frag); 499 + trace_rxrpc_tx_packet(call->debug_id, call->local->kvec[0].iov_base, frag); 701 500 } 702 501 703 502 rxrpc_tx_backoff(call, ret); 704 - if (ret == -EMSGSIZE && frag == rxrpc_tx_point_call_data_frag) { 705 - rxrpc_local_dont_fragment(conn->local, false); 706 - frag = rxrpc_tx_point_call_data_frag; 707 - goto retry; 708 - } 709 503 710 - done: 711 - if (ret >= 0) { 712 - rxrpc_tstamp_data_packets(call, txb); 713 - } else { 714 - /* Cancel the call if the initial transmission fails, 715 - * particularly if that's due to network routing issues that 716 - * aren't going away anytime soon. The layer above can arrange 717 - * the retransmission. 504 + if (ret < 0) { 505 + /* Cancel the call if the initial transmission fails or if we 506 + * hit due to network routing issues that aren't going away 507 + * anytime soon. The layer above can arrange the 508 + * retransmission. 718 509 */ 719 - if (!test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER, &call->flags)) 510 + if (new_call || 511 + ret == -ENETUNREACH || 512 + ret == -EHOSTUNREACH || 513 + ret == -ECONNREFUSED) 720 514 rxrpc_set_call_completion(call, RXRPC_CALL_LOCAL_ERROR, 721 515 RX_USER_ABORT, ret); 722 516 } 723 517 724 - _leave(" = %d [%u]", ret, call->peer->maxdata); 725 - return ret; 518 + done: 519 + _leave(" = %d [%u]", ret, call->peer->max_data); 726 520 } 727 521 728 522 /* ··· 895 691 896 692 peer->last_tx_at = ktime_get_seconds(); 897 693 _leave(""); 898 - } 899 - 900 - /* 901 - * Schedule an instant Tx resend. 902 - */ 903 - static inline void rxrpc_instant_resend(struct rxrpc_call *call, 904 - struct rxrpc_txbuf *txb) 905 - { 906 - if (!__rxrpc_call_is_complete(call)) 907 - kdebug("resend"); 908 - } 909 - 910 - /* 911 - * Transmit one packet. 912 - */ 913 - void rxrpc_transmit_one(struct rxrpc_call *call, struct rxrpc_txbuf *txb) 914 - { 915 - int ret; 916 - 917 - ret = rxrpc_send_data_packet(call, txb); 918 - if (ret < 0) { 919 - switch (ret) { 920 - case -ENETUNREACH: 921 - case -EHOSTUNREACH: 922 - case -ECONNREFUSED: 923 - rxrpc_set_call_completion(call, RXRPC_CALL_LOCAL_ERROR, 924 - 0, ret); 925 - break; 926 - default: 927 - _debug("need instant resend %d", ret); 928 - rxrpc_instant_resend(call, txb); 929 - } 930 - } else { 931 - ktime_t delay = ns_to_ktime(call->peer->rto_us * NSEC_PER_USEC); 932 - 933 - call->resend_at = ktime_add(ktime_get_real(), delay); 934 - trace_rxrpc_timer_set(call, delay, rxrpc_timer_trace_resend_tx); 935 - } 936 694 }
+104 -10
net/rxrpc/peer_event.c
··· 102 102 */ 103 103 static void rxrpc_adjust_mtu(struct rxrpc_peer *peer, unsigned int mtu) 104 104 { 105 + unsigned int max_data; 106 + 105 107 /* wind down the local interface MTU */ 106 108 if (mtu > 0 && peer->if_mtu == 65535 && mtu < peer->if_mtu) 107 109 peer->if_mtu = mtu; ··· 122 120 } 123 121 } 124 122 125 - if (mtu < peer->mtu) { 126 - spin_lock(&peer->lock); 127 - peer->mtu = mtu; 128 - peer->maxdata = peer->mtu - peer->hdrsize; 129 - spin_unlock(&peer->lock); 123 + max_data = max_t(int, mtu - peer->hdrsize, 500); 124 + if (max_data < peer->max_data) { 125 + if (peer->pmtud_good > max_data) 126 + peer->pmtud_good = max_data; 127 + if (peer->pmtud_bad > max_data + 1) 128 + peer->pmtud_bad = max_data + 1; 129 + 130 + trace_rxrpc_pmtud_reduce(peer, 0, max_data, rxrpc_pmtud_reduce_icmp); 131 + write_seqcount_begin(&peer->mtu_lock); 132 + peer->max_data = max_data; 133 + write_seqcount_end(&peer->mtu_lock); 130 134 } 131 135 } 132 136 ··· 213 205 struct rxrpc_call *call; 214 206 HLIST_HEAD(error_targets); 215 207 216 - spin_lock(&peer->lock); 208 + spin_lock_irq(&peer->lock); 217 209 hlist_move_list(&peer->error_targets, &error_targets); 218 210 219 211 while (!hlist_empty(&error_targets)) { 220 212 call = hlist_entry(error_targets.first, 221 213 struct rxrpc_call, error_link); 222 214 hlist_del_init(&call->error_link); 223 - spin_unlock(&peer->lock); 215 + spin_unlock_irq(&peer->lock); 224 216 225 217 rxrpc_see_call(call, rxrpc_call_see_distribute_error); 226 218 rxrpc_set_call_completion(call, compl, 0, -err); 227 - rxrpc_input_call_event(call, skb); 219 + rxrpc_input_call_event(call); 228 220 229 - spin_lock(&peer->lock); 221 + spin_lock_irq(&peer->lock); 230 222 } 231 223 232 - spin_unlock(&peer->lock); 224 + spin_unlock_irq(&peer->lock); 233 225 } 234 226 235 227 /* ··· 354 346 timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay); 355 347 356 348 _leave(""); 349 + } 350 + 351 + /* 352 + * Do path MTU probing. 353 + */ 354 + void rxrpc_input_probe_for_pmtud(struct rxrpc_connection *conn, rxrpc_serial_t acked_serial, 355 + bool sendmsg_fail) 356 + { 357 + struct rxrpc_peer *peer = conn->peer; 358 + unsigned int max_data = peer->max_data; 359 + int good, trial, bad, jumbo; 360 + 361 + good = peer->pmtud_good; 362 + trial = peer->pmtud_trial; 363 + bad = peer->pmtud_bad; 364 + if (good >= bad - 1) { 365 + conn->pmtud_probe = 0; 366 + peer->pmtud_lost = false; 367 + return; 368 + } 369 + 370 + if (!peer->pmtud_probing) 371 + goto send_probe; 372 + 373 + if (sendmsg_fail || after(acked_serial, conn->pmtud_probe)) { 374 + /* Retry a lost probe. */ 375 + if (!peer->pmtud_lost) { 376 + trace_rxrpc_pmtud_lost(conn, acked_serial); 377 + conn->pmtud_probe = 0; 378 + peer->pmtud_lost = true; 379 + goto send_probe; 380 + } 381 + 382 + /* The probed size didn't seem to get through. */ 383 + bad = trial; 384 + peer->pmtud_bad = bad; 385 + if (bad <= max_data) 386 + max_data = bad - 1; 387 + } else { 388 + /* It did get through. */ 389 + good = trial; 390 + peer->pmtud_good = good; 391 + if (good > max_data) 392 + max_data = good; 393 + } 394 + 395 + max_data = umin(max_data, peer->ackr_max_data); 396 + if (max_data != peer->max_data) { 397 + preempt_disable(); 398 + write_seqcount_begin(&peer->mtu_lock); 399 + peer->max_data = max_data; 400 + write_seqcount_end(&peer->mtu_lock); 401 + preempt_enable(); 402 + } 403 + 404 + jumbo = max_data + sizeof(struct rxrpc_jumbo_header); 405 + jumbo /= RXRPC_JUMBO_SUBPKTLEN; 406 + peer->pmtud_jumbo = jumbo; 407 + 408 + trace_rxrpc_pmtud_rx(conn, acked_serial); 409 + conn->pmtud_probe = 0; 410 + peer->pmtud_lost = false; 411 + 412 + if (good < RXRPC_JUMBO(2) && bad > RXRPC_JUMBO(2)) 413 + trial = RXRPC_JUMBO(2); 414 + else if (good < RXRPC_JUMBO(4) && bad > RXRPC_JUMBO(4)) 415 + trial = RXRPC_JUMBO(4); 416 + else if (good < RXRPC_JUMBO(3) && bad > RXRPC_JUMBO(3)) 417 + trial = RXRPC_JUMBO(3); 418 + else if (good < RXRPC_JUMBO(6) && bad > RXRPC_JUMBO(6)) 419 + trial = RXRPC_JUMBO(6); 420 + else if (good < RXRPC_JUMBO(5) && bad > RXRPC_JUMBO(5)) 421 + trial = RXRPC_JUMBO(5); 422 + else if (good < RXRPC_JUMBO(8) && bad > RXRPC_JUMBO(8)) 423 + trial = RXRPC_JUMBO(8); 424 + else if (good < RXRPC_JUMBO(7) && bad > RXRPC_JUMBO(7)) 425 + trial = RXRPC_JUMBO(7); 426 + else 427 + trial = (good + bad) / 2; 428 + peer->pmtud_trial = trial; 429 + 430 + if (good >= bad) 431 + return; 432 + 433 + send_probe: 434 + peer->pmtud_pending = true; 357 435 }
+21 -9
net/rxrpc/peer_object.c
··· 162 162 #endif 163 163 164 164 peer->if_mtu = 1500; 165 + if (peer->max_data < peer->if_mtu - peer->hdrsize) { 166 + trace_rxrpc_pmtud_reduce(peer, 0, peer->if_mtu - peer->hdrsize, 167 + rxrpc_pmtud_reduce_route); 168 + peer->max_data = peer->if_mtu - peer->hdrsize; 169 + } 165 170 166 171 memset(&fl, 0, sizeof(fl)); 167 172 switch (peer->srx.transport.family) { ··· 204 199 } 205 200 206 201 peer->if_mtu = dst_mtu(dst); 202 + peer->hdrsize += dst->header_len + dst->trailer_len; 203 + peer->tx_seg_max = dst->dev->gso_max_segs; 207 204 dst_release(dst); 205 + 206 + peer->max_data = umin(RXRPC_JUMBO(1), peer->if_mtu - peer->hdrsize); 207 + peer->pmtud_good = 500; 208 + peer->pmtud_bad = peer->if_mtu - peer->hdrsize + 1; 209 + peer->pmtud_trial = umin(peer->max_data, peer->pmtud_bad - 1); 210 + peer->pmtud_pending = true; 208 211 209 212 _leave(" [if_mtu %u]", peer->if_mtu); 210 213 } ··· 235 222 peer->service_conns = RB_ROOT; 236 223 seqlock_init(&peer->service_conn_lock); 237 224 spin_lock_init(&peer->lock); 238 - spin_lock_init(&peer->rtt_input_lock); 225 + seqcount_init(&peer->mtu_lock); 239 226 peer->debug_id = atomic_inc_return(&rxrpc_debug_id); 240 - 241 - rxrpc_peer_init_rtt(peer); 242 - 227 + peer->recent_srtt_us = UINT_MAX; 243 228 peer->cong_ssthresh = RXRPC_TX_MAX_WINDOW; 244 229 trace_rxrpc_peer(peer->debug_id, 1, why); 245 230 } ··· 253 242 unsigned long hash_key) 254 243 { 255 244 peer->hash_key = hash_key; 256 - rxrpc_assess_MTU_size(local, peer); 257 - peer->mtu = peer->if_mtu; 258 - peer->rtt_last_req = ktime_get_real(); 245 + 259 246 260 247 switch (peer->srx.transport.family) { 261 248 case AF_INET: ··· 277 268 } 278 269 279 270 peer->hdrsize += sizeof(struct rxrpc_wire_header); 280 - peer->maxdata = peer->mtu - peer->hdrsize; 271 + peer->max_data = peer->if_mtu - peer->hdrsize; 272 + 273 + rxrpc_assess_MTU_size(local, peer); 281 274 } 282 275 283 276 /* ··· 315 304 * Set up a new incoming peer. There shouldn't be any other matching peers 316 305 * since we've already done a search in the list from the non-reentrant context 317 306 * (the data_ready handler) that is the only place we can add new peers. 307 + * Called with interrupts disabled. 318 308 */ 319 309 void rxrpc_new_incoming_peer(struct rxrpc_local *local, struct rxrpc_peer *peer) 320 310 { ··· 491 479 */ 492 480 unsigned int rxrpc_kernel_get_srtt(const struct rxrpc_peer *peer) 493 481 { 494 - return peer->rtt_count > 0 ? peer->srtt_us >> 3 : UINT_MAX; 482 + return READ_ONCE(peer->recent_srtt_us); 495 483 } 496 484 EXPORT_SYMBOL(rxrpc_kernel_get_srtt); 497 485
+43 -18
net/rxrpc/proc.c
··· 52 52 struct rxrpc_call *call; 53 53 struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq)); 54 54 enum rxrpc_call_state state; 55 - rxrpc_seq_t acks_hard_ack; 55 + rxrpc_seq_t tx_bottom; 56 56 char lbuff[50], rbuff[50]; 57 57 long timeout = 0; 58 58 ··· 79 79 if (state != RXRPC_CALL_SERVER_PREALLOC) 80 80 timeout = ktime_ms_delta(READ_ONCE(call->expect_rx_by), ktime_get_real()); 81 81 82 - acks_hard_ack = READ_ONCE(call->acks_hard_ack); 82 + tx_bottom = READ_ONCE(call->tx_bottom); 83 83 seq_printf(seq, 84 84 "UDP %-47.47s %-47.47s %4x %08x %08x %s %3u" 85 85 " %-8.8s %08x %08x %08x %02x %08x %02x %08x %02x %06lx\n", ··· 93 93 rxrpc_call_states[state], 94 94 call->abort_code, 95 95 call->debug_id, 96 - acks_hard_ack, READ_ONCE(call->tx_top) - acks_hard_ack, 96 + tx_bottom, READ_ONCE(call->tx_top) - tx_bottom, 97 97 call->ackr_window, call->ackr_wtop - call->ackr_window, 98 98 call->rx_serial, 99 99 call->cong_cwnd, ··· 283 283 284 284 if (v == SEQ_START_TOKEN) { 285 285 seq_puts(seq, 286 - "Proto Local " 287 - " Remote " 288 - " Use SST MTU LastUse RTT RTO\n" 286 + "Proto Local Remote Use SST Maxd LastUse RTT RTO\n" 289 287 ); 290 288 return 0; 291 289 } ··· 296 298 297 299 now = ktime_get_seconds(); 298 300 seq_printf(seq, 299 - "UDP %-47.47s %-47.47s %3u" 300 - " %3u %5u %6llus %8u %8u\n", 301 + "UDP %-47.47s %-47.47s %3u %4u %5u %6llus %8d %8d\n", 301 302 lbuff, 302 303 rbuff, 303 304 refcount_read(&peer->ref), 304 305 peer->cong_ssthresh, 305 - peer->mtu, 306 + peer->max_data, 306 307 now - peer->last_tx_at, 307 - peer->srtt_us >> 3, 308 - peer->rto_us); 308 + READ_ONCE(peer->recent_srtt_us), 309 + READ_ONCE(peer->recent_rto_us)); 309 310 310 311 return 0; 311 312 } ··· 473 476 struct rxrpc_net *rxnet = rxrpc_net(seq_file_single_net(seq)); 474 477 475 478 seq_printf(seq, 476 - "Data : send=%u sendf=%u fail=%u\n", 479 + "Data : send=%u sendf=%u fail=%u emsz=%u\n", 477 480 atomic_read(&rxnet->stat_tx_data_send), 478 481 atomic_read(&rxnet->stat_tx_data_send_frag), 479 - atomic_read(&rxnet->stat_tx_data_send_fail)); 482 + atomic_read(&rxnet->stat_tx_data_send_fail), 483 + atomic_read(&rxnet->stat_tx_data_send_msgsize)); 480 484 seq_printf(seq, 481 485 "Data-Tx : nr=%u retrans=%u uf=%u cwr=%u\n", 482 486 atomic_read(&rxnet->stat_tx_data), ··· 506 508 atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_DELAY]), 507 509 atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_IDLE])); 508 510 seq_printf(seq, 509 - "Ack-Rx : req=%u dup=%u oos=%u exw=%u nos=%u png=%u prs=%u dly=%u idl=%u\n", 511 + "Ack-Rx : req=%u dup=%u oos=%u exw=%u nos=%u png=%u prs=%u dly=%u idl=%u z=%u\n", 510 512 atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_REQUESTED]), 511 513 atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_DUPLICATE]), 512 514 atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_OUT_OF_SEQUENCE]), ··· 515 517 atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_PING]), 516 518 atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_PING_RESPONSE]), 517 519 atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_DELAY]), 518 - atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_IDLE])); 520 + atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_IDLE]), 521 + atomic_read(&rxnet->stat_rx_acks[0])); 519 522 seq_printf(seq, 520 - "Why-Req-A: acklost=%u already=%u mrtt=%u ortt=%u\n", 523 + "Why-Req-A: acklost=%u mrtt=%u ortt=%u stall=%u\n", 521 524 atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_ack_lost]), 522 - atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_already_on]), 523 525 atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_more_rtt]), 524 - atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_old_rtt])); 526 + atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_old_rtt]), 527 + atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_app_stall])); 525 528 seq_printf(seq, 526 529 "Why-Req-A: nolast=%u retx=%u slows=%u smtxw=%u\n", 527 530 atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_no_srv_last]), 528 531 atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_retrans]), 529 532 atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_slow_start]), 530 533 atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_small_txwin])); 534 + seq_printf(seq, 535 + "Jumbo-Tx : %u,%u,%u,%u,%u,%u,%u,%u,%u,%u\n", 536 + atomic_read(&rxnet->stat_tx_jumbo[0]), 537 + atomic_read(&rxnet->stat_tx_jumbo[1]), 538 + atomic_read(&rxnet->stat_tx_jumbo[2]), 539 + atomic_read(&rxnet->stat_tx_jumbo[3]), 540 + atomic_read(&rxnet->stat_tx_jumbo[4]), 541 + atomic_read(&rxnet->stat_tx_jumbo[5]), 542 + atomic_read(&rxnet->stat_tx_jumbo[6]), 543 + atomic_read(&rxnet->stat_tx_jumbo[7]), 544 + atomic_read(&rxnet->stat_tx_jumbo[8]), 545 + atomic_read(&rxnet->stat_tx_jumbo[9])); 546 + seq_printf(seq, 547 + "Jumbo-Rx : %u,%u,%u,%u,%u,%u,%u,%u,%u,%u\n", 548 + atomic_read(&rxnet->stat_rx_jumbo[0]), 549 + atomic_read(&rxnet->stat_rx_jumbo[1]), 550 + atomic_read(&rxnet->stat_rx_jumbo[2]), 551 + atomic_read(&rxnet->stat_rx_jumbo[3]), 552 + atomic_read(&rxnet->stat_rx_jumbo[4]), 553 + atomic_read(&rxnet->stat_rx_jumbo[5]), 554 + atomic_read(&rxnet->stat_rx_jumbo[6]), 555 + atomic_read(&rxnet->stat_rx_jumbo[7]), 556 + atomic_read(&rxnet->stat_rx_jumbo[8]), 557 + atomic_read(&rxnet->stat_rx_jumbo[9])); 531 558 seq_printf(seq, 532 559 "Buffers : txb=%u rxb=%u\n", 533 560 atomic_read(&rxrpc_nr_txbuf), ··· 590 567 atomic_set(&rxnet->stat_tx_ack_skip, 0); 591 568 memset(&rxnet->stat_tx_acks, 0, sizeof(rxnet->stat_tx_acks)); 592 569 memset(&rxnet->stat_rx_acks, 0, sizeof(rxnet->stat_rx_acks)); 570 + memset(&rxnet->stat_tx_jumbo, 0, sizeof(rxnet->stat_tx_jumbo)); 571 + memset(&rxnet->stat_rx_jumbo, 0, sizeof(rxnet->stat_rx_jumbo)); 593 572 594 573 memset(&rxnet->stat_why_req_ack, 0, sizeof(rxnet->stat_why_req_ack)); 595 574
+9 -4
net/rxrpc/protocol.h
··· 92 92 /* 93 93 * The maximum number of subpackets that can possibly fit in a UDP packet is: 94 94 * 95 - * ((max_IP - IP_hdr - UDP_hdr) / RXRPC_JUMBO_SUBPKTLEN) + 1 96 - * = ((65535 - 28 - 28) / 1416) + 1 97 - * = 46 non-terminal packets and 1 terminal packet. 95 + * (max_UDP - wirehdr + jumbohdr) / (jumbohdr + 1412) 96 + * = ((65535 - 28 + 4) / 1416) 97 + * = 45 non-terminal packets and 1 terminal packet. 98 98 */ 99 - #define RXRPC_MAX_NR_JUMBO 47 99 + #define RXRPC_MAX_NR_JUMBO 46 100 + 101 + /* Size of a jumbo packet with N subpackets, excluding UDP+IP */ 102 + #define RXRPC_JUMBO(N) ((int)sizeof(struct rxrpc_wire_header) + \ 103 + RXRPC_JUMBO_DATALEN + \ 104 + ((N) - 1) * RXRPC_JUMBO_SUBPKTLEN) 100 105 101 106 /*****************************************************************************/ 102 107 /*
+9 -9
net/rxrpc/recvmsg.c
··· 36 36 sk = &rx->sk; 37 37 if (rx && sk->sk_state < RXRPC_CLOSE) { 38 38 if (call->notify_rx) { 39 - spin_lock(&call->notify_lock); 39 + spin_lock_irq(&call->notify_lock); 40 40 call->notify_rx(sk, call, call->user_call_ID); 41 - spin_unlock(&call->notify_lock); 41 + spin_unlock_irq(&call->notify_lock); 42 42 } else { 43 - spin_lock(&rx->recvmsg_lock); 43 + spin_lock_irq(&rx->recvmsg_lock); 44 44 if (list_empty(&call->recvmsg_link)) { 45 45 rxrpc_get_call(call, rxrpc_call_get_notify_socket); 46 46 list_add_tail(&call->recvmsg_link, &rx->recvmsg_q); 47 47 } 48 - spin_unlock(&rx->recvmsg_lock); 48 + spin_unlock_irq(&rx->recvmsg_lock); 49 49 50 50 if (!sock_flag(sk, SOCK_DEAD)) { 51 51 _debug("call %ps", sk->sk_data_ready); ··· 337 337 * We also want to weed out calls that got requeued whilst we were 338 338 * shovelling data out. 339 339 */ 340 - spin_lock(&rx->recvmsg_lock); 340 + spin_lock_irq(&rx->recvmsg_lock); 341 341 l = rx->recvmsg_q.next; 342 342 call = list_entry(l, struct rxrpc_call, recvmsg_link); 343 343 344 344 if (!rxrpc_call_is_complete(call) && 345 345 skb_queue_empty(&call->recvmsg_queue)) { 346 346 list_del_init(&call->recvmsg_link); 347 - spin_unlock(&rx->recvmsg_lock); 347 + spin_unlock_irq(&rx->recvmsg_lock); 348 348 release_sock(&rx->sk); 349 349 trace_rxrpc_recvmsg(call->debug_id, rxrpc_recvmsg_unqueue, 0); 350 350 rxrpc_put_call(call, rxrpc_call_put_recvmsg); ··· 355 355 list_del_init(&call->recvmsg_link); 356 356 else 357 357 rxrpc_get_call(call, rxrpc_call_get_recvmsg); 358 - spin_unlock(&rx->recvmsg_lock); 358 + spin_unlock_irq(&rx->recvmsg_lock); 359 359 360 360 call_debug_id = call->debug_id; 361 361 trace_rxrpc_recvmsg(call_debug_id, rxrpc_recvmsg_dequeue, 0); ··· 445 445 446 446 error_requeue_call: 447 447 if (!(flags & MSG_PEEK)) { 448 - spin_lock(&rx->recvmsg_lock); 448 + spin_lock_irq(&rx->recvmsg_lock); 449 449 list_add(&call->recvmsg_link, &rx->recvmsg_q); 450 - spin_unlock(&rx->recvmsg_lock); 450 + spin_unlock_irq(&rx->recvmsg_lock); 451 451 trace_rxrpc_recvmsg(call_debug_id, rxrpc_recvmsg_requeue, 0); 452 452 } else { 453 453 rxrpc_put_call(call, rxrpc_call_put_recvmsg);
+58 -45
net/rxrpc/rtt.c
··· 12 12 #include "ar-internal.h" 13 13 14 14 #define RXRPC_RTO_MAX (120 * USEC_PER_SEC) 15 - #define RXRPC_TIMEOUT_INIT ((unsigned int)(1 * MSEC_PER_SEC)) /* RFC6298 2.1 initial RTO value */ 15 + #define RXRPC_TIMEOUT_INIT ((unsigned int)(1 * USEC_PER_SEC)) /* RFC6298 2.1 initial RTO value */ 16 16 #define rxrpc_jiffies32 ((u32)jiffies) /* As rxrpc_jiffies32 */ 17 17 18 - static u32 rxrpc_rto_min_us(struct rxrpc_peer *peer) 18 + static u32 rxrpc_rto_min_us(struct rxrpc_call *call) 19 19 { 20 20 return 200; 21 21 } 22 22 23 - static u32 __rxrpc_set_rto(const struct rxrpc_peer *peer) 23 + static u32 __rxrpc_set_rto(const struct rxrpc_call *call) 24 24 { 25 - return (peer->srtt_us >> 3) + peer->rttvar_us; 25 + return (call->srtt_us >> 3) + call->rttvar_us; 26 26 } 27 27 28 28 static u32 rxrpc_bound_rto(u32 rto) 29 29 { 30 - return min(rto, RXRPC_RTO_MAX); 30 + return clamp(200000, rto + 100000, RXRPC_RTO_MAX); 31 31 } 32 32 33 33 /* ··· 40 40 * To save cycles in the RFC 1323 implementation it was better to break 41 41 * it up into three procedures. -- erics 42 42 */ 43 - static void rxrpc_rtt_estimator(struct rxrpc_peer *peer, long sample_rtt_us) 43 + static void rxrpc_rtt_estimator(struct rxrpc_call *call, long sample_rtt_us) 44 44 { 45 45 long m = sample_rtt_us; /* RTT */ 46 - u32 srtt = peer->srtt_us; 46 + u32 srtt = call->srtt_us; 47 47 48 48 /* The following amusing code comes from Jacobson's 49 49 * article in SIGCOMM '88. Note that rtt and mdev ··· 66 66 srtt += m; /* rtt = 7/8 rtt + 1/8 new */ 67 67 if (m < 0) { 68 68 m = -m; /* m is now abs(error) */ 69 - m -= (peer->mdev_us >> 2); /* similar update on mdev */ 69 + m -= (call->mdev_us >> 2); /* similar update on mdev */ 70 70 /* This is similar to one of Eifel findings. 71 71 * Eifel blocks mdev updates when rtt decreases. 72 72 * This solution is a bit different: we use finer gain ··· 78 78 if (m > 0) 79 79 m >>= 3; 80 80 } else { 81 - m -= (peer->mdev_us >> 2); /* similar update on mdev */ 81 + m -= (call->mdev_us >> 2); /* similar update on mdev */ 82 82 } 83 83 84 - peer->mdev_us += m; /* mdev = 3/4 mdev + 1/4 new */ 85 - if (peer->mdev_us > peer->mdev_max_us) { 86 - peer->mdev_max_us = peer->mdev_us; 87 - if (peer->mdev_max_us > peer->rttvar_us) 88 - peer->rttvar_us = peer->mdev_max_us; 84 + call->mdev_us += m; /* mdev = 3/4 mdev + 1/4 new */ 85 + if (call->mdev_us > call->mdev_max_us) { 86 + call->mdev_max_us = call->mdev_us; 87 + if (call->mdev_max_us > call->rttvar_us) 88 + call->rttvar_us = call->mdev_max_us; 89 89 } 90 90 } else { 91 91 /* no previous measure. */ 92 92 srtt = m << 3; /* take the measured time to be rtt */ 93 - peer->mdev_us = m << 1; /* make sure rto = 3*rtt */ 94 - peer->rttvar_us = max(peer->mdev_us, rxrpc_rto_min_us(peer)); 95 - peer->mdev_max_us = peer->rttvar_us; 93 + call->mdev_us = m << 1; /* make sure rto = 3*rtt */ 94 + call->rttvar_us = umax(call->mdev_us, rxrpc_rto_min_us(call)); 95 + call->mdev_max_us = call->rttvar_us; 96 96 } 97 97 98 - peer->srtt_us = max(1U, srtt); 98 + call->srtt_us = umax(srtt, 1); 99 99 } 100 100 101 101 /* 102 102 * Calculate rto without backoff. This is the second half of Van Jacobson's 103 103 * routine referred to above. 104 104 */ 105 - static void rxrpc_set_rto(struct rxrpc_peer *peer) 105 + static void rxrpc_set_rto(struct rxrpc_call *call) 106 106 { 107 107 u32 rto; 108 108 ··· 113 113 * is invisible. Actually, Linux-2.4 also generates erratic 114 114 * ACKs in some circumstances. 115 115 */ 116 - rto = __rxrpc_set_rto(peer); 116 + rto = __rxrpc_set_rto(call); 117 117 118 118 /* 2. Fixups made earlier cannot be right. 119 119 * If we do not estimate RTO correctly without them, ··· 124 124 /* NOTE: clamping at RXRPC_RTO_MIN is not required, current algo 125 125 * guarantees that rto is higher. 126 126 */ 127 - peer->rto_us = rxrpc_bound_rto(rto); 127 + call->rto_us = rxrpc_bound_rto(rto); 128 128 } 129 129 130 - static void rxrpc_ack_update_rtt(struct rxrpc_peer *peer, long rtt_us) 130 + static void rxrpc_update_rtt_min(struct rxrpc_call *call, ktime_t resp_time, long rtt_us) 131 + { 132 + /* Window size 5mins in approx usec (ipv4.sysctl_tcp_min_rtt_wlen) */ 133 + u32 wlen_us = 5ULL * NSEC_PER_SEC / 1024; 134 + 135 + minmax_running_min(&call->min_rtt, wlen_us, resp_time / 1024, 136 + (u32)rtt_us ? : jiffies_to_usecs(1)); 137 + } 138 + 139 + static void rxrpc_ack_update_rtt(struct rxrpc_call *call, ktime_t resp_time, long rtt_us) 131 140 { 132 141 if (rtt_us < 0) 133 142 return; 134 143 135 - //rxrpc_update_rtt_min(peer, rtt_us); 136 - rxrpc_rtt_estimator(peer, rtt_us); 137 - rxrpc_set_rto(peer); 144 + /* Update RACK min RTT [RFC8985 6.1 Step 1]. */ 145 + rxrpc_update_rtt_min(call, resp_time, rtt_us); 138 146 139 - /* RFC6298: only reset backoff on valid RTT measurement. */ 140 - peer->backoff = 0; 147 + rxrpc_rtt_estimator(call, rtt_us); 148 + rxrpc_set_rto(call); 149 + 150 + /* Only reset backoff on valid RTT measurement [RFC6298]. */ 151 + call->backoff = 0; 141 152 } 142 153 143 154 /* 144 155 * Add RTT information to cache. This is called in softirq mode and has 145 - * exclusive access to the peer RTT data. 156 + * exclusive access to the call RTT data. 146 157 */ 147 - void rxrpc_peer_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, 158 + void rxrpc_call_add_rtt(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why, 148 159 int rtt_slot, 149 160 rxrpc_serial_t send_serial, rxrpc_serial_t resp_serial, 150 161 ktime_t send_time, ktime_t resp_time) 151 162 { 152 - struct rxrpc_peer *peer = call->peer; 153 163 s64 rtt_us; 154 164 155 165 rtt_us = ktime_to_us(ktime_sub(resp_time, send_time)); 156 166 if (rtt_us < 0) 157 167 return; 158 168 159 - spin_lock(&peer->rtt_input_lock); 160 - rxrpc_ack_update_rtt(peer, rtt_us); 161 - if (peer->rtt_count < 3) 162 - peer->rtt_count++; 163 - spin_unlock(&peer->rtt_input_lock); 169 + rxrpc_ack_update_rtt(call, resp_time, rtt_us); 170 + if (call->rtt_count < 3) 171 + call->rtt_count++; 172 + call->rtt_taken++; 173 + 174 + WRITE_ONCE(call->peer->recent_srtt_us, call->srtt_us / 8); 175 + WRITE_ONCE(call->peer->recent_rto_us, call->rto_us); 164 176 165 177 trace_rxrpc_rtt_rx(call, why, rtt_slot, send_serial, resp_serial, 166 - peer->srtt_us >> 3, peer->rto_us); 178 + rtt_us, call->srtt_us, call->rto_us); 167 179 } 168 180 169 181 /* 170 182 * Get the retransmission timeout to set in nanoseconds, backing it off each 171 183 * time we retransmit. 172 184 */ 173 - ktime_t rxrpc_get_rto_backoff(struct rxrpc_peer *peer, bool retrans) 185 + ktime_t rxrpc_get_rto_backoff(struct rxrpc_call *call, bool retrans) 174 186 { 175 187 u64 timo_us; 176 - u32 backoff = READ_ONCE(peer->backoff); 188 + u32 backoff = READ_ONCE(call->backoff); 177 189 178 - timo_us = peer->rto_us; 190 + timo_us = call->rto_us; 179 191 timo_us <<= backoff; 180 192 if (retrans && timo_us * 2 <= RXRPC_RTO_MAX) 181 - WRITE_ONCE(peer->backoff, backoff + 1); 193 + WRITE_ONCE(call->backoff, backoff + 1); 182 194 183 195 if (timo_us < 1) 184 196 timo_us = 1; ··· 198 186 return ns_to_ktime(timo_us * NSEC_PER_USEC); 199 187 } 200 188 201 - void rxrpc_peer_init_rtt(struct rxrpc_peer *peer) 189 + void rxrpc_call_init_rtt(struct rxrpc_call *call) 202 190 { 203 - peer->rto_us = RXRPC_TIMEOUT_INIT; 204 - peer->mdev_us = RXRPC_TIMEOUT_INIT; 205 - peer->backoff = 0; 206 - //minmax_reset(&peer->rtt_min, rxrpc_jiffies32, ~0U); 191 + call->rtt_last_req = KTIME_MIN; 192 + call->rto_us = RXRPC_TIMEOUT_INIT; 193 + call->mdev_us = RXRPC_TIMEOUT_INIT; 194 + call->backoff = 0; 195 + //minmax_reset(&call->rtt_min, rxrpc_jiffies32, ~0U); 207 196 }
+38 -21
net/rxrpc/rxkad.c
··· 148 148 static struct rxrpc_txbuf *rxkad_alloc_txbuf(struct rxrpc_call *call, size_t remain, gfp_t gfp) 149 149 { 150 150 struct rxrpc_txbuf *txb; 151 - size_t shdr, space; 151 + size_t shdr, alloc, limit, part; 152 152 153 - remain = min(remain, 65535 - sizeof(struct rxrpc_wire_header)); 153 + remain = umin(remain, 65535 - sizeof(struct rxrpc_wire_header)); 154 154 155 155 switch (call->conn->security_level) { 156 156 default: 157 - space = min_t(size_t, remain, RXRPC_JUMBO_DATALEN); 158 - return rxrpc_alloc_data_txbuf(call, space, 1, gfp); 157 + alloc = umin(remain, RXRPC_JUMBO_DATALEN); 158 + return rxrpc_alloc_data_txbuf(call, alloc, 1, gfp); 159 159 case RXRPC_SECURITY_AUTH: 160 160 shdr = sizeof(struct rxkad_level1_hdr); 161 161 break; ··· 164 164 break; 165 165 } 166 166 167 - space = min_t(size_t, round_down(RXRPC_JUMBO_DATALEN, RXKAD_ALIGN), remain + shdr); 168 - space = round_up(space, RXKAD_ALIGN); 167 + limit = round_down(RXRPC_JUMBO_DATALEN, RXKAD_ALIGN) - shdr; 168 + if (remain < limit) { 169 + part = remain; 170 + alloc = round_up(shdr + part, RXKAD_ALIGN); 171 + } else { 172 + part = limit; 173 + alloc = RXRPC_JUMBO_DATALEN; 174 + } 169 175 170 - txb = rxrpc_alloc_data_txbuf(call, space, RXKAD_ALIGN, gfp); 176 + txb = rxrpc_alloc_data_txbuf(call, alloc, RXKAD_ALIGN, gfp); 171 177 if (!txb) 172 178 return NULL; 173 179 174 180 txb->offset += shdr; 175 - txb->space -= shdr; 181 + txb->space = part; 176 182 return txb; 177 183 } 178 184 ··· 269 263 check = txb->seq ^ call->call_id; 270 264 hdr->data_size = htonl((u32)check << 16 | txb->len); 271 265 272 - txb->len += sizeof(struct rxkad_level1_hdr); 273 - pad = txb->len; 266 + txb->pkt_len = sizeof(struct rxkad_level1_hdr) + txb->len; 267 + pad = txb->pkt_len; 274 268 pad = RXKAD_ALIGN - pad; 275 269 pad &= RXKAD_ALIGN - 1; 276 270 if (pad) { 277 271 memset(txb->kvec[0].iov_base + txb->offset, 0, pad); 278 - txb->len += pad; 272 + txb->pkt_len += pad; 279 273 } 280 274 281 275 /* start the encryption afresh */ ··· 304 298 struct rxkad_level2_hdr *rxkhdr = (void *)(whdr + 1); 305 299 struct rxrpc_crypt iv; 306 300 struct scatterlist sg; 307 - size_t pad; 301 + size_t content, pad; 308 302 u16 check; 309 303 int ret; 310 304 ··· 315 309 rxkhdr->data_size = htonl(txb->len | (u32)check << 16); 316 310 rxkhdr->checksum = 0; 317 311 318 - txb->len += sizeof(struct rxkad_level2_hdr); 319 - pad = txb->len; 320 - pad = RXKAD_ALIGN - pad; 321 - pad &= RXKAD_ALIGN - 1; 322 - if (pad) { 312 + content = sizeof(struct rxkad_level2_hdr) + txb->len; 313 + txb->pkt_len = round_up(content, RXKAD_ALIGN); 314 + pad = txb->pkt_len - content; 315 + if (pad) 323 316 memset(txb->kvec[0].iov_base + txb->offset, 0, pad); 324 - txb->len += pad; 325 - } 326 317 327 318 /* encrypt from the session key */ 328 319 token = call->conn->key->payload.data[0]; 329 320 memcpy(&iv, token->kad->session_key, sizeof(iv)); 330 321 331 - sg_init_one(&sg, rxkhdr, txb->len); 322 + sg_init_one(&sg, rxkhdr, txb->pkt_len); 332 323 skcipher_request_set_sync_tfm(req, call->conn->rxkad.cipher); 333 324 skcipher_request_set_callback(req, 0, NULL, NULL); 334 - skcipher_request_set_crypt(req, &sg, &sg, txb->len, iv.x); 325 + skcipher_request_set_crypt(req, &sg, &sg, txb->pkt_len, iv.x); 335 326 ret = crypto_skcipher_encrypt(req); 336 327 skcipher_request_zero(req); 337 328 return ret; ··· 387 384 388 385 switch (call->conn->security_level) { 389 386 case RXRPC_SECURITY_PLAIN: 387 + txb->pkt_len = txb->len; 390 388 ret = 0; 391 389 break; 392 390 case RXRPC_SECURITY_AUTH: 393 391 ret = rxkad_secure_packet_auth(call, txb, req); 392 + if (txb->alloc_size == RXRPC_JUMBO_DATALEN) 393 + txb->jumboable = true; 394 394 break; 395 395 case RXRPC_SECURITY_ENCRYPT: 396 396 ret = rxkad_secure_packet_encrypt(call, txb, req); 397 + if (txb->alloc_size == RXRPC_JUMBO_DATALEN) 398 + txb->jumboable = true; 397 399 break; 398 400 default: 399 401 ret = -EPERM; 400 402 break; 403 + } 404 + 405 + /* Clear excess space in the packet */ 406 + if (txb->pkt_len < txb->alloc_size) { 407 + struct rxrpc_wire_header *whdr = txb->kvec[0].iov_base; 408 + size_t gap = txb->alloc_size - txb->pkt_len; 409 + void *p = whdr + 1; 410 + 411 + memset(p + txb->pkt_len, 0, gap); 401 412 } 402 413 403 414 skcipher_request_free(req);
+1 -1
net/rxrpc/rxperf.c
··· 503 503 reply_len + sizeof(rxperf_magic_cookie)); 504 504 505 505 while (reply_len > 0) { 506 - len = min_t(size_t, reply_len, PAGE_SIZE); 506 + len = umin(reply_len, PAGE_SIZE); 507 507 bvec_set_page(&bv, ZERO_PAGE(0), len, 0); 508 508 iov_iter_bvec(&msg.msg_iter, WRITE, &bv, 1, len); 509 509 msg.msg_flags = MSG_MORE;
+2 -2
net/rxrpc/security.c
··· 114 114 if (conn->state == RXRPC_CONN_CLIENT_UNSECURED) { 115 115 ret = conn->security->init_connection_security(conn, token); 116 116 if (ret == 0) { 117 - spin_lock(&conn->state_lock); 117 + spin_lock_irq(&conn->state_lock); 118 118 if (conn->state == RXRPC_CONN_CLIENT_UNSECURED) 119 119 conn->state = RXRPC_CONN_CLIENT; 120 - spin_unlock(&conn->state_lock); 120 + spin_unlock_irq(&conn->state_lock); 121 121 } 122 122 } 123 123 mutex_unlock(&conn->security_lock);
+68 -24
net/rxrpc/sendmsg.c
··· 94 94 */ 95 95 static bool rxrpc_check_tx_space(struct rxrpc_call *call, rxrpc_seq_t *_tx_win) 96 96 { 97 + rxrpc_seq_t tx_bottom = READ_ONCE(call->tx_bottom); 98 + 97 99 if (_tx_win) 98 - *_tx_win = call->tx_bottom; 99 - return call->tx_prepared - call->tx_bottom < 256; 100 + *_tx_win = tx_bottom; 101 + return call->send_top - tx_bottom < 256; 100 102 } 101 103 102 104 /* ··· 134 132 rxrpc_seq_t tx_start, tx_win; 135 133 signed long rtt, timeout; 136 134 137 - rtt = READ_ONCE(call->peer->srtt_us) >> 3; 135 + rtt = READ_ONCE(call->srtt_us) >> 3; 138 136 rtt = usecs_to_jiffies(rtt) * 2; 139 137 if (rtt < 2) 140 138 rtt = 2; 141 139 142 140 timeout = rtt; 143 - tx_start = smp_load_acquire(&call->acks_hard_ack); 141 + tx_start = READ_ONCE(call->tx_bottom); 144 142 145 143 for (;;) { 146 144 set_current_state(TASK_UNINTERRUPTIBLE); ··· 197 195 DECLARE_WAITQUEUE(myself, current); 198 196 int ret; 199 197 200 - _enter(",{%u,%u,%u,%u}", 201 - call->tx_bottom, call->acks_hard_ack, call->tx_top, call->tx_winsize); 198 + _enter(",{%u,%u,%u}", 199 + call->tx_bottom, call->tx_top, call->tx_winsize); 202 200 203 201 add_wait_queue(&call->waitq, &myself); 204 202 ··· 242 240 struct rxrpc_txbuf *txb, 243 241 rxrpc_notify_end_tx_t notify_end_tx) 244 242 { 243 + struct rxrpc_txqueue *sq = call->send_queue; 245 244 rxrpc_seq_t seq = txb->seq; 246 245 bool poke, last = txb->flags & RXRPC_LAST_PACKET; 247 - 246 + int ix = seq & RXRPC_TXQ_MASK; 248 247 rxrpc_inc_stat(call->rxnet, stat_tx_data); 249 248 250 - ASSERTCMP(txb->seq, ==, call->tx_prepared + 1); 251 - 252 - /* We have to set the timestamp before queueing as the retransmit 253 - * algorithm can see the packet as soon as we queue it. 254 - */ 255 - txb->last_sent = ktime_get_real(); 249 + ASSERTCMP(txb->seq, ==, call->send_top + 1); 256 250 257 251 if (last) 258 252 trace_rxrpc_txqueue(call, rxrpc_txqueue_queue_last); 259 253 else 260 254 trace_rxrpc_txqueue(call, rxrpc_txqueue_queue); 261 255 256 + if (WARN_ON_ONCE(sq->bufs[ix])) 257 + trace_rxrpc_tq(call, sq, seq, rxrpc_tq_queue_dup); 258 + else 259 + trace_rxrpc_tq(call, sq, seq, rxrpc_tq_queue); 260 + 262 261 /* Add the packet to the call's output buffer */ 263 - spin_lock(&call->tx_lock); 264 - poke = list_empty(&call->tx_sendmsg); 265 - list_add_tail(&txb->call_link, &call->tx_sendmsg); 266 - call->tx_prepared = seq; 267 - if (last) 262 + poke = (READ_ONCE(call->tx_bottom) == call->send_top); 263 + sq->bufs[ix] = txb; 264 + /* Order send_top after the queue->next pointer and txb content. */ 265 + smp_store_release(&call->send_top, seq); 266 + if (last) { 268 267 rxrpc_notify_end_tx(rx, call, notify_end_tx); 269 - spin_unlock(&call->tx_lock); 268 + call->send_queue = NULL; 269 + } 270 270 271 271 if (poke) 272 272 rxrpc_poke_call(call, rxrpc_call_poke_start); 273 + } 274 + 275 + /* 276 + * Allocate a new txqueue unit and add it to the transmission queue. 277 + */ 278 + static int rxrpc_alloc_txqueue(struct sock *sk, struct rxrpc_call *call) 279 + { 280 + struct rxrpc_txqueue *tq; 281 + 282 + tq = kzalloc(sizeof(*tq), sk->sk_allocation); 283 + if (!tq) 284 + return -ENOMEM; 285 + 286 + tq->xmit_ts_base = KTIME_MIN; 287 + for (int i = 0; i < RXRPC_NR_TXQUEUE; i++) 288 + tq->segment_xmit_ts[i] = UINT_MAX; 289 + 290 + if (call->send_queue) { 291 + tq->qbase = call->send_top + 1; 292 + call->send_queue->next = tq; 293 + call->send_queue = tq; 294 + } else if (WARN_ON(call->tx_queue)) { 295 + kfree(tq); 296 + return -ENOMEM; 297 + } else { 298 + /* We start at seq 1, so pretend seq 0 is hard-acked. */ 299 + tq->nr_reported_acks = 1; 300 + tq->segment_acked = 1UL; 301 + tq->qbase = 0; 302 + call->tx_qbase = 0; 303 + call->send_queue = tq; 304 + call->tx_qtail = tq; 305 + call->tx_queue = tq; 306 + } 307 + 308 + trace_rxrpc_tq(call, tq, call->send_top, rxrpc_tq_alloc); 309 + return 0; 273 310 } 274 311 275 312 /* ··· 385 344 if (!rxrpc_check_tx_space(call, NULL)) 386 345 goto wait_for_space; 387 346 347 + /* See if we need to begin/extend the Tx queue. */ 348 + if (!call->send_queue || !((call->send_top + 1) & RXRPC_TXQ_MASK)) { 349 + ret = rxrpc_alloc_txqueue(sk, call); 350 + if (ret < 0) 351 + goto maybe_error; 352 + } 353 + 388 354 /* Work out the maximum size of a packet. Assume that 389 355 * the security header is going to be in the padded 390 356 * region (enc blocksize), but the trailer is not. ··· 408 360 409 361 /* append next segment of data to the current buffer */ 410 362 if (msg_data_left(msg) > 0) { 411 - size_t copy = min_t(size_t, txb->space, msg_data_left(msg)); 363 + size_t copy = umin(txb->space, msg_data_left(msg)); 412 364 413 365 _debug("add %zu", copy); 414 366 if (!copy_from_iter_full(txb->kvec[0].iov_base + txb->offset, ··· 433 385 (msg_data_left(msg) == 0 && !more)) { 434 386 if (msg_data_left(msg) == 0 && !more) 435 387 txb->flags |= RXRPC_LAST_PACKET; 436 - else if (call->tx_top - call->acks_hard_ack < 437 - call->tx_winsize) 438 - txb->flags |= RXRPC_MORE_PACKETS; 439 388 440 389 ret = call->security->secure_packet(call, txb); 441 390 if (ret < 0) 442 391 goto out; 443 392 444 393 txb->kvec[0].iov_len += txb->len; 445 - txb->len = txb->kvec[0].iov_len; 446 394 rxrpc_queue_packet(rx, call, txb, notify_end_tx); 447 395 txb = NULL; 448 396 }
+4 -2
net/rxrpc/sysctl.c
··· 11 11 #include "ar-internal.h" 12 12 13 13 static struct ctl_table_header *rxrpc_sysctl_reg_table; 14 + static const unsigned int rxrpc_rx_mtu_min = 500; 15 + static const unsigned int rxrpc_jumbo_max = RXRPC_MAX_NR_JUMBO; 14 16 static const unsigned int four = 4; 15 17 static const unsigned int max_backlog = RXRPC_BACKLOG_MAX - 1; 16 18 static const unsigned int n_65535 = 65535; ··· 117 115 .maxlen = sizeof(unsigned int), 118 116 .mode = 0644, 119 117 .proc_handler = proc_dointvec_minmax, 120 - .extra1 = (void *)SYSCTL_ONE, 118 + .extra1 = (void *)&rxrpc_rx_mtu_min, 121 119 .extra2 = (void *)&n_65535, 122 120 }, 123 121 { ··· 127 125 .mode = 0644, 128 126 .proc_handler = proc_dointvec_minmax, 129 127 .extra1 = (void *)SYSCTL_ONE, 130 - .extra2 = (void *)&four, 128 + .extra2 = (void *)&rxrpc_jumbo_max, 131 129 }, 132 130 }; 133 131
+5 -122
net/rxrpc/txbuf.c
··· 24 24 size_t total, hoff; 25 25 void *buf; 26 26 27 - txb = kmalloc(sizeof(*txb), gfp); 27 + txb = kzalloc(sizeof(*txb), gfp); 28 28 if (!txb) 29 29 return NULL; 30 30 ··· 43 43 44 44 whdr = buf + hoff; 45 45 46 - INIT_LIST_HEAD(&txb->call_link); 47 - INIT_LIST_HEAD(&txb->tx_link); 48 46 refcount_set(&txb->ref, 1); 49 - txb->last_sent = KTIME_MIN; 50 47 txb->call_debug_id = call->debug_id; 51 48 txb->debug_id = atomic_inc_return(&rxrpc_txbuf_debug_ids); 49 + txb->alloc_size = data_size; 52 50 txb->space = data_size; 53 - txb->len = 0; 54 51 txb->offset = sizeof(*whdr); 55 52 txb->flags = call->conn->out_clientflag; 56 - txb->ack_why = 0; 57 - txb->seq = call->tx_prepared + 1; 58 - txb->serial = 0; 59 - txb->cksum = 0; 53 + txb->seq = call->send_top + 1; 60 54 txb->nr_kvec = 1; 61 55 txb->kvec[0].iov_base = whdr; 62 56 txb->kvec[0].iov_len = sizeof(*whdr); ··· 69 75 trace_rxrpc_txbuf(txb->debug_id, txb->call_debug_id, txb->seq, 1, 70 76 rxrpc_txbuf_alloc_data); 71 77 72 - atomic_inc(&rxrpc_nr_txbuf); 73 - return txb; 74 - } 75 - 76 - /* 77 - * Allocate and partially initialise an ACK packet. 78 - */ 79 - struct rxrpc_txbuf *rxrpc_alloc_ack_txbuf(struct rxrpc_call *call, size_t sack_size) 80 - { 81 - struct rxrpc_wire_header *whdr; 82 - struct rxrpc_acktrailer *trailer; 83 - struct rxrpc_ackpacket *ack; 84 - struct rxrpc_txbuf *txb; 85 - gfp_t gfp = rcu_read_lock_held() ? GFP_ATOMIC | __GFP_NOWARN : GFP_NOFS; 86 - void *buf, *buf2 = NULL; 87 - u8 *filler; 88 - 89 - txb = kmalloc(sizeof(*txb), gfp); 90 - if (!txb) 91 - return NULL; 92 - 93 - buf = page_frag_alloc(&call->local->tx_alloc, 94 - sizeof(*whdr) + sizeof(*ack) + 1 + 3 + sizeof(*trailer), gfp); 95 - if (!buf) { 96 - kfree(txb); 97 - return NULL; 98 - } 99 - 100 - if (sack_size) { 101 - buf2 = page_frag_alloc(&call->local->tx_alloc, sack_size, gfp); 102 - if (!buf2) { 103 - page_frag_free(buf); 104 - kfree(txb); 105 - return NULL; 106 - } 107 - } 108 - 109 - whdr = buf; 110 - ack = buf + sizeof(*whdr); 111 - filler = buf + sizeof(*whdr) + sizeof(*ack) + 1; 112 - trailer = buf + sizeof(*whdr) + sizeof(*ack) + 1 + 3; 113 - 114 - INIT_LIST_HEAD(&txb->call_link); 115 - INIT_LIST_HEAD(&txb->tx_link); 116 - refcount_set(&txb->ref, 1); 117 - txb->call_debug_id = call->debug_id; 118 - txb->debug_id = atomic_inc_return(&rxrpc_txbuf_debug_ids); 119 - txb->space = 0; 120 - txb->len = sizeof(*whdr) + sizeof(*ack) + 3 + sizeof(*trailer); 121 - txb->offset = 0; 122 - txb->flags = call->conn->out_clientflag; 123 - txb->ack_rwind = 0; 124 - txb->seq = 0; 125 - txb->serial = 0; 126 - txb->cksum = 0; 127 - txb->nr_kvec = 3; 128 - txb->kvec[0].iov_base = whdr; 129 - txb->kvec[0].iov_len = sizeof(*whdr) + sizeof(*ack); 130 - txb->kvec[1].iov_base = buf2; 131 - txb->kvec[1].iov_len = sack_size; 132 - txb->kvec[2].iov_base = filler; 133 - txb->kvec[2].iov_len = 3 + sizeof(*trailer); 134 - 135 - whdr->epoch = htonl(call->conn->proto.epoch); 136 - whdr->cid = htonl(call->cid); 137 - whdr->callNumber = htonl(call->call_id); 138 - whdr->seq = 0; 139 - whdr->type = RXRPC_PACKET_TYPE_ACK; 140 - whdr->flags = 0; 141 - whdr->userStatus = 0; 142 - whdr->securityIndex = call->security_ix; 143 - whdr->_rsvd = 0; 144 - whdr->serviceId = htons(call->dest_srx.srx_service); 145 - 146 - get_page(virt_to_head_page(trailer)); 147 - 148 - trace_rxrpc_txbuf(txb->debug_id, txb->call_debug_id, txb->seq, 1, 149 - rxrpc_txbuf_alloc_ack); 150 78 atomic_inc(&rxrpc_nr_txbuf); 151 79 return txb; 152 80 } ··· 95 179 trace_rxrpc_txbuf(txb->debug_id, txb->call_debug_id, txb->seq, 0, 96 180 rxrpc_txbuf_free); 97 181 for (i = 0; i < txb->nr_kvec; i++) 98 - if (txb->kvec[i].iov_base) 182 + if (txb->kvec[i].iov_base && 183 + !is_zero_pfn(page_to_pfn(virt_to_page(txb->kvec[i].iov_base)))) 99 184 page_frag_free(txb->kvec[i].iov_base); 100 185 kfree(txb); 101 186 atomic_dec(&rxrpc_nr_txbuf); ··· 118 201 if (dead) 119 202 rxrpc_free_txbuf(txb); 120 203 } 121 - } 122 - 123 - /* 124 - * Shrink the transmit buffer. 125 - */ 126 - void rxrpc_shrink_call_tx_buffer(struct rxrpc_call *call) 127 - { 128 - struct rxrpc_txbuf *txb; 129 - rxrpc_seq_t hard_ack = smp_load_acquire(&call->acks_hard_ack); 130 - bool wake = false; 131 - 132 - _enter("%x/%x/%x", call->tx_bottom, call->acks_hard_ack, call->tx_top); 133 - 134 - while ((txb = list_first_entry_or_null(&call->tx_buffer, 135 - struct rxrpc_txbuf, call_link))) { 136 - hard_ack = smp_load_acquire(&call->acks_hard_ack); 137 - if (before(hard_ack, txb->seq)) 138 - break; 139 - 140 - if (txb->seq != call->tx_bottom + 1) 141 - rxrpc_see_txbuf(txb, rxrpc_txbuf_see_out_of_step); 142 - ASSERTCMP(txb->seq, ==, call->tx_bottom + 1); 143 - smp_store_release(&call->tx_bottom, call->tx_bottom + 1); 144 - list_del_rcu(&txb->call_link); 145 - 146 - trace_rxrpc_txqueue(call, rxrpc_txqueue_dequeue); 147 - 148 - rxrpc_put_txbuf(txb, rxrpc_txbuf_put_rotated); 149 - if (after(call->acks_hard_ack, call->tx_bottom + 128)) 150 - wake = true; 151 - } 152 - 153 - if (wake) 154 - wake_up(&call->waitq); 155 204 }