···11+ LC-trie implementation notes.22+33+Node types44+----------55+leaf 66+ An end node with data. This has a copy of the relevant key, along77+ with 'hlist' with routing table entries sorted by prefix length.88+ See struct leaf and struct leaf_info.99+1010+trie node or tnode1111+ An internal node, holding an array of child (leaf or tnode) pointers,1212+ indexed through a subset of the key. See Level Compression.1313+1414+A few concepts explained1515+------------------------1616+Bits (tnode) 1717+ The number of bits in the key segment used for indexing into the1818+ child array - the "child index". See Level Compression.1919+2020+Pos (tnode)2121+ The position (in the key) of the key segment used for indexing into2222+ the child array. See Path Compression.2323+2424+Path Compression / skipped bits2525+ Any given tnode is linked to from the child array of its parent, using2626+ a segment of the key specified by the parent's "pos" and "bits" 2727+ In certain cases, this tnode's own "pos" will not be immediately2828+ adjacent to the parent (pos+bits), but there will be some bits2929+ in the key skipped over because they represent a single path with no3030+ deviations. These "skipped bits" constitute Path Compression.3131+ Note that the search algorithm will simply skip over these bits when3232+ searching, making it necessary to save the keys in the leaves to3333+ verify that they actually do match the key we are searching for.3434+3535+Level Compression / child arrays3636+ the trie is kept level balanced moving, under certain conditions, the3737+ children of a full child (see "full_children") up one level, so that3838+ instead of a pure binary tree, each internal node ("tnode") may3939+ contain an arbitrarily large array of links to several children.4040+ Conversely, a tnode with a mostly empty child array (see empty_children)4141+ may be "halved", having some of its children moved downwards one level,4242+ in order to avoid ever-increasing child arrays.4343+4444+empty_children4545+ the number of positions in the child array of a given tnode that are4646+ NULL.4747+4848+full_children4949+ the number of children of a given tnode that aren't path compressed.5050+ (in other words, they aren't NULL or leaves and their "pos" is equal5151+ to this tnode's "pos"+"bits").5252+5353+ (The word "full" here is used more in the sense of "complete" than5454+ as the opposite of "empty", which might be a tad confusing.)5555+5656+Comments5757+---------5858+5959+We have tried to keep the structure of the code as close to fib_hash as 6060+possible to allow verification and help up reviewing. 6161+6262+fib_find_node()6363+ A good start for understanding this code. This function implements a6464+ straightforward trie lookup.6565+6666+fib_insert_node()6767+ Inserts a new leaf node in the trie. This is bit more complicated than6868+ fib_find_node(). Inserting a new node means we might have to run the6969+ level compression algorithm on part of the trie.7070+7171+trie_leaf_remove()7272+ Looks up a key, deletes it and runs the level compression algorithm.7373+7474+trie_rebalance()7575+ The key function for the dynamic trie after any change in the trie7676+ it is run to optimize and reorganize. Tt will walk the trie upwards 7777+ towards the root from a given tnode, doing a resize() at each step 7878+ to implement level compression.7979+8080+resize()8181+ Analyzes a tnode and optimizes the child array size by either inflating8282+ or shrinking it repeatedly until it fullfills the criteria for optimal8383+ level compression. This part follows the original paper pretty closely8484+ and there may be some room for experimentation here.8585+8686+inflate()8787+ Doubles the size of the child array within a tnode. Used by resize().8888+8989+halve()9090+ Halves the size of the child array within a tnode - the inverse of9191+ inflate(). Used by resize();9292+9393+fn_trie_insert(), fn_trie_delete(), fn_trie_select_default()9494+ The route manipulation functions. Should conform pretty closely to the9595+ corresponding functions in fib_hash.9696+9797+fn_trie_flush()9898+ This walks the full trie (using nextleaf()) and searches for empty9999+ leaves which have to be removed.100100+101101+fn_trie_dump()102102+ Dumps the routing table ordered by prefix length. This is somewhat103103+ slower than the corresponding fib_hash function, as we have to walk the104104+ entire trie for each prefix length. In comparison, fib_hash is organized105105+ as one "zone"/hash per prefix length.106106+107107+Locking108108+-------109109+110110+fib_lock is used for an RW-lock in the same way that this is done in fib_hash.111111+However, the functions are somewhat separated for other possible locking112112+scenarios. It might conceivably be possible to run trie_rebalance via RCU113113+to avoid read_lock in the fn_trie_lookup() function.114114+115115+Main lookup mechanism116116+---------------------117117+fn_trie_lookup() is the main lookup function.118118+119119+The lookup is in its simplest form just like fib_find_node(). We descend the120120+trie, key segment by key segment, until we find a leaf. check_leaf() does121121+the fib_semantic_match in the leaf's sorted prefix hlist.122122+123123+If we find a match, we are done.124124+125125+If we don't find a match, we enter prefix matching mode. The prefix length,126126+starting out at the same as the key length, is reduced one step at a time,127127+and we backtrack upwards through the trie trying to find a longest matching128128+prefix. The goal is always to reach a leaf and get a positive result from the129129+fib_semantic_match mechanism.130130+131131+Inside each tnode, the search for longest matching prefix consists of searching132132+through the child array, chopping off (zeroing) the least significant "1" of133133+the child index until we find a match or the child index consists of nothing but134134+zeros.135135+136136+At this point we backtrack (t->stats.backtrack++) up the trie, continuing to137137+chop off part of the key in order to find the longest matching prefix.138138+139139+At this point we will repeatedly descend subtries to look for a match, and there140140+are some optimizations available that can provide us with "shortcuts" to avoid141141+descending into dead ends. Look for "HL_OPTIMIZE" sections in the code.142142+143143+To alleviate any doubts about the correctness of the route selection process,144144+a new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which145145+gives userland access to fib_lookup().
+16-26
drivers/net/shaper.c
···135135{136136 struct shaper *shaper = dev->priv;137137 struct sk_buff *ptr;138138-139139- if (down_trylock(&shaper->sem))140140- return -1;141141-138138+139139+ spin_lock(&shaper->lock);142140 ptr=shaper->sendq.prev;143141144142 /*···230232 shaper->stats.collisions++;231233 }232234 shaper_kick(shaper);233233- up(&shaper->sem);235235+ spin_unlock(&shaper->lock);234236 return 0;235237}236238···269271{270272 struct shaper *shaper = (struct shaper *)data;271273272272- if (!down_trylock(&shaper->sem)) {273273- shaper_kick(shaper);274274- up(&shaper->sem);275275- } else276276- mod_timer(&shaper->timer, jiffies);274274+ spin_lock(&shaper->lock);275275+ shaper_kick(shaper);276276+ spin_unlock(&shaper->lock);277277}278278279279/*···328332329333330334/*331331- * Flush the shaper queues on a closedown332332- */333333-334334-static void shaper_flush(struct shaper *shaper)335335-{336336- struct sk_buff *skb;337337-338338- down(&shaper->sem);339339- while((skb=skb_dequeue(&shaper->sendq))!=NULL)340340- dev_kfree_skb(skb);341341- shaper_kick(shaper);342342- up(&shaper->sem);343343-}344344-345345-/*346335 * Bring the interface up. We just disallow this until a 347336 * bind.348337 */···356375static int shaper_close(struct net_device *dev)357376{358377 struct shaper *shaper=dev->priv;359359- shaper_flush(shaper);378378+ struct sk_buff *skb;379379+380380+ while ((skb = skb_dequeue(&shaper->sendq)) != NULL)381381+ dev_kfree_skb(skb);382382+383383+ spin_lock_bh(&shaper->lock);384384+ shaper_kick(shaper);385385+ spin_unlock_bh(&shaper->lock);386386+360387 del_timer_sync(&shaper->timer);361388 return 0;362389}···565576 init_timer(&sh->timer);566577 sh->timer.function=shaper_timer;567578 sh->timer.data=(unsigned long)sh;579579+ spin_lock_init(&sh->lock);568580}569581570582/*
···2323 __u32 shapeclock;2424 unsigned long recovery; /* Time we can next clock a packet out on2525 an empty queue */2626- struct semaphore sem;2626+ spinlock_t lock;2727 struct net_device_stats stats;2828 struct net_device *dev;2929 int (*hard_start_xmit) (struct sk_buff *skb,
+9-10
include/linux/skbuff.h
···183183 * @priority: Packet queueing priority184184 * @users: User count - see {datagram,tcp}.c185185 * @protocol: Packet protocol from driver186186- * @security: Security level of packet187186 * @truesize: Buffer size 188187 * @head: Head of buffer189188 * @data: Data head pointer···248249 data_len,249250 mac_len,250251 csum;251251- unsigned char local_df,252252- cloned:1,253253- nohdr:1,254254- pkt_type,255255- ip_summed;256252 __u32 priority;257257- unsigned short protocol,258258- security;253253+ __u8 local_df:1,254254+ cloned:1,255255+ ip_summed:2,256256+ nohdr:1;257257+ /* 3 bits spare */258258+ __u8 pkt_type;259259+ __u16 protocol;259260260261 void (*destructor)(struct sk_buff *skb);261262#ifdef CONFIG_NETFILTER262262- unsigned long nfmark;263263+ unsigned long nfmark;263264 __u32 nfcache;264265 __u32 nfctinfo;265266 struct nf_conntrack *nfct;···12101211{12111212 int hlen = skb_headlen(skb);1212121312131213- if (offset + len <= hlen)12141214+ if (hlen - offset >= len)12141215 return skb->data + offset;1215121612161217 if (skb_copy_bits(skb, offset, buffer, len) < 0)
···286286 __u32 max_window; /* Maximal window ever seen from peer */287287 __u32 pmtu_cookie; /* Last pmtu seen by socket */288288 __u32 mss_cache; /* Cached effective mss, not including SACKS */289289- __u16 mss_cache_std; /* Like mss_cache, but without TSO */289289+ __u16 xmit_size_goal; /* Goal for segmenting output packets */290290 __u16 ext_header_len; /* Network protocol overhead (IP/IPv6 options) */291291 __u8 ca_state; /* State of fast-retransmit machine */292292 __u8 retransmits; /* Number of unrecovered RTO timeouts. */
···170170};171171#define NULLSLCOMPR (struct slcompress *)0172172173173-#define __ARGS(x) x174174-175173/* In slhc.c: */176176-struct slcompress *slhc_init __ARGS((int rslots, int tslots));177177-void slhc_free __ARGS((struct slcompress *comp));174174+struct slcompress *slhc_init(int rslots, int tslots);175175+void slhc_free(struct slcompress *comp);178176179179-int slhc_compress __ARGS((struct slcompress *comp, unsigned char *icp,180180- int isize, unsigned char *ocp, unsigned char **cpp,181181- int compress_cid));182182-int slhc_uncompress __ARGS((struct slcompress *comp, unsigned char *icp,183183- int isize));184184-int slhc_remember __ARGS((struct slcompress *comp, unsigned char *icp,185185- int isize));186186-int slhc_toss __ARGS((struct slcompress *comp));177177+int slhc_compress(struct slcompress *comp, unsigned char *icp, int isize,178178+ unsigned char *ocp, unsigned char **cpp, int compress_cid);179179+int slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize);180180+int slhc_remember(struct slcompress *comp, unsigned char *icp, int isize);181181+int slhc_toss(struct slcompress *comp);187182188183#endif /* _SLHC_H */
+5-2
include/net/sock.h
···11341134static inline struct sk_buff *sk_stream_alloc_pskb(struct sock *sk,11351135 int size, int mem, int gfp)11361136{11371137- struct sk_buff *skb = alloc_skb(size + sk->sk_prot->max_header, gfp);11371137+ struct sk_buff *skb;11381138+ int hdr_len;1138113911401140+ hdr_len = SKB_DATA_ALIGN(sk->sk_prot->max_header);11411141+ skb = alloc_skb(size + hdr_len, gfp);11391142 if (skb) {11401143 skb->truesize += mem;11411144 if (sk->sk_forward_alloc >= (int)skb->truesize ||11421145 sk_stream_mem_schedule(sk, skb->truesize, 0)) {11431143- skb_reserve(skb, sk->sk_prot->max_header);11461146+ skb_reserve(skb, hdr_len);11441147 return skb;11451148 }11461149 __kfree_skb(skb);
+17-139
include/net/tcp.h
···721721 return tp->ack.pending&TCP_ACK_SCHED;722722}723723724724-static __inline__ void tcp_dec_quickack_mode(struct tcp_sock *tp)724724+static __inline__ void tcp_dec_quickack_mode(struct tcp_sock *tp, unsigned int pkts)725725{726726- if (tp->ack.quick && --tp->ack.quick == 0) {727727- /* Leaving quickack mode we deflate ATO. */728728- tp->ack.ato = TCP_ATO_MIN;726726+ if (tp->ack.quick) {727727+ if (pkts >= tp->ack.quick) {728728+ tp->ack.quick = 0;729729+730730+ /* Leaving quickack mode we deflate ATO. */731731+ tp->ack.ato = TCP_ATO_MIN;732732+ } else733733+ tp->ack.quick -= pkts;729734 }730735}731736···848843849844/* tcp_output.c */850845851851-extern int tcp_write_xmit(struct sock *, int nonagle);846846+extern void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp,847847+ unsigned int cur_mss, int nonagle);848848+extern int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp);852849extern int tcp_retransmit_skb(struct sock *, struct sk_buff *);853850extern void tcp_xmit_retransmit_queue(struct sock *);854851extern void tcp_simple_retransmit(struct sock *);···862855extern void tcp_send_fin(struct sock *sk);863856extern void tcp_send_active_reset(struct sock *sk, int priority);864857extern int tcp_send_synack(struct sock *);865865-extern void tcp_push_one(struct sock *, unsigned mss_now);858858+extern void tcp_push_one(struct sock *, unsigned int mss_now);866859extern void tcp_send_ack(struct sock *sk);867860extern void tcp_send_delayed_ack(struct sock *sk);861861+862862+/* tcp_input.c */863863+extern void tcp_cwnd_application_limited(struct sock *sk);868864869865/* tcp_timer.c */870866extern void tcp_init_xmit_timers(struct sock *);···968958static inline void tcp_initialize_rcv_mss(struct sock *sk)969959{970960 struct tcp_sock *tp = tcp_sk(sk);971971- unsigned int hint = min(tp->advmss, tp->mss_cache_std);961961+ unsigned int hint = min_t(unsigned int, tp->advmss, tp->mss_cache);972962973963 hint = min(hint, tp->rcv_wnd/2);974964 hint = min(hint, TCP_MIN_RCVMSS);···12351225 tp->left_out = tp->sacked_out + tp->lost_out;12361226}1237122712381238-extern void tcp_cwnd_application_limited(struct sock *sk);12391239-12401240-/* Congestion window validation. (RFC2861) */12411241-12421242-static inline void tcp_cwnd_validate(struct sock *sk, struct tcp_sock *tp)12431243-{12441244- __u32 packets_out = tp->packets_out;12451245-12461246- if (packets_out >= tp->snd_cwnd) {12471247- /* Network is feed fully. */12481248- tp->snd_cwnd_used = 0;12491249- tp->snd_cwnd_stamp = tcp_time_stamp;12501250- } else {12511251- /* Network starves. */12521252- if (tp->packets_out > tp->snd_cwnd_used)12531253- tp->snd_cwnd_used = tp->packets_out;12541254-12551255- if ((s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= tp->rto)12561256- tcp_cwnd_application_limited(sk);12571257- }12581258-}12591259-12601228/* Set slow start threshould and cwnd not falling to slow start */12611229static inline void __tcp_enter_cwr(struct tcp_sock *tp)12621230{···12671279 return 3;12681280}1269128112701270-static __inline__ int tcp_minshall_check(const struct tcp_sock *tp)12711271-{12721272- return after(tp->snd_sml,tp->snd_una) &&12731273- !after(tp->snd_sml, tp->snd_nxt);12741274-}12751275-12761282static __inline__ void tcp_minshall_update(struct tcp_sock *tp, int mss, 12771283 const struct sk_buff *skb)12781284{12791285 if (skb->len < mss)12801286 tp->snd_sml = TCP_SKB_CB(skb)->end_seq;12811281-}12821282-12831283-/* Return 0, if packet can be sent now without violation Nagle's rules:12841284- 1. It is full sized.12851285- 2. Or it contains FIN.12861286- 3. Or TCP_NODELAY was set.12871287- 4. Or TCP_CORK is not set, and all sent packets are ACKed.12881288- With Minshall's modification: all sent small packets are ACKed.12891289- */12901290-12911291-static __inline__ int12921292-tcp_nagle_check(const struct tcp_sock *tp, const struct sk_buff *skb, 12931293- unsigned mss_now, int nonagle)12941294-{12951295- return (skb->len < mss_now &&12961296- !(TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) &&12971297- ((nonagle&TCP_NAGLE_CORK) ||12981298- (!nonagle &&12991299- tp->packets_out &&13001300- tcp_minshall_check(tp))));13011301-}13021302-13031303-extern void tcp_set_skb_tso_segs(struct sock *, struct sk_buff *);13041304-13051305-/* This checks if the data bearing packet SKB (usually sk->sk_send_head)13061306- * should be put on the wire right now.13071307- */13081308-static __inline__ int tcp_snd_test(struct sock *sk,13091309- struct sk_buff *skb,13101310- unsigned cur_mss, int nonagle)13111311-{13121312- struct tcp_sock *tp = tcp_sk(sk);13131313- int pkts = tcp_skb_pcount(skb);13141314-13151315- if (!pkts) {13161316- tcp_set_skb_tso_segs(sk, skb);13171317- pkts = tcp_skb_pcount(skb);13181318- }13191319-13201320- /* RFC 1122 - section 4.2.3.413211321- *13221322- * We must queue if13231323- *13241324- * a) The right edge of this frame exceeds the window13251325- * b) There are packets in flight and we have a small segment13261326- * [SWS avoidance and Nagle algorithm]13271327- * (part of SWS is done on packetization)13281328- * Minshall version sounds: there are no _small_13291329- * segments in flight. (tcp_nagle_check)13301330- * c) We have too many packets 'in flight'13311331- *13321332- * Don't use the nagle rule for urgent data (or13331333- * for the final FIN -DaveM).13341334- *13351335- * Also, Nagle rule does not apply to frames, which13361336- * sit in the middle of queue (they have no chances13371337- * to get new data) and if room at tail of skb is13381338- * not enough to save something seriously (<32 for now).13391339- */13401340-13411341- /* Don't be strict about the congestion window for the13421342- * final FIN frame. -DaveM13431343- */13441344- return (((nonagle&TCP_NAGLE_PUSH) || tp->urg_mode13451345- || !tcp_nagle_check(tp, skb, cur_mss, nonagle)) &&13461346- (((tcp_packets_in_flight(tp) + (pkts-1)) < tp->snd_cwnd) ||13471347- (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)) &&13481348- !after(TCP_SKB_CB(skb)->end_seq, tp->snd_una + tp->snd_wnd));13491287}1350128813511289static __inline__ void tcp_check_probe_timer(struct sock *sk, struct tcp_sock *tp)···12801366 tcp_reset_xmit_timer(sk, TCP_TIME_PROBE0, tp->rto);12811367}1282136812831283-static __inline__ int tcp_skb_is_last(const struct sock *sk, 12841284- const struct sk_buff *skb)12851285-{12861286- return skb->next == (struct sk_buff *)&sk->sk_write_queue;12871287-}12881288-12891289-/* Push out any pending frames which were held back due to12901290- * TCP_CORK or attempt at coalescing tiny packets.12911291- * The socket must be locked by the caller.12921292- */12931293-static __inline__ void __tcp_push_pending_frames(struct sock *sk,12941294- struct tcp_sock *tp,12951295- unsigned cur_mss,12961296- int nonagle)12971297-{12981298- struct sk_buff *skb = sk->sk_send_head;12991299-13001300- if (skb) {13011301- if (!tcp_skb_is_last(sk, skb))13021302- nonagle = TCP_NAGLE_PUSH;13031303- if (!tcp_snd_test(sk, skb, cur_mss, nonagle) ||13041304- tcp_write_xmit(sk, nonagle))13051305- tcp_check_probe_timer(sk, tp);13061306- }13071307- tcp_cwnd_validate(sk, tp);13081308-}13091309-13101369static __inline__ void tcp_push_pending_frames(struct sock *sk,13111370 struct tcp_sock *tp)13121371{13131372 __tcp_push_pending_frames(sk, tp, tcp_current_mss(sk, 1), tp->nonagle);13141314-}13151315-13161316-static __inline__ int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp)13171317-{13181318- struct sk_buff *skb = sk->sk_send_head;13191319-13201320- return (skb &&13211321- tcp_snd_test(sk, skb, tcp_current_mss(sk, 1),13221322- tcp_skb_is_last(sk, skb) ? TCP_NAGLE_PUSH : tp->nonagle));13231373}1324137413251375static __inline__ void tcp_init_wl(struct tcp_sock *tp, u32 ack, u32 seq)
···4949 * will allow a single TSO frame to consume. Building TSO frames5050 * which are too large can cause TCP streams to be bursty.5151 */5252-int sysctl_tcp_tso_win_divisor = 8;5252+int sysctl_tcp_tso_win_divisor = 3;53535454static inline void update_send_head(struct sock *sk, struct tcp_sock *tp,5555 struct sk_buff *skb)···140140 tp->ack.pingpong = 1;141141}142142143143-static __inline__ void tcp_event_ack_sent(struct sock *sk)143143+static __inline__ void tcp_event_ack_sent(struct sock *sk, unsigned int pkts)144144{145145 struct tcp_sock *tp = tcp_sk(sk);146146147147- tcp_dec_quickack_mode(tp);147147+ tcp_dec_quickack_mode(tp, pkts);148148 tcp_clear_xmit_timer(sk, TCP_TIME_DACK);149149}150150···355355 tp->af_specific->send_check(sk, th, skb->len, skb);356356357357 if (tcb->flags & TCPCB_FLAG_ACK)358358- tcp_event_ack_sent(sk);358358+ tcp_event_ack_sent(sk, tcp_skb_pcount(skb));359359360360 if (skb->len != tcp_header_size)361361 tcp_event_data_sent(tp, skb, sk);···403403 sk->sk_send_head = skb;404404}405405406406-static inline void tcp_tso_set_push(struct sk_buff *skb)407407-{408408- /* Force push to be on for any TSO frames to workaround409409- * problems with busted implementations like Mac OS-X that410410- * hold off socket receive wakeups until push is seen.411411- */412412- if (tcp_skb_pcount(skb) > 1)413413- TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_PSH;414414-}415415-416416-/* Send _single_ skb sitting at the send head. This function requires417417- * true push pending frames to setup probe timer etc.418418- */419419-void tcp_push_one(struct sock *sk, unsigned cur_mss)420420-{421421- struct tcp_sock *tp = tcp_sk(sk);422422- struct sk_buff *skb = sk->sk_send_head;423423-424424- if (tcp_snd_test(sk, skb, cur_mss, TCP_NAGLE_PUSH)) {425425- /* Send it out now. */426426- TCP_SKB_CB(skb)->when = tcp_time_stamp;427427- tcp_tso_set_push(skb);428428- if (!tcp_transmit_skb(sk, skb_clone(skb, sk->sk_allocation))) {429429- sk->sk_send_head = NULL;430430- tp->snd_nxt = TCP_SKB_CB(skb)->end_seq;431431- tcp_packets_out_inc(sk, tp, skb);432432- return;433433- }434434- }435435-}436436-437437-void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb)406406+static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb)438407{439408 struct tcp_sock *tp = tcp_sk(sk);440409441441- if (skb->len <= tp->mss_cache_std ||410410+ if (skb->len <= tp->mss_cache ||442411 !(sk->sk_route_caps & NETIF_F_TSO)) {443412 /* Avoid the costly divide in the normal444413 * non-TSO case.···417448 } else {418449 unsigned int factor;419450420420- factor = skb->len + (tp->mss_cache_std - 1);421421- factor /= tp->mss_cache_std;451451+ factor = skb->len + (tp->mss_cache - 1);452452+ factor /= tp->mss_cache;422453 skb_shinfo(skb)->tso_segs = factor;423423- skb_shinfo(skb)->tso_size = tp->mss_cache_std;454454+ skb_shinfo(skb)->tso_size = tp->mss_cache;424455 }425456}426457···506537 }507538508539 /* Link BUFF into the send queue. */540540+ skb_header_release(buff);509541 __skb_append(skb, buff);510542511543 return 0;···627657628658 /* And store cached results */629659 tp->pmtu_cookie = pmtu;630630- tp->mss_cache = tp->mss_cache_std = mss_now;660660+ tp->mss_cache = mss_now;631661632662 return mss_now;633663}···639669 * cannot be large. However, taking into account rare use of URG, this640670 * is not a big flaw.641671 */642642-643643-unsigned int tcp_current_mss(struct sock *sk, int large)672672+unsigned int tcp_current_mss(struct sock *sk, int large_allowed)644673{645674 struct tcp_sock *tp = tcp_sk(sk);646675 struct dst_entry *dst = __sk_dst_get(sk);647647- unsigned int do_large, mss_now;676676+ u32 mss_now;677677+ u16 xmit_size_goal;678678+ int doing_tso = 0;648679649649- mss_now = tp->mss_cache_std;680680+ mss_now = tp->mss_cache;681681+682682+ if (large_allowed &&683683+ (sk->sk_route_caps & NETIF_F_TSO) &&684684+ !tp->urg_mode)685685+ doing_tso = 1;686686+650687 if (dst) {651688 u32 mtu = dst_mtu(dst);652689 if (mtu != tp->pmtu_cookie)653690 mss_now = tcp_sync_mss(sk, mtu);654691 }655692656656- do_large = (large &&657657- (sk->sk_route_caps & NETIF_F_TSO) &&658658- !tp->urg_mode);659659-660660- if (do_large) {661661- unsigned int large_mss, factor, limit;662662-663663- large_mss = 65535 - tp->af_specific->net_header_len -664664- tp->ext_header_len - tp->tcp_header_len;665665-666666- if (tp->max_window && large_mss > (tp->max_window>>1))667667- large_mss = max((tp->max_window>>1),668668- 68U - tp->tcp_header_len);669669-670670- factor = large_mss / mss_now;671671-672672- /* Always keep large mss multiple of real mss, but673673- * do not exceed 1/tso_win_divisor of the congestion window674674- * so we can keep the ACK clock ticking and minimize675675- * bursting.676676- */677677- limit = tp->snd_cwnd;678678- if (sysctl_tcp_tso_win_divisor)679679- limit /= sysctl_tcp_tso_win_divisor;680680- limit = max(1U, limit);681681- if (factor > limit)682682- factor = limit;683683-684684- tp->mss_cache = mss_now * factor;685685-686686- mss_now = tp->mss_cache;687687- }688688-689693 if (tp->rx_opt.eff_sacks)690694 mss_now -= (TCPOLEN_SACK_BASE_ALIGNED +691695 (tp->rx_opt.eff_sacks * TCPOLEN_SACK_PERBLOCK));696696+697697+ xmit_size_goal = mss_now;698698+699699+ if (doing_tso) {700700+ xmit_size_goal = 65535 -701701+ tp->af_specific->net_header_len -702702+ tp->ext_header_len - tp->tcp_header_len;703703+704704+ if (tp->max_window &&705705+ (xmit_size_goal > (tp->max_window >> 1)))706706+ xmit_size_goal = max((tp->max_window >> 1),707707+ 68U - tp->tcp_header_len);708708+709709+ xmit_size_goal -= (xmit_size_goal % mss_now);710710+ }711711+ tp->xmit_size_goal = xmit_size_goal;712712+692713 return mss_now;714714+}715715+716716+/* Congestion window validation. (RFC2861) */717717+718718+static inline void tcp_cwnd_validate(struct sock *sk, struct tcp_sock *tp)719719+{720720+ __u32 packets_out = tp->packets_out;721721+722722+ if (packets_out >= tp->snd_cwnd) {723723+ /* Network is feed fully. */724724+ tp->snd_cwnd_used = 0;725725+ tp->snd_cwnd_stamp = tcp_time_stamp;726726+ } else {727727+ /* Network starves. */728728+ if (tp->packets_out > tp->snd_cwnd_used)729729+ tp->snd_cwnd_used = tp->packets_out;730730+731731+ if ((s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= tp->rto)732732+ tcp_cwnd_application_limited(sk);733733+ }734734+}735735+736736+static unsigned int tcp_window_allows(struct tcp_sock *tp, struct sk_buff *skb, unsigned int mss_now, unsigned int cwnd)737737+{738738+ u32 window, cwnd_len;739739+740740+ window = (tp->snd_una + tp->snd_wnd - TCP_SKB_CB(skb)->seq);741741+ cwnd_len = mss_now * cwnd;742742+ return min(window, cwnd_len);743743+}744744+745745+/* Can at least one segment of SKB be sent right now, according to the746746+ * congestion window rules? If so, return how many segments are allowed.747747+ */748748+static inline unsigned int tcp_cwnd_test(struct tcp_sock *tp, struct sk_buff *skb)749749+{750750+ u32 in_flight, cwnd;751751+752752+ /* Don't be strict about the congestion window for the final FIN. */753753+ if (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)754754+ return 1;755755+756756+ in_flight = tcp_packets_in_flight(tp);757757+ cwnd = tp->snd_cwnd;758758+ if (in_flight < cwnd)759759+ return (cwnd - in_flight);760760+761761+ return 0;762762+}763763+764764+/* This must be invoked the first time we consider transmitting765765+ * SKB onto the wire.766766+ */767767+static inline int tcp_init_tso_segs(struct sock *sk, struct sk_buff *skb)768768+{769769+ int tso_segs = tcp_skb_pcount(skb);770770+771771+ if (!tso_segs) {772772+ tcp_set_skb_tso_segs(sk, skb);773773+ tso_segs = tcp_skb_pcount(skb);774774+ }775775+ return tso_segs;776776+}777777+778778+static inline int tcp_minshall_check(const struct tcp_sock *tp)779779+{780780+ return after(tp->snd_sml,tp->snd_una) &&781781+ !after(tp->snd_sml, tp->snd_nxt);782782+}783783+784784+/* Return 0, if packet can be sent now without violation Nagle's rules:785785+ * 1. It is full sized.786786+ * 2. Or it contains FIN. (already checked by caller)787787+ * 3. Or TCP_NODELAY was set.788788+ * 4. Or TCP_CORK is not set, and all sent packets are ACKed.789789+ * With Minshall's modification: all sent small packets are ACKed.790790+ */791791+792792+static inline int tcp_nagle_check(const struct tcp_sock *tp,793793+ const struct sk_buff *skb, 794794+ unsigned mss_now, int nonagle)795795+{796796+ return (skb->len < mss_now &&797797+ ((nonagle&TCP_NAGLE_CORK) ||798798+ (!nonagle &&799799+ tp->packets_out &&800800+ tcp_minshall_check(tp))));801801+}802802+803803+/* Return non-zero if the Nagle test allows this packet to be804804+ * sent now.805805+ */806806+static inline int tcp_nagle_test(struct tcp_sock *tp, struct sk_buff *skb,807807+ unsigned int cur_mss, int nonagle)808808+{809809+ /* Nagle rule does not apply to frames, which sit in the middle of the810810+ * write_queue (they have no chances to get new data).811811+ *812812+ * This is implemented in the callers, where they modify the 'nonagle'813813+ * argument based upon the location of SKB in the send queue.814814+ */815815+ if (nonagle & TCP_NAGLE_PUSH)816816+ return 1;817817+818818+ /* Don't use the nagle rule for urgent data (or for the final FIN). */819819+ if (tp->urg_mode ||820820+ (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN))821821+ return 1;822822+823823+ if (!tcp_nagle_check(tp, skb, cur_mss, nonagle))824824+ return 1;825825+826826+ return 0;827827+}828828+829829+/* Does at least the first segment of SKB fit into the send window? */830830+static inline int tcp_snd_wnd_test(struct tcp_sock *tp, struct sk_buff *skb, unsigned int cur_mss)831831+{832832+ u32 end_seq = TCP_SKB_CB(skb)->end_seq;833833+834834+ if (skb->len > cur_mss)835835+ end_seq = TCP_SKB_CB(skb)->seq + cur_mss;836836+837837+ return !after(end_seq, tp->snd_una + tp->snd_wnd);838838+}839839+840840+/* This checks if the data bearing packet SKB (usually sk->sk_send_head)841841+ * should be put on the wire right now. If so, it returns the number of842842+ * packets allowed by the congestion window.843843+ */844844+static unsigned int tcp_snd_test(struct sock *sk, struct sk_buff *skb,845845+ unsigned int cur_mss, int nonagle)846846+{847847+ struct tcp_sock *tp = tcp_sk(sk);848848+ unsigned int cwnd_quota;849849+850850+ tcp_init_tso_segs(sk, skb);851851+852852+ if (!tcp_nagle_test(tp, skb, cur_mss, nonagle))853853+ return 0;854854+855855+ cwnd_quota = tcp_cwnd_test(tp, skb);856856+ if (cwnd_quota &&857857+ !tcp_snd_wnd_test(tp, skb, cur_mss))858858+ cwnd_quota = 0;859859+860860+ return cwnd_quota;861861+}862862+863863+static inline int tcp_skb_is_last(const struct sock *sk, 864864+ const struct sk_buff *skb)865865+{866866+ return skb->next == (struct sk_buff *)&sk->sk_write_queue;867867+}868868+869869+int tcp_may_send_now(struct sock *sk, struct tcp_sock *tp)870870+{871871+ struct sk_buff *skb = sk->sk_send_head;872872+873873+ return (skb &&874874+ tcp_snd_test(sk, skb, tcp_current_mss(sk, 1),875875+ (tcp_skb_is_last(sk, skb) ?876876+ TCP_NAGLE_PUSH :877877+ tp->nonagle)));878878+}879879+880880+/* Trim TSO SKB to LEN bytes, put the remaining data into a new packet881881+ * which is put after SKB on the list. It is very much like882882+ * tcp_fragment() except that it may make several kinds of assumptions883883+ * in order to speed up the splitting operation. In particular, we884884+ * know that all the data is in scatter-gather pages, and that the885885+ * packet has never been sent out before (and thus is not cloned).886886+ */887887+static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len)888888+{889889+ struct sk_buff *buff;890890+ int nlen = skb->len - len;891891+ u16 flags;892892+893893+ /* All of a TSO frame must be composed of paged data. */894894+ BUG_ON(skb->len != skb->data_len);895895+896896+ buff = sk_stream_alloc_pskb(sk, 0, 0, GFP_ATOMIC);897897+ if (unlikely(buff == NULL))898898+ return -ENOMEM;899899+900900+ buff->truesize = nlen;901901+ skb->truesize -= nlen;902902+903903+ /* Correct the sequence numbers. */904904+ TCP_SKB_CB(buff)->seq = TCP_SKB_CB(skb)->seq + len;905905+ TCP_SKB_CB(buff)->end_seq = TCP_SKB_CB(skb)->end_seq;906906+ TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(buff)->seq;907907+908908+ /* PSH and FIN should only be set in the second packet. */909909+ flags = TCP_SKB_CB(skb)->flags;910910+ TCP_SKB_CB(skb)->flags = flags & ~(TCPCB_FLAG_FIN|TCPCB_FLAG_PSH);911911+ TCP_SKB_CB(buff)->flags = flags;912912+913913+ /* This packet was never sent out yet, so no SACK bits. */914914+ TCP_SKB_CB(buff)->sacked = 0;915915+916916+ buff->ip_summed = skb->ip_summed = CHECKSUM_HW;917917+ skb_split(skb, buff, len);918918+919919+ /* Fix up tso_factor for both original and new SKB. */920920+ tcp_set_skb_tso_segs(sk, skb);921921+ tcp_set_skb_tso_segs(sk, buff);922922+923923+ /* Link BUFF into the send queue. */924924+ skb_header_release(buff);925925+ __skb_append(skb, buff);926926+927927+ return 0;928928+}929929+930930+/* Try to defer sending, if possible, in order to minimize the amount931931+ * of TSO splitting we do. View it as a kind of TSO Nagle test.932932+ *933933+ * This algorithm is from John Heffner.934934+ */935935+static int tcp_tso_should_defer(struct sock *sk, struct tcp_sock *tp, struct sk_buff *skb)936936+{937937+ u32 send_win, cong_win, limit, in_flight;938938+939939+ if (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)940940+ return 0;941941+942942+ if (tp->ca_state != TCP_CA_Open)943943+ return 0;944944+945945+ in_flight = tcp_packets_in_flight(tp);946946+947947+ BUG_ON(tcp_skb_pcount(skb) <= 1 ||948948+ (tp->snd_cwnd <= in_flight));949949+950950+ send_win = (tp->snd_una + tp->snd_wnd) - TCP_SKB_CB(skb)->seq;951951+952952+ /* From in_flight test above, we know that cwnd > in_flight. */953953+ cong_win = (tp->snd_cwnd - in_flight) * tp->mss_cache;954954+955955+ limit = min(send_win, cong_win);956956+957957+ /* If sk_send_head can be sent fully now, just do it. */958958+ if (skb->len <= limit)959959+ return 0;960960+961961+ if (sysctl_tcp_tso_win_divisor) {962962+ u32 chunk = min(tp->snd_wnd, tp->snd_cwnd * tp->mss_cache);963963+964964+ /* If at least some fraction of a window is available,965965+ * just use it.966966+ */967967+ chunk /= sysctl_tcp_tso_win_divisor;968968+ if (limit >= chunk)969969+ return 0;970970+ } else {971971+ /* Different approach, try not to defer past a single972972+ * ACK. Receiver should ACK every other full sized973973+ * frame, so if we have space for more than 3 frames974974+ * then send now.975975+ */976976+ if (limit > tcp_max_burst(tp) * tp->mss_cache)977977+ return 0;978978+ }979979+980980+ /* Ok, it looks like it is advisable to defer. */981981+ return 1;693982}694983695984/* This routine writes packets to the network. It advances the···958729 * Returns 1, if no segments are in flight and we have queued segments, but959730 * cannot send anything now because of SWS or another problem.960731 */961961-int tcp_write_xmit(struct sock *sk, int nonagle)732732+static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)962733{963734 struct tcp_sock *tp = tcp_sk(sk);964964- unsigned int mss_now;735735+ struct sk_buff *skb;736736+ unsigned int tso_segs, sent_pkts;737737+ int cwnd_quota;965738966739 /* If we are closed, the bytes will have to remain here.967740 * In time closedown will finish, we empty the write queue and all968741 * will be happy.969742 */970970- if (sk->sk_state != TCP_CLOSE) {971971- struct sk_buff *skb;972972- int sent_pkts = 0;743743+ if (unlikely(sk->sk_state == TCP_CLOSE))744744+ return 0;973745974974- /* Account for SACKS, we may need to fragment due to this.975975- * It is just like the real MSS changing on us midstream.976976- * We also handle things correctly when the user adds some977977- * IP options mid-stream. Silly to do, but cover it.978978- */979979- mss_now = tcp_current_mss(sk, 1);746746+ skb = sk->sk_send_head;747747+ if (unlikely(!skb))748748+ return 0;980749981981- while ((skb = sk->sk_send_head) &&982982- tcp_snd_test(sk, skb, mss_now,983983- tcp_skb_is_last(sk, skb) ? nonagle :984984- TCP_NAGLE_PUSH)) {985985- if (skb->len > mss_now) {986986- if (tcp_fragment(sk, skb, mss_now))750750+ tso_segs = tcp_init_tso_segs(sk, skb);751751+ cwnd_quota = tcp_cwnd_test(tp, skb);752752+ if (unlikely(!cwnd_quota))753753+ goto out;754754+755755+ sent_pkts = 0;756756+ while (likely(tcp_snd_wnd_test(tp, skb, mss_now))) {757757+ BUG_ON(!tso_segs);758758+759759+ if (tso_segs == 1) {760760+ if (unlikely(!tcp_nagle_test(tp, skb, mss_now,761761+ (tcp_skb_is_last(sk, skb) ?762762+ nonagle : TCP_NAGLE_PUSH))))763763+ break;764764+ } else {765765+ if (tcp_tso_should_defer(sk, tp, skb))766766+ break;767767+ }768768+769769+ if (tso_segs > 1) {770770+ u32 limit = tcp_window_allows(tp, skb,771771+ mss_now, cwnd_quota);772772+773773+ if (skb->len < limit) {774774+ unsigned int trim = skb->len % mss_now;775775+776776+ if (trim)777777+ limit = skb->len - trim;778778+ }779779+ if (skb->len > limit) {780780+ if (tso_fragment(sk, skb, limit))987781 break;988782 }989989-990990- TCP_SKB_CB(skb)->when = tcp_time_stamp;991991- tcp_tso_set_push(skb);992992- if (tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC)))783783+ } else if (unlikely(skb->len > mss_now)) {784784+ if (unlikely(tcp_fragment(sk, skb, mss_now)))993785 break;994994-995995- /* Advance the send_head. This one is sent out.996996- * This call will increment packets_out.997997- */998998- update_send_head(sk, tp, skb);999999-10001000- tcp_minshall_update(tp, mss_now, skb);10011001- sent_pkts = 1;1002786 }100378710041004- if (sent_pkts) {10051005- tcp_cwnd_validate(sk, tp);10061006- return 0;10071007- }788788+ TCP_SKB_CB(skb)->when = tcp_time_stamp;100878910091009- return !tp->packets_out && sk->sk_send_head;790790+ if (unlikely(tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC))))791791+ break;792792+793793+ /* Advance the send_head. This one is sent out.794794+ * This call will increment packets_out.795795+ */796796+ update_send_head(sk, tp, skb);797797+798798+ tcp_minshall_update(tp, mss_now, skb);799799+ sent_pkts++;800800+801801+ /* Do not optimize this to use tso_segs. If we chopped up802802+ * the packet above, tso_segs will no longer be valid.803803+ */804804+ cwnd_quota -= tcp_skb_pcount(skb);805805+806806+ BUG_ON(cwnd_quota < 0);807807+ if (!cwnd_quota)808808+ break;809809+810810+ skb = sk->sk_send_head;811811+ if (!skb)812812+ break;813813+ tso_segs = tcp_init_tso_segs(sk, skb);1010814 }10111011- return 0;815815+816816+ if (likely(sent_pkts)) {817817+ tcp_cwnd_validate(sk, tp);818818+ return 0;819819+ }820820+out:821821+ return !tp->packets_out && sk->sk_send_head;822822+}823823+824824+/* Push out any pending frames which were held back due to825825+ * TCP_CORK or attempt at coalescing tiny packets.826826+ * The socket must be locked by the caller.827827+ */828828+void __tcp_push_pending_frames(struct sock *sk, struct tcp_sock *tp,829829+ unsigned int cur_mss, int nonagle)830830+{831831+ struct sk_buff *skb = sk->sk_send_head;832832+833833+ if (skb) {834834+ if (tcp_write_xmit(sk, cur_mss, nonagle))835835+ tcp_check_probe_timer(sk, tp);836836+ }837837+}838838+839839+/* Send _single_ skb sitting at the send head. This function requires840840+ * true push pending frames to setup probe timer etc.841841+ */842842+void tcp_push_one(struct sock *sk, unsigned int mss_now)843843+{844844+ struct tcp_sock *tp = tcp_sk(sk);845845+ struct sk_buff *skb = sk->sk_send_head;846846+ unsigned int tso_segs, cwnd_quota;847847+848848+ BUG_ON(!skb || skb->len < mss_now);849849+850850+ tso_segs = tcp_init_tso_segs(sk, skb);851851+ cwnd_quota = tcp_snd_test(sk, skb, mss_now, TCP_NAGLE_PUSH);852852+853853+ if (likely(cwnd_quota)) {854854+ BUG_ON(!tso_segs);855855+856856+ if (tso_segs > 1) {857857+ u32 limit = tcp_window_allows(tp, skb,858858+ mss_now, cwnd_quota);859859+860860+ if (skb->len < limit) {861861+ unsigned int trim = skb->len % mss_now;862862+863863+ if (trim)864864+ limit = skb->len - trim;865865+ }866866+ if (skb->len > limit) {867867+ if (unlikely(tso_fragment(sk, skb, limit)))868868+ return;869869+ }870870+ } else if (unlikely(skb->len > mss_now)) {871871+ if (unlikely(tcp_fragment(sk, skb, mss_now)))872872+ return;873873+ }874874+875875+ /* Send it out now. */876876+ TCP_SKB_CB(skb)->when = tcp_time_stamp;877877+878878+ if (likely(!tcp_transmit_skb(sk, skb_clone(skb, sk->sk_allocation)))) {879879+ update_send_head(sk, tp, skb);880880+ tcp_cwnd_validate(sk, tp);881881+ return;882882+ }883883+ }1012884}10138851014886/* This function returns the amount that we can raise the···13691039 if (sk->sk_route_caps & NETIF_F_TSO) {13701040 sk->sk_route_caps &= ~NETIF_F_TSO;13711041 sock_set_flag(sk, SOCK_NO_LARGESEND);13721372- tp->mss_cache = tp->mss_cache_std;13731042 }1374104313751044 if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))···14301101 * is still in somebody's hands, else make a clone.14311102 */14321103 TCP_SKB_CB(skb)->when = tcp_time_stamp;14331433- tcp_tso_set_push(skb);1434110414351105 err = tcp_transmit_skb(sk, (skb_cloned(skb) ?14361106 pskb_copy(skb, GFP_ATOMIC):···19981670 if (sk->sk_route_caps & NETIF_F_TSO) {19991671 sock_set_flag(sk, SOCK_NO_LARGESEND);20001672 sk->sk_route_caps &= ~NETIF_F_TSO;20012001- tp->mss_cache = tp->mss_cache_std;20021673 }20031674 } else if (!tcp_skb_pcount(skb))20041675 tcp_set_skb_tso_segs(sk, skb);2005167620061677 TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_PSH;20071678 TCP_SKB_CB(skb)->when = tcp_time_stamp;20082008- tcp_tso_set_push(skb);20091679 err = tcp_transmit_skb(sk, skb_clone(skb, GFP_ATOMIC));20101680 if (!err) {20111681 update_send_head(sk, tp, skb);
···399399{400400 int err;401401 struct rtattr *kind = tca[TCA_KIND-1];402402- void *p = NULL;403402 struct Qdisc *sch;404403 struct Qdisc_ops *ops;405405- int size;406404407405 ops = qdisc_lookup_ops(kind);408406#ifdef CONFIG_KMOD···435437 if (ops == NULL)436438 goto err_out;437439438438- /* ensure that the Qdisc and the private data are 32-byte aligned */439439- size = ((sizeof(*sch) + QDISC_ALIGN_CONST) & ~QDISC_ALIGN_CONST);440440- size += ops->priv_size + QDISC_ALIGN_CONST;441441-442442- p = kmalloc(size, GFP_KERNEL);443443- err = -ENOBUFS;444444- if (!p)440440+ sch = qdisc_alloc(dev, ops);441441+ if (IS_ERR(sch)) {442442+ err = PTR_ERR(sch);445443 goto err_out2;446446- memset(p, 0, size);447447- sch = (struct Qdisc *)(((unsigned long)p + QDISC_ALIGN_CONST)448448- & ~QDISC_ALIGN_CONST);449449- sch->padded = (char *)sch - (char *)p;444444+ }450445451451- INIT_LIST_HEAD(&sch->list);452452- skb_queue_head_init(&sch->q);453453-454454- if (handle == TC_H_INGRESS)446446+ if (handle == TC_H_INGRESS) {455447 sch->flags |= TCQ_F_INGRESS;456456-457457- sch->ops = ops;458458- sch->enqueue = ops->enqueue;459459- sch->dequeue = ops->dequeue;460460- sch->dev = dev;461461- dev_hold(dev);462462- atomic_set(&sch->refcnt, 1);463463- sch->stats_lock = &dev->queue_lock;464464- if (handle == 0) {448448+ handle = TC_H_MAKE(TC_H_INGRESS, 0);449449+ } else if (handle == 0) {465450 handle = qdisc_alloc_handle(dev);466451 err = -ENOMEM;467452 if (handle == 0)468453 goto err_out3;469454 }470455471471- if (handle == TC_H_INGRESS)472472- sch->handle =TC_H_MAKE(TC_H_INGRESS, 0);473473- else474474- sch->handle = handle;456456+ sch->handle = handle;475457476458 if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS-1])) == 0) {459459+#ifdef CONFIG_NET_ESTIMATOR460460+ if (tca[TCA_RATE-1]) {461461+ err = gen_new_estimator(&sch->bstats, &sch->rate_est,462462+ sch->stats_lock,463463+ tca[TCA_RATE-1]);464464+ if (err) {465465+ /*466466+ * Any broken qdiscs that would require467467+ * a ops->reset() here? The qdisc was never468468+ * in action so it shouldn't be necessary.469469+ */470470+ if (ops->destroy)471471+ ops->destroy(sch);472472+ goto err_out3;473473+ }474474+ }475475+#endif477476 qdisc_lock_tree(dev);478477 list_add_tail(&sch->list, &dev->qdisc_list);479478 qdisc_unlock_tree(dev);480479481481-#ifdef CONFIG_NET_ESTIMATOR482482- if (tca[TCA_RATE-1])483483- gen_new_estimator(&sch->bstats, &sch->rate_est,484484- sch->stats_lock, tca[TCA_RATE-1]);485485-#endif486480 return sch;487481 }488482err_out3:489483 dev_put(dev);484484+ kfree((char *) sch - sch->padded);490485err_out2:491486 module_put(ops->owner);492487err_out:493488 *errp = err;494494- if (p)495495- kfree(p);496489 return NULL;497490}498491
+54
net/sched/sch_blackhole.c
···11+/*22+ * net/sched/sch_blackhole.c Black hole queue33+ *44+ * This program is free software; you can redistribute it and/or55+ * modify it under the terms of the GNU General Public License66+ * as published by the Free Software Foundation; either version77+ * 2 of the License, or (at your option) any later version.88+ *99+ * Authors: Thomas Graf <tgraf@suug.ch>1010+ *1111+ * Note: Quantum tunneling is not supported.1212+ */1313+1414+#include <linux/config.h>1515+#include <linux/module.h>1616+#include <linux/types.h>1717+#include <linux/kernel.h>1818+#include <linux/netdevice.h>1919+#include <linux/skbuff.h>2020+#include <net/pkt_sched.h>2121+2222+static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch)2323+{2424+ qdisc_drop(skb, sch);2525+ return NET_XMIT_SUCCESS;2626+}2727+2828+static struct sk_buff *blackhole_dequeue(struct Qdisc *sch)2929+{3030+ return NULL;3131+}3232+3333+static struct Qdisc_ops blackhole_qdisc_ops = {3434+ .id = "blackhole",3535+ .priv_size = 0,3636+ .enqueue = blackhole_enqueue,3737+ .dequeue = blackhole_dequeue,3838+ .owner = THIS_MODULE,3939+};4040+4141+static int __init blackhole_module_init(void)4242+{4343+ return register_qdisc(&blackhole_qdisc_ops);4444+}4545+4646+static void __exit blackhole_module_exit(void)4747+{4848+ unregister_qdisc(&blackhole_qdisc_ops);4949+}5050+5151+module_init(blackhole_module_init)5252+module_exit(blackhole_module_exit)5353+5454+MODULE_LICENSE("GPL");