switch-linux/net/ipv4
Jon Maxwell 45caeaa5ac dccp/tcp: fix routing redirect race
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

  [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
  [] tcp_rcv_established at ffffffff81580b64
 [] tcp_v4_do_rcv at ffffffff8158b54a
 [] tcp_v4_rcv at ffffffff8158cd02
 [] ip_local_deliver_finish at ffffffff815668f4
 [] ip_local_deliver at ffffffff81566bd9
 [] ip_rcv_finish at ffffffff8156656d
 [] ip_rcv at ffffffff81566f06
 [] __netif_receive_skb_core at ffffffff8152b3a2
 [] __netif_receive_skb at ffffffff8152b608
 [] netif_receive_skb at ffffffff8152b690
 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
 [] net_rx_action at ffffffff8152bac2
 [] __do_softirq at ffffffff81084b4f
 [] call_softirq at ffffffff8164845c
 [] do_softirq at ffffffff81016fc5
 [] irq_exit at ffffffff81084ee5
 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320610 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13 21:55:47 -07:00
..
netfilter lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
af_inet.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
ah4.c
arp.c
cipso_ipv4.c
datagram.c
devinet.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
esp4.c
esp4_offload.c
fib_frontend.c net: route: add missing nla_policy entry for RTA_MARK attribute 2017-03-01 10:25:56 -08:00
fib_lookup.h
fib_rules.c
fib_semantics.c
fib_trie.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
fou.c
gre_demux.c
gre_offload.c
icmp.c
igmp.c
inet_connection_sock.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
inet_diag.c
inet_fragment.c
inet_hashtables.c
inet_timewait_sock.c
inetpeer.c
ip_forward.c
ip_fragment.c
ip_gre.c
ip_input.c
ip_options.c
ip_output.c udp: avoid ufo handling on IP payload compression packets 2017-03-09 18:28:42 -08:00
ip_sockglue.c
ip_tunnel.c
ip_tunnel_core.c
ip_vti.c
ipcomp.c
ipconfig.c
ipip.c
ipmr.c lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
Kconfig
Makefile
netfilter.c netfilter: use skb_to_full_sk in ip_route_me_harder 2017-02-28 12:49:36 +01:00
ping.c
proc.c
protocol.c
raw.c
raw_diag.c
route.c ipv4: mask tos for input route 2017-02-26 11:03:38 -05:00
syncookies.c
sysctl_net_ipv4.c
tcp.c tcp: fix potential double free issue for fastopen_req 2017-03-02 14:05:41 -08:00
tcp_bbr.c
tcp_bic.c
tcp_cdg.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c
tcp_diag.c
tcp_fastopen.c
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp/dccp: block BH for SYN processing 2017-03-01 15:03:31 -08:00
tcp_ipv4.c dccp/tcp: fix routing redirect race 2017-03-13 21:55:47 -07:00
tcp_lp.c
tcp_metrics.c
tcp_minisocks.c tcp: account for ts offset only if tsecr not zero 2017-02-22 16:35:58 -05:00
tcp_nv.c
tcp_offload.c
tcp_output.c
tcp_probe.c
tcp_rate.c
tcp_recovery.c
tcp_scalable.c
tcp_timer.c tcp: fix various issues for sockets morphing to listen state 2017-03-07 13:58:33 -08:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c
udp.c
udp_diag.c
udp_impl.h
udp_offload.c
udp_tunnel.c
udplite.c
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c