Skip to content

Commit be0502a

Browse files
Florian Westphalummakynes
authored andcommitted
netfilter: conntrack: tcp: only close if RST matches exact sequence
TCP resets cause instant transition from established to closed state provided the reset is in-window. Endpoints that implement RFC 5961 require resets to match the next expected sequence number. RST segments that are in-window (but that do not match RCV.NXT) are ignored, and a "challenge ACK" is sent back. Main problem for conntrack is that its a middlebox, i.e. whereas an end host might have ACK'd SEQ (and would thus accept an RST with this sequence number), conntrack might not have seen this ACK (yet). Therefore we can't simply flag RSTs with non-exact match as invalid. This updates RST processing as follows: 1. If the connection is in a state other than ESTABLISHED, nothing is changed, RST is subject to normal in-window check. 2. If the RSTs sequence number either matches exactly RCV.NXT, connection state moves to CLOSE. 3. The same applies if the RST sequence number aligns with a previous packet in the same direction. In all other cases, the connection remains in ESTABLISHED state. If the normal-in-window check passes, the timeout will be lowered to that of CLOSE. If the peer sends a challenge ack, connection timeout will be reset. If the challenge ACK triggers another RST (RST was valid after all), this 2nd RST will match expected sequence and conntrack state changes to CLOSE. If no challenge ACK is received, the connection will time out after CLOSE seconds (10 seconds by default), just like without this patch. Packetdrill test case: 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Receive a segment. 0.210 < P. 1:1001(1000) ack 1 win 46 0.210 > . 1:1(0) ack 1001 // Application writes 1000 bytes. 0.250 write(4, ..., 1000) = 1000 0.250 > P. 1:1001(1000) ack 1001 // First reset, old sequence. Conntrack (correctly) considers this // invalid due to failed window validation (regardless of this patch). 0.260 < R 2:2(0) ack 1001 win 260 // 2nd reset, but too far ahead sequence. Same: correctly handled // as invalid. 0.270 < R 99990001:99990001(0) ack 1001 win 260 // in-window, but not exact sequence. // Current Linux kernels might reply with a challenge ack, and do not // remove connection. // Without this patch, conntrack state moves to CLOSE. // With patch, timeout is lowered like CLOSE, but connection stays // in ESTABLISHED state. 0.280 < R 1010:1010(0) ack 1001 win 260 // Expect challenge ACK 0.281 > . 1001:1001(0) ack 1001 win 501 // With or without this patch, RST will cause connection // to move to CLOSE (sequence number matches) // 0.282 < R 1001:1001(0) ack 1001 win 260 // ACK 0.300 < . 1001:1001(0) ack 1001 win 257 // more data could be exchanged here, connection // is still established // Client closes the connection. 0.610 < F. 1001:1001(0) ack 1001 win 260 0.650 > . 1001:1001(0) ack 1002 // Close the connection without reading outstanding data 0.700 close(4) = 0 // so one more reset. Will be deemed acceptable with patch as well: // connection is already closing. 0.701 > R. 1001:1001(0) ack 1002 win 501 // End packetdrill test case. With patch, this generates following conntrack events: [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] Without patch, first RST moves connection to close, whereas socket state does not change until FIN is received. [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
1 parent f25a9b8 commit be0502a

File tree

1 file changed

+40
-10
lines changed

1 file changed

+40
-10
lines changed

net/netfilter/nf_conntrack_proto_tcp.c

Lines changed: 40 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -828,6 +828,12 @@ static noinline bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
828828
return true;
829829
}
830830

831+
static bool nf_conntrack_tcp_established(const struct nf_conn *ct)
832+
{
833+
return ct->proto.tcp.state == TCP_CONNTRACK_ESTABLISHED &&
834+
test_bit(IPS_ASSURED_BIT, &ct->status);
835+
}
836+
831837
/* Returns verdict for packet, or -1 for invalid. */
832838
int nf_conntrack_tcp_packet(struct nf_conn *ct,
833839
struct sk_buff *skb,
@@ -1030,16 +1036,38 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
10301036
new_state = TCP_CONNTRACK_ESTABLISHED;
10311037
break;
10321038
case TCP_CONNTRACK_CLOSE:
1033-
if (index == TCP_RST_SET
1034-
&& (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET)
1035-
&& before(ntohl(th->seq), ct->proto.tcp.seen[!dir].td_maxack)) {
1036-
/* Invalid RST */
1037-
spin_unlock_bh(&ct->lock);
1038-
nf_ct_l4proto_log_invalid(skb, ct, "invalid rst");
1039-
return -NF_ACCEPT;
1039+
if (index != TCP_RST_SET)
1040+
break;
1041+
1042+
if (ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_MAXACK_SET) {
1043+
u32 seq = ntohl(th->seq);
1044+
1045+
if (before(seq, ct->proto.tcp.seen[!dir].td_maxack)) {
1046+
/* Invalid RST */
1047+
spin_unlock_bh(&ct->lock);
1048+
nf_ct_l4proto_log_invalid(skb, ct, "invalid rst");
1049+
return -NF_ACCEPT;
1050+
}
1051+
1052+
if (!nf_conntrack_tcp_established(ct) ||
1053+
seq == ct->proto.tcp.seen[!dir].td_maxack)
1054+
break;
1055+
1056+
/* Check if rst is part of train, such as
1057+
* foo:80 > bar:4379: P, 235946583:235946602(19) ack 42
1058+
* foo:80 > bar:4379: R, 235946602:235946602(0) ack 42
1059+
*/
1060+
if (ct->proto.tcp.last_index == TCP_ACK_SET &&
1061+
ct->proto.tcp.last_dir == dir &&
1062+
seq == ct->proto.tcp.last_end)
1063+
break;
1064+
1065+
/* ... RST sequence number doesn't match exactly, keep
1066+
* established state to allow a possible challenge ACK.
1067+
*/
1068+
new_state = old_state;
10401069
}
1041-
if (index == TCP_RST_SET
1042-
&& ((test_bit(IPS_SEEN_REPLY_BIT, &ct->status)
1070+
if (((test_bit(IPS_SEEN_REPLY_BIT, &ct->status)
10431071
&& ct->proto.tcp.last_index == TCP_SYN_SET)
10441072
|| (!test_bit(IPS_ASSURED_BIT, &ct->status)
10451073
&& ct->proto.tcp.last_index == TCP_ACK_SET))
@@ -1055,7 +1083,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
10551083
* segments we ignored. */
10561084
goto in_window;
10571085
}
1058-
/* Just fall through */
1086+
break;
10591087
default:
10601088
/* Keep compilers happy. */
10611089
break;
@@ -1090,6 +1118,8 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
10901118
if (ct->proto.tcp.retrans >= tn->tcp_max_retrans &&
10911119
timeouts[new_state] > timeouts[TCP_CONNTRACK_RETRANS])
10921120
timeout = timeouts[TCP_CONNTRACK_RETRANS];
1121+
else if (unlikely(index == TCP_RST_SET))
1122+
timeout = timeouts[TCP_CONNTRACK_CLOSE];
10931123
else if ((ct->proto.tcp.seen[0].flags | ct->proto.tcp.seen[1].flags) &
10941124
IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED &&
10951125
timeouts[new_state] > timeouts[TCP_CONNTRACK_UNACK])

0 commit comments

Comments
 (0)