Skip to content

Commit

Permalink
tcp/dccp: try to not exhaust ip_local_port_range in connect()
Browse files Browse the repository at this point in the history
A long standing problem on busy servers is the tiny available TCP port
range (/proc/sys/net/ipv4/ip_local_port_range) and the default
sequential allocation of source ports in connect() system call.

If a host is having a lot of active TCP sessions, chances are
very high that all ports are in use by at least one flow,
and subsequent bind(0) attempts fail, or have to scan a big portion of
space to find a slot.

In this patch, I changed the starting point in __inet_hash_connect()
so that we try to favor even [1] ports, leaving odd ports for bind()
users.

We still perform a sequential search, so there is no guarantee, but
if connect() targets are very different, end result is we leave
more ports available to bind(), and we spread them all over the range,
lowering time for both connect() and bind() to find a slot.

This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range
is even, ie if start/end values have different parity.

Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to
32768 - 60999 (instead of 32768 - 61000)

There is no change on security aspects here, only some poor hashing
schemes could be eventually impacted by this change.

[1] : The odd/even property depends on ip_local_port_range values parity

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
Eric Dumazet authored and davem330 committed May 27, 2015
1 parent 837b995 commit 07f4c90
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 6 deletions.
8 changes: 5 additions & 3 deletions Documentation/networking/ip-sysctl.txt
Original file line number Diff line number Diff line change
Expand Up @@ -751,8 +751,10 @@ IP Variables:
ip_local_port_range - 2 INTEGERS
Defines the local port range that is used by TCP and UDP to
choose the local port. The first number is the first, the
second the last local port number. The default values are
32768 and 61000 respectively.
second the last local port number.
If possible, it is better these numbers have different parity.
(one even and one odd values)
The default values are 32768 and 60999 respectively.

ip_local_reserved_ports - list of comma separated ranges
Specify the ports which are reserved for known third-party
Expand All @@ -775,7 +777,7 @@ ip_local_reserved_ports - list of comma separated ranges
ip_local_port_range, e.g.:

$ cat /proc/sys/net/ipv4/ip_local_port_range
32000 61000
32000 60999
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
8080,9148

Expand Down
2 changes: 1 addition & 1 deletion net/ipv4/af_inet.c
Original file line number Diff line number Diff line change
Expand Up @@ -1595,7 +1595,7 @@ static __net_init int inet_init_net(struct net *net)
*/
seqlock_init(&net->ipv4.ip_local_ports.lock);
net->ipv4.ip_local_ports.range[0] = 32768;
net->ipv4.ip_local_ports.range[1] = 61000;
net->ipv4.ip_local_ports.range[1] = 60999;

seqlock_init(&net->ipv4.ping_group_range.lock);
/*
Expand Down
10 changes: 8 additions & 2 deletions net/ipv4/inet_hashtables.c
Original file line number Diff line number Diff line change
Expand Up @@ -502,8 +502,14 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
inet_get_local_port_range(net, &low, &high);
remaining = (high - low) + 1;

/* By starting with offset being an even number,
* we tend to leave about 50% of ports for other uses,
* like bind(0).
*/
offset &= ~1;

local_bh_disable();
for (i = 1; i <= remaining; i++) {
for (i = 0; i < remaining; i++) {
port = low + (i + offset) % remaining;
if (inet_is_local_reserved_port(net, port))
continue;
Expand Down Expand Up @@ -547,7 +553,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
return -EADDRNOTAVAIL;

ok:
hint += i;
hint += (i + 2) & ~1;

/* Head lock still held and bh's disabled */
inet_bind_hash(sk, tb, port);
Expand Down

0 comments on commit 07f4c90

Please sign in to comment.