New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour of tokio tcp client #383

Closed
izderadicka opened this Issue May 30, 2018 · 11 comments

Comments

Projects
None yet
4 participants
@izderadicka

izderadicka commented May 30, 2018

I'm creating issue as result of this discussion on rust forum

I have been playing a bit with tokio - to see changes in latest version especially async fs interface and blocking code. I created simple server which would just send couple of lines of text and then very simple client. Now problem is with the client - if run with many concurrent connection it gets sometimes stuck - by meaning stuck it - (for 1000 or 2000 concurrent connections - more connections will hit open files limit) - it receives majority of messages and then stops - if I look into open connection with lsof -i4 -itp it should a number (~50) connection in established state. Client stay stuck forever in that state - even if server is closed. This problem was not experience in every client run, but in 30-40% of cases

To confirm that problem is in client I’ve done following:

  1. Run client against simple netcat script - while true; do echo Hey | nc -l 12345; done - I received indeed many connection refused errors, as netcat receives only one connection and then restarts, but still was able to see similar problem - client got stuck with several (this time less 5-10) pending established connections.

  2. Created a similar simple client in python asyncio (make it similar to tokio client) - python client worked like expected, it never got stuck, run smoothly (tokio client sometimes paused a half second or second) - for 1000 concurrent connections never had issue - for bit more it only complained about open files limit, but never issues similar to described above.

To continue with the weird story - initially tokio client only received bytes - I modified it (now commented in the code) to send small initial message - and it’s behavior changed - I started to get connection refused errors (although there was no reason why server should refuse them) - and it was even for small amount of concurrent connections - like 10 - about 20% of connections was refused - it grow slightly with increase of number of concurrent connections. And again I checked with python client - no problems here, even when it send same message.

So here is code for the client:

extern crate tokio;
extern crate futures;
extern crate tokio_io;
#[macro_use]
extern crate lazy_static;

use std::env;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::net::SocketAddr;
use tokio::prelude::*;
use tokio::net::TcpStream;
use std::time::{Instant};

const DEFAULT_CONNECTIONS:usize = 100;
const DEFAULT_SERVER:&str = "127.0.0.1:12345";

lazy_static! {
    static ref JOKES: AtomicUsize  = AtomicUsize::new(0);
    static ref BYTES: AtomicUsize = AtomicUsize::new(0);
}

fn main() {

    let count:usize = env::args().nth(1)
    .map(|x| x.parse().unwrap_or(DEFAULT_CONNECTIONS)).unwrap_or(DEFAULT_CONNECTIONS);
    let server = env::args().nth(2)
    .unwrap_or(DEFAULT_SERVER.into());
    let addr:SocketAddr =server.parse().unwrap();
    let mut rt = tokio::runtime::Runtime::new().unwrap();

    let start = Instant::now();
    for _i in 0..count {
        let client = TcpStream::connect(&addr)
        .map_err(|e| eprintln!("Connection Error: {:?}",e))
        .and_then(|socket| {
            // tokio::io::write_all(socket, b"hey\n\n")
            // .map_err(|e| eprintln!("Write error: {}",e))
            // .and_then(|(socket, _x)| {
            tokio::io::read_to_end(socket, vec![]).map(|(_, v)| {
                let prev = JOKES.fetch_add(1, Ordering::Relaxed);
                BYTES.fetch_add(v.len(), Ordering::Relaxed);
                println!("Got joke  {}", prev);
                })
                .map_err(|e| eprintln!("Read Error: {:?}",e))
        // })
        });
        rt.spawn(client);
    }

    rt.shutdown_on_idle().wait().unwrap();

    let dur = start.elapsed();

    println!("FINISHED - jokes {}, bytes {}, duration {}.{:03}", 
    JOKES.load(Ordering::Relaxed),
    BYTES.load(Ordering::Relaxed),
    dur.as_secs(),
    dur.subsec_nanos() / 1_000_000
    );
}

I’ve tried with stable and highly rustc and tokio 0.1.6. My platform is 64 bit linux (Ubuntu 16.04). For reference whole project in on github here . Here is the server - server logic is in first function - prepare_server and python reference client is there also python/client.py

Also based on discussion with Vitaly we found:

  • If code is amended with trace level logging problem disappears and client is running faster
  • Also I tried to recreate client with tokio-core - basically same problem
@tanriol

This comment has been minimized.

tanriol commented May 30, 2018

To continue with the weird story - initially tokio client only received bytes - I modified it (now commented in the code) to send small initial message - and it’s behavior changed - I started to get connection refused errors (although there was no reason why server should refuse them) - and it was even for small amount of concurrent connections - like 10 - about 20% of connections was refused - it grow slightly with increase of number of concurrent connections.

Was it really "connection refused"? Trying to reproduce this part, I see a number of (expected) "connection reset", but no "refused"...

@tanriol

This comment has been minimized.

tanriol commented May 30, 2018

To me this looks like (possibly unexpected) behavior of the Linux network stack. These connections are successfully established (SYN, SYN-ACK, ACK), but never actually returned from accept. Also, increasing /proc/sys/net/core/somaxconn made this effect disappear. The Python client is likely just too slow to trigger this effect.

@tanriol

This comment has been minimized.

tanriol commented May 30, 2018

TL;DR: This is probably not a Tokio bug, but some interacting kernel features.

First, there's the socket backlog parameter that's passed to listen. For mio and thus tokio it's 1024. However, the /proc/sys/net/core/somaxconn variable is a hard cap on this number and it's 128 by default, so the backlog is limited to 128.

Second, when the backlog is reached, according to @veithen's post, the final ACK packet of the 3-way handshake is ignored. Normally this should result in the server resending the SYN-ACK packet several times with exponential backoff (the client cannot do anything as from his POV everything's good, the connection is established and he's waiting for data).

Third, if this batch of SYN packets triggers the SYN cookies protection against SYN flood, the server does not actually remember the SYN-ACK packets is sends and, obviously, never retries them, so it cannot fix this connection either. Thus the connection becomes half-open from the client's side and unknown on the server's side.

Either increasing somaxconn to avoid the drop or disabling SYN cookies is enough to avoid the hanging connections problem (but possibly get other problems).

@vitalyd

This comment has been minimized.

vitalyd commented May 31, 2018

@tanriol, I won't rule out some (unexpected) kernel behavior, particularly since this is going over loopback but:

  1. The client connections that are awaiting the completion of the tcp handshake should be accepted - the server here is doing minimal work, spread over multiple executor threads and a dedicated I/O thread (i.e. stock tokio Runtime configuration). On the server, the client TcpStream is dropped as soon as the socket write is done, and the amount of data written is tiny. So IMO there's no good reason to not be able to handle a few thousand concurrent connections under these circumstances.
  2. As mentioned, adding TRACE level logging reliably makes the scenario complete, and it completes faster(!) than without this logging even in cases the original exercise completes without hiccups. This suggests some sort of timing change, although it's theoretically possible that logging code has side effects (almost certainly unintentional). If anything, TRACE level logging should slow down the server, and a backlog of connections/IO should grow.
  3. Python client completes without issue, although I'll take @izderadicka's word for it as I didn't try it myself.

One thing worth trying, for anyone with time and motivation, is to write a "raw" mio client and see what happens there.

@izderadicka

This comment has been minimized.

izderadicka commented May 31, 2018

@tanriol - re: conn reset - you'right - actual error (when socket also sends few bytes) is
Read Error: Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }

I messed it when putting together report, sorry.

@tanriol

This comment has been minimized.

tanriol commented May 31, 2018

@vitalyd A few more observation points I've based the idea above on. Note that we're talking not just about a few thousand concurrent connections, but about a burst of a thousand connection requests in some tens or at most hundreds of milliseconds.

  • I've had an explicit message in the system log:

TCP: request_sock_TCP: Possible SYN flooding on port 12345. Sending cookies. Check SNMP counters.

  • Monitoring the loopback with Wireshark showed that, on the lost connections, the only data exchange happening is the TCP three-way handshake (all three packets). After the handshake is complete, waiting for data is the expected state of the client. Notably, no FIN or RST happens either, so the socket is never "officially" closed from the server side, even when the server is stopped.

  • For a normal loopback connection (socat-to-socat) the ss utility shows both sides of the connection (both sockets) independently. For the lost connections, only the client socket is shown. Furthermore, the socket staying ESTABLISHED with one endpoint missing is already an anomalous behavior.

  • Monitoring the server socket calls with strace shows that, for an observed hanging connection from source port S, no connection from port S was ever accepted. After the three-way handshake, the server should have the socket in the listen queue and accept it from there, but the connection is not accepted (even though the incoming connections queue is correctly drained as starting one more connection afterwards works correctly).

The timing for the exchanges also looks significant. Here I'm looking at the packet log with Wireshark. There's the initial connection surge that, on my PC, takes about 84 ms, and then a batch of SYN retransmissions (so the original SYNs of these connections have been dropped somewhere) and the corresponding exchanges at 1.05 to 1.09 s since start. If SYN cookies are allowed, that's all; however, if they are disabled, I also see the next retransmission batches at 3.095 s, 7.148 s and 15.3 s (possibly even later ones, but in my case this was enough for the connection queue to drain successfully). If the trace logging were added to and activated for the client so that the SYN requests were more spread out, no drops occurred and no retransmissions were required, the client should have finished in some 300 ms or so, while in this case it waits for a whole second for the first retransmission wave. That's probably also the reason that the Python client is actually faster in wall time -- it's too slow to trigger the drops.

@izderadicka Connection reset is what you should receive if you send data to the server and the server closes its side of the connection without reading the data you sent from the socket.

@izderadicka

This comment has been minimized.

izderadicka commented May 31, 2018

Increasing somaxconn from 128 (default) to 1024 help on my side - now client is not hanging and works fast up to 2000 connections, when it starts to complain about too many open files, which is expected. Even in those cases it's fast.

The explanation of @tanriol makes sense to me (although I'm no TCP expert) - it's combination of server socket dropping connections from backlog and SYN flood - I also had these TCP: request_sock_TCP: Possible SYN flooding on port 12345. Sending cookies. Check SNMP counters. messages. In normal circumstances it cannot be considered that one client will make 1000 connection in fraction of second (unless benchmarking server or trying DoS attack).

Now (with increased somaxconn) client works reliably - 100 connections 10-30ms , 1000 connections 70-100ms, 2000 connections - 150 - 200ms.

So behavior which looked weird to me makes some sense now and it's matter of kernel. Also understand why connections are reset if client sends short message, but server does not read them - it succeed to send it's part and close connection before client finishes send of his part (even when it is just few bytes).

Thanks guys for educating me, I guess issue can be closed, unless you feel you can do something about it?

@vitalyd

This comment has been minimized.

vitalyd commented May 31, 2018

@tanriol, I'm convinced - terrific investigative work! 👍

@carllerche

This comment has been minimized.

Member

carllerche commented May 31, 2018

That does indeed seem like it was super fun to debug.

Are there any steps for Tokio to take at this point?

@tanriol

This comment has been minimized.

tanriol commented May 31, 2018

Are there any steps for Tokio to take at this point?

I see none, after all, this is an aspect of the OS settings, not something individual applications should mess with by default.

@carllerche

This comment has been minimized.

Member

carllerche commented May 31, 2018

Thanks for debugging again 🌮

@carllerche carllerche closed this May 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment