Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Successive HTTP requests in lws 2.4 exhausting server ah pool #1249

Closed
rdumusc opened this issue Apr 23, 2018 · 20 comments
Closed

Successive HTTP requests in lws 2.4 exhausting server ah pool #1249

rdumusc opened this issue Apr 23, 2018 · 20 comments

Comments

@rdumusc
Copy link

rdumusc commented Apr 23, 2018

We are facing an issue with lws 2.4 in unit tests that validate the HTTP client-server communication of our C++ wrapper library. Our tests create a lws client-server pair to make a succession of simple HTTP requests (no SSL or http/2) from the client to the server.

Until lws 2.3, the test cases execute extremely fast (<< 1 sec). But since precisely this commit introducing http2 support: 904a9c0 , execution is very slow (~1 min) and the observed behavior of the unit tests is something like the following:

----BEGIN TEST------
GET /url
GET /registry
GET /ABC
GET /
GET //
--- progression stops for a few seconds, looping around lws_service() for both client and server until the HTTP connection is established ---
GET /abc/def/
etc...
----END TEST------

This keeps happening with v2.4.1, v2.4.2 and v2.4-stable.

Enabling libwebsocket's log output gave a useful indication:

lws_adopt_descriptor_vhost: waiting for ah

Indeed, it turns out that the ah pool of the lws server become exhausted after max_http_header_pool (default=5) previous HTTP requests have been made by the client. Execution continues after some internal timeout (?) frees the resources of previous connections.

More debugging allowed us to locate the exact change causing the connections to no longer be closed. It is the following 4 lines (2610-2613) in lib/server.c: 904a9c0#diff-9621bd8daa5d0a52d5af514af312fbaaR2610
Reverting back to the previous behavior of "goto fail;" here fixes the problem; Although we don't know enough about libwebsockets internals to suggest this as a proper fix.

As a workaround, we observed that forcing a return -1; from LWS_CALLBACK_HTTP_WRITEABLE in the server code also fixes the problem. However, bypassing the lws_http_transaction_completed() check as is done in test-server-http.c is unlikely to be the correct solution. We were otherwise unable to identify an appropriate change in the client code based on the existing examples and documentation. Even recreating the client context with
lws_context_destroy() + lws_create_context() before each lws_client_connect_via_info() does not make a difference.

So the question is:
a) Could the change in behavior described above be caused by a regression in libwebsockets, or:
b) Is there anything specific to do in either the client or server code to properly close the HTTP connection(s) starting from v2.4?

@lws-team
Copy link
Member

What happens if you just increase the amount of ah available with max_http_header_pool to suit your situation?

4 (the default) has no special meaning.

	short max_http_header_pool;
	/**< CONTEXT: The max number of connections with http headers that
	 * can be processed simultaneously (the corresponding memory is
	 * allocated for the lifetime of the context).  If the pool is
	 * busy new incoming connections must wait for accept until one
	 * becomes free. */

@rdumusc
Copy link
Author

rdumusc commented Apr 23, 2018

Increasing the max_http_header_pool to a value bigger than the number of requests works around the problem as well. But I think that's just hiding the fact that resources are somehow not being freed up correctly.

@lws-team
Copy link
Member

I see. Well, I will try to reproduce it tomorrow. attack.sh which is in the tree does pretty much what you describe and I tested v2.4 with that without problems before its original release. What are you using to do the tests, also lws or something else?

@rdumusc
Copy link
Author

rdumusc commented Apr 23, 2018

Thanks for your help. Yes, the unit tests are using lws for both the client and server side. This can be part of the reason, but your question got me thinking. I just tested again our server inside a more complex application with Firefox as a client, and observed similar issues of ah pool exhaustion (since lws 2.4 only).

I'm sure that the reference lws server has been well tested so there must be a subtle difference in our server code that leads to this behavior since v2.4; Unfortunately, I've been looking into this for a long time already and I really can't figure out what that could be.

In case you would like to run the tests I've been using:

git clone -b debugging --recursive https://github.com/rdumusc/Rockets.git
mkdir Rockets/build
cd Rockets/build
cmake ..
make http-slow
./tests/http-slow

@lws-team
Copy link
Member

Hm the fact there's no logging from lws cited and if I understand it, no problem with lws test apps either, along with doing bisect-type a-b testing on lws as "the problem" without actually identifying what the problem really is, reminds me of other 'issues' where the problem turned out to be the cut-and-pasted http callback code from lws samples in the user app doesn't deal with the end of http transactions properly.

Http2 requires this to work because a single connection may carry many ongoing and completed transactions. Even http/1.1 pipelining needs it to work.

In v2.4 there's a 'dummy' callback exported that you can either use with mounts and rely on completely to get this boilerplate right for you, or have your callback call through to. If this code came from older lws, you should review it against the http stuff in the dummy calkback in lib/context.c and / or the same code in the v2.4 test app.

@rdumusc
Copy link
Author

rdumusc commented Apr 23, 2018

Fair enough, I'll try and write down a simpler reproducer from scratch this week. Hopefully it will turn out that I missed something obvious. Thank you for the pointers, but excluding the CGI/PROXY parts (which we're not using), the dummy callback does little else than:

case LWS_CALLBACK_HTTP:
	if (lws_return_http_status(wsi, HTTP_STATUS_NOT_FOUND, NULL))
		return -1;
	if (lws_http_transaction_completed(wsi))
		return -1;
	break;

I'm not seeing any relevant changes in the examples for v2.4 either yet, but I'll keep looking.

@lws-team
Copy link
Member

...and...

#if !defined(LWS_NO_SERVER)
	case LWS_CALLBACK_HTTP_FILE_COMPLETION:
		if (lws_http_transaction_completed(wsi))
			return -1;
		break;
#endif

https://github.com/warmcat/libwebsockets/blob/master/lib/context.c#L301

@rdumusc
Copy link
Author

rdumusc commented Apr 24, 2018

Correct, I checked that as well but this callback is never occurring as we're not using the file serve built-in functionality.

The http_callback is the following:

static int callback_http(lws* wsi, const lws_callback_reasons reason,
                         void* user, void* in, const size_t len)
{
    std::cerr << to_string(reason) << std::endl;

    // Protocol may be null during the initial callbacks
    if (auto protocol = lws_get_protocol(wsi))
    {
        auto impl = static_cast<Server::Impl*>(protocol->user);
        const auto& handler = impl->handler;

        switch (reason)
        {
        case LWS_CALLBACK_HTTP:
            impl->connections.emplace(wsi,
                                      http::Connection{wsi, (const char*)in});
            return handler.handleNewRequest(impl->connections.at(wsi));
        case LWS_CALLBACK_HTTP_BODY:
            return handler.handleData(impl->connections.at(wsi),
                                      (const char*)in, len);
        case LWS_CALLBACK_HTTP_BODY_COMPLETION:
            return handler.respondToRequest(impl->connections.at(wsi));

        case LWS_CALLBACK_HTTP_WRITEABLE:
            if (impl->connections.count(wsi))
                return handler.respondToRequest(impl->connections.at(wsi));
            break;

#if LWS_LIBRARY_VERSION_NUMBER >= 2001000
        case LWS_CALLBACK_HTTP_DROP_PROTOCOL: // fall-through
#endif
        case LWS_CALLBACK_CLOSED_HTTP:
            impl->connections.erase(wsi);
            break;

        case LWS_CALLBACK_ADD_POLL_FD:
            impl->pollDescriptors.add(static_cast<lws_pollargs*>(in));
            break;
        case LWS_CALLBACK_DEL_POLL_FD:
            impl->pollDescriptors.remove(static_cast<lws_pollargs*>(in));
            break;
        case LWS_CALLBACK_CHANGE_MODE_POLL_FD:
            impl->pollDescriptors.update(static_cast<lws_pollargs*>(in));
            break;
        default:
            break;
        }
    }
    return lws_callback_http_dummy(wsi, reason, user, in, len);
}

Note that the POLL_FD part is irrelevant in this unit test case and can be commented out. All the functions for the HTTP callbacks (except HTTP_BODY) return -1 on lws_http_transaction_completed() after writing the response. Except that since v2.4, lws_http_transaction_completed() no longer closes the connection.

For what it's worth, here is a diff between the output of a single test testing the [GET, POST, PUT, PATCH, DELETE, OPTIONS] methods in order, comparing v2.3-stable and v2.4-stable:
http://www.mergely.com/X5Q5SPvs/

But I'll try to work on a minimal, self-contained code example that can be used to debug this.

@lws-team
Copy link
Member

If it works on the test apps, it sounds like it requires something specifically your code is doing different.

@lws-team
Copy link
Member

I'll be happy to look at a sample showing the problem.

@rdumusc
Copy link
Author

rdumusc commented Apr 25, 2018

Sure, please see below. Hopefully you can point me in the right direction, because I'm at a loss for ideas. Here is a minimal HTTP server that I put together by looking at test-server-http.c and the newer minimal-http-server examples.
If you open http://localhost:7681 in your browser, it returns an html index page which then loads 6 images using javascript. This example works as expected with lws 2.3-stable. If you try to "attack" it by refreshing the page aggressively (holding down <SHIFT>+<F5>), it reloads in < 5ms an infinite number of times. If you do the same with lws 2.4-stable, the server gets rapidly blocked, and the page takes 5-10s to reload.

#include <libwebsockets.h>
#include <signal.h>
#include <string.h>

static int interrupted;

const char *htmlPage{
    R"HTMLCONTENT(<html>
    <head>
        <meta charset="UTF-8">
        <title>Minimal Page</title>
        <script>
            function getImage(targetImg, sourceUrl)
            {
                var req = new XMLHttpRequest();
                req.onreadystatechange = function() {
                    if (req.readyState == 4 && req.status == 200) {
                        document.getElementById(targetImg).src = req.responseText;
                    }
                };
                req.open("GET", sourceUrl, true); // true for asynchronous
                req.send(null);
            }
            function getImages()
            {
                getImage("pngImg1", "img1.png");
                getImage("pngImg2", "img2.png");
                getImage("pngImg3", "img3.png");
                getImage("pngImg4", "img1.png");
                getImage("pngImg5", "img2.png");
                getImage("pngImg6", "img3.png");
            }
        </script>
    </head>
    <body onload="getImages()">
        <h1>Minimal Page</h1>
        <img id="pngImg1" />
        <img id="pngImg2" />
        <img id="pngImg3" />
        <img id="pngImg4" />
        <img id="pngImg5" />
        <img id="pngImg6" />
    </body>
</html>
)HTMLCONTENT"};

const char *pngImageBlue =
    "data:image/"
    "png;base64,"
    "iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAYAAABw4pVUAAAAnUlEQVR42u3RAQ0AAAgDIG1m/"
    "1K3hnNQgZ5KijNaiBCECEGIEIQIQYgQIUIQIgQhQhAiBCFCEIIQIQgRghAhCBGCEIQIQYgQhAh"
    "BiBCEIEQIQoQgRAhChCAEIUIQIgQhQhAiBCEIEYIQIQgRghAhCEGIEIQIQYgQhAhBCEKEIEQIQ"
    "oQgRAhCECIEIUIQIgQhQhAiRIgQhAhBiBCEfLc/reCdqegczgAAAABJRU5ErkJggg==";

const char *pngImageGreen =
    "data:image/"
    "png;base64,"
    "iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAYAAABw4pVUAAAAnUlEQVR42u3RAQ0AAAgDIO1q/"
    "wq3hnNQgZ5UijNaiBCECEGIEIQIQYgQIUIQIgQhQhAiBCFCEIIQIQgRghAhCBGCEIQIQYgQhAh"
    "BiBCEIEQIQoQgRAhChCAEIUIQIgQhQhAiBCEIEYIQIQgRghAhCEGIEIQIQYgQhAhBCEKEIEQIQ"
    "oQgRAhCECIEIUIQIgQhQhAiRIgQhAhBiBCEfLc7GO6tdYHtTQAAAABJRU5ErkJggg==";
const char *pngImageRed =
    "data:image/png;base64,"
    "iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAYAAABw4pVUAAAApElEQVR42u3RAQ0AAAjDMO5fNC"
    "CDkC5z0HTVrisFCBABASIgQAQEiIAAAQJEQIAICBABASIgQAREQIAICBABASIgQAREQIAICBAB"
    "ASIgQAREQIAICBABASIgQAREQIAICBABASIgQAREQIAICBABASIgQAREQIAICBABASIgQAREQI"
    "AICBABASIgQAREQIAICBABASIgQAQECBAgAgJEQIAIyPcGFY7HnV2aPXoAAAAASUVORK5CYII="
    "";

int getContentIndex(const char *in)
{
    if (!strcmp((const char *)in, "/img1.png"))
        return 1;
    if (!strcmp((const char *)in, "/img2.png"))
        return 2;
    if (!strcmp((const char *)in, "/img3.png"))
        return 3;
    return 0;
}

static int callback_http(struct lws *wsi, enum lws_callback_reasons reason,
                         void *user, void *in, size_t len)
{
    switch (reason)
    {
    case LWS_CALLBACK_HTTP:
    {
        uint8_t buf[LWS_PRE + 256];
        uint8_t *start = &buf[LWS_PRE];
        uint8_t *p = start;
        uint8_t *end = &buf[sizeof(buf) - 1];

        lwsl_user("lws_http_serve: %s\n", (const char *)in);

        if (!lws_hdr_total_length(wsi, WSI_TOKEN_GET_URI))
            /* not a GET */
            break;

        /* Tell writable callback what content to send. */
        static_cast<int *>(user)[0] = getContentIndex((const char *)in);

        if (!strcmp((const char *)in, "/"))
        {
            /* Write headers */
            if (lws_add_http_header_status(wsi, HTTP_STATUS_OK, &p, end))
                return 1;

            if (lws_add_http_header_by_token(wsi, WSI_TOKEN_HTTP_CONTENT_TYPE,
                                             (unsigned char *)"text/html", 9,
                                             &p, end))
                return 1;

            if (lws_add_http_header_content_length(wsi, strlen(htmlPage), &p,
                                                   end))
                return 1;

            if (lws_finalize_http_header(wsi, &p, end))
                return -1;

            const int n =
                lws_write(wsi, start, p - start,
                          LWS_WRITE_HTTP_HEADERS /*| LWS_WRITE_H2_STREAM_END*/);
            if (n < 0)
                return -1;

            lws_callback_on_writable(wsi);
            return 0;
        }
        else if (static_cast<int *>(user)[0])
        {
            /* Write headers */
            if (lws_add_http_header_status(wsi, HTTP_STATUS_OK, &p, end))
                return 1;

            if (lws_add_http_header_by_token(
                    wsi, WSI_TOKEN_HTTP_CONTENT_TYPE,
                    (unsigned char *)"application/base64", 18, &p, end))
                return 1;

            if (lws_add_http_header_content_length(wsi, strlen(pngImageBlue),
                                                   &p, end))
                return 1;

            if (lws_finalize_http_header(wsi, &p, end))
                return -1;

            const int n =
                lws_write(wsi, start, p - start,
                          LWS_WRITE_HTTP_HEADERS /*| LWS_WRITE_H2_STREAM_END*/);
            if (n < 0)
                return -1;

            lws_callback_on_writable(wsi);
            return 0;
        }
        /* not our URL, let lws_callback_http_dummy() return 404 */
    }
    break;

    case LWS_CALLBACK_HTTP_WRITEABLE:
    {
        uint8_t buffer[LWS_PRE + 8192];
        uint8_t *start = &buffer[LWS_PRE];
        size_t length = 0;

        switch (static_cast<int *>(user)[0])
        {
        case 0:
            length = strlen(htmlPage);
            memcpy(start, htmlPage, strlen(htmlPage));
            break;
        case 1:
            length = strlen(pngImageBlue);
            memcpy(start, pngImageBlue, length);
            break;
        case 2:
            length = strlen(pngImageGreen);
            memcpy(start, pngImageGreen, length);
            break;
        case 3:
            length = strlen(pngImageRed);
            memcpy(start, pngImageRed, length);
            break;
        default:
            lwsl_err("invalid content index\n");
            return -1;
        }

        /* Write body */
        const int n = lws_write(wsi, start, length, LWS_WRITE_HTTP);
        if (n < 0)
        {
            lwsl_err("write failed\n");
            /* write failed, close conn */
            return -1;
        }
        if (lws_http_transaction_completed(wsi))
            return -1;
        return 0;
    }
    break;

    default:
        break;
    }

    return lws_callback_http_dummy(wsi, reason, user, in, len);
}

static struct lws_protocols protocols[] = {
    {"http", callback_http, 4, 0, 0, 0, 0},
    {NULL, NULL, 0, 0, 0, 0, 0} /* terminator */
};

void sigint_handler(int /*sig*/)
{
    interrupted = 1;
}

int main(int /*argc*/, const char ** /*argv*/)
{
    struct lws_context_creation_info info;
    struct lws_context *context;
    int n = 0;
    int logs = LLL_USER | LLL_ERR | LLL_WARN | LLL_NOTICE /*| LLL_INFO*/;

    signal(SIGINT, sigint_handler);

    lws_set_log_level(logs, NULL);
    lwsl_user("LWS minimal http server GET | visit http://localhost:7681\n");

    memset(&info, 0, sizeof info); /* otherwise uninitialized garbage */
    info.port = 7681;
    info.protocols = protocols;
    info.max_http_header_pool = 4;

    context = lws_create_context(&info);
    if (!context)
    {
        lwsl_err("lws init failed\n");
        return 1;
    }

    while (n >= 0 && !interrupted)
        n = lws_service(context, 1000);

    lws_context_destroy(context);

    return 0;
}

@rdumusc
Copy link
Author

rdumusc commented May 3, 2018

@lws-team have you had a chance to look at the test code I posted above?

@lws-team
Copy link
Member

lws-team commented May 4, 2018

Sorry these things eat up a lot of time to understand.

So your test app is a bit unusual... I guess you are trying to confirm the ah wait queue processing which is a Good Thing. But your test procedure doesn't do what you think it does.

  1. you restrict the number to available ah to exactly 4. In the testing here to make it more clear what it happening, I restrict it to 1.

  2. your XHR js then forces 6 separate simultaneous http/1 connections to the server, in addition to the html one. Again to make the situation clearer I reduced this to 1 in your js.

With Chrome 67, hitting Ctrl-Shift R:

  1. He makes 5 x tcp connections to the server within a few 100us... that's not what you thought was happening, right :-)
[2018/05/04 10:05:55:9714] INFO: lws_get_or_create_peer: creating new peer
[2018/05/04 10:05:55:9714] INFO: lws_header_table_attach: wsi 0xfc60c0: ah (nil) (tsi 0, count = 0) in
[2018/05/04 10:05:55:9715] INFO: _lws_create_ah: created ah 0xfc6580 (size 4096): pool length 1
[2018/05/04 10:05:55:9715] INFO: lws_header_table_attach: did attach wsi 0xfc60c0: ah 0xfc6580: count 1 (on exit)
[2018/05/04 10:05:55:9715] INFO: rops_handle_POLLIN_listen: new wsi 0xfc60c0: wsistate 0x20000015, role_ops h1
[2018/05/04 10:05:55:9715] INFO: lws_header_table_attach: wsi 0xfc7960: ah (nil) (tsi 0, count = 1) in
[2018/05/04 10:05:55:9715] INFO: _lws_header_ensure_we_are_on_waiting_list: wsi: 0xfc7960
[2018/05/04 10:05:55:9715] INFO: lws_adopt_descriptor_vhost: waiting for ah
[2018/05/04 10:05:55:9715] INFO: rops_handle_POLLIN_listen: new wsi 0xfc7960: wsistate 0x20000015, role_ops h1
[2018/05/04 10:05:55:9716] INFO: lws_header_table_attach: wsi 0xfc7e20: ah (nil) (tsi 0, count = 1) in
[2018/05/04 10:05:55:9716] INFO: _lws_header_ensure_we_are_on_waiting_list: wsi: 0xfc7e20
[2018/05/04 10:05:55:9716] INFO: lws_adopt_descriptor_vhost: waiting for ah
[2018/05/04 10:05:55:9717] INFO: rops_handle_POLLIN_listen: new wsi 0xfc7e20: wsistate 0x20000015, role_ops h1
[2018/05/04 10:05:55:9718] INFO: lws_header_table_attach: wsi 0xfc82e0: ah (nil) (tsi 0, count = 1) in
[2018/05/04 10:05:55:9718] INFO: _lws_header_ensure_we_are_on_waiting_list: wsi: 0xfc82e0
[2018/05/04 10:05:55:9718] INFO: lws_adopt_descriptor_vhost: waiting for ah
[2018/05/04 10:05:55:9718] INFO: rops_handle_POLLIN_listen: new wsi 0xfc82e0: wsistate 0x20000015, role_ops h1
[2018/05/04 10:05:55:9719] INFO: lws_header_table_attach: wsi 0xfc87a0: ah (nil) (tsi 0, count = 1) in
[2018/05/04 10:05:55:9719] INFO: _lws_header_ensure_we_are_on_waiting_list: wsi: 0xfc87a0
[2018/05/04 10:05:55:9719] INFO: lws_adopt_descriptor_vhost: waiting for ah
[2018/05/04 10:05:55:9719] INFO: rops_handle_POLLIN_listen: new wsi 0xfc87a0: wsistate 0x20000015, role_ops h1

Then a request comes on wsi 0xfc60c0 (source port localhost.50096)

[2018/05/04 10:05:55:9738] INFO: lws_handshake_server: parsed count 426
[2018/05/04 10:05:55:9738] INFO: lws_select_vhost: vhost match to default based on port 7681
[2018/05/04 10:05:55:9738] INFO: lws_handshake_server: 0xfc60c0: No upgrade
[2018/05/04 10:05:55:9738] INFO: Method: 'GET' (0), request for '/'
[2018/05/04 10:05:55:9739] INFO: no hit
[2018/05/04 10:05:55:9739] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0xfc60c0
[2018/05/04 10:05:55:9739] USER: lws_http_serve: /
[2018/05/04 10:05:55:9739] INFO: lws_add_http_header_content_length: wsi 0xfc60c0: tx_content_length/remain 1234
10:05:55.973765 IP localhost.50096 > localhost.7681: Flags [P.], seq 1:427, ack 1, win 342, options [nop,nop,TS val 561972738 ecr 561972736], length 426
	0x0000:  4500 01de de5d 4000 4006 5cba 7f00 0001  E....]@.@.\.....
	0x0010:  7f00 0001 c3b0 1e01 a9de a39d 42f9 a2e6  ............B...
	0x0020:  8018 0156 ffd2 0000 0101 080a 217f 0602  ...V........!...
	0x0030:  217f 0600 4745 5420 2f20 4854 5450 2f31  !...GET./.HTTP/1
	0x0040:  2e31 0d0a 486f 7374 3a20 6c6f 6361 6c68  .1..Host:.localh
	0x0050:  6f73 743a 3736 3831 0d0a 436f 6e6e 6563  ost:7681..Connec
	0x0060:  7469 6f6e 3a20 6b65 6570 2d61 6c69 7665  tion:.keep-alive
	0x0070:  0d0a 5072 6167 6d61 3a20 6e6f 2d63 6163  ..Pragma:.no-cac
	0x0080:  6865 0d0a 4361 6368 652d 436f 6e74 726f  he..Cache-Contro
	0x0090:  6c3a 206e 6f2d 6361 6368 650d 0a55 7067  l:.no-cache..Upg
	0x00a0:  7261 6465 2d49 6e73 6563 7572 652d 5265  rade-Insecure-Re
	0x00b0:  7175 6573 7473 3a20 310d 0a55 7365 722d  quests:.1..User-
	0x00c0:  4167 656e 743a 204d 6f7a 696c 6c61 2f35  Agent:.Mozilla/5
	0x00d0:  2e30 2028 5831 313b 2046 6564 6f72 613b  .0.(X11;.Fedora;
	0x00e0:  204c 696e 7578 2078 3836 5f36 3429 2041  .Linux.x86_64).A
	0x00f0:  7070 6c65 5765 624b 6974 2f35 3337 2e33  ppleWebKit/537.3
	0x0100:  3620 284b 4854 4d4c 2c20 6c69 6b65 2047  6.(KHTML,.like.G
	0x0110:  6563 6b6f 2920 4368 726f 6d65 2f36 382e  ecko).Chrome/68.
	0x0120:  302e 3334 3039 2e32 2053 6166 6172 692f  0.3409.2.Safari/
	0x0130:  3533 372e 3336 0d0a 4163 6365 7074 3a20  537.36..Accept:.
	0x0140:  7465 7874 2f68 746d 6c2c 6170 706c 6963  text/html,applic
	0x0150:  6174 696f 6e2f 7868 746d 6c2b 786d 6c2c  ation/xhtml+xml,
	0x0160:  6170 706c 6963 6174 696f 6e2f 786d 6c3b  application/xml;
	0x0170:  713d 302e 392c 696d 6167 652f 7765 6270  q=0.9,image/webp
	0x0180:  2c69 6d61 6765 2f61 706e 672c 2a2f 2a3b  ,image/apng,*/*;
	0x0190:  713d 302e 380d 0a41 6363 6570 742d 456e  q=0.8..Accept-En
	0x01a0:  636f 6469 6e67 3a20 677a 6970 2c20 6465  coding:.gzip,.de
	0x01b0:  666c 6174 652c 2062 720d 0a41 6363 6570  flate,.br..Accep
	0x01c0:  742d 4c61 6e67 7561 6765 3a20 656e 2d55  t-Language:.en-U
	0x01d0:  532c 656e 3b71 3d30 2e39 0d0a 0d0a       S,en;q=0.9....

We service it (this is all on wsi 0xfc60c0)

10:05:55.973954 IP localhost.7681 > localhost.50096: Flags [P.], seq 1:67, ack 427, win 350, options [nop,nop,TS val 561972739 ecr 561972738], length 66
	0x0000:  4500 0076 df22 4000 4006 5d5d 7f00 0001  E..v."@.@.]]....
	0x0010:  7f00 0001 1e01 c3b0 42f9 a2e6 a9de a547  ........B......G
	0x0020:  8018 015e fe6a 0000 0101 080a 217f 0603  ...^.j......!...
	0x0030:  217f 0602 4854 5450 2f31 2e31 2032 3030  !...HTTP/1.1.200
	0x0040:  204f 4b0d 0a63 6f6e 7465 6e74 2d74 7970  .OK..content-typ
	0x0050:  653a 2074 6578 742f 6874 6d6c 0d0a 636f  e:.text/html..co
	0x0060:  6e74 656e 742d 6c65 6e67 7468 3a20 3132  ntent-length:.12
	0x0070:  3334 0d0a 0d0a                           34....
10:05:55.973988 IP localhost.7681 > localhost.50096: Flags [P.], seq 67:1301, ack 427, win 350, options [nop,nop,TS val 561972739 ecr 561972739], length 1234
	0x0000:  4500 0506 df23 4000 4006 58cc 7f00 0001  E....#@.@.X.....
	0x0010:  7f00 0001 1e01 c3b0 42f9 a328 a9de a547  ........B..(...G
	0x0020:  8018 015e 02fb 0000 0101 080a 217f 0603  ...^........!...
	0x0030:  217f 0603 3c68 746d 6c3e 0a20 2020 203c  !...<html>.....<
	0x0040:  6865 6164 3e0a 2020 2020 2020 2020 3c6d  head>.........<m
	0x0050:  6574 6120 6368 6172 7365 743d 2255 5446  eta.charset="UTF
	0x0060:  2d38 223e 0a20 2020 2020 2020 203c 7469  -8">.........<ti
	0x0070:  746c 653e 4d69 6e69 6d61 6c20 5061 6765  tle>Minimal.Page
	0x0080:  3c2f 7469 746c 653e 0a20 2020 2020 2020  </title>........
	0x0090:  203c 7363 7269 7074 3e0a 2020 2020 2020  .<script>.......
	0x00a0:  2020 2020 2020 6675 6e63 7469 6f6e 2067  ......function.g
	0x00b0:  6574 496d 6167 6528 7461 7267 6574 496d  etImage(targetIm
	0x00c0:  672c 2073 6f75 7263 6555 726c 290a 2020  g,.sourceUrl)...
	0x00d0:  2020 2020 2020 2020 2020 7b0a 2020 2020  ..........{.....
	0x00e0:  2020 2020 2020 2020 2020 2020 7661 7220  ............var.
	0x00f0:  7265 7120 3d20 6e65 7720 584d 4c48 7474  req.=.new.XMLHtt
	0x0100:  7052 6571 7565 7374 2829 3b0a 2020 2020  pRequest();.....
	0x0110:  2020 2020 2020 2020 2020 2020 7265 712e  ............req.
	0x0120:  6f6e 7265 6164 7973 7461 7465 6368 616e  onreadystatechan
	0x0130:  6765 203d 2066 756e 6374 696f 6e28 2920  ge.=.function().
	0x0140:  7b0a 2020 2020 2020 2020 2020 2020 2020  {...............
	0x0150:  2020 2020 2020 6966 2028 7265 712e 7265  ......if.(req.re
	0x0160:  6164 7953 7461 7465 203d 3d20 3420 2626  adyState.==.4.&&
	0x0170:  2072 6571 2e73 7461 7475 7320 3d3d 2032  .req.status.==.2
	0x0180:  3030 2920 7b0a 2020 2020 2020 2020 2020  00).{...........
	0x0190:  2020 2020 2020 2020 2020 2020 2020 646f  ..............do
	0x01a0:  6375 6d65 6e74 2e67 6574 456c 656d 656e  cument.getElemen
	0x01b0:  7442 7949 6428 7461 7267 6574 496d 6729  tById(targetImg)
	0x01c0:  2e73 7263 203d 2072 6571 2e72 6573 706f  .src.=.req.respo
	0x01d0:  6e73 6554 6578 743b 0a20 2020 2020 2020  nseText;........
	0x01e0:  2020 2020 2020 2020 2020 2020 207d 0a20  .............}..
	0x01f0:  2020 2020 2020 2020 2020 2020 2020 207d  ...............}
	0x0200:  3b0a 2020 2020 2020 2020 2020 2020 2020  ;...............
	0x0210:  2020 7265 712e 6f70 656e 2822 4745 5422  ..req.open("GET"
	0x0220:  2c20 736f 7572 6365 5572 6c2c 2074 7275  ,.sourceUrl,.tru
	0x0230:  6529 3b20 2f2f 2074 7275 6520 666f 7220  e);.//.true.for.
	0x0240:  6173 796e 6368 726f 6e6f 7573 0a20 2020  asynchronous....
	0x0250:  2020 2020 2020 2020 2020 2020 2072 6571  .............req
	0x0260:  2e73 656e 6428 6e75 6c6c 293b 0a20 2020  .send(null);....
	0x0270:  2020 2020 2020 2020 207d 0a20 2020 2020  .........}......
	0x0280:  2020 2020 2020 2066 756e 6374 696f 6e20  .......function.
	0x0290:  6765 7449 6d61 6765 7328 290a 2020 2020  getImages().....
	0x02a0:  2020 2020 2020 2020 7b0a 2020 2020 2020  ........{.......
	0x02b0:  2020 2020 2020 2020 2020 6765 7449 6d61  ..........getIma
	0x02c0:  6765 2822 706e 6749 6d67 3122 2c20 2269  ge("pngImg1",."i
	0x02d0:  6d67 312e 706e 6722 293b 0a20 2020 2020  mg1.png");......
	0x02e0:  2020 2020 2020 2020 2020 202f 2f67 6574  ...........//get
	0x02f0:  496d 6167 6528 2270 6e67 496d 6732 222c  Image("pngImg2",
	0x0300:  2022 696d 6732 2e70 6e67 2229 3b0a 2020  ."img2.png");...
	0x0310:  2020 2020 2020 2020 2020 2020 2020 2f2f  ..............//
	0x0320:  6765 7449 6d61 6765 2822 706e 6749 6d67  getImage("pngImg
	0x0330:  3322 2c20 2269 6d67 332e 706e 6722 293b  3",."img3.png");
	0x0340:  0a20 2020 2020 2020 2020 2020 2020 2020  ................
	0x0350:  202f 2f67 6574 496d 6167 6528 2270 6e67  .//getImage("png
	0x0360:  496d 6734 222c 2022 696d 6731 2e70 6e67  Img4",."img1.png
	0x0370:  2229 3b0a 2020 2020 2020 2020 2020 2020  ");.............
	0x0380:  2020 2020 2f2f 6765 7449 6d61 6765 2822  ....//getImage("
	0x0390:  706e 6749 6d67 3522 2c20 2269 6d67 322e  pngImg5",."img2.
	0x03a0:  706e 6722 293b 0a20 2020 2020 2020 2020  png");..........
	0x03b0:  2020 2020 2020 202f 2f67 6574 496d 6167  .......//getImag
	0x03c0:  6528 2270 6e67 496d 6736 222c 2022 696d  e("pngImg6",."im
	0x03d0:  6733 2e70 6e67 2229 3b0a 2020 2020 2020  g3.png");.......
	0x03e0:  2020 2020 2020 7d0a 2020 2020 2020 2020  ......}.........
	0x03f0:  3c2f 7363 7269 7074 3e0a 2020 2020 3c2f  </script>.....</
	0x0400:  6865 6164 3e0a 2020 2020 3c62 6f64 7920  head>.....<body.
	0x0410:  6f6e 6c6f 6164 3d22 6765 7449 6d61 6765  onload="getImage
	0x0420:  7328 2922 3e0a 2020 2020 2020 2020 3c68  s()">.........<h
	0x0430:  313e 4d69 6e69 6d61 6c20 5061 6765 3c2f  1>Minimal.Page</
	0x0440:  6831 3e0a 2020 2020 2020 2020 3c69 6d67  h1>.........<img
	0x0450:  2069 643d 2270 6e67 496d 6731 2220 2f3e  .id="pngImg1"./>
	0x0460:  0a20 2020 2020 2020 203c 696d 6720 6964  .........<img.id
	0x0470:  3d22 706e 6749 6d67 3222 202f 3e0a 2020  ="pngImg2"./>...
	0x0480:  2020 2020 2020 3c69 6d67 2069 643d 2270  ......<img.id="p
	0x0490:  6e67 496d 6733 2220 2f3e 0a20 2020 2020  ngImg3"./>......
	0x04a0:  2020 203c 696d 6720 6964 3d22 706e 6749  ...<img.id="pngI
	0x04b0:  6d67 3422 202f 3e0a 2020 2020 2020 2020  mg4"./>.........
	0x04c0:  3c69 6d67 2069 643d 2270 6e67 496d 6735  <img.id="pngImg5
	0x04d0:  2220 2f3e 0a20 2020 2020 2020 203c 696d  "./>.........<im
	0x04e0:  6720 6964 3d22 706e 6749 6d67 3622 202f  g.id="pngImg6"./
	0x04f0:  3e0a 2020 2020 3c2f 626f 6479 3e0a 3c2f  >.....</body>.</
	0x0500:  6874 6d6c 3e0a                           html>.

We see there is no pipelined request right now for wsi 0xfc60c0, so we give up the ah and have wsi 0xfc60c0 wait a bit in case something new comes.

The ah detach processing passes the (only) ah to the last (first requesting) guy in the ah wait list, wsi 0xfc7960

[2018/05/04 10:05:55:9740] INFO: lws_http_transaction_completed: wsi 0xfc60c0
[2018/05/04 10:05:55:9740] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0xfc60c0
[2018/05/04 10:05:55:9740] INFO: have prev 0xfb7f50, setting him to our next (nil)
[2018/05/04 10:05:55:9740] INFO: lws_http_transaction_completed: nothing in buflist so detaching ah
[2018/05/04 10:05:55:9740] INFO: __lws_header_table_detach: wsi 0xfc60c0: ah 0xfc6580 (tsi=0, count = 1)
[2018/05/04 10:05:55:9740] INFO: pt wait list 0xfc87a0
[2018/05/04 10:05:55:9740] INFO: __lws_header_table_detach: transferring ah to last eligible wsi in wait list 0xfc7960 (wsistate 0x20000015)
[2018/05/04 10:05:55:9740] INFO: __lws_header_table_detach: Enabling 0xfc7960 POLLIN
[2018/05/04 10:05:55:9740] INFO: __lws_header_table_detach: wsi 0xfc7960: ah 0xfc6580 (tsi=0, count = 1)
[2018/05/04 10:05:55:9740] INFO: lws_http_transaction_completed: 0xfc60c0: keep-alive await new transaction

wsi 0xfc7960 has nothing waiting for him at all; he never gets sent any headers by the browser. So he just sits there holding the only ah.

Meanwhile the browser processed the JS and decides to send the XHR on our old friend wsi 0xfc60c0, as a pipelined transaction.

10:05:55.984195 IP localhost.50096 > localhost.7681: Flags [P.], seq 427:782, ack 1301, win 1365, options [nop,nop,TS val 561972749 ecr 561972739], length 355
	0x0000:  4500 0197 de60 4000 4006 5cfe 7f00 0001  E....`@.@.\.....
	0x0010:  7f00 0001 c3b0 1e01 a9de a547 42f9 a7fa  ...........GB...
	0x0020:  8018 0555 ff8b 0000 0101 080a 217f 060d  ...U........!...
	0x0030:  217f 0603 4745 5420 2f69 6d67 312e 706e  !...GET./img1.pn
	0x0040:  6720 4854 5450 2f31 2e31 0d0a 486f 7374  g.HTTP/1.1..Host
	0x0050:  3a20 6c6f 6361 6c68 6f73 743a 3736 3831  :.localhost:7681
	0x0060:  0d0a 436f 6e6e 6563 7469 6f6e 3a20 6b65  ..Connection:.ke
	0x0070:  6570 2d61 6c69 7665 0d0a 5072 6167 6d61  ep-alive..Pragma
	0x0080:  3a20 6e6f 2d63 6163 6865 0d0a 4361 6368  :.no-cache..Cach
	0x0090:  652d 436f 6e74 726f 6c3a 206e 6f2d 6361  e-Control:.no-ca
	0x00a0:  6368 650d 0a55 7365 722d 4167 656e 743a  che..User-Agent:
	0x00b0:  204d 6f7a 696c 6c61 2f35 2e30 2028 5831  .Mozilla/5.0.(X1
	0x00c0:  313b 2046 6564 6f72 613b 204c 696e 7578  1;.Fedora;.Linux
	0x00d0:  2078 3836 5f36 3429 2041 7070 6c65 5765  .x86_64).AppleWe
	0x00e0:  624b 6974 2f35 3337 2e33 3620 284b 4854  bKit/537.36.(KHT
	0x00f0:  4d4c 2c20 6c69 6b65 2047 6563 6b6f 2920  ML,.like.Gecko).
	0x0100:  4368 726f 6d65 2f36 382e 302e 3334 3039  Chrome/68.0.3409
	0x0110:  2e32 2053 6166 6172 692f 3533 372e 3336  .2.Safari/537.36
	0x0120:  0d0a 4163 6365 7074 3a20 2a2f 2a0d 0a52  ..Accept:.*/*..R
	0x0130:  6566 6572 6572 3a20 6874 7470 3a2f 2f6c  eferer:.http://l
	0x0140:  6f63 616c 686f 7374 3a37 3638 312f 0d0a  ocalhost:7681/..
	0x0150:  4163 6365 7074 2d45 6e63 6f64 696e 673a  Accept-Encoding:
	0x0160:  2067 7a69 702c 2064 6566 6c61 7465 2c20  .gzip,.deflate,.
	0x0170:  6272 0d0a 4163 6365 7074 2d4c 616e 6775  br..Accept-Langu
	0x0180:  6167 653a 2065 6e2d 5553 2c65 6e3b 713d  age:.en-US,en;q=
	0x0190:  302e 390d 0a0d 0a                        0.9....

Since we gave the only ah up to this other speculative browser connection, we can't do anything with it but put it on the ah list.

[2018/05/04 10:05:55:9842] INFO: lws_header_table_attach: wsi 0xfc60c0: ah (nil) (tsi 0, count = 1) in
[2018/05/04 10:05:55:9842] INFO: _lws_header_ensure_we_are_on_waiting_list: wsi: 0xfc60c0
[2018/05/04 10:05:55:9842] INFO: wsi 0xfc60c0: ah not available

So then we are stuck until the unused speculative connection with our only ah times out... we sit there for 6s/

[2018/05/04 10:06:01:9907] INFO: __lws_close_free_wsi: 0xfc60c0: caller: timeout
[2018/05/04 10:06:01:9908] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0xfc60c0
[2018/05/04 10:06:01:9908] INFO: have prev 0xfb7f50, setting him to our next (nil)
[2018/05/04 10:06:01:9909] INFO: rops_destroy_role_h1: ah det due to close
[2018/05/04 10:06:01:9909] INFO: __lws_remove_from_ah_waiting_list: wsi 0xfc60c0

The browser retries on a different connection (either fresh or localhost.50100 was one of the speculative connections created at the start, tcpdump didn't catch it being established either way)

10:06:01.991193 IP localhost.50100 > localhost.7681: Flags [P.], seq 4035575824:4035576179, ack 3624293255, win 342, options [nop,nop,TS val 561978756 ecr 561972736], length 355
	0x0000:  4500 0197 7dca 4000 4006 bd94 7f00 0001  E...}.@.@.......
	0x0010:  7f00 0001 c3b4 1e01 f08a 0010 d806 5387  ..............S.
	0x0020:  8018 0156 ff8b 0000 0101 080a 217f 1d84  ...V........!...
	0x0030:  217f 0600 4745 5420 2f69 6d67 312e 706e  !...GET./img1.pn
	0x0040:  6720 4854 5450 2f31 2e31 0d0a 486f 7374  g.HTTP/1.1..Host
	0x0050:  3a20 6c6f 6361 6c68 6f73 743a 3736 3831  :.localhost:7681
	0x0060:  0d0a 436f 6e6e 6563 7469 6f6e 3a20 6b65  ..Connection:.ke
	0x0070:  6570 2d61 6c69 7665 0d0a 5072 6167 6d61  ep-alive..Pragma
	0x0080:  3a20 6e6f 2d63 6163 6865 0d0a 4361 6368  :.no-cache..Cach
	0x0090:  652d 436f 6e74 726f 6c3a 206e 6f2d 6361  e-Control:.no-ca
	0x00a0:  6368 650d 0a55 7365 722d 4167 656e 743a  che..User-Agent:
	0x00b0:  204d 6f7a 696c 6c61 2f35 2e30 2028 5831  .Mozilla/5.0.(X1
	0x00c0:  313b 2046 6564 6f72 613b 204c 696e 7578  1;.Fedora;.Linux
	0x00d0:  2078 3836 5f36 3429 2041 7070 6c65 5765  .x86_64).AppleWe
	0x00e0:  624b 6974 2f35 3337 2e33 3620 284b 4854  bKit/537.36.(KHT
	0x00f0:  4d4c 2c20 6c69 6b65 2047 6563 6b6f 2920  ML,.like.Gecko).
	0x0100:  4368 726f 6d65 2f36 382e 302e 3334 3039  Chrome/68.0.3409
	0x0110:  2e32 2053 6166 6172 692f 3533 372e 3336  .2.Safari/537.36
	0x0120:  0d0a 4163 6365 7074 3a20 2a2f 2a0d 0a52  ..Accept:.*/*..R
	0x0130:  6566 6572 6572 3a20 6874 7470 3a2f 2f6c  eferer:.http://l
	0x0140:  6f63 616c 686f 7374 3a37 3638 312f 0d0a  ocalhost:7681/..
	0x0150:  4163 6365 7074 2d45 6e63 6f64 696e 673a  Accept-Encoding:
	0x0160:  2067 7a69 702c 2064 6566 6c61 7465 2c20  .gzip,.deflate,.
	0x0170:  6272 0d0a 4163 6365 7074 2d4c 616e 6775  br..Accept-Langu
	0x0180:  6167 653a 2065 6e2d 5553 2c65 6e3b 713d  age:.en-US,en;q=
	0x0190:  302e 390d 0a0d 0a                        0.9....

... and we service it finally etc...

[2018/05/04 10:06:01:9912] INFO: lws_handshake_server: parsed count 355
[2018/05/04 10:06:01:9912] INFO: lws_select_vhost: vhost match to default based on port 7681
[2018/05/04 10:06:01:9912] INFO: lws_handshake_server: 0xfc7960: No upgrade
[2018/05/04 10:06:01:9912] INFO: Method: 'GET' (0), request for '/img1.png'

So this test with an ah limit of 4 is unreasonable is the client is going to open 5 connections speculatively and leave some of them idle. You can test the pipelining meaningfully like this (the /dev/zero is just to keep nc alive until the server finishes sending everything)

echo -e -n "GET / http/1.1\x0d\x0a\x0d\x0aGET /img1.png http/1.1\x0d\x0a\x0d\x0aGET /img2.png http/1.1\x0d\x0a\x0d\x0aGET /img3.png http/1.1\x0d\x0a\x0d\x0a" | cat - /dev/zero | nc 127.0.0.1 7681

or using the new minimal example that knows how to keep itself open long enough

echo -e -n "GET / http/1.1\x0d\x0a\x0d\x0aGET /img1.png http/1.1\x0d\x0a\x0d\x0aGET /img2.png http/1.1\x0d\x0a\x0d\x0aGET /img3.png http/1.1\x0d\x0a\x0d\x0a" | ./bin/lws-minimal-raw-netcat --server 127.0.0.1 --port 7681

On master, the ah limit defaults to the number of fds the process has... for http/2, restricting ah is much more difficult and we don't support it yet (if ever). For h2, all of the simultaneous streams share one tcp connection and it is assumed you can always fire up a new stream and send headers on it, up to the general limit of streams-per-connection. So the ability to manage ah is not really there like it is for h1.... you need to allow for enough ah for as many streams as you expect.

@rdumusc
Copy link
Author

rdumusc commented May 4, 2018

Thank you for this in-depth analysis, I appreciate that you're taking time to look into this.

My example may indeed have been slightly too complex; I just wanted to see if issuing more http requests (6) than available ah (4) would work without blocking the server, which indeed works for both v2.3 and v2.4. I hoped this was a good test of TCP connection reuse in http1.1.

Doing as you suggest, I have now restricted everything to just 1 extra http request for 1 image and 1 available ah.

With both Firefox 59 and Chrome 63 (which are the versions I have at hand on Ubuntu 16.04), after pressing Ctrl-Shift-R once I get the same output for v2.4-stable:

[2018/05/04 11:19:23:7832] INFO: lws_header_table_attach: wsi 0x17604c0: ah (nil) (tsi 0, count = 0) in
[2018/05/04 11:19:23:7832] INFO: _lws_create_ah: created ah 0x17663a0 (size 4096): pool length 1
[2018/05/04 11:19:23:7832] INFO: lws_header_table_attach: did attach wsi 0x17604c0: ah 0x17663a0: count 1 (on exit)
[2018/05/04 11:19:23:7832] INFO: vhost match to default based on port 7681
[2018/05/04 11:19:23:7832] INFO: No upgrade
[2018/05/04 11:19:23:7832] INFO: Method: 'GET' (0), request for '/'
[2018/05/04 11:19:23:7832] INFO: no hit
[2018/05/04 11:19:23:7832] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:19:23:7832] USER: lws_http_serve: /
[2018/05/04 11:19:23:7833] INFO: lws_read: read_ok, used 365
[2018/05/04 11:19:23:7833] INFO: lws_http_transaction_completed: wsi 0x17604c0
[2018/05/04 11:19:23:7833] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:19:23:7833] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:19:23:7833] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 1)
[2018/05/04 11:19:23:7833] INFO: lws_header_table_detach: nobody usable waiting
[2018/05/04 11:19:23:7833] INFO: _lws_destroy_ah: freed ah 0x17663a0 : pool length 0
[2018/05/04 11:19:23:7833] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 0)
[2018/05/04 11:19:23:7833] INFO: lws_http_transaction_completed: 0x17604c0: keep-alive await new transaction
[2018/05/04 11:19:23:8632] INFO: lws_header_table_attach: wsi 0x17604c0: ah (nil) (tsi 0, count = 0) in
[2018/05/04 11:19:23:8632] INFO: _lws_create_ah: created ah 0x17663a0 (size 4096): pool length 1
[2018/05/04 11:19:23:8632] INFO: lws_header_table_attach: did attach wsi 0x17604c0: ah 0x17663a0: count 1 (on exit)
[2018/05/04 11:19:23:8632] INFO: vhost match to default based on port 7681
[2018/05/04 11:19:23:8632] INFO: No upgrade
[2018/05/04 11:19:23:8632] INFO: Method: 'GET' (0), request for '/img1.png'
[2018/05/04 11:19:23:8632] INFO: no hit
[2018/05/04 11:19:23:8632] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:19:23:8632] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:19:23:8632] USER: lws_http_serve: /img1.png
[2018/05/04 11:19:23:8632] INFO: lws_read: read_ok, used 273
[2018/05/04 11:19:23:8632] INFO: lws_http_transaction_completed: wsi 0x17604c0
[2018/05/04 11:19:23:8632] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:19:23:8632] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:19:23:8632] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 1)
[2018/05/04 11:19:23:8633] INFO: lws_header_table_detach: nobody usable waiting
[2018/05/04 11:19:23:8633] INFO: _lws_destroy_ah: freed ah 0x17663a0 : pool length 0
[2018/05/04 11:19:23:8633] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 0)
[2018/05/04 11:19:23:8633] INFO: lws_http_transaction_completed: 0x17604c0: keep-alive await new transaction
---- a few seconds later ---
[2018/05/04 11:19:29:8695] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:19:29:8695] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:19:29:8696] INFO: ah det due to close

So you're right, I wonder how you got to see 5 TCP connections... :-)

Anyway, if I press Ctrl-Shift-R twice in Firefox 59 the server blocks with v2.4-stable (matching your observations):

[2018/05/04 11:29:58:3349] INFO: lws_header_table_attach: wsi 0x17604c0: ah (nil) (tsi 0, count = 0) in
[2018/05/04 11:29:58:3349] INFO: _lws_create_ah: created ah 0x17663a0 (size 4096): pool length 1
[2018/05/04 11:29:58:3349] INFO: lws_header_table_attach: did attach wsi 0x17604c0: ah 0x17663a0: count 1 (on exit)
[2018/05/04 11:29:58:3349] INFO: vhost match to default based on port 7681
[2018/05/04 11:29:58:3349] INFO: No upgrade
[2018/05/04 11:29:58:3349] INFO: Method: 'GET' (0), request for '/'
[2018/05/04 11:29:58:3349] INFO: no hit
[2018/05/04 11:29:58:3349] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:29:58:3349] USER: lws_http_serve: /
[2018/05/04 11:29:58:3350] INFO: lws_read: read_ok, used 365
[2018/05/04 11:29:58:3350] INFO: lws_http_transaction_completed: wsi 0x17604c0
[2018/05/04 11:29:58:3350] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:29:58:3350] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:29:58:3350] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 1)
[2018/05/04 11:29:58:3350] INFO: lws_header_table_detach: nobody usable waiting
[2018/05/04 11:29:58:3350] INFO: _lws_destroy_ah: freed ah 0x17663a0 : pool length 0
[2018/05/04 11:29:58:3350] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 0)
[2018/05/04 11:29:58:3350] INFO: lws_http_transaction_completed: 0x17604c0: keep-alive await new transaction
[2018/05/04 11:29:58:4551] INFO: lws_header_table_attach: wsi 0x17604c0: ah (nil) (tsi 0, count = 0) in
[2018/05/04 11:29:58:4551] INFO: _lws_create_ah: created ah 0x17663a0 (size 4096): pool length 1
[2018/05/04 11:29:58:4551] INFO: lws_header_table_attach: did attach wsi 0x17604c0: ah 0x17663a0: count 1 (on exit)
[2018/05/04 11:29:58:4552] INFO: vhost match to default based on port 7681
[2018/05/04 11:29:58:4552] INFO: No upgrade
[2018/05/04 11:29:58:4552] INFO: Method: 'GET' (0), request for '/img1.png'
[2018/05/04 11:29:58:4552] INFO: no hit
[2018/05/04 11:29:58:4552] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:29:58:4552] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:29:58:4552] USER: lws_http_serve: /img1.png
[2018/05/04 11:29:58:4552] INFO: lws_read: read_ok, used 273
[2018/05/04 11:29:58:4552] INFO: lws_http_transaction_completed: wsi 0x17604c0
[2018/05/04 11:29:58:4552] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:29:58:4552] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:29:58:4552] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 1)
[2018/05/04 11:29:58:4552] INFO: lws_header_table_detach: nobody usable waiting
[2018/05/04 11:29:58:4552] INFO: _lws_destroy_ah: freed ah 0x17663a0 : pool length 0
[2018/05/04 11:29:58:4552] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 0)
[2018/05/04 11:29:58:4552] INFO: lws_http_transaction_completed: 0x17604c0: keep-alive await new transaction
[2018/05/04 11:29:58:8826] INFO: lws_header_table_attach: wsi 0x17604c0: ah (nil) (tsi 0, count = 0) in
[2018/05/04 11:29:58:8826] INFO: _lws_create_ah: created ah 0x17663a0 (size 4096): pool length 1
[2018/05/04 11:29:58:8826] INFO: lws_header_table_attach: did attach wsi 0x17604c0: ah 0x17663a0: count 1 (on exit)
[2018/05/04 11:29:58:8826] INFO: lws_server_socket_service: read 0 len a
[2018/05/04 11:29:58:8827] INFO: lws_header_table_attach: wsi 0x1760730: ah (nil) (tsi 0, count = 1) in
[2018/05/04 11:29:58:8827] INFO: _lws_header_ensure_we_are_on_waiting_list: wsi: 0x1760730
[2018/05/04 11:29:58:8827] INFO: lws_adopt_descriptor_vhost: waiting for ah
---- waiting 5-10 seconds for the page to reload ---
[2018/05/04 11:30:09:8939] INFO: wsi 0x17604c0: TIMEDOUT WAITING on 25 (did hdr 0, ah 0x17663a0, wl 1, pfd events 0) 1525426209 vs 1525426208
[2018/05/04 11:30:09:8940] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x17604c0
[2018/05/04 11:30:09:8940] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:30:09:8941] INFO: ah det due to close
[2018/05/04 11:30:09:8941] INFO: lws_header_table_detach: wsi 0x17604c0: ah 0x17663a0 (tsi=0, count = 1)
[2018/05/04 11:30:09:8941] INFO: pt wait list 0x1760730
[2018/05/04 11:30:09:8941] INFO: lws_header_table_detach: last eligible wsi in wait list 0x1760730
[2018/05/04 11:30:09:8941] INFO: lws_header_table_detach: Enabling 0x1760730 POLLIN
[2018/05/04 11:30:09:8941] INFO: lws_header_table_detach: wsi 0x1760730: ah 0x17663a0 (tsi=0, count = 1)
[2018/05/04 11:30:09:8941] INFO: vhost match to default based on port 7681
[2018/05/04 11:30:09:8941] INFO: No upgrade
[2018/05/04 11:30:09:8941] INFO: Method: 'GET' (0), request for '/'
[2018/05/04 11:30:09:8941] INFO: no hit
[2018/05/04 11:30:09:8941] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x1760730
[2018/05/04 11:30:09:8941] USER: lws_http_serve: /
[2018/05/04 11:30:09:8942] INFO: lws_read: read_ok, used 365
[2018/05/04 11:30:09:8942] INFO: lws_http_transaction_completed: wsi 0x1760730
[2018/05/04 11:30:09:8942] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x1760730
[2018/05/04 11:30:09:8942] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:30:09:8942] INFO: lws_header_table_detach: wsi 0x1760730: ah 0x17663a0 (tsi=0, count = 1)
[2018/05/04 11:30:09:8942] INFO: lws_header_table_detach: nobody usable waiting
[2018/05/04 11:30:09:8942] INFO: _lws_destroy_ah: freed ah 0x17663a0 : pool length 0
[2018/05/04 11:30:09:8942] INFO: lws_header_table_detach: wsi 0x1760730: ah 0x17663a0 (tsi=0, count = 0)
[2018/05/04 11:30:09:8942] INFO: lws_http_transaction_completed: 0x1760730: keep-alive await new transaction
[2018/05/04 11:30:09:9653] INFO: lws_header_table_attach: wsi 0x1760730: ah (nil) (tsi 0, count = 0) in
[2018/05/04 11:30:09:9653] INFO: _lws_create_ah: created ah 0x17663a0 (size 4096): pool length 1
[2018/05/04 11:30:09:9654] INFO: lws_header_table_attach: did attach wsi 0x1760730: ah 0x17663a0: count 1 (on exit)
[2018/05/04 11:30:09:9654] INFO: vhost match to default based on port 7681
[2018/05/04 11:30:09:9654] INFO: No upgrade
[2018/05/04 11:30:09:9654] INFO: Method: 'GET' (0), request for '/img1.png'
[2018/05/04 11:30:09:9654] INFO: no hit
[2018/05/04 11:30:09:9654] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x1760730
[2018/05/04 11:30:09:9654] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:30:09:9654] USER: lws_http_serve: /img1.png
[2018/05/04 11:30:09:9654] INFO: lws_read: read_ok, used 273
[2018/05/04 11:30:09:9654] INFO: lws_http_transaction_completed: wsi 0x1760730
[2018/05/04 11:30:09:9654] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x1760730
[2018/05/04 11:30:09:9654] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:30:09:9654] INFO: lws_header_table_detach: wsi 0x1760730: ah 0x17663a0 (tsi=0, count = 1)
[2018/05/04 11:30:09:9654] INFO: lws_header_table_detach: nobody usable waiting
[2018/05/04 11:30:09:9654] INFO: _lws_destroy_ah: freed ah 0x17663a0 : pool length 0
[2018/05/04 11:30:09:9654] INFO: lws_header_table_detach: wsi 0x1760730: ah 0x17663a0 (tsi=0, count = 0)
[2018/05/04 11:30:09:9654] INFO: lws_http_transaction_completed: 0x1760730: keep-alive await new transaction
[2018/05/04 11:30:15:9712] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x1760730
[2018/05/04 11:30:15:9712] INFO: have prev 0x1760230, setting him to our next (nil)
[2018/05/04 11:30:15:9713] INFO: ah det due to close

Whereas I could reload an infinite number of times with v2.3-stable.

In conclusion, if I understand correctly, you suggest fixing the problem by using a larger ah pool with v2.4. Since there is no way of determining in advance how many clients will connect and how many requests each one of them will be making, that means setting an arbitrarily high limit, right? I logically assumed it was a regression in the http1 support... but seeing your last comment, you seem to present it as a necessary evolution to support http2, and the ah limit has been removed by default (max_http_header_pool=0) in the latest master branch (v3.0.0 :-)). Is that right?

@lws-team
Copy link
Member

lws-team commented May 4, 2018

No doubt what the browsers choose to do is dependent on the exact version; chrome 67 that I have acts different than your chrome 63. But if there's even one speculative connection that is idle, and we only have one ah to play with, it will block until the connection times out and the browser retries. That isn't an "lws thing" it's just what happens under the test conditions.

Whereas I could reload an infinite number of times with v2.3-stable.

I don't have time to look at it but I think I will find v2.3 using more ah-es or otherwise acting differently. v2.4 + master are doing what was asked AFAICS.

if I understand correctly, you suggest fixing the problem by using a larger ah pool with v2.4.

No... I suggest testing it with an exact test that I pasted in my previous reply, where you know exactly what stimulus you have given it, how many connections it uses, and how many ah it uses, so you are testing what you think you are.

For normal use testing holding down Ctrl-R is a fine test on $BROWSER. But under the constraint there are only one for four ah to go around, as I showed in my previous post that is not workable with every browser which can just choose to broadside 5 connections in Chrome 67 case. If you want to confirm it, install Chrome 67 :-) For those tests, there is no point restricting the ah like that with a browser.

Since there is no way of determining in advance how many clients will connect and how many requests each one of them will be making, that means setting an arbitrarily high limit, right?

No... there are two separate issues. 1a) Is the ah limit appropriate you can server ANYTHING to Chrome 67? It must be above 5 by the look of it.

1b) If you are running http/2, where the stream limit is say 64 per connection, can you handle the amount of streams it will broadside to get assets? If you have a lot of pictures in your html, the browser may reasonably say it wants to load 16 streams in parallel, each with headers and requiring 16 ah. It might go up to 64 theoretically or whatever you set as the stream limit. So you must cut your ah limit to that cloth even to service one browser client with your busy page. (The ah are malloc'd up and deleted if not needed after they are used). We can't play deferment games with the ah on h2 because there is one serialized connection with all the streams muxed that is blocked by data we don't read. If we just stop reading the connection until another ah appears, we cannot get, eg, tx credit updates that allow us to send more and just stall. So we must always go forward with h2.

  1. Separately, how many concurrent ah can you afford memory-wise? Actually they are typically around 5KiB each. If your process is allowed 1024 fd and you just keep the new default of the same number of ah as the limit, well, it's only ~5MiB we are talking about. So for most use-cases, it's negligible. When you hit this limit, whole connections get dropped.

ah limit has been removed by default

By default the ah limit == the fd limit (set by ulimit -n) now. It is not "unlimited". Like I say the memory load is perhaps not as burdensome as you are thinking...

@rdumusc
Copy link
Author

rdumusc commented May 4, 2018

OK I see, then I will increase the ah limit to 1024 when compiling against lws 2.4, which should solve my problem in practice. The default limit was simply not sufficient anymore with the new behavior. Thanks again for your help.

For the record, here is the difference between 2.3 and 2.4, calling lws-minimal-raw-netcat twice in row.

First call with 2.4:

echo -e -n "GET / http/1.1\x0d\x0a\x0d\x0aGET /img1.png http/1.1\x0d\x0a\x0d\x0aGET /img2.png http/1.1\x0d\x0a\x0d\x0aGET /img3.png http/1.1\x0d\x0a\x0d\x0a" | ./bin/lws-minimal-raw-netcat --server 127.0.0.1 --port 7681
[2018/05/04 13:48:43:9435] USER: LWS minimal raw netcat [--server ip] [--port port] [-w ms]
[2018/05/04 13:48:43:9436] NOTICE: Creating Vhost 'default' (no listener), 1 protocols, IPv6 off
[2018/05/04 13:48:43:9436] USER: Starting connect to 127.0.0.1:7681...
[2018/05/04 13:48:43:9437] USER: Connected...
[2018/05/04 13:48:43:9437] USER: LWS_CALLBACK_RAW_ADOPT
[2018/05/04 13:48:43:9437] USER: LWS_CALLBACK_RAW_ADOPT_FILE
[2018/05/04 13:48:43:9437] USER: LWS_CALLBACK_RAW_WRITEABLE
[2018/05/04 13:48:43:9437] USER: LWS_CALLBACK_RAW_RX_FILE
[2018/05/04 13:48:43:9437] NOTICE: raw file read 96
[2018/05/04 13:48:43:9437] USER: LWS_CALLBACK_RAW_CLOSE_FILE
[2018/05/04 13:48:43:9437] USER: LWS_CALLBACK_RAW_WRITEABLE
[2018/05/04 13:48:43:9438] USER: LWS_CALLBACK_RAW_RX (66)
HTTP/1.1 200 OK
content-type: text/html
content-length: 1234

[2018/05/04 13:48:43:9438] USER: LWS_CALLBACK_RAW_RX (1234)
<html>
    <head>
        <meta charset="UTF-8">
        <title>Minimal Page</title>
        <script>
            function getImage(targetImg, sourceUrl)
            {
                var req = new XMLHttpRequest();
                req.onreadystatechange = function() {
                    if (req.readyState == 4 && req.status == 200) {
                        document.getElementById(targetImg).src = req.responseText;
                    }
                };
                req.open("GET", sourceUrl, true); // true for asynchronous
                req.send(null);
            }
            function getImages()
            {
                getImage("pngImg1", "img1.png");
//                getImage("pngImg2", "img2.png");
//                getImage("pngImg3", "img3.png");
//                getImage("pngImg4", "img1.png");
//                getImage("pngImg5", "img2.png");
//                getImage("pngImg6", "img3.png");
            }
        </script>
    </head>
    <body onload="getImages()">
        <h1>Minimal Page</h1>
        <img id="pngImg1" />
        <img id="pngImg2" />
        <img id="pngImg3" />
        <img id="pngImg4" />
        <img id="pngImg5" />
        <img id="pngImg6" />
    </body>
</html>
[2018/05/04 13:48:43:9439] USER: LWS_CALLBACK_RAW_RX (384)
HTTP/1.1 200 OK
content-type: application/base64
content-length: 310

[2018/05/04 13:48:43:9439] USER: LWS_CALLBACK_RAW_RX (74)
HTTP/1.1 200 OK
content-type: application/base64
content-length: 310

[2018/05/04 13:48:43:9439] USER: LWS_CALLBACK_RAW_RX (310)
[2018/05/04 13:48:43:9439] USER: LWS_CALLBACK_RAW_RX (74)
HTTP/1.1 200 OK
content-type: application/base64
content-length: 310

[2018/05/04 13:48:43:9440] USER: LWS_CALLBACK_RAW_RX (318)
[2018/05/04 13:48:44:0437] USER: LWS_CALLBACK_TIMER
[2018/05/04 13:48:44:0437] USER: LWS_CALLBACK_RAW_CLOSE
[2018/05/04 13:48:44:0438] USER: main: destroying context

Second call with 2.4:

echo -e -n "GET / http/1.1\x0d\x0a\x0d\x0aGET /img1.png http/1.1\x0d\x0a\x0d\x0aGET /img2.png http/1.1\x0d\x0a\x0d\x0aGET /img3.png http/1.1\x0d\x0a\x0d\x0a" | ./bin/lws-minimal-raw-netcat --server 127.0.0.1 --port 7681
[2018/05/04 13:48:45:0953] USER: LWS minimal raw netcat [--server ip] [--port port] [-w ms]
[2018/05/04 13:48:45:0953] NOTICE: Creating Vhost 'default' (no listener), 1 protocols, IPv6 off
[2018/05/04 13:48:45:0954] USER: Starting connect to 127.0.0.1:7681...
[2018/05/04 13:48:45:0954] USER: Connected...
[2018/05/04 13:48:45:0954] USER: LWS_CALLBACK_RAW_ADOPT
[2018/05/04 13:48:45:0955] USER: LWS_CALLBACK_RAW_ADOPT_FILE
[2018/05/04 13:48:45:0955] USER: LWS_CALLBACK_RAW_WRITEABLE
[2018/05/04 13:48:45:0955] USER: LWS_CALLBACK_RAW_RX_FILE
[2018/05/04 13:48:45:0955] NOTICE: raw file read 96
[2018/05/04 13:48:45:0955] USER: LWS_CALLBACK_RAW_CLOSE_FILE
[2018/05/04 13:48:45:0955] USER: LWS_CALLBACK_RAW_WRITEABLE
[2018/05/04 13:48:45:1955] USER: LWS_CALLBACK_TIMER
[2018/05/04 13:48:45:1955] USER: LWS_CALLBACK_RAW_CLOSE
[2018/05/04 13:48:45:1956] USER: main: destroying context

With v2.3-stable:

echo -e -n "GET / http/1.1\x0d\x0a\x0d\x0aGET /img1.png http/1.1\x0d\x0a\x0d\x0aGET /img2.png http/1.1\x0d\x0a\x0d\x0aGET /img3.png http/1.1\x0d\x0a\x0d\x0a" | ./bin/lws-minimal-raw-netcat --server 127.0.0.1 --port 7681
[2018/05/04 14:04:49:4046] USER: LWS minimal raw netcat [--server ip] [--port port] [-w ms]
[2018/05/04 14:04:49:4046] NOTICE: Creating Vhost 'default' (no listener), 1 protocols, IPv6 off
[2018/05/04 14:04:49:4046] USER: Starting connect to 127.0.0.1:7681...
[2018/05/04 14:04:49:4047] USER: Connected...
[2018/05/04 14:04:49:4047] USER: LWS_CALLBACK_RAW_ADOPT
[2018/05/04 14:04:49:4047] USER: LWS_CALLBACK_RAW_ADOPT_FILE
[2018/05/04 14:04:49:4047] USER: LWS_CALLBACK_RAW_WRITEABLE
[2018/05/04 14:04:49:4047] USER: LWS_CALLBACK_RAW_RX_FILE
[2018/05/04 14:04:49:4047] NOTICE: raw file read 96
[2018/05/04 14:04:49:4047] USER: LWS_CALLBACK_RAW_CLOSE_FILE
[2018/05/04 14:04:49:4047] USER: LWS_CALLBACK_RAW_WRITEABLE
[2018/05/04 14:04:49:4048] USER: LWS_CALLBACK_RAW_RX (66)
HTTP/1.1 200 OK
content-type: text/html
content-length: 1234

[2018/05/04 14:04:49:4048] USER: LWS_CALLBACK_RAW_RX (1234)
<html>
    <head>
        <meta charset="UTF-8">
        <title>Minimal Page</title>
        <script>
            function getImage(targetImg, sourceUrl)
            {
                var req = new XMLHttpRequest();
                req.onreadystatechange = function() {
                    if (req.readyState == 4 && req.status == 200) {
                        document.getElementById(targetImg).src = req.responseText;
                    }
                };
                req.open("GET", sourceUrl, true); // true for asynchronous
                req.send(null);
            }
            function getImages()
            {
                getImage("pngImg1", "img1.png");
//                getImage("pngImg2", "img2.png");
//                getImage("pngImg3", "img3.png");
//                getImage("pngImg4", "img1.png");
//                getImage("pngImg5", "img2.png");
//                getImage("pngImg6", "img3.png");
            }
        </script>
    </head>
    <body onload="getImages()">
        <h1>Minimal Page</h1>
        <img id="pngImg1" />
        <img id="pngImg2" />
        <img id="pngImg3" />
        <img id="pngImg4" />
        <img id="pngImg5" />
        <img id="pngImg6" />
    </body>
</html>
[2018/05/04 14:04:49:5047] USER: LWS_CALLBACK_TIMER
[2018/05/04 14:04:49:5047] USER: LWS_CALLBACK_RAW_CLOSE
[2018/05/04 14:04:49:5048] USER: main: destroying context

Possibly the key difference is that it looks like the images are not served... matching server output:

[2018/05/04 14:04:49:4047] INFO: lws_header_table_attach: wsi 0x235f890: ah (nil) (tsi 0, count = 0) in
[2018/05/04 14:04:49:4047] INFO: lws_header_table_attach: did attach wsi 0x235f890: ah 0x2355f80: count 1 (on exit)
[2018/05/04 14:04:49:4048] INFO: no host
[2018/05/04 14:04:49:4048] INFO: No upgrade
[2018/05/04 14:04:49:4048] INFO: lws_ensure_user_space: 0x235f890 protocol 0x6394e0
[2018/05/04 14:04:49:4048] INFO: Method: GET request for '/'
[2018/05/04 14:04:49:4048] INFO: no hit
[2018/05/04 14:04:49:4048] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x235f890
[2018/05/04 14:04:49:4048] INFO: lws_ensure_user_space: 0x235f890 protocol 0x6394e0
[2018/05/04 14:04:49:4048] USER: lws_http_serve: /
[2018/05/04 14:04:49:4048] INFO: lws_read: read_ok, used 96
[2018/05/04 14:04:49:4048] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x235f890
[2018/05/04 14:04:49:4048] INFO: have prev 0x2356e40, setting him to our next (nil)
[2018/05/04 14:04:49:4048] INFO: lws_ensure_user_space: 0x235f890 protocol 0x6394e0
[2018/05/04 14:04:49:4048] INFO: lws_http_transaction_completed: wsi->more_rx_waiting=1
[2018/05/04 14:04:49:4048] INFO: lws_http_transaction_completed: 0x235f890: keep-alive await new transaction
[2018/05/04 14:04:49:5048] INFO: lws_server_socket_service: read 0 len
[2018/05/04 14:04:49:5048] INFO: lws_close_free_wsi: shutting down connection: 0x235f890 (sock 9, state 0)
[2018/05/04 14:04:49:5048] INFO: lws_close_free_wsi: real just_kill_connection: 0x235f890 (sockfd 9)
[2018/05/04 14:04:49:5048] INFO: lws_same_vh_protocol_remove: removing same prot wsi 0x235f890
[2018/05/04 14:04:49:5048] INFO: have prev 0x2356e40, setting him to our next (nil)
[2018/05/04 14:04:49:5049] INFO: ah det due to close
[2018/05/04 14:04:49:5049] INFO: lws_header_table_detach: wsi 0x235f890: ah 0x2355f80 (tsi=0, count = 1)
[2018/05/04 14:04:49:5049] INFO: lws_header_table_detach: wsi 0x235f890: ah 0x2355f80 (tsi=0, count = 0)

@lws-team
Copy link
Member

lws-team commented May 4, 2018

Trying it against master... which you don't mention... there is no problem running the 1 x ah test as much as you like with the minimal netcat... ie, it's good.

If you want a fix for 2.4 I think you understood what that is about when you said

Reverting back to the previous behavior of "goto fail;" here fixes the problem; Although we don't know enough about libwebsockets internals to suggest this as a proper fix.

If you want provide a patch (just paste in a comment is OK) aligning 2.4 to what I did here on master

f66a945

if it fixes it as I assume it'll be super welcome.

@rdumusc
Copy link
Author

rdumusc commented May 4, 2018

Oh wow, indeed it works! That's great news 👍
To be honest I only arrived at the "goto fail;" after some lengthy trial and error... I don't fully understand the patch, but it's small enough that I should be able to backport it nonetheless. I'll give it a try and let you know how it goes.

rdumusc pushed a commit to rdumusc/libwebsockets that referenced this issue May 4, 2018
@rdumusc
Copy link
Author

rdumusc commented May 4, 2018

OK, this patch fixes everything for me: #1265

Also, I think there is a typo here:
f66a945#diff-0e9f12ed857f9caa311eb5dc0e3fc6e9R339

wsi->ws->rx_draining_ext || wsi->ws->rx_draining_ext
should probably be:
wsi->ws->rx_draining_ext || wsi->ws->tx_draining_ext

rdumusc pushed a commit to rdumusc/Rockets that referenced this issue May 4, 2018
Increase the pool size to 1024 for lws < 3.0 as discussed here:
warmcat/libwebsockets#1249
There was no point in restricting the pool size, memory is only
allocated as-needed, and should reach max 5MB. Since lws 3.0 the
default ah limit == fd limit (typically 1024 as well).
rdumusc pushed a commit to rdumusc/Rockets that referenced this issue May 4, 2018
Increase the pool size to 1024 for lws < 3.0 as discussed here:
warmcat/libwebsockets#1249
There was no point in restricting the pool size, memory is only
allocated as-needed, and should reach max 5MB. Since lws 3.0 the
default ah limit == fd limit (typically 1024 as well).
@lws-team
Copy link
Member

lws-team commented May 4, 2018

Oh good find... I modified it accordingly on master.

rdumusc pushed a commit to BlueBrain/Rockets that referenced this issue May 7, 2018
Increase the pool size to 1024 for lws < 3.0 as discussed here:
warmcat/libwebsockets#1249
There was no point in restricting the pool size, memory is only
allocated as-needed, and should reach max 5MB. Since lws 3.0 the
default ah limit == fd limit (typically 1024 as well).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants