Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't report RDHUP as HUP #939

Merged
merged 12 commits into from May 15, 2019
Merged

Don't report RDHUP as HUP #939

merged 12 commits into from May 15, 2019

Conversation

carllerche
Copy link
Member

@carllerche carllerche commented May 13, 2019

Continuation of #933, adding a test and a fix for kqueue.

This should also fix tokio-rs/tokio#449.

@carllerche carllerche changed the base branch from master to v0.6.x May 13, 2019 19:28
carllerche and others added 7 commits May 13, 2019 17:12
Currently, EPOLLRDHUP sets UnixReady::hup(). This is incorrect behavior
because higher-level libraries like tokio (correctly) assume that
UnixReady::hup() is unclearable since it signals that both the read and
write halfs are shutdown. In reality, EPOLLRDHUP only means that the TCP
stream has been half-closed and such a half-closed stream can still be
written to.

This will fix a current issue with tokio, which is that tokio infinitely
attempts to write to a half-closed socket that's returning WouldBlock
when it's write buffer is full, an issue which manifests with excessive
CPU usage. I think this may help some of the issues discussed in
tokio-rs/tokio#449

After this change, EOF will still be propagated correctly, because
read-hangups also trigger read-readiness via EPOLLIN. However, if
handling of EPOLLRDHUP is desired to be retained, I believe it should be
implemented as another readiness kind on UnixReady, perhaps
UnixReady::read_hup().

Possible concern of a breaking change: Since it's not currently possible
for a user of mio to differentiate between EPOLLHUP and EPOLLRDHUP, it
must be that no users of mio currently are. There _may_ be applications
that test the "health" of a socket by checking for UnixRead::hup(),
which would previously trigger on EPOLLRDHUP but will no longer with
this change. This will change such applications from considering a
half-closed connection as closed to considering it open. However, I
still beleive this change is a correction of the semantics of HUP and
the desired behavior such applications was already ambiguous.

Note: if this is an agreed upon change for epoll, I think a similar
change is in order for kqueue. I _think_ this would be to only set
UnixReady::hup() if (e.filter == libc::EVFILT_READ && e.flags &
libc::EV_EOF != 0), but I will leave that change to someone more
knowledgeable with kqueue.
@carllerche carllerche changed the title Add TCP shutdown test Don't report RDHUP as HUP May 14, 2019
@carllerche carllerche marked this pull request as ready for review May 14, 2019 16:24
@carllerche
Copy link
Member Author

cc @behrat

test/test_tcp_shutdown.rs Show resolved Hide resolved
src/sys/unix/kqueue.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@Thomasdezeeuw Thomasdezeeuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think we should add documentation/comment in either Ready::hup or the change log to indicate that cross platform HUP might behaviour unexpectedly and link to #933 and #941.

src/sys/unix/kqueue.rs Show resolved Hide resolved
@@ -141,11 +141,6 @@ fn ioevent_to_epoll(interest: Ready, opts: PollOpt) -> u32 {
kind |= EPOLLOUT;
}

if UnixReady::from(interest).is_hup() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means passing HUP readiness is a no-op on epoll platforms right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, HUP is an implicit interest w/ epoll (you always get it). The error here was requesting RDHUP which means something different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right EPOLLHUP is implicit but it won't be possible to get EPOLLRDHUP anymore right?

@carllerche
Copy link
Member Author

@Thomasdezeeuw There already is a warning:

/// **Note that only readable and writable readiness is guaranteed to be
/// supported on all platforms**. This means that `hup` readiness
/// should be treated as a hint. For more details, see [readiness] in the
/// poll documentation.

I can make it stronger though.

@carllerche
Copy link
Member Author

@Thomasdezeeuw feedback should be addressed.

@SerhoLiu
Copy link

on kqueue, only set HUP when EV_EOF is received with the EVFILT_WRITE filter.

This did not fix tokio-rs/tokio#449. tokio-reactor (https://github.com/tokio-rs/tokio/blob/tokio-reactor-0.1.9/tokio-reactor/src/lib.rs#L428) will notify read task, but read still returnWouldBlock. In other words, EV_EOF with EVFILT_WRITE does not indicate EVFILT_READ.

@carllerche
Copy link
Member Author

@SerhoLiu I believe it does, EV_EOF with EVFILT_WRITE will only trigger (for TCP) when the socket is fully disconnected, in which case the read half will be ready as well.

@carllerche
Copy link
Member Author

@SerhoLiu If I am incorrect, could you submit a test case that demonstrates the bug after this PR is applied?

@carllerche carllerche deleted the rd-hup-fix branch May 15, 2019 20:04
olix0r added a commit to linkerd/linkerd2-proxy that referenced this pull request May 16, 2019
olix0r added a commit to linkerd/linkerd2-proxy that referenced this pull request May 16, 2019
olix0r added a commit to linkerd/linkerd2 that referenced this pull request May 16, 2019
commit b27dfb2d21aa8ca5466ea0edce17d27094ace7c1
Author: Takanori Ishibashi <takanori.1112@gmail.com>
Date:   Wed May 15 05:58:42 2019 +0900

    updaes->updates (#250)

    Signed-off-by: Takanori Ishibashi <takanori.1112@gmail.com>

commit 16441c25a9d423a6ab12b689b830d9ae3798fa00
Author: Eliza Weisman <eliza@buoyant.io>
Date:   Tue May 14 14:40:03 2019 -0700

     Pass router::Config directly to router::Layer (#253)

    Currently, router `layer`s are constructed with a single argument, a
    type implementing `Recognize`. Then, the entire router stack is built
    with a `router::Config`. However, in #248, it became necessary to
    provide the config up front when constructing the `router::layer`, as
    the layer is used in a fallback layer. Rather than providing a separate
    type for a preconfigured layer, @olix0r suggested we simply change all
    router layers to accept the `Config` when they're constructed (see
    linkerd/linkerd2-proxy#248 (comment)).

    This branch changes `router::Layer` to accept the config up front. The
    `router::Stack` types `make` function now requires no arguments, and the
    implementation of `Service` for `Stack` can be called with any `T` (as
    the target is now ignored).

    Signed-off-by: Eliza Weisman <eliza@buoyant.io>

commit b70c68d4504a362eac6a7828039a2e5c7fcd308a
Author: Eliza Weisman <eliza@buoyant.io>
Date:   Wed May 15 13:14:04 2019 -0700

    Load balancers fall back to ORIG_DST when no endpoints exist (#248)

    Currently, when no endpoints exist in the load balancer for a
    destination, we fail the request. This is because we expect endpoints to
    be discovered by both destination service queries _and_ DNS lookups, so
    if there are no endpoints for a destination, it is assumed to not exist.

    In #2661, we intend to remove the DNS lookup from the
    proxy and instead fall back to routing requests for which no endpoints
    exist in the destination service to their SO_ORIGINAL_DST IP address.
    This means that the current approach of failing requests when the load
    balancer has no endpoints will no longer work.

    This branch introduces a generic `fallback` layer, which composes a
    primary and secondary service builder into a new layer. The primary
    service can fail requests with an error type that propages the original
    request, allowing the fallback middleware to call the fallback service
    with the same request. Other errors returned by the primary service are
    still propagated upstream.

    In contrast to the approach used in #240, this fallback middleware is
    generic and not tied directly to a load balancer or a router, and can
    be used for other purposes in the future. It relies on the router cache
    eviction added in #247 to drain the router when it is not being used,
    rather than proactively destroying the router when endpoints are
    available for the lb, and re-creating it when they exist again.

    A new trait, `HasEndpointStatus`, is added in order to allow the
    discovery lookup to communicate the "no endpoints" state to the
    balancer. In addition, we add a new `Update::NoEndpoints` variant to
    `proxy::resolve::Update`, so that when the control plane sends a no
    endpoints update, we switch from the balancer to the no endpoints state
    _immediately_, rather than waiting for all the endpoints to be
    individually removed. When the balancer has no endpoints, it fails all
    requests with a fallback error, so that the fallback middleware

    A subsequent PR (#248) will remove the DNS lookups from the discovery
    module.

    Closes #240.

    Signed-off-by: Eliza Weisman <eliza@buoyant.io>

commit 6525b0638ad18e74510f3156269e0613f237e2f5
Author: Zahari Dichev <zaharidichev@gmail.com>
Date:   Wed May 15 23:35:09 2019 +0300

    Allow disabling tap by setting an env var (#252)

    This PR fixes #2811. Now if
    `LINKERD2_PROXY_TAP_DISABLED` is set, the tap is not served at all. The
    approach taken is that  the `ProxyParts` is changed so the
    `control_listener` is now an `Option` that will be None if tap is
    disabled as this control_listener seems to be exclusively used to serve
    the tap. Feel free to suggest a better approach.

    Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

commit 91f32db2ea6d74470fd689c713ff87dc7586222d
Author: Zahari Dichev <zaharidichev@gmail.com>
Date:   Thu May 16 00:45:23 2019 +0300

    Assert that outbound TLS works before identity is certified (#251)

    This commit introduces TLS capabilities to the support server as well as
    tests to ensure that outbound TLS works even when there is no verified
    certificate for the proxy yet.

    Fixes #2599

    Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

commit 45aadc6b1b28e6daea0c40e694a86ae518887d85
Author: Sean McArthur <sean@buoyant.io>
Date:   Wed May 15 14:25:39 2019 -0700

    Update h2 to v0.1.19

    Includes a couple HPACK fixes

    Signed-off-by: Sean McArthur <sean@buoyant.io>

commit 3e0e00c6dfbf5a9155b887cfd594f611edfc135f
Author: Oliver Gould <ver@buoyant.io>
Date:   Thu May 16 08:11:06 2019 -0700

    Update mio to 0.6.17 (#257)

    To pick up tokio-rs/mio#939
olix0r added a commit to linkerd/linkerd2 that referenced this pull request May 16, 2019
commit b27dfb2d21aa8ca5466ea0edce17d27094ace7c1
Author: Takanori Ishibashi <takanori.1112@gmail.com>
Date:   Wed May 15 05:58:42 2019 +0900

    updaes->updates (#250)

    Signed-off-by: Takanori Ishibashi <takanori.1112@gmail.com>

commit 16441c25a9d423a6ab12b689b830d9ae3798fa00
Author: Eliza Weisman <eliza@buoyant.io>
Date:   Tue May 14 14:40:03 2019 -0700

     Pass router::Config directly to router::Layer (#253)

    Currently, router `layer`s are constructed with a single argument, a
    type implementing `Recognize`. Then, the entire router stack is built
    with a `router::Config`. However, in #248, it became necessary to
    provide the config up front when constructing the `router::layer`, as
    the layer is used in a fallback layer. Rather than providing a separate
    type for a preconfigured layer, @olix0r suggested we simply change all
    router layers to accept the `Config` when they're constructed (see
    linkerd/linkerd2-proxy#248 (comment)).

    This branch changes `router::Layer` to accept the config up front. The
    `router::Stack` types `make` function now requires no arguments, and the
    implementation of `Service` for `Stack` can be called with any `T` (as
    the target is now ignored).

    Signed-off-by: Eliza Weisman <eliza@buoyant.io>

commit b70c68d4504a362eac6a7828039a2e5c7fcd308a
Author: Eliza Weisman <eliza@buoyant.io>
Date:   Wed May 15 13:14:04 2019 -0700

    Load balancers fall back to ORIG_DST when no endpoints exist (#248)

    Currently, when no endpoints exist in the load balancer for a
    destination, we fail the request. This is because we expect endpoints to
    be discovered by both destination service queries _and_ DNS lookups, so
    if there are no endpoints for a destination, it is assumed to not exist.

    In #2661, we intend to remove the DNS lookup from the
    proxy and instead fall back to routing requests for which no endpoints
    exist in the destination service to their SO_ORIGINAL_DST IP address.
    This means that the current approach of failing requests when the load
    balancer has no endpoints will no longer work.

    This branch introduces a generic `fallback` layer, which composes a
    primary and secondary service builder into a new layer. The primary
    service can fail requests with an error type that propages the original
    request, allowing the fallback middleware to call the fallback service
    with the same request. Other errors returned by the primary service are
    still propagated upstream.

    In contrast to the approach used in #240, this fallback middleware is
    generic and not tied directly to a load balancer or a router, and can
    be used for other purposes in the future. It relies on the router cache
    eviction added in #247 to drain the router when it is not being used,
    rather than proactively destroying the router when endpoints are
    available for the lb, and re-creating it when they exist again.

    A new trait, `HasEndpointStatus`, is added in order to allow the
    discovery lookup to communicate the "no endpoints" state to the
    balancer. In addition, we add a new `Update::NoEndpoints` variant to
    `proxy::resolve::Update`, so that when the control plane sends a no
    endpoints update, we switch from the balancer to the no endpoints state
    _immediately_, rather than waiting for all the endpoints to be
    individually removed. When the balancer has no endpoints, it fails all
    requests with a fallback error, so that the fallback middleware

    A subsequent PR (#248) will remove the DNS lookups from the discovery
    module.

    Closes #240.

    Signed-off-by: Eliza Weisman <eliza@buoyant.io>

commit 6525b0638ad18e74510f3156269e0613f237e2f5
Author: Zahari Dichev <zaharidichev@gmail.com>
Date:   Wed May 15 23:35:09 2019 +0300

    Allow disabling tap by setting an env var (#252)

    This PR fixes #2811. Now if
    `LINKERD2_PROXY_TAP_DISABLED` is set, the tap is not served at all. The
    approach taken is that  the `ProxyParts` is changed so the
    `control_listener` is now an `Option` that will be None if tap is
    disabled as this control_listener seems to be exclusively used to serve
    the tap. Feel free to suggest a better approach.

    Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

commit 91f32db2ea6d74470fd689c713ff87dc7586222d
Author: Zahari Dichev <zaharidichev@gmail.com>
Date:   Thu May 16 00:45:23 2019 +0300

    Assert that outbound TLS works before identity is certified (#251)

    This commit introduces TLS capabilities to the support server as well as
    tests to ensure that outbound TLS works even when there is no verified
    certificate for the proxy yet.

    Fixes #2599

    Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

commit 45aadc6b1b28e6daea0c40e694a86ae518887d85
Author: Sean McArthur <sean@buoyant.io>
Date:   Wed May 15 14:25:39 2019 -0700

    Update h2 to v0.1.19

    Includes a couple HPACK fixes

    Signed-off-by: Sean McArthur <sean@buoyant.io>

commit 3e0e00c6dfbf5a9155b887cfd594f611edfc135f
Author: Oliver Gould <ver@buoyant.io>
Date:   Thu May 16 08:11:06 2019 -0700

    Update mio to 0.6.17 (#257)

    To pick up tokio-rs/mio#939
panthervis added a commit to panthervis/linkerd2-proxy that referenced this pull request Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants