Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider promoting DirectService to Service #136

Closed
carllerche opened this issue Dec 13, 2018 · 12 comments
Closed

Consider promoting DirectService to Service #136

carllerche opened this issue Dec 13, 2018 · 12 comments
Assignees

Comments

@carllerche
Copy link
Member

carllerche commented Dec 13, 2018

The DirectService trait (introduced in #118) is an alternate version of Service that is being experimented with. It provides the necessary hooks to "drive" a service's internal state machine forward. This allows avoiding to have to spawn new tasks to power service implementations.

We should consider whether or not this trait should be promoted to be the primary Service trait.

Driven vs. buffered

When introducing DirectService, we extensively discussed whether or not the functionality should just be provided by Service. We concluded that it should not. The primary reason is that middleware is not always able to drive the inner service. For example, a router cannot effectively call poll_service as needed. Because of this, there are service implementations that driven (require poll_service to be called) and there are service implementations that are buffered (requests get dispatched on a channel to another task). These services do not require poll_service to be called. Initially, we could not figure out a good way to implement middleware that could indicate which kind of service it wanted, so we opted for two completely separate traits.

It occurred to me that we could combine DirectService with Service and use Clone as an indicator of whether or not the service is driven vs. buffered. The router must already accept T: Service + Clone. The strategy taken is, when a request comes in, the request is routed to an inner T: Service and that T is cloned into the router's response future. T::poll_ready is called by the response future. Services that are unable to call poll_ready will all most likely take a T: Service + Clone.

So, the proposal is to make Service what DirectService is today. Service implementations like Buffer would have no-op poll_service and poll_close implementations.

Complexity

The other con is that implementing Service becomes more complicated. This is a real drawback, but I believe it can be mitigated by providing utilities for common patterns:

  • service_fn for easily defining leaf services.
  • Combinators and other utilities for common middleware patterns.

For example, basic logging could be implemented as:

service_fn(|request| process(request))
    .before(|request| println!("request = {:?}", request))
    .after(|response| println!("response = {:?}", response))

And more complicated:

service_fn(|request| ...)
    .around(|request, next| {
        let request = request.clone();

        next.call(request)
            .inspect(move |response| {
                println!("request = {:?}; response={:?}", request, response);
            })
    })

Roadmap

The promotion could be done in a few steps.

  1. Rename DirectService -> Service but keep it in the tower-direct-service crate.
  2. Update middleware to use tower_direct_service::Service instead of tower_service::Service
  3. Validate the API in applications like linkerd and noria.
  4. Replace tower_service::Service with tower_direct_service::Service and release tower-service 0.3.

Refs #137

@carllerche
Copy link
Member Author

cc/ @seanmonstar @jonhoo @olix0r

@jonhoo
Copy link
Collaborator

jonhoo commented Dec 13, 2018

Do we have other examples of services that cannot drive an inner Service? I'm a little worried that the Clone fix is overly specialized to router. And in the general case, I don't think we want to have to clone things all over the place.

As a secondary concern, I don't like Clone having an implied secondary meaning of "don't bother to call poll_service". That seems super brittle. Maybe we could introduce an empty trait that derives from DirectService + Clone, but even that seems hard to enforce. I can totally think of DirectService implementations that are Clone, but still need to have poll_service called. Specifically, this would be because they're Clone for some other reason, like if you want multiple independent copies of a service.

@carllerche
Copy link
Member Author

To be clear, poll_service and others are still called, but they are called on the cloned handle in the Response future. This is already the strategy used with poll_ready.

If you know cases in which this strategy does not work, let’s discuss them.

@jonhoo
Copy link
Collaborator

jonhoo commented Dec 13, 2018

That implies that you can poll_service through any of the cloned handles to the service? I don't have any DirectService where that would work.

My concern is more the opportunity for accidental misuse here. For example, consider tokio_tower::pipeline::Client. It would be totally reasonable for it to be Clone if T: Clone to make it easy to, say, set up multiple mock clients. However, each clone would be an independent Client. That now implements DirectService + Clone, but if you were to call, then Clone it, and then poll_service your clone, that poll_service wouldn't actually poll the instance that is servicing your request. Tying "whether or not you implement Clone" to "it's safe to clone and poll_service just the clone" seems super odd to me.

I also think it's unfortunate to expose poll_service to clients all the time, making them think about whether or not they have to keep calling it. In some sense, the point of Service now is to expose a service where the polling is taken care of for you, so you just need to worry about call and whether or not it is ready (with poll_ready).

@seanmonstar
Copy link
Collaborator

On a tangential note, I think the current contract of "you must call poll_ready before call" should be loosened. If you have a Service outside a task context, the contract can't be fulfilled. I think it's still fine that a Service::call return an error if it wasn't ready. Some of our existing Services should be changed to DirectService that currently require calling poll_ready to drive work forward. Things like Reconnect, Balance, etc...

@carllerche
Copy link
Member Author

I think it's still fine that a Service::call return an error if it wasn't ready

That's probably what should happen. I don't think a panic would be good.

@carllerche
Copy link
Member Author

DirectService complexity

The way I'm thinking about it, there are 4 main ways to use a T: Service.

  • Server implementation (the thing that holds a socket and a T: Service and dispatches received requests to the T: Service).
  • Clients, where a user holds a handle to a T: Service and issues requests and waits for the response.
  • "Leaf" services, which implement the service logic. For example, an "echo" service would just receive the request and respond with the same value. An HTTP "leaf" service would respond however (render a template?).
  • Middleware, which are T: Service implementations that are implemented in terms of other T: Service values.

Within the "client" category, I think there are a few categories of clients.

  • Lower level proxy type things, such as linkerd. Here, I believe that we pretty much always want access to the details of DirectService. We are still experimenting with integrating DirectService, but so far I believe it is going well (cc @olix0r)

  • "Applications" (for a lack of a better word), which would be long lived processes that initialize a client up front, then use the same client to respond to received requests. For example, a web app that needs to issue a request to a database. The DB connection should be ready before web requests are received.

  • "Tools" (again, for a lack of a better word), which are short lived processes (think CLI tools) that create the connection, use it, then discard it.

In both the "application" and the "tool" case, I don't think users should have to manage poll_service directly, but I also don't think that they should use T: Service directly either.

The way I would hope a hyper client would shape up to be is roughly something like:

struct Client<T = hyper::client::Service> {
    service: Buffer<T>,
}

// tower_http::Service is a "trait alias" for an HTTP Service
impl<T: tower_http::Service> Client<T> {
    /// Issue an HTTP GET request
    pub fn get(&self, path: &str) -> GetFuture {
        // ....
    }
}

Where hyper::client::Service is some type that implements DirectService (promoted to top level Service). The key here is most users then don't have to worry about calling poll_ready directly, nor do they have to worry about the Service trait at all, they would use an ergonomic shim provided by hyper over a T: Service. This would also allow users to bring their own T: tower_http::Service and pass it to hyper::Client. For example, a mock service :)

This pattern would extend beyond hyper, and in general client implementations would provide a shim like this to make the simple cases simple. Then, projects like linkerd2-proxy could use hyper::client::Service directly and manage all the details of poll_service and co.

Service + Clone

The idea here is not that a T: Service + Clone means that you don't have to call poll_service. It means that you do call poll_service, but you call it in the response future. So, Clone means "this is how you get a service handle for one off requests. This also means that you need to call poll_close`, etc... before completing the response future, but in theory it would be very cheap (most likely a no-op).

Your example with Clone returning independent clients should still work here because the intent would be to manage the entire Service lifecycle on a per request basis. So, poll_service and poll_close are still called.

@jonhoo
Copy link
Collaborator

jonhoo commented Dec 13, 2018

What you say all makes sense. However, my concern is that we are overloading Clone to mean "another handle to the same Service". While that is true for things like Buffer, it is not true for something like Client<MockTransport>. If you clone that, you'll get a new Client that is entirely independent of the old one (including with a dedicated transport). If you sent/received on the original MockTransport handle you had, it wouldn't become visible to the cloned Client<MockTransport> (well, depending on how the hypothetical MockTransport is implemented). It's also not entirely clear how to change Client to have a Clone impl that shares a handle to it (and allows concurrent poll_service calls) except by introducing a Buffer?

@jonhoo
Copy link
Collaborator

jonhoo commented Dec 13, 2018

Following much discussion, some perhaps more refined thoughts:

Tower assumes that if it clones a Service, that clone is fine to use in place of the original. That using either Service instance is fine (as long as you poll the same one you called on of course). To me, Clone does not carry that implication — just because you can clone something doesn't mean that the clone is "just as good" as the original. Specifically, it seems reasonable to me to have a MyService that internally contains a connection of some sort, and which, on Clone, creates a new MyService that does not have an established connection. The clone would be a sort of shallow copy of MyServicehat must be "set up" before being used. Such a clone implementation makes it easy to write code that establishes a bunch of services that are all configured the same, but have dedicated connections (for load balancing for example). While it may be unorthodox, I do not believe it to be unreasonable. However, Tower assumes that if a Service is Clone, the clone is a full-fledged Service that can be used instead of the original without any further action. Yet this is not expressed in the type Service + Clone.

I'm not necessarily arguing for Clone not being the right trait to use here. While a new trait wrapping Clone with this added assumption would be possible, I'm also totally open to the services just explicitly documenting exactly what they expect when they clone the Service.

Another tangentially related point that came up was that Tower assumes that cloning a Service is cheap. This should definitely be documented somewhere.

@carllerche
Copy link
Member Author

I updated the main issue with a suggested roadmap to implement this change while validating it along the way:

  1. Rename DirectService -> Service but keep it in the tower-direct-service crate.
  2. Update middleware to use tower_direct_service::Service instead of tower_service::Service
  3. Validate the API in applications like linkerd and noria.
  4. Replace tower_service::Service with tower_direct_service::Service and release tower-service 0.3.

@carllerche
Copy link
Member Author

The next steps are to make the change across the tower stack in a branch. This is the same process we took for Service<Request> v0.2. I assigned @hawkw to the issue.

@carllerche
Copy link
Member Author

Closing as this avenue is no longer being considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants