Adapter tests #96

Closed
wants to merge 27 commits into
from

Projects

None yet

5 participants

@samwgoldman
Member

Could use some feedback here.

There's a few things going on. I originally wanted to see what was causing this annoying spec wart, realized that it was my fault, and set out to fix it.

The fault lies with this spec. It's not a very helpful spec, but it did catch a weird integration issue.

I decided to just make some better integration specs cover more of the adapter internals. This actually caught one issue with translating headers.

I had to change the production code in order to gracefully shut down the servers after the examples run. There might be a more clever way to do this, which wouldn't require adding an additional method to the mongrel/reel adapters' APIs.

Should I keep going with this?

samwgoldman added some commits Jan 24, 2013
@samwgoldman samwgoldman Extract test resource from rack adapter spec 26aed19
@samwgoldman samwgoldman Test mongral adapter in integration
This actually caught an issue with the way response headers are handled.
Sometimes response headers can be array-valued, which was broken.

I also split the mongrel and reel adapters #run method into two parts,
in order to support graceful shutdown.

This fixes an annoying wart where ports are left open across tests, but
it also adds unnecessary complexity to the adapters.
c836698
@samwgoldman
Member

I can't reproduce the CI failure locally, but I'm pretty sure it's unrelated to my changes.

@seancribbs seancribbs and 1 other commented on an outdated diff Jan 24, 2013
webmachine.gemspec
@@ -21,6 +21,7 @@ Gem::Specification.new do |gem|
gem.add_development_dependency(%q<rake>)
gem.add_development_dependency(%q<rack>)
gem.add_development_dependency(%q<rack-test>)
+ gem.add_development_dependency(%q<httpclient>)
@seancribbs
seancribbs Jan 24, 2013 Member

Where do you use httpclient, and could Net::HTTP be used instead?

@samwgoldman
samwgoldman Jan 24, 2013 Member

I used httpclient to drive the integration test on the mongrel adapter. Yes, I could use Net::HTTP instead.

@seancribbs seancribbs and 1 other commented on an outdated diff Jan 24, 2013
lib/webmachine/adapters/reel.rb
@@ -10,14 +10,17 @@ module Webmachine
module Adapters
class Reel < Adapter
def run
+ trap("INT"){ server.terminate; exit 0 }
@seancribbs
seancribbs Jan 24, 2013 Member

I don't see where #serve is called to start the server.

@samwgoldman
samwgoldman Jan 24, 2013 Member

Yeah this is a bug. I renamed server to serve (which is also a bad name), but didn't update this reference.

samwgoldman added some commits Jan 24, 2013
@samwgoldman samwgoldman Add a shutdown method for adapters
This is useful for integration testing, which would start and stop a
server. In normal operation, the signal handler would perform the
shutdown.
22db035
@samwgoldman samwgoldman Use net/http instead of httpclient 3193349
@samwgoldman
Member

The more I think about this, the more I feel a need for some kind of adapter "lint" shared example group. The example group would spin up a real server for a few important tests. The various response types (string, enum, callable, io) should be covered. Encodings should probably be covered. Various cases of translating the Webmachine Request/Response objects should be covered.

These would be relatively expensive tests, but I think if we choose the test cases wisely, we can gain a lot of confidence from them.

samwgoldman added some commits Jan 24, 2013
@samwgoldman samwgoldman Don't stub adapter internals to check for #to_s/#each interface 8c47393
@samwgoldman samwgoldman Remove unused lets 69688bd
@samwgoldman samwgoldman Remove outdated Rack::Lint compliance check
The behavior of Rack::Lint was changed to better reflect the RFC, so
this test isn't necessary anymore.

rack/rack@3623d04
6b970de
@samwgoldman samwgoldman Wait until the mongrel server is responsive 1d93fd5
@samwgoldman samwgoldman Extract adapter integration tests into a shared example. 0e8360e
@samwgoldman samwgoldman Run/shutdown server around all examples 2026a0c
@samwgoldman samwgoldman Use adapter lint for rack adapter 008e9e0
@samwgoldman samwgoldman Fix IO response bodies for Rack/WEBrick/Mongrel
There were three issues that needed to be resolved.

1. Precedence issue in `IOEncoder#each`
2. IOEncoder needs to define `empty?` if it has a content length
   chunked transfer encoding is not set on IO response bodies the same
   way that it is for enum/proc/fiber, so `ensure_content_length` ends
   up calling `set_content_length`, which calls `empty?`
3. Adapters shouldn't use ChunkedBody unless Transfer-Encoding says so

The fixes I put in are just workarouns. I think that some real thought
should be put into better long term fixes.
2d01994
@samwgoldman samwgoldman Get Reel to pass adapter lint 2091b2a
@samwgoldman
Member

I think I'm at a stopping point with this now. I'd love to get some feedback.

I wanted to get Hatetepe to pass the lint tests, but it looked like it required a bit too much additional code to the adapter which I didn't feel comfortable adding.

There are a number of bugfixes in this branch which are only tested in integration. I didn't write unit tests, because it feels to me like there is a design problem, and I don't want to "bake in" a problematic design with low-level tests.

Unfortunately, the tests seem to be pretty flaky in CI. I haven't been able to reproduce any failures locally, but I don't recommend merging anything until that's cleared up.

@samwgoldman
Member

Any thoughts on this? Am I mistaken or does this work show that adapters are fairly broken in the current release?

@tarcieri
Contributor
tarcieri commented Mar 1, 2013

Let me know if there's any problems with the Reel adapter

@petejohanson petejohanson reopened this Mar 6, 2013
@samwgoldman
Member

@tarcieri 2091b2a shows the only 2 issues I found in the Reel adapter.

The biggest problem among adapters is that Webmachine::Response#body can be a few different things.

  • nil
  • String
  • EnumerableEncoder
  • CallableEncoder
  • FiberEncoder
  • IOEncoder

It seems wrong to hand these types across the Adapter boundary. Web servers can and do have intelligence about handling IO vs String vs Enumerable, but I don't think it's reasonable to expect web servers to handle, say, IOEncoder. That should be handled at the Adapter level IMO.

@tarcieri
Contributor
tarcieri commented Mar 6, 2013

@samwgoldman cool, thanks

@samwgoldman
Member

Welp, I thought I had these tests reliably passing. Seems inevitable that a failure would pop up only when updating the PR. @tarcieri does it look like I'm setting up and tearing down the Reel server correctly?

@tarcieri
Contributor
tarcieri commented Apr 1, 2013

It looks to me like this (@server being nil) should only happen if Webmachine::Adapters::Reel#run is never called

@tarcieri
Contributor
tarcieri commented Apr 1, 2013

@samwgoldman it looks like this is the problematic commit:

samwgoldman@22db035#L6R81

Perhaps the adapter shutdown methods should raise an exception if they're asked to shut down without ever being initialized? And perhaps the test should swallow that exception here...

@samwgoldman
Member

I am concerned that some timing issue is the root cause of that failure, and patching over/ignoring that error will hide real failures.

It's true that @server can be nil if run is not invoked, and I agree that shutdown should handle that exceptional case, but looking at this specific instance it seems run should have been invoked. I admittedly don't have much experience with concurrent programming, so maybe I just can't see the issue.

Maybe another test left a server bound to adptr.configuration.ip and adptr.configuration.port so the test ran against that instead of adptr's server. If that's the case, I wonder why the server was running since the adapter tests are intended to shut down the server after the example group runs.

@tarcieri
Contributor
tarcieri commented Apr 1, 2013

Well, the problem here is an exception is being thrown before the adapter is initialized and the real exception that's probably meaningful to the tests is being swallowed up by the ensure block.

@samwgoldman
Member

Huh. I didn't realize ensure swallowed exceptions. I will see if I can't get that exception to bubble up and try to reproduce.

@tarcieri
Contributor
tarcieri commented Apr 1, 2013

Well, I'm not entirely sure it's actually swallowing an exception in this case. However, if an exception is being raised, and your ensure block crashes, the exception you get is from the ensure block, and the original exception is obscured.

@samwgoldman
Member

Aha. Good catch. I'm sorting this out now.

@tarcieri
Contributor
tarcieri commented Apr 1, 2013

As a hack, you can change https://github.com/samwgoldman/webmachine-ruby/blob/a777ea1d6f0c02f0f3d8cf1f1f8438d32289cf55/spec/webmachine/adapters/reel_spec.rb#L71 to be:

adptr.shutdown rescue nil

This will prevent failed shutdown from obscuring other exceptions.

Ideally instead of rescue nil the shutdown exception should be logged in some form.

@samwgoldman
Member

For now I've already started with this commit.

Once this is sorted, I can replace that with some kind of NoServerRunning exception, and handle it as you suggest.

@samwgoldman
Member

Okay, I have managed to break the tests, but I am still not sure about the root cause.

If I run the Rack (uses WEBrick), WEBrick, and Reel specs with JRuby, occasionally the tests will just hang. The output looks like this:

Webmachine::Adapters::Reel
  it should behave like adapter_lint
    handles missing pages
    handles empty response bodies
    handles enumerable response bodies
    handles fiber response bodies
    provides a string-like request body
    handles io response bodies
[2013-04-01 11:58:44] INFO  going to shutdown ...
[2013-04-01 11:58:44] INFO  Webmachine::Adapters::WEBrick::Server#start done.
[2013-04-01 11:58:44] INFO  going to shutdown ...
[2013-04-01 11:58:44] INFO  WEBrick::HTTPServer#start done.
    provides an enumerable request body (FAILED - 1)

When the adapter gets shut down, servers don't necessarily block until all listeners are cleaned up. Whenever the test fails, I can see that WEBrick is in the process of shutting down.

My thinking is to join the server thread after the example group runs. By joining the server thread, WEBrick is guaranteed to be cleaned up before the next example group runs. This works for every adapter except Reel, which can't join because it's sleeping.

Can we avoid the sleep in the Reel adapter?

@tarcieri
Contributor
tarcieri commented Apr 1, 2013

Not really: Reel runs in its own actor, and its lifecycle is managed by a Celluloid supervisor.

You can join to Reel (or its supervisor) after it shuts down using Celluloid.join(actor)

@samwgoldman
Member

I was wrong about WEBrick being related. Sometimes the Reel adapter spec hangs when run by itself. Perhaps this is related to the Celluloid::TaskThread workaround for JRuby 1.7.3? You did say it was experimental. Maybe I should just disable Reel for 1.7.3 instead.

@tarcieri
Contributor
tarcieri commented Apr 1, 2013

I can try to debug where it's hanging tonight

samwgoldman added some commits Apr 1, 2013
@samwgoldman samwgoldman Call stop from Mongrel interrupt handler
The `shutdown` method isn't actually in scope because of how Mongrel
evaluates the configurator block. I could have passed the method into
the configurator as a proc in the options hash (as dispatcher is) but
this is simple, too.
b5e3e3f
@samwgoldman samwgoldman Exit from sleep in Reel adapter's interrupt handler 66ffee3
@tarcieri
Contributor
tarcieri commented Apr 2, 2013

Oh god @ 66ffee3, are signals causing the problem somehow? o_O

@samwgoldman
Member

Nah, that signal handler is never called from test. I just realized that I need to exit when doing a manual smoke test.

@tarcieri
Contributor
tarcieri commented Apr 2, 2013

@samwgoldman I have an idea for a better fix. I can PR it separately from your PR if you like.

Your version is fine but somewhat breaks the shutdown semantics, but they were broken anyway by Celluloid 0.13, and webmachine is in need of an update ;)

@samwgoldman
Member

Cool. It's probably a good idea to put the fix on top of master in a separate PR. I'll keep an eye out and incorporate the fix into this branch as well.

@seancribbs
Member

Let me just say that I love that you guys are fixing my buggy code! Keep up the good work.

Sean Cribbs

On Apr 1, 2013, at 8:57 PM, Sam Goldman notifications@github.com wrote:

Cool. It's probably a good idea to put the fix on top of master in a separate PR. I'll keep an eye out and incorporate the fix into this branch as well.


Reply to this email directly or view it on GitHub.

@samwgoldman
Member

@tarcieri I should have been more clear. If the fix is simple to apply to master then it's probably best there. I wouldn't want a good fix to get tied up in all the complexity of these changes. If the fix is easier to apply here, then feel free to work directly on this branch. I have given you commit so have at it.

@tarcieri
Contributor
tarcieri commented Apr 3, 2013

Hmm, not sure this is helping, so the fix is simply:

require 'celluloid/autostart'

I made Reel do this automatically on master. I'm not sure Celluloid's shutdown is actually working though, at least during the tests.

@seancribbs
Member

Not sure it's related, but somehow the Gemfile is structured such that Reel is not included on MRI 2.0. @tarcieri thoughts?

@tarcieri
Contributor
tarcieri commented Apr 3, 2013

Oh, probably the :platform needs to be updated. There's also the issue of the whiny source :rubygems

@seancribbs
Member

@tarcieri Yeah, saw that too. 👎

@samwgoldman
Member

@tarcieri Have you been able to reproduce the hang? I just ran the Reel spec with JRuby 1.7.3 in a loop. Takes anywhere from 1-20 runs on my machine. I tried looking at a thread dump in VisualVM, but this is all pretty far out of my wheelhouse.

The bundler configuration is unrelated to this branch (unless allowing those builds to continue uncovers more adapter bugs), but it should be fixed. I think another branch would be best to keep this work at least a little focused.

@tarcieri
Contributor
tarcieri commented Apr 3, 2013

@samwgoldman haven't tried your branch yet. I wasn't able to reproduce it on master

@samwgoldman
Member

If I update to celluloid-io 0.13.1, I can not reproduce the hangs I was seeing before. Yay?

It still annoys me that some servers (webrick) don't fully shutdown before the next adapter's test start. I would like to join on the server thread after the example group runs, but I had issues with the Reel adapter because it sleeps instead of blocks. This commit shows one way we can work around it. Is this an acceptable kind of thing to have in the adapter?

@seancribbs
Member

@samwgoldman Yes and no. I think it should be the adapter's job to join any necessary threads after the traps/etc are setup, but if that's not able to be consistent, waiting on a thread just in the test seems ok.

@samwgoldman
Member

@seancribbs Sorry, I don't understand what you mean. Each adapter's start method does join on listener threads, or block otherwise (sleep in the case of Reel). The issue is in the shutdown method.

It seems that, while WEBrick provides it's own shutdown method, this method does not block until all listeners are closed. The listeners are opaque and we can't get at them. This same issue also affects the Rack adapter when using its WEBrick handler, and potentially other backends.

All of this is a non-issue in a production setting because shutdown would normally happen on exit.

I have looked into WEBrick and Rack's source quite a bit by now and I don't see how we can write a shutdown method with the desired semantics. That's why I join on the server thread in test: to guarantee all listeners are closed before the next test runs.

Like I said, I probably misunderstand you. I'm not thrilled with the code right now myself, so I definitely want to get on the same page.

@seancribbs
Member

@samwgoldman Re-ran this again (after rebasing over master and resolving a bunch of conflicts). No problems on 1.9, 2.0, but JRuby 1.7.4 wets the bed on Fiber streams on all three adapters. I'd really love to see this make it into 1.2.0. Here's the output on JRuby:

Failures:

  1) Webmachine::Adapters::WEBrick it should behave like adapter_lint handles fiber response bodies
     Failure/Error: response.body.should eq("Fiber response body")

       expected: "Fiber response body"
            got: "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0//EN\">\n<HTML>\n  <HEAD><TITLE>Internal Server Error</TITLE></HEAD>\n  <BODY>\n    <H1>Internal Server Error</H1>\n    \n    <HR>\n    <ADDRESS>\n     WEBrick/1.3.1 (Ruby/1.9.3/2013-05-16) at\n     0.0.0.0:65263\n    </ADDRESS>\n  </BODY>\n</HTML>\n"

       (compared using ==)

       Diff:
       @@ -1,2 +1,14 @@
       -Fiber response body
       +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
       +<HTML>
       +  <HEAD><TITLE>Internal Server Error</TITLE></HEAD>
       +  <BODY>
       +    <H1>Internal Server Error</H1>
       +    
       +    <HR>
       +    <ADDRESS>
       +     WEBrick/1.3.1 (Ruby/1.9.3/2013-05-16) at
       +     0.0.0.0:65263
       +    </ADDRESS>
       +  </BODY>
       +</HTML>
     Shared Example Group: "adapter_lint" called from ./spec/webmachine/adapters/webrick_spec.rb:5
     # ./spec/support/adapter_lint.rb:106:in `(root)'

  2) Webmachine::Adapters::Reel it should behave like adapter_lint handles fiber response bodies
     Failure/Error: response = client.request(request)
     EOFError:
       End of file reached
     Shared Example Group: "adapter_lint" called from ./spec/webmachine/adapters/reel_spec.rb:6
     # ./spec/support/adapter_lint.rb:104:in `(root)'

  3) Webmachine::Adapters::Rack it should behave like adapter_lint handles fiber response bodies
     Failure/Error: response = client.request(request)
     EOFError:
       End of file reached
     Shared Example Group: "adapter_lint" called from ./spec/webmachine/adapters/rack_spec.rb:7
     # ./spec/support/adapter_lint.rb:104:in `(root)'

@samwgoldman
Member

Thanks for checking the status of this branch. I will take a look soon to see if I can resolve those errors.

I'm still a bit concerned about flakiness. Last time I was in here I had a really hard time getting these tests to pass consistently on CI.

Also not sure about the changes I made to the Reel adapter. It would be nice to get a sign off from the original adapter authors.

Hatetepe adapter is not linted, either.

I will make some time to look into this tomorrow evening.

@tarcieri
Contributor

@samwgoldman it'd be nice to actually address the problems with Reel. It seems like your changes will break streaming responses?

samwgoldman added some commits Jul 28, 2013
@samwgoldman samwgoldman Fix fiber response bodies in JRuby 1.7.4
The fiber response body adapter test failed for Reel, WEBrick, and Rack
(WEBrick backend) adapters.

I don't know why, but replacing the 3rd Fiber.yield with a simple return
value makes the fiber response body adapter tests pass.

The closest I got to finding the root cause was an exception handled in
WEBrick, which is passed to the server's logger's `error` method. The
exception is a java.lang.NullPointerException and the backtrace doesn't
include any lines from webmachine or the standard library.
5ca7f20
@samwgoldman samwgoldman Wait for server thread to finish
This was previously not possible because I didn't know how to wait for
the Reel server to terminate. This means that servers should never be
running at the same time, so we no longer need the any_available_port
hack.
c6de140
@samwgoldman
Member

I managed to get fiber response tests passing on JRuby 1.7.4 in 5ca7f20.

c6de140 was actually a mis-step, which works well in JRuby but causes the Reel adapter specs to hang in other rubies.

At this point, I don't think this work is ready for the next release. Even if I can get the tests to pass consistently on my machine, I can't shake their occasionally failing on CI. The tests integrate so many parts that failures are a nightmare to track down.

The goal was to cover the adapters in integration with the dispatcher and the underlying web server. I still think that is a worthwhile goal (it uncovered quite a few bugs), but this approach is unmaintainable, at least by me.

@tarcieri What changes are you referring to, specifically? Do you mean the Callable->Enumerable conversion in the Reel adapter? If so, then yeah, you can't stream Callables with the Reel adapter. I would absolutely prefer a solution that permits streaming.

@samwgoldman
Member

For what it's worth, disabling the Reel adapter spec gets most CI tests passing. Only rbx1.8 and jruby1.8 still fail. The failure mode is bad: the fiber response body test hangs. I suspect the culprit is the "poor man's fiber" implementation being incompatible with rbx/jruby in 1.8 mode.

@tarcieri
Contributor

Radical proposal: abandon all 1.8 support. 1.8 is EOL. Can we move on please? This isn't Python ;)

@ghost
ghost commented Jul 29, 2013

I wouldn't miss 1.8 support.

@seancribbs
Member

@robgleeson @tarcieri @samwgoldman I'm ok with that.

@bernd
Contributor
bernd commented Jul 29, 2013

I'm okay with dropping 1.8 support as well. But I think we should do a major version bump then.

@seancribbs
Member

Closing in favor of #114

@seancribbs seancribbs closed this Aug 10, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment