New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent but reproducible full-run failures (specs pass when run individually) #1715

Closed
jasonfb opened this Issue Jun 28, 2016 · 15 comments

Comments

Projects
None yet
3 participants
@jasonfb

jasonfb commented Jun 28, 2016

Capybara 2.7.0 and 2.7.1 introduced some unbelievably non-deterministic timing failures (very frustrating) into our test suite.

When a single spec is run, it passes (consistently). When run with other feature specs locally or on CI, it fails (consistently fails in 1 place when entire suite is run)

As far as I can tell, this behavior is not present on Capybara 2.6.2

This seems like a pretty serious regression to me. We believe that whatever mechanism was introduced to solve timing delays is actually making things worse.

the actual failure looks like premature truncation (We're using database cleaner). Specifically, we think that not all requests are finished before DB cleaner starts cleaning, because we see "Record Not found" errors.

As this is a non-deterministic failure that requires that I re-run my entire test suite to see the problem, it is extraordinarily difficult to do trial & error here.

@jasonfb

This comment has been minimized.

Show comment
Hide comment
@jasonfb

jasonfb Jun 28, 2016

our actual failure. only fails when I run the whole suite or run the suite on the CI server


  1) cart to registration page as someone not logged in with jerry-rigged authentication for the legacy ATC experience should let me add something to my cart
     Failure/Error: @line_item = @order.contents.add(Spree::Variant.find(params[:line_item][:variant_id]), params[:line_item][:quantity])

     ActiveRecord::RecordNotFound:
       Couldn't find Spree::Variant with 'id'=2 [WHERE `spree_variants`.`deleted_at` IS NULL]
     # ./app/controllers/api/v1/checkout/line_items_controller.rb:4:in `create'
     # ------------------
     # --- Caused by: ---
     # Capybara::ExpectationNotMet:
     #   expected to find text "1x Blue Night 18 Hour Jersey has just been added to your cart" in "LOG IN REGISTER (0) $ 50 · FREE SHIPPING $ 100 · 10% OFF $ 150 · 15% OFF $ 200 · 20% OFF 18 HOUR JERSEY - A great boxer brief is all about fit, fabric, and comfort. This one takes the cake. Mesh zones keep you cool, a no-roll waistband anchors things up top, and our stay-put leg design prevents riding up below. Imported. 47.5% Long Staple Cotton / 47.5% Modal / 5% Lycra YOUR CART 18 Hour Jersey Blue Night S QTY $48.00 We don't believe in sales; we believe in rewards. Every additional item in your cart means you're getting a better deal. Item Subtotal $48.00 Subtotal $48.00 CHECKOUT Apply Promo Code Enter your email address to continue to checkout NEXT « Back Welcome back! Enter your password and you'll be that much closer to your new Mack Weldon gear. SIGN IN « Back Welcome! REGISTER « Back 18 HOUR JERSEY $ 24 Buy more save more. Blue Night / / VIEW SIZE CHART QUANTITY 1 ADD TO CART BACON IPSUM DOLOR AMET SHORT MINIM BEEF RIBS. FREE RETURNS, ALWAYS Mack Weldon is reinventing men’s basics. We believe in smart design, premium fabrics, and simple shopping. JOIN OUR MAILING LIST FAQ ABOUT CONTACT SHIPPING & RETURNS LEGAL SAVINGS REVIEWS JOBS © 2016 MACK WELDON ALL RIGHTS RESERVED"
     #   ./spec/features/shopping/add_to_cart_legacy_spec.rb:108:in `block (5 levels) in <top (required)>'

jasonfb commented Jun 28, 2016

our actual failure. only fails when I run the whole suite or run the suite on the CI server


  1) cart to registration page as someone not logged in with jerry-rigged authentication for the legacy ATC experience should let me add something to my cart
     Failure/Error: @line_item = @order.contents.add(Spree::Variant.find(params[:line_item][:variant_id]), params[:line_item][:quantity])

     ActiveRecord::RecordNotFound:
       Couldn't find Spree::Variant with 'id'=2 [WHERE `spree_variants`.`deleted_at` IS NULL]
     # ./app/controllers/api/v1/checkout/line_items_controller.rb:4:in `create'
     # ------------------
     # --- Caused by: ---
     # Capybara::ExpectationNotMet:
     #   expected to find text "1x Blue Night 18 Hour Jersey has just been added to your cart" in "LOG IN REGISTER (0) $ 50 · FREE SHIPPING $ 100 · 10% OFF $ 150 · 15% OFF $ 200 · 20% OFF 18 HOUR JERSEY - A great boxer brief is all about fit, fabric, and comfort. This one takes the cake. Mesh zones keep you cool, a no-roll waistband anchors things up top, and our stay-put leg design prevents riding up below. Imported. 47.5% Long Staple Cotton / 47.5% Modal / 5% Lycra YOUR CART 18 Hour Jersey Blue Night S QTY $48.00 We don't believe in sales; we believe in rewards. Every additional item in your cart means you're getting a better deal. Item Subtotal $48.00 Subtotal $48.00 CHECKOUT Apply Promo Code Enter your email address to continue to checkout NEXT « Back Welcome back! Enter your password and you'll be that much closer to your new Mack Weldon gear. SIGN IN « Back Welcome! REGISTER « Back 18 HOUR JERSEY $ 24 Buy more save more. Blue Night / / VIEW SIZE CHART QUANTITY 1 ADD TO CART BACON IPSUM DOLOR AMET SHORT MINIM BEEF RIBS. FREE RETURNS, ALWAYS Mack Weldon is reinventing men’s basics. We believe in smart design, premium fabrics, and simple shopping. JOIN OUR MAILING LIST FAQ ABOUT CONTACT SHIPPING & RETURNS LEGAL SAVINGS REVIEWS JOBS © 2016 MACK WELDON ALL RIGHTS RESERVED"
     #   ./spec/features/shopping/add_to_cart_legacy_spec.rb:108:in `block (5 levels) in <top (required)>'
@jasonfb

This comment has been minimized.

Show comment
Hide comment
@jasonfb

jasonfb Jun 28, 2016

here's the spec, as will see there is already a sleep 5 at the end of my spec which appears to do nothing and does not help whatsoever with the timing problems.
https://gist.github.com/jasonfb/986b99a459308ca2ade4212900d740f6

jasonfb commented Jun 28, 2016

here's the spec, as will see there is already a sleep 5 at the end of my spec which appears to do nothing and does not help whatsoever with the timing problems.
https://gist.github.com/jasonfb/986b99a459308ca2ade4212900d740f6

@jasonfb

This comment has been minimized.

Show comment
Hide comment

jasonfb commented Jun 28, 2016

@jasonfb jasonfb changed the title from Capybara 2.7.0, 2.7.1 introduced consistent, reproducible full-run failures to Capybara 2.7.0, 2.7.1 introduced inconsistent but reproducible full-run failures (specs pass when run individually) Jun 28, 2016

@jasonfb

This comment has been minimized.

Show comment
Hide comment
@jasonfb

jasonfb Jun 28, 2016

I am going back & forth between PhantomJS 2.1.1 and Phantom JS 2.0.0. So far 2.0.0 looks more stable and consistent. (With 2.1.1 I was seeing the driver lock-up on my CI server, like freeze halfway through and all subsequent tests fail. 2.0.0 does not show this behavior)

jasonfb commented Jun 28, 2016

I am going back & forth between PhantomJS 2.1.1 and Phantom JS 2.0.0. So far 2.0.0 looks more stable and consistent. (With 2.1.1 I was seeing the driver lock-up on my CI server, like freeze halfway through and all subsequent tests fail. 2.0.0 does not show this behavior)

@jasonfb

This comment has been minimized.

Show comment
Hide comment
@jasonfb

jasonfb Jun 28, 2016

I am leaving all this documentation here because it seems to us that later versions the problems are getting worse not better, so this is warning to others.

jasonfb commented Jun 28, 2016

I am leaving all this documentation here because it seems to us that later versions the problems are getting worse not better, so this is warning to others.

@adsteel

This comment has been minimized.

Show comment
Hide comment
@adsteel

adsteel Jun 28, 2016

@jasonfb This is probably a better question for SO. That being said, are you checking the database for a record after creating that record through the UI?

adsteel commented Jun 28, 2016

@jasonfb This is probably a better question for SO. That being said, are you checking the database for a record after creating that record through the UI?

@twalpole

This comment has been minimized.

Show comment
Hide comment
@twalpole

twalpole Jun 28, 2016

Collaborator

@jasonfb There is no way for premature truncation to occur while you're still in a test (unless you've really screwed up your database_cleaner config somehow, or you're doing parallel tests on the same database). What could happen prior to 2.7.0 is the test could finish and the database gets cleaned while there were still unfinished (and not waited for the visible results of) requests being executed. Since you are still in the test (waiting for "1x Blue Night 18 Hour Jersey has just been added to your cart" to appear) then waiting at the end isn't going to do anything.

If it was left over requests from the previous test, then sleeping 5 at the beginning of the test (rather than the end) that always fails might would potentially make a difference.

This really looks more like it's an issue with the mocking/doubling being done in a feature test (which is usually frowned upon)

Collaborator

twalpole commented Jun 28, 2016

@jasonfb There is no way for premature truncation to occur while you're still in a test (unless you've really screwed up your database_cleaner config somehow, or you're doing parallel tests on the same database). What could happen prior to 2.7.0 is the test could finish and the database gets cleaned while there were still unfinished (and not waited for the visible results of) requests being executed. Since you are still in the test (waiting for "1x Blue Night 18 Hour Jersey has just been added to your cart" to appear) then waiting at the end isn't going to do anything.

If it was left over requests from the previous test, then sleeping 5 at the beginning of the test (rather than the end) that always fails might would potentially make a difference.

This really looks more like it's an issue with the mocking/doubling being done in a feature test (which is usually frowned upon)

@twalpole

This comment has been minimized.

Show comment
Hide comment
@twalpole

twalpole Jun 28, 2016

Collaborator

@jasonfb Also I'm a little shocked that after running a bunch of tests your test would still be referencing a Spree::Variant with 'id'=2, since a few tests in I would expect to see ids higher than 2 - or is that seed data created with fixed ids or something?

Collaborator

twalpole commented Jun 28, 2016

@jasonfb Also I'm a little shocked that after running a bunch of tests your test would still be referencing a Spree::Variant with 'id'=2, since a few tests in I would expect to see ids higher than 2 - or is that seed data created with fixed ids or something?

@twalpole

This comment has been minimized.

Show comment
Hide comment
@twalpole

twalpole Jun 28, 2016

Collaborator

@jasonfb Also are there tables you've excluded from being cleaned by database cleaner -- are you sure variant: boxer_briefs_18_hour.variants.first is actually returning a variant with a null deleted_at?

Collaborator

twalpole commented Jun 28, 2016

@jasonfb Also are there tables you've excluded from being cleaned by database cleaner -- are you sure variant: boxer_briefs_18_hour.variants.first is actually returning a variant with a null deleted_at?

@jasonfb

This comment has been minimized.

Show comment
Hide comment
@jasonfb

jasonfb Jun 29, 2016

@twalpole thanks for all that attention, much appreciated. I think the auto-increment isn't reset on the test dbs, thus causing the growing id numbers as you pointed out. Will go through your other points as well one-by-one. Thanks.

It certainly feels like something is funky with the db cleaner setup. as well, I will try to see if I can re-do those mocks and just do proper log-ins.

jasonfb commented Jun 29, 2016

@twalpole thanks for all that attention, much appreciated. I think the auto-increment isn't reset on the test dbs, thus causing the growing id numbers as you pointed out. Will go through your other points as well one-by-one. Thanks.

It certainly feels like something is funky with the db cleaner setup. as well, I will try to see if I can re-do those mocks and just do proper log-ins.

@twalpole

This comment has been minimized.

Show comment
Hide comment
@twalpole

twalpole Jun 29, 2016

Collaborator

@jasonfb Just to clarify -- If a bunch of previous tests had occurred I would expect a number higher than 2 (so its the fact the number is so low that makes me question whats going on)-- Not resetting the auto-increment helps mitigate caching issues since each test ends up accessing resources with unique ids

Collaborator

twalpole commented Jun 29, 2016

@jasonfb Just to clarify -- If a bunch of previous tests had occurred I would expect a number higher than 2 (so its the fact the number is so low that makes me question whats going on)-- Not resetting the auto-increment helps mitigate caching issues since each test ends up accessing resources with unique ids

@jasonfb jasonfb changed the title from Capybara 2.7.0, 2.7.1 introduced inconsistent but reproducible full-run failures (specs pass when run individually) to ~~Capybara 2.7.0, 2.7.1 introduced inconsistent but reproducible full-run failures (specs pass when run individually)~~ Jun 30, 2016

@jasonfb jasonfb changed the title from ~~Capybara 2.7.0, 2.7.1 introduced inconsistent but reproducible full-run failures (specs pass when run individually)~~ to inconsistent but reproducible full-run failures (specs pass when run individually) Jun 30, 2016

@jasonfb

This comment has been minimized.

Show comment
Hide comment
@jasonfb

jasonfb Jun 30, 2016

We solved this ---- it was really hard. @twalpole You were right, the problem was in our setup.

Specifically, and importantly (documenting here), capybara-webkit will hold onto a cache of endpoints that are set to cache (makes sense, sort of). That means that even if you Rails.cache.clear on the server (which we tried), the headless driver has already cached our endpoint from a previous spec.

On the 2nd (subsequent) spec, the same endpoint is not fetched again by the headless driver, because it believes it already has it in the (headless driver's) web cache. Unfortunately for us, that important endpoint (which is just a giant JSON dump) is actually specific to our data, which was changing from spec to spec.

So that's why on subsequent run my front-end was sending variant_id(s) to my backend that weren't in the database, because those old veriant IDs had already been deleted by database cleaner.

So basically if you have endpoints that have cache-control headers on them (like, non-zero cache-control headers) you have to remember that the headless driver (webkit or poltergeist) will cache those responses inside of itself, even if you clear your Rails cache. This is why they pass individually but failed when run as a group.

For now, we're going to if Rails.env.test? the cache control header, so that in test it will respond with 0 minutes (normally it responds with a 3 minute cache). However, is there any way to tell the driver (webkit or poltergeist-- we switch between them) not to hang onto the cache of endpoints it hits from spec-to-spec? That way, I could avoid the ugly if Rails.env.test? in my app.

jasonfb commented Jun 30, 2016

We solved this ---- it was really hard. @twalpole You were right, the problem was in our setup.

Specifically, and importantly (documenting here), capybara-webkit will hold onto a cache of endpoints that are set to cache (makes sense, sort of). That means that even if you Rails.cache.clear on the server (which we tried), the headless driver has already cached our endpoint from a previous spec.

On the 2nd (subsequent) spec, the same endpoint is not fetched again by the headless driver, because it believes it already has it in the (headless driver's) web cache. Unfortunately for us, that important endpoint (which is just a giant JSON dump) is actually specific to our data, which was changing from spec to spec.

So that's why on subsequent run my front-end was sending variant_id(s) to my backend that weren't in the database, because those old veriant IDs had already been deleted by database cleaner.

So basically if you have endpoints that have cache-control headers on them (like, non-zero cache-control headers) you have to remember that the headless driver (webkit or poltergeist) will cache those responses inside of itself, even if you clear your Rails cache. This is why they pass individually but failed when run as a group.

For now, we're going to if Rails.env.test? the cache control header, so that in test it will respond with 0 minutes (normally it responds with a 3 minute cache). However, is there any way to tell the driver (webkit or poltergeist-- we switch between them) not to hang onto the cache of endpoints it hits from spec-to-spec? That way, I could avoid the ugly if Rails.env.test? in my app.

@jasonfb jasonfb closed this Jun 30, 2016

@twalpole

This comment has been minimized.

Show comment
Hide comment
@twalpole

twalpole Jun 30, 2016

Collaborator

@jasonfb The poltergeist 1.10.0 I released the other day added driver.clear_memory_cache which will clear out the cache - you could try calling that in a before block - not sure if capybara-webkit has a similar option or not

Collaborator

twalpole commented Jun 30, 2016

@jasonfb The poltergeist 1.10.0 I released the other day added driver.clear_memory_cache which will clear out the cache - you could try calling that in a before block - not sure if capybara-webkit has a similar option or not

@jasonfb

This comment has been minimized.

Show comment
Hide comment
@jasonfb

jasonfb Jun 30, 2016

Nice, thanks.

jasonfb commented Jun 30, 2016

Nice, thanks.

@jasonfb

This comment has been minimized.

Show comment
Hide comment

jasonfb commented Jun 30, 2016

@reidcooper @jasonfb 🚞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment