Yaml [BUG] Segmentation fault #4

thibaudgg opened this Issue Jun 28, 2010 · 15 comments


None yet
2 participants

Hi Myron,

Since we use VCR (1.0, with webmock) we sometimes got this scary error:

/Users/Thibaud/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/yaml.rb:133: [BUG] Segmentation fault

ruby 1.8.7 (2010-04-19 patchlevel 253) [i686-darwin10.4.0], MBARI 0x6770, Ruby Enterprise Edition 2010.02

We have a lot of external API call inside your app (~40 yml files). Maybe you have already got this error and have an idea to how fix it.



myronmarston commented Jun 28, 2010

I haven't seen that error before. I took a look at the code in yaml.rb at line 133, and this is it:

def YAML.load( io )
    yp = parser.load( io )

Nothing fancy. VCR calls this when a cassette is inserted. It looks like this is a bug in ruby's built-in YAML de-serializer. Do you consistently get this error? Have you been able to pin point a particular YAML file that triggers the error? Have you edited your YAML files at all?

I'm not sure if these links help, but I googled "YAML.load seg fault", and other people have gotten YAML seg faults, too:

You might consider trying either RbYAML or psych rather than Ruby's built in YAML library. Let me know if you need any changes to VCR to support using a different YAML library.

Yes this a very weird & problematic issue for us. We principally got this errors (not always) when we launch all our spec (bundle exec rspec spec).

Sadly this error come randomly and we haven't been able to point a particular YAML file. Yes, we have edited some YAML files with some regex to manage timestamps with something like that:

- !ruby/struct:VCR::HTTPInteraction 
  request: !ruby/struct:VCR::Request 
    method: :get
    uri: !ruby/regexp /..../

Do you think it can be a problem?

For the moment we need to stay on ree-1.8.7 (Heroku didn't support 1.9.2 yet) so psych is not a option and RbYAML seems very old (2006) have you already tried it or have some info to how use it inside a Rails3 app? I'm interested to try it out.


myronmarston commented Jun 29, 2010

I don't think the regexp is a problem, as long as it is valid YAML. I haven't actually used the regexp feature in an app yet, it was just requested by a few users and editing the YAML files seemed to be the easiest, cleanest way to accomplish it. My specs and features that use the regexp thing pass fine, though.

I think we need to pin point which YAML files are causing the issue. I created a little test script that should help. You'll need to edit it a bit, but it should get you started. The idea is to isolate the error by just loading the YAML files without VCR, so we can confirm it's a problem in YAML itself, and not a problem in VCR. Also, I wrote it to make it easy to load your yaml files using RbYAML--just set the YAML environment variable before running your script.

BTW, did you read the link I posted about YAML seg faults when loading fixtures? It appears this can happen with a very large YAML file (or a YAML file that is dynamically made to be very large when it is evaluated as ERB first). Do you have any very large YAML files?


myronmarston commented Jun 29, 2010

Hmm, after googling some more for yaml.rb:133 "[BUG] Segmentation fault", it appears that many people have had this issue, and it has to do with memory issues--either by loading a huge yaml file or by having limited memory available to MRI.

Could either of those be the issue here? That would explain the fact that it's been an intermittent problem. But it doesn't get us any closer to solving it :(.

Yeah regex is a very good feature to dealing with timestamped url and it doesn't seems to be the problem.
With your test script (thanks!) I was able to point a specific yaml piece that always [BUG] after loading it 206 times. If I add more yml files belongs the bugish file it'll will failed quicker.
I'm not sure if the yaml file size is important, my biggest file is ~500 KB and it works fine with the test script.

Hope this help :).


myronmarston commented Jun 30, 2010

Well, that confirms that the bug is definitely in YAML. I don't think I'll be able to make any changes to VCR that will fix it. But hopefully we can find a way to work around it.

The fact that you don't get the error until the file has been loaded 206 times suggests that there may be a memory leak (especially given the fact that this seg fault has been known to occur in limited memory situations). Can you watch the memory of the process as you run the test script to see if it keeps increasing?

Also, what happens if you remove the yaml file you pin pointed as the problem? If it is indeed a memory leak issue, I'd expect the problem to surface when a different file gets loaded enough times.

Have you tried the test script with RbYAML?

The memory process stay constant (It just increase at beginning of the first loop) so I'm not sure it is a memory leak issue. If I remove the bugish yaml file, I can run 1000 loops without any problem. But I have identify that almost 50% of my yaml file can cause that bug :(

With RbYAML is even worst, but with different bugs :(

What do you think about stale_fish approach? http://github.com/jsmestad/stale_fish/blob/master/lib/stale_fish/fixture.rb
A unique header yaml file with only important info for each recorded requests and their related responds stored in a plain file, not a bugish yml file :-) That way we can perfectly control the yaml content/structure and add some other info (the update_interval interval feature is quite interesting).


myronmarston commented Jul 2, 2010

The update_interval feature of stale_fish is indeed interesting, and looks useful. I've open up another issue to discuss that. Please take a look and leave comments about what kind of API you'd like for this issue.

As for the YAML problem...I'm not ready to ditch YAML yet. I still think it's the most human-readable, easily-editable serialization format out there. I've never had any problems with it, and before this ticket, I had never heard of the segmentation fault bug. Ruby 1.9.2's psych parser should solve that issue as well. And of course YAML is used a ton in Rails, both for things like database.yml and fixtures. If I changed to another format, I'd force all VCR users to change as well, and I'm not willing to do that. (I already forced a migration of the cassette format once, with the 0.4 release...I don't want to do it again unless I know the YAML problem is affecting a lot of users).

That said, I'm more than happy to make it easy to change the serialization format. I just need to provide extension hooks where users can easily change it. Here's my first-pass idea for what an API could look like:

VCR.config do |c|
  c.serialize_interactions do |interactions|
    # serialize the array of VCR::HTTPInteraction to whatever format you like...
    # XML, JSON or some custom format.  Just return a single string from this block.

  c.deserialize_interactions do |interactions_string|
    # do whatever you need to do to deserialize the string.
    # Return an array of VCR::HTTPInteraction from this block.

One issue with this is that the cassette files get stored with a .yml extension. It'd be nice to be able to change this to match whatever serialization format you use, but I don't particularly like the idea of adding an additional config setting just for this.

What do you think of this? Will this solve your problem sufficiently? Do you have a suggestion for a better way?

You totally right YAML is definitely the most human-readable, easily-editable serialization format out there, no question about that. I think it's a really great format for the HTTP Request structure of VCR, but not sure for the HTTP Response. Directly storing the xml or json (or anything else) responded seems a good approach too (and is very easy to edit too). It's always the HTTP Response YAML structure that makes problem. Now it's very rare that all our specs pass without a bug :(. We have listed three kind of YAML BUG.

/Users/Thibaud/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/yaml.rb:133: [BUG] Segmentation fault

/Users/Thibaud/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/yaml/rubytypes.rb:41: [BUG] Segmentation fault

/Users/Thibaud/.rvm/rubies/ree-1.8.7-2010.02/lib/ruby/1.8/yaml/rubytypes.rb:92: [BUG] Segmentation fault

And the HTTP Response YAML structure that cause problem (simplified) contain always that kind of stuff, http://gist.github.com/458355:

- !ruby/struct:VCR::HTTPInteraction 
  response: !ruby/struct:VCR::Response 
    body: &id002 !str 
      str: ""
      "@net_http_res": !ruby/object:Net::HTTPOK 
        body: *id002

We never have this "&id002 !str/*id002" in our HTTP Request structure, do you know where it's come from? Maybe it's a WebMock related problem?

So storing the cassette in another format seems a hack for me (and it'll be less user-friendly). I see two solutions for our problem:

  1. Found what cause this "&id002 !str/*id002" and prevent that to happen
  2. Make a reference to the http response file (xml, json…) inside the cassette, like stale_fish

myronmarston commented Jul 5, 2010

OK, this is really useful info. I should have asked you for an example yaml file sooner. Those "&id"/"*id" things shouldn't be there. It's valid yaml (it's how yaml deals with object references) but it shouldn't be needed and apparently it's causing the problem for you.

I did some research and it looks like rest-client may be the source of the problem. Are you using it? It extends the response body string with a module that adds additional attributes and instance variables to the string, and ruby's YAML serialization serializes all of it.

I think I have a fix, but before spending a bunch of time on writing the specs for it, I was hoping to chat with you a bit over IRC (or some other chat client) to make sure my fix really solves your problem. Are you available to chat?

Here's the commit that I think will solve the problem.

Really good news! Yes we are using a gem that depends on rest-client :-)
I have tried your commit and when I re-record the cassette those "&id"/"*id" things are gone, great! Now we need to re-record a lot of cassettes (with regex), so I'll tell you tomorrow if it's definitely ok.

Sure we can chat (AIM: thibaudgg, Jabber: thibaud@thibaud.me), I live in Switzerland (now it's 21:39, http://everytimezone.com, UTC + 2), I'll be available tomorrow morning. Let me know when you're free.

Thanks a lot for this commit, I'm full of hope!

Ok very good news, after we have cleaned all our cassettes this bug seems definitely gone! I'm little curious, how you found that rest-client was the problem?

You rock, congrats! I'm a very happy VCR user :-)


myronmarston commented Jul 6, 2010

That's good to hear :).

The YAML snippet you pasted gave me the clue. I googled net_http_res and rest-client was the first thing to come up.

I'll try to get an official release out in the next day or so that includes this fix.

BTW, can you recommend me on working with rails?


myronmarston commented Jul 6, 2010

Ensure the response body is serialized as a raw string, without any extensions on the string instance.

This is needed for rest-client. Closed by db348c3.

Recommended! Thanks again.

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment