Segmentation fault when adding Curl::Easy instances to a Curl::Multi #20

Open
ghost opened this Issue Dec 22, 2009 · 13 comments

Comments

Projects
None yet
2 participants
@ghost

ghost commented Dec 22, 2009

I'm getting a segmentation fault when I add Curl:Easy instances to a running Curl::Multi instance. The Curl:Easy instances are added from on_success / on_failure handlers. The problem is sporadic but shows up eventually after a couple of minutes when running in a continuous loop.

I'm currently running Ruby 1.9.1 and Curb 0.6.0.0. I have tested with different versions of Curb but get always the same result.

Any idea?

This is the console dump:

[BUG] Segmentation fault
ruby 1.9.1p376 (2009-12-07 revision 26041) [x86_64-linux]

-- control frame ----------
c:0013 p:---- s:0044 b:0044 l:000043 d:000043 CFUNC :add
c:0012 p:0105 s:0040 b:0040 l:001628 d:001628 METHOD feedzilla.rb:86
c:0011 p:0035 s:0033 b:0033 l:002408 d:000032 BLOCK feedzilla.rb:69
c:0010 p:---- s:0030 b:0030 l:000029 d:000029 FINISH
c:0009 p:---- s:0028 b:0028 l:000027 d:000027 CFUNC :call
c:0008 p:---- s:0026 b:0026 l:000025 d:000025 CFUNC :perform
c:0007 p:0117 s:0023 b:0023 l:000022 d:000022 METHOD feedzilla.rb:53
c:0006 p:0024 s:0016 b:0016 l:0023f8 d:000015 BLOCK feedzilla.rb:96
c:0005 p:---- s:0013 b:0013 l:000012 d:000012 FINISH
c:0004 p:---- s:0011 b:0011 l:000010 d:000010 CFUNC :each
c:0003 p:0084 s:0008 b:0008 l:0023f8 d:001418 EVAL feedzilla.rb:95
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH

c:0001 p:0000 s:0002 b:0002 l:0023f8 d:0023f8 TOP

-- Ruby level backtrace information-----------------------------------------

feedzilla.rb:86:in add
feedzilla.rb:86:in add_url_to_multi
feedzilla.rb:69:in block in add_url_to_multi
feedzilla.rb:53:in call
feedzilla.rb:53:in perform
feedzilla.rb:53:in fetch
feedzilla.rb:96:in block in


feedzilla.rb:95:in each
feedzilla.rb:95:in

-- C level backtrace information -------------------------------------------
0x4eb69b ruby(rb_vm_bugreport+0x3b) [0x4eb69b]
0x51a160 ruby [0x51a160]
0x51a2d1 ruby(rb_bug+0xb1) [0x51a2d1]
0x495cef ruby [0x495cef]
0x2aaaaacda0f0 /lib/libpthread.so.0 [0x2aaaaacda0f0]
0x49bb1e ruby(st_lookup+0xe) [0x49bb1e]
0x4d7f4f ruby [0x4d7f4f]
0x4d7fb3 ruby(rb_get_method_body+0x23) [0x4d7fb3]
0x4db653 ruby [0x4db653]
0x4dbed7 ruby(rb_funcall+0x147) [0x4dbed7]
0x4a0f43 ruby(rb_obj_as_string+0x83) [0x4a0f43]
0x2aaaac03952d /usr/local/ruby-1.9.1-p376/lib/ruby/gems/1.9.1/gems/curb-0.6.0.0/lib/curb_core.so(ruby_curl_easy_setup+0x80d) [0x2aaaac03952d]
0x2aaaac033e4f /usr/local/ruby-1.9.1-p376/lib/ruby/gems/1.9.1/gems/curb-0.6.0.0/lib/curb_core.so(ruby_curl_multi_add+0x7f) [0x2aaaac033e4f]
0x4d97a0 ruby [0x4d97a0]
0x4dea90 ruby [0x4dea90]
0x4dfaa4 ruby [0x4dfaa4]
0x4e4c5b ruby [0x4e4c5b]
0x4e740d ruby(rb_vm_invoke_proc+0x42d) [0x4e740d]
0x4db7d9 ruby [0x4db7d9]
0x4dbed7 ruby(rb_funcall+0x147) [0x4dbed7]
0x2aaaac033ab1 /usr/local/ruby-1.9.1-p376/lib/ruby/gems/1.9.1/gems/curb-0.6.0.0/lib/curb_core.so [0x2aaaac033ab1]
0x2aaaac033d1b /usr/local/ruby-1.9.1-p376/lib/ruby/gems/1.9.1/gems/curb-0.6.0.0/lib/curb_core.so(ruby_curl_multi_perform+0x21b) [0x2aaaac033d1b]
0x4d97a0 ruby [0x4d97a0]
0x4dea90 ruby [0x4dea90]
0x4dfaa4 ruby [0x4dfaa4]
0x4e4c5b ruby [0x4e4c5b]
0x4e5367 ruby [0x4e5367]
0x4e63cc ruby(rb_yield+0x6c) [0x4e63cc]
0x46b9f1 ruby [0x46b9f1]
0x4d97a0 ruby [0x4d97a0]
0x4dea90 ruby [0x4dea90]
0x4dfaa4 ruby [0x4dfaa4]
0x4e4c5b ruby [0x4e4c5b]
0x4e4e29 ruby(rb_iseq_eval_main+0xa9) [0x4e4e29]
0x4195dc ruby(ruby_exec_node+0xac) [0x4195dc]
0x41ad43 ruby(ruby_run_node+0x33) [0x41ad43]
0x41817d ruby(main+0x4d) [0x41817d]
0x2aaaab7d0466 /lib/libc.so.6(__libc_start_main+0xe6) [0x2aaaab7d0466]
0x418069 ruby [0x418069]

@taf2

This comment has been minimized.

Show comment Hide comment
@taf2

taf2 Jan 15, 2010

Owner

Do you have a sample use case?

Owner

taf2 commented Jan 15, 2010

Do you have a sample use case?

@ghost

This comment has been minimized.

Show comment Hide comment
@ghost

ghost Jan 15, 2010

Not sure what you mean with use case? I have sent you the the code and samples from valgrind.

ghost commented Jan 15, 2010

Not sure what you mean with use case? I have sent you the the code and samples from valgrind.

@igrigorik

This comment has been minimized.

Show comment Hide comment
@igrigorik

igrigorik Jan 29, 2010

Contributor

Andre, are you by any chance setting the headers anywhere in your code? We are seeing the exact same problem, and looking at our backtrace, it segfaults when we are allocating a new curl instance and are trying to initialize some headers (specifically, it's the first time we try to access the header hash)... hence the st_lookup segfault.

taf2, any ideas or suggestions?

Contributor

igrigorik commented Jan 29, 2010

Andre, are you by any chance setting the headers anywhere in your code? We are seeing the exact same problem, and looking at our backtrace, it segfaults when we are allocating a new curl instance and are trying to initialize some headers (specifically, it's the first time we try to access the header hash)... hence the st_lookup segfault.

taf2, any ideas or suggestions?

@ghost

This comment has been minimized.

Show comment Hide comment
@ghost

ghost Jan 29, 2010

Ilya, we are setting the usual headers: accept, user_agent etc. The first time I have seen this problem it has not been related to setting headers butwhen one of the callback methods is called. We where using gem v0.6.0 at this time. I did some tests with valgrind with gem v0.6.4. With this version the segfault looked like what you described.
We run on EC2. For some reason we see the problem only on 64 bit machines. Our (crapy) fix at the moment is to run the crawlers on small instances.

ghost commented Jan 29, 2010

Ilya, we are setting the usual headers: accept, user_agent etc. The first time I have seen this problem it has not been related to setting headers butwhen one of the callback methods is called. We where using gem v0.6.0 at this time. I did some tests with valgrind with gem v0.6.4. With this version the segfault looked like what you described.
We run on EC2. For some reason we see the problem only on 64 bit machines. Our (crapy) fix at the moment is to run the crawlers on small instances.

@igrigorik

This comment has been minimized.

Show comment Hide comment
@igrigorik

igrigorik Jan 29, 2010

Contributor

Hmm, same setup. EC2, 64 bit (xlarge). CentOS 5.1.

Latest & greatest version curl on the server, etc.

Contributor

igrigorik commented Jan 29, 2010

Hmm, same setup. EC2, 64 bit (xlarge). CentOS 5.1.

Latest & greatest version curl on the server, etc.

@taf2

This comment has been minimized.

Show comment Hide comment
@taf2

taf2 Jan 29, 2010

Owner

Ilya, I'm going to use your suggest of the first access on a my 64 box and see if i can reproduce...

Owner

taf2 commented Jan 29, 2010

Ilya, I'm going to use your suggest of the first access on a my 64 box and see if i can reproduce...

@igrigorik

This comment has been minimized.

Show comment Hide comment
@igrigorik

igrigorik Jan 30, 2010

Contributor

Todd, also I should mention that I'm seeing this problem in exact same setup as described above: within the success callback I'm creating a new curl easy instance and queuing it up into the multi-loop. The idea is to keep the multi loop always occupied, always running several hundred connections.

Contributor

igrigorik commented Jan 30, 2010

Todd, also I should mention that I'm seeing this problem in exact same setup as described above: within the success callback I'm creating a new curl easy instance and queuing it up into the multi-loop. The idea is to keep the multi loop always occupied, always running several hundred connections.

@taf2

This comment has been minimized.

Show comment Hide comment
@taf2

taf2 Jan 30, 2010

Owner

Can you try this patch: http://gist.github.com/290650

Owner

taf2 commented Jan 30, 2010

Can you try this patch: http://gist.github.com/290650

@taf2

This comment has been minimized.

Show comment Hide comment
@taf2

taf2 Jan 30, 2010

Owner

Also, in HEAD I changed the order of easy_setup and multi_add. My thought is that maybe there are bad easy handles getting into the multiple handle... so before if an easy handle was added and than raised an exception the multi handle could be left in a bad state. Now if the easy handle is going to raise it'll raise before it gets into the multi handle... maybe give this a try after the other patch above and please let me know the effects... to really isolate this issue down to something repeatable... would be ideal..

Owner

taf2 commented Jan 30, 2010

Also, in HEAD I changed the order of easy_setup and multi_add. My thought is that maybe there are bad easy handles getting into the multiple handle... so before if an easy handle was added and than raised an exception the multi handle could be left in a bad state. Now if the easy handle is going to raise it'll raise before it gets into the multi handle... maybe give this a try after the other patch above and please let me know the effects... to really isolate this issue down to something repeatable... would be ideal..

@igrigorik

This comment has been minimized.

Show comment Hide comment
@igrigorik

igrigorik Jan 31, 2010

Contributor

Grabbed HEAD and applied your patch -- running now. The challenge is and will be in reproducing the actual bug. We use curl-multi interface to drive our downloaders, and sometimes they go for hours before the process falls down. I haven't been able to reproduce this problem reliably before..

Interestingly enough though, I have run extended tests against local endpoints (nginx server) and same code did not fail me there. It's almost like it has something to do with a specific site / URL.

Last but not least: we're actually using 0.4.6.0 in production at the moment. I recently upgraded several of our production boxes to latest gem, but for some reason, our throughput dropped by more than 2x immediately following the upgrade. We couldn't spot any obvious problems after several hours of investigation, and ended up reverting to the older version -- perhaps something to look into. The same segfault problem showed up in 0.6.x release as well though.

ig

Contributor

igrigorik commented Jan 31, 2010

Grabbed HEAD and applied your patch -- running now. The challenge is and will be in reproducing the actual bug. We use curl-multi interface to drive our downloaders, and sometimes they go for hours before the process falls down. I haven't been able to reproduce this problem reliably before..

Interestingly enough though, I have run extended tests against local endpoints (nginx server) and same code did not fail me there. It's almost like it has something to do with a specific site / URL.

Last but not least: we're actually using 0.4.6.0 in production at the moment. I recently upgraded several of our production boxes to latest gem, but for some reason, our throughput dropped by more than 2x immediately following the upgrade. We couldn't spot any obvious problems after several hours of investigation, and ended up reverting to the older version -- perhaps something to look into. The same segfault problem showed up in 0.6.x release as well though.

ig

@igrigorik

This comment has been minimized.

Show comment Hide comment
@igrigorik

igrigorik Jan 31, 2010

Contributor

Aha, I think we're on the right track. Got a different stack trace this time. Looks like it's SEGV'ing when it tries to invoke the on_failure callback:

/pr/core/app/keystone/downloader.rb:136: [BUG] Segmentation fault
ruby 1.9.2dev (2009-07-18 trunk 24186) [x86_64-linux]

-- control frame ----------
c:0030 p:---- s:0094 b:0094 l:000093 d:000093 CFUNC :on_failure
c:0029 p:0011 s:0091 b:0091 l:000228 d:000090 BLOCK /pr/core/app/keystone/downloader.rb:136
c:0028 p:---- s:0088 b:0088 l:000087 d:000087 FINISH
c:0027 p:---- s:0086 b:0086 l:000085 d:000085 CFUNC :call
c:0026 p:---- s:0084 b:0084 l:000083 d:000083 CFUNC :new

Contributor

igrigorik commented Jan 31, 2010

Aha, I think we're on the right track. Got a different stack trace this time. Looks like it's SEGV'ing when it tries to invoke the on_failure callback:

/pr/core/app/keystone/downloader.rb:136: [BUG] Segmentation fault
ruby 1.9.2dev (2009-07-18 trunk 24186) [x86_64-linux]

-- control frame ----------
c:0030 p:---- s:0094 b:0094 l:000093 d:000093 CFUNC :on_failure
c:0029 p:0011 s:0091 b:0091 l:000228 d:000090 BLOCK /pr/core/app/keystone/downloader.rb:136
c:0028 p:---- s:0088 b:0088 l:000087 d:000087 FINISH
c:0027 p:---- s:0086 b:0086 l:000085 d:000085 CFUNC :call
c:0026 p:---- s:0084 b:0084 l:000083 d:000083 CFUNC :new

@taf2

This comment has been minimized.

Show comment Hide comment
@taf2

taf2 May 9, 2010

Owner

I wonder if this is bug in the version of ruby 1.9.2 ?

Owner

taf2 commented May 9, 2010

I wonder if this is bug in the version of ruby 1.9.2 ?

@taf2

This comment has been minimized.

Show comment Hide comment
@taf2

taf2 Jun 22, 2010

Owner

I wonder if the recent refactoring for issue 24 has had additional impact on this issue? Also, I was looking at a few of the bug fixes in most recent versions of libcurl and was thinking they could also be suspect....

Owner

taf2 commented Jun 22, 2010

I wonder if the recent refactoring for issue 24 has had additional impact on this issue? Also, I was looking at a few of the bug fixes in most recent versions of libcurl and was thinking they could also be suspect....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment