Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool not working after a long tight loop #3

Closed
ray-delossantos opened this issue Jan 13, 2016 · 9 comments
Closed

Pool not working after a long tight loop #3

ray-delossantos opened this issue Jan 13, 2016 · 9 comments
Labels

Comments

@ray-delossantos
Copy link

I have a REST API using the pool. We're doing some stress Testing over the lib and found this behavior.

This tight loop always stop working after 65536 iteration.

require "pg"
require "pool/connection"

# DDL
# DROP TABLE public.test;
#
# CREATE TABLE public.test(
#   id serial,
#   data jsonb NOT NULL
#
# );

$pg = ConnectionPool.new(capacity: 30) do
  PG.connect(ENV["DATABASE_URL"])
end

class Test
  def test_loop
    $pg.connection do |pg|
      pg.exec("DELETE FROM public.test")

      (1..1000).each do |i|
        puts "i: #{i}"
        result = pg.exec({Int32}, "INSERT INTO public.test (data) VALUES ('{}') RETURNING id")
        if rows = result.rows
          rows.each do |row|
            puts "i: #{i} id: #{row[0]}"
            # id = row[0]
          end
        end
      end
    end

    (1..1000000).each do |i|
      $pg.connection do |pg|
        puts "i: #{i}, pg.size: #{$pg.size}, pg.pending: #{$pg.pending}"
        result = pg.exec({JSON::Any}, "SELECT data FROM public.test WHERE id =$1", [100])
        if rows = result.rows
          rows.each do |row|
            if data = row[0]
              puts "data: #{data}"
            end
          end
        end
      end
    end
  end
end

t = Test.new

t.test_loop


@ray-delossantos
Copy link
Author

I suppose is related to @w << '.' can't add more bits. I supposed that in the checkout part the @w should be decreased.
I really don't understand the library in details so i'll appreciate any help that could be provided.

@ysbaddaden
Copy link
Owner

I'm just pushing a char from a coroutine to a socket, which is read by another coroutine (I use the socket to notify and have timeouts). It should just continue to write and read, whatever how many times it happens. Maybe there is a BufferedIO which gets filled. That would explain why it always fail at the 65536 iteration.

@ray-delossantos
Copy link
Author

@asterite, Does crystal have any limitation for the IO.pipe > abstract def read(slice : Slice(UInt8)) method?, I'll appreciate your help or guidance on what to do with this problem?

@ysbaddaden
Copy link
Owner

I wrote some tests. The first one shows there is no limit to IO.pipe. I can write has many bytes as I want.

r, w = IO.pipe

spawn do
  loop do |i|
    buffer = Slice(UInt8).new(1)
    r.read(buffer)
    puts "received #{i}" if i % 65536 == 0
  end
end

spawn do
  loop { w << "." }
end

sleep

The second one outlines the problem you describe: the pool stops around 65535. What's interesting is that is doesn't raise any Timeout::Error, so there must be a blocking call somewhere.

require "./src/pool"
class Obj; end
pool = Pool.new(capacity: 5, timeout: 1.0) { Obj.new }

spawn do
  loop do |i|
    puts "iteration #{i * 5}"
    obj1 = pool.checkout
    obj2 = pool.checkout
    obj3 = pool.checkout
    obj4 = pool.checkout
    obj5 = pool.checkout
    pool.checkin(obj1)
    pool.checkin(obj2)
    pool.checkin(obj3)
    pool.checkin(obj4)
    pool.checkin(obj5)
  end
end

sleep

@ysbaddaden
Copy link
Owner

The problem is indeed @w << '.' which calls IO::FileDescriptor#unbuffered_write(slice) where LibC.write fails with Errno::EAGAIN, so crystal waits for the pipe to be writable... which never happens (thus blocking indefinitely).

The problem seems to happen at the libc level, where a buffer seems to be filled. What I don't understand is that it doesn't happen in my IO.pipe tests. I can write as many bytes as needed and never block 😞

@ysbaddaden
Copy link
Owner

From the pipe(7) man page:

A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail, depending whether the O_NONBLOCK flag is set (see below). Different implementations have different limits for the pipe capacity. (...) Since Linux 2.6.11, the pipe capacity is 65536 bytes.

I think it's starting to make sense. I'm always writing a byte (on start_one and on checkin), but I'm never reading a byte when the pool has pending objects !

@ray-delossantos
Copy link
Author

Hello @ysbaddaden, it's nice to know you are near the identification of the root cause. I'm following this issue by the minutes, i don't want to restart the server anymore, every hour :).

@ysbaddaden
Copy link
Owner

I just pushed a branch which disables the lazily start instances features and always reads first from the pipe. That fixes the problem.

Now, to get lazy instances with the pipe fix is quite difficult 😭

@ysbaddaden
Copy link
Owner

Fixed in 4630a55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants