Pool not working after a long tight loop #3

ray-delossantos · 2016-01-13T22:09:22Z

I have a REST API using the pool. We're doing some stress Testing over the lib and found this behavior.

This tight loop always stop working after 65536 iteration.

require "pg"
require "pool/connection"

# DDL
# DROP TABLE public.test;
#
# CREATE TABLE public.test(
#   id serial,
#   data jsonb NOT NULL
#
# );

$pg = ConnectionPool.new(capacity: 30) do
  PG.connect(ENV["DATABASE_URL"])
end

class Test
  def test_loop
    $pg.connection do |pg|
      pg.exec("DELETE FROM public.test")

      (1..1000).each do |i|
        puts "i: #{i}"
        result = pg.exec({Int32}, "INSERT INTO public.test (data) VALUES ('{}') RETURNING id")
        if rows = result.rows
          rows.each do |row|
            puts "i: #{i} id: #{row[0]}"
            # id = row[0]
          end
        end
      end
    end

    (1..1000000).each do |i|
      $pg.connection do |pg|
        puts "i: #{i}, pg.size: #{$pg.size}, pg.pending: #{$pg.pending}"
        result = pg.exec({JSON::Any}, "SELECT data FROM public.test WHERE id =$1", [100])
        if rows = result.rows
          rows.each do |row|
            if data = row[0]
              puts "data: #{data}"
            end
          end
        end
      end
    end
  end
end

t = Test.new

t.test_loop

ray-delossantos · 2016-01-13T22:41:42Z

I suppose is related to @w << '.' can't add more bits. I supposed that in the checkout part the @w should be decreased.
I really don't understand the library in details so i'll appreciate any help that could be provided.

ysbaddaden · 2016-01-13T23:01:22Z

I'm just pushing a char from a coroutine to a socket, which is read by another coroutine (I use the socket to notify and have timeouts). It should just continue to write and read, whatever how many times it happens. Maybe there is a BufferedIO which gets filled. That would explain why it always fail at the 65536 iteration.

ray-delossantos · 2016-01-14T01:21:15Z

@asterite, Does crystal have any limitation for the IO.pipe > abstract def read(slice : Slice(UInt8)) method?, I'll appreciate your help or guidance on what to do with this problem?

ysbaddaden · 2016-01-15T08:32:12Z

I wrote some tests. The first one shows there is no limit to IO.pipe. I can write has many bytes as I want.

r, w = IO.pipe

spawn do
  loop do |i|
    buffer = Slice(UInt8).new(1)
    r.read(buffer)
    puts "received #{i}" if i % 65536 == 0
  end
end

spawn do
  loop { w << "." }
end

sleep

The second one outlines the problem you describe: the pool stops around 65535. What's interesting is that is doesn't raise any Timeout::Error, so there must be a blocking call somewhere.

require "./src/pool"
class Obj; end
pool = Pool.new(capacity: 5, timeout: 1.0) { Obj.new }

spawn do
  loop do |i|
    puts "iteration #{i * 5}"
    obj1 = pool.checkout
    obj2 = pool.checkout
    obj3 = pool.checkout
    obj4 = pool.checkout
    obj5 = pool.checkout
    pool.checkin(obj1)
    pool.checkin(obj2)
    pool.checkin(obj3)
    pool.checkin(obj4)
    pool.checkin(obj5)
  end
end

sleep

ysbaddaden · 2016-01-15T09:15:28Z

The problem is indeed @w << '.' which calls IO::FileDescriptor#unbuffered_write(slice) where LibC.write fails with Errno::EAGAIN, so crystal waits for the pipe to be writable... which never happens (thus blocking indefinitely).

The problem seems to happen at the libc level, where a buffer seems to be filled. What I don't understand is that it doesn't happen in my IO.pipe tests. I can write as many bytes as needed and never block 😞

ysbaddaden · 2016-01-15T09:31:45Z

From the pipe(7) man page:

A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail, depending whether the O_NONBLOCK flag is set (see below). Different implementations have different limits for the pipe capacity. (...) Since Linux 2.6.11, the pipe capacity is 65536 bytes.

I think it's starting to make sense. I'm always writing a byte (on start_one and on checkin), but I'm never reading a byte when the pool has pending objects !

ray-delossantos · 2016-01-15T09:49:03Z

Hello @ysbaddaden, it's nice to know you are near the identification of the root cause. I'm following this issue by the minutes, i don't want to restart the server anymore, every hour :).

ysbaddaden · 2016-01-15T09:57:25Z

I just pushed a branch which disables the lazily start instances features and always reads first from the pipe. That fixes the problem.

Now, to get lazy instances with the pipe fix is quite difficult 😭

ysbaddaden · 2016-01-15T10:12:30Z

Fixed in 4630a55

ysbaddaden added a commit that referenced this issue Jan 15, 2016

Disable lazily start instances to fix #3

2993f94

ysbaddaden closed this as completed Jan 15, 2016

ysbaddaden added the bug label Jan 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pool not working after a long tight loop #3

Pool not working after a long tight loop #3

ray-delossantos commented Jan 13, 2016

ray-delossantos commented Jan 13, 2016

ysbaddaden commented Jan 13, 2016

ray-delossantos commented Jan 14, 2016

ysbaddaden commented Jan 15, 2016

ysbaddaden commented Jan 15, 2016

ysbaddaden commented Jan 15, 2016

ray-delossantos commented Jan 15, 2016

ysbaddaden commented Jan 15, 2016

ysbaddaden commented Jan 15, 2016

Pool not working after a long tight loop #3

Pool not working after a long tight loop #3

Comments

ray-delossantos commented Jan 13, 2016

ray-delossantos commented Jan 13, 2016

ysbaddaden commented Jan 13, 2016

ray-delossantos commented Jan 14, 2016

ysbaddaden commented Jan 15, 2016

ysbaddaden commented Jan 15, 2016

ysbaddaden commented Jan 15, 2016

ray-delossantos commented Jan 15, 2016

ysbaddaden commented Jan 15, 2016

ysbaddaden commented Jan 15, 2016