Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

close all files when PostgreSQL is not running #13

Closed
jclulow opened this issue Sep 11, 2017 · 6 comments · Fixed by #27
Closed

close all files when PostgreSQL is not running #13

jclulow opened this issue Sep 11, 2017 · 6 comments · Fixed by #27
Assignees
Labels

Comments

@jclulow
Copy link

jclulow commented Sep 11, 2017

When PostgreSQL is not running, the prefaulter appears to retain a large number of open files within the PostgreSQL data directory. So that we can cleanly unmount and remount the data file system in Manatee, the prefaulter needs to close these descriptors promptly, and avoid opening any new ones, until it is determined that PostgreSQL is running again. It would also be good to make sure the current working directory of the prefaulter process never resides within the PostgreSQL data directory.

@jjelinek
Copy link

Just to expand on Josh's comment, the above issues are a common bug that we repeatedly encounter and which will prevent the zone from being rebooted, leaving it in the "down" state (e.g. an operator accidentally leaving their shell cwd within a zone subdirectory). When this happens ops opens a ticket and involves an engineer who will have to manually track down the offending process and kill it before the zone can be restarted.

@jclulow jclulow added the bug label Sep 12, 2017
@sean- sean- closed this as completed in d1813ad Sep 12, 2017
@sean- sean- reopened this Sep 21, 2017
@davepacheco
Copy link

We appear to have seen this again a few times with the prefaulter running inside the Manatee zone. This was on Manatee image dd78c1de-9783-11e7-8420-572a658a6ad3.

@sean-
Copy link
Contributor

sean- commented Sep 21, 2017

Huh. When I wrote the initial version of this I apparently noticed this was missing and totally forgot about it, but filed an issue requesting the desired functionality required to get the eviction handlers to fire on a Purge event: bluele/gcache#37 I'll add this to our copy and the PR this upstream.

@sean-
Copy link
Contributor

sean- commented Oct 1, 2017

$ uname -a
SunOS fault-test 5.11 joyent_20170303T182017Z i86pc i386 i86pc

See pg_prefaulter when the database is up and there is no lag:

$ pfiles `pgrep pg_prefaulter`
34983:  ./pg_prefaulter run --config=pg_prefaulter-follower.toml --retry-db-in
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:3165767
   1: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:3165767
   2: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:3165767
   3: S_IFCHR mode:0644 dev:561,174 ino:4286469645 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:572,0 uid:907 gid:907 size:0
   5: S_IFDOOR mode:0444 dev:571,0 ino:666 uid:0 gid:0 rdev:570,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[28218]
   6: S_IFSOCK mode:0666 dev:569,0 ino:60124 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 43223
   7: S_IFSOCK mode:0666 dev:569,0 ino:29156 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_BROADCAST,SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX
        peername: AF_UNIX /tmp/.s.PGSQL.5433
        peer: postgres[33045] zone: 8d5fc727-270e-c4bf-dc79-874b8f3d9d9d[361]

Generate some data:

$ make gendata
2>&1 PGSSLMODE=disable PGHOST=/tmp PGUSER=postgres PGPASSWORD="`cat \".pwfile\"`" PGOPTIONS="-c synchronous_commit=off" "/opt/local/bin/psql" test -c 'INSERT INTO garbage SELECT s, md5(random()::text) FROM generate_series(1,1000000) s' | tee -a test.log
INSERT 0 1000000

Look in the prefaulter logs:

{"time":"2017-09-30T16:23:17.627980073Z","level":"debug","hit":14085,"miss":26,"lookup":14111,"hit-rate":0.9981574658068174,"message":"filehandle-stats"}
{"time":"2017-09-30T16:23:17.969357212Z","level":"debug","hit":8,"miss":18,"lookup":26,"hit-rate":0.3076923076923077,"message":"walcache-stats"}
{"time":"2017-09-30T16:23:18.146309635Z","level":"debug","hit":665194,"miss":901793,"lookup":1566987,"hit-rate":0.4245051171451965,"message":"iocache-stats"}

Look at pfiles(1):

$ pfiles `pgrep pg_prefaulter`
34983:  ./pg_prefaulter run --config=pg_prefaulter-follower.toml --retry-db-in
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:5374604
   1: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:5374604
   2: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:5374604
   3: S_IFCHR mode:0644 dev:561,174 ino:4286469645 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:572,0 uid:907 gid:907 size:0
   5: S_IFDOOR mode:0444 dev:571,0 ino:666 uid:0 gid:0 rdev:570,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[28218]
   6: S_IFSOCK mode:0666 dev:569,0 ino:60124 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 43223
   7: S_IFSOCK mode:0666 dev:569,0 ino:29156 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_BROADCAST,SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX
        peername: AF_UNIX /tmp/.s.PGSQL.5433
        peer: postgres[33045] zone: 8d5fc727-270e-c4bf-dc79-874b8f3d9d9d[361]
   8: S_IFREG mode:0600 dev:90,68350 ino:233634 uid:907 gid:907 size:136536064
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/16394
      offset:0
   9: S_IFREG mode:0600 dev:90,68350 ino:233389 uid:907 gid:907 size:73728
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/1247
      offset:0
  10: S_IFREG mode:0600 dev:90,68350 ino:233434 uid:907 gid:907 size:16384
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2703
      offset:0
  11: S_IFREG mode:0600 dev:90,68350 ino:233565 uid:907 gid:907 size:450560
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2608
      offset:0
  12: S_IFREG mode:0600 dev:90,68350 ino:233485 uid:907 gid:907 size:40960
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2704
      offset:0
  13: S_IFREG mode:0600 dev:90,68350 ino:233483 uid:907 gid:907 size:352256
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2674
      offset:0
  14: S_IFREG mode:0600 dev:90,68350 ino:233428 uid:907 gid:907 size:319488
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2673
      offset:0
  15: S_IFREG mode:0600 dev:90,68350 ino:233380 uid:907 gid:907 size:40960
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2663
      offset:0
  16: S_IFREG mode:0600 dev:90,68350 ino:233412 uid:907 gid:907 size:90112
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/1259
      offset:0
  17: S_IFREG mode:0600 dev:90,68350 ino:233526 uid:907 gid:907 size:32768
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2662
      offset:0
  18: S_IFREG mode:0600 dev:90,68350 ino:233452 uid:907 gid:907 size:106496
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2658
      offset:0
  19: S_IFREG mode:0600 dev:90,68350 ino:233463 uid:907 gid:907 size:368640
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/1249
      offset:0
  20: S_IFREG mode:0600 dev:90,68350 ino:233367 uid:907 gid:907 size:32768
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/3455
      offset:0
  21: S_IFREG mode:0600 dev:90,68350 ino:233609 uid:907 gid:907 size:73728
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2659
      offset:0
  22: S_IFREG mode:0600 dev:90,68350 ino:233433 uid:907 gid:907 size:16384
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2679
      offset:0
  23: S_IFREG mode:0600 dev:90,68350 ino:233615 uid:907 gid:907 size:16384
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2678
      offset:0
  24: S_IFREG mode:0600 dev:90,68350 ino:233475 uid:907 gid:907 size:24576
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/2610
      offset:0
  25: S_IFREG mode:0600 dev:90,68350 ino:233636 uid:907 gid:907 size:8192
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16393/16399
      offset:0

Terminate postgres sitter-style:

$ pkill -9 postgres

Generate some more data to create a backlog and somewhere in the middle of this restart postgres (or generate load and then kill/restart postgres, doesn't matter - both tested):

$ make gendata
2>&1 PGSSLMODE=disable PGHOST=/tmp PGUSER=postgres PGPASSWORD="`cat \".pwfile\"`" PGOPTIONS="-c synchronous_commit=off" "/opt/local/bin/psql" test -c 'INSERT INTO garbage SELECT s, md5(random()::text) FROM generate_series(1,1000000) s' | tee -a test.log
INSERT 0 1000000
$ make gendata
2>&1 PGSSLMODE=disable PGHOST=/tmp PGUSER=postgres PGPASSWORD="`cat \".pwfile\"`" PGOPTIONS="-c synchronous_commit=off" "/opt/local/bin/psql" test -c 'INSERT INTO garbage SELECT s, md5(random()::text) FROM generate_series(1,1000000) s' | tee -a test.log
INSERT 0 1000000
$ make gendata
2>&1 PGSSLMODE=disable PGHOST=/tmp PGUSER=postgres PGPASSWORD="`cat \".pwfile\"`" PGOPTIONS="-c synchronous_commit=off" "/opt/local/bin/psql" test -c 'INSERT INTO garbage SELECT s, md5(random()::text) FROM generate_series(1,1000000) s' | tee -a test.log
INSERT 0 1000000

Verifying the state of the prefaulter before starting PostgreSQL:

$ pfiles `pgrep pg_prefaulter`
20796:  ./pg_prefaulter run --config=pg_prefaulter-follower.toml --retry-db-in
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:90414
   1: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:90414
   2: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:90414
   3: S_IFCHR mode:0644 dev:561,174 ino:4286469645 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:572,0 uid:907 gid:907 size:0
   5: S_IFDOOR mode:0444 dev:571,0 ino:666 uid:0 gid:0 rdev:570,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[28218]
   6: S_IFSOCK mode:0666 dev:569,0 ino:34274 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 35332

Look at the logs to see what's there:

{"time":"2017-09-30T20:38:46.59842733Z","level":"debug","hit":0,"miss":0,"lookup":0,"hit-rate":0,"message":"filehandle-stats"}
{"time":"2017-09-30T20:38:47.002619298Z","level":"debug","hit":0,"miss":0,"lookup":0,"hit-rate":0,"message":"iocache-stats"}
{"time":"2017-09-30T20:38:47.002642715Z","level":"debug","hit":0,"miss":0,"lookup":0,"hit-rate":0,"message":"walcache-stats"}
{"time":"2017-09-30T20:39:46.598609172Z","level":"debug","hit":7260,"miss":6,"lookup":7266,"hit-rate":0.9991742361684558,"message":"filehandle-stats"}
{"time":"2017-09-30T20:39:47.00377543Z","level":"debug","hit":6,"miss":10,"lookup":16,"hit-rate":0.375,"message":"walcache-stats"}
{"time":"2017-09-30T20:39:47.003772888Z","level":"debug","hit":343361,"miss":527554,"lookup":870915,"hit-rate":0.39425317051606645,"message":"iocache-stats"}

Check pfiles(1) again:

$ pfiles `pgrep pg_prefaulter`                                                                                                                                                                                          [302/1995]
33395:  ./pg_prefaulter run --config=pg_prefaulter-follower.toml --retry-db-in
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:6152242
   1: S_IFIFO mode:0000 dev:558,0 ino:480599548 uid:907 gid:907 rdev:0,0
      O_RDWR
   2: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:6152242
   3: S_IFCHR mode:0644 dev:561,174 ino:4286469645 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:572,0 uid:907 gid:907 size:0
   5: S_IFDOOR mode:0444 dev:571,0 ino:666 uid:0 gid:0 rdev:570,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[28218]
   6: S_IFSOCK mode:0666 dev:569,0 ino:55086 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 39104
   7: S_IFSOCK mode:0666 dev:569,0 ino:10736 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_BROADCAST,SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX
        peername: AF_UNIX /tmp/.s.PGSQL.5433
        peer: postgres[34335] zone: 8d5fc727-270e-c4bf-dc79-874b8f3d9d9d[361]
   8: S_IFREG mode:0600 dev:90,68350 ino:260963 uid:907 gid:907 size:1073741824
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16384/16385
      offset:0
   9: S_IFREG mode:0600 dev:90,68350 ino:261021 uid:907 gid:907 size:147456
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16384/2619
      offset:0
  11: S_IFREG mode:0600 dev:90,68350 ino:260951 uid:907 gid:907 size:65536
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16384/2840
      offset:0
  12: S_IFREG mode:0600 dev:90,68350 ino:260927 uid:907 gid:907 size:16384
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16384/2696
      offset:0
  14: S_IFREG mode:0600 dev:90,68350 ino:260996 uid:907 gid:907 size:90112
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16384/1259
      offset:0
  15: S_IFREG mode:0600 dev:90,68350 ino:260869 uid:907 gid:907 size:16384
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /var/pgsql/go/src/github.com/joyent/pg_prefaulter/.pgdata_follower/base/16384/2841
      offset:0

Terminate PostgreSQL at will and when we do, check pfiles(1):

$ pfiles `pgrep pg_prefaulter`
33395:  ./pg_prefaulter run --config=pg_prefaulter-follower.toml --retry-db-in
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:6155692
   1: S_IFIFO mode:0000 dev:558,0 ino:480599548 uid:907 gid:907 rdev:0,0
      O_RDWR
   2: S_IFCHR mode:0620 dev:561,174 ino:3690371815 uid:0 gid:7 rdev:6,19
      O_RDWR|O_APPEND|O_NOCTTY|O_LARGEFILE
      /dev/pts/19
      offset:6155692
   3: S_IFCHR mode:0644 dev:561,174 ino:4286469645 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:572,0 uid:907 gid:907 size:0
   5: S_IFDOOR mode:0444 dev:571,0 ino:666 uid:0 gid:0 rdev:570,0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[28218]
   6: S_IFSOCK mode:0666 dev:569,0 ino:55086 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 39104

Useful logging that is relevant:

{"time":"2017-09-30T20:45:08.352018215Z","level":"error","error":"unable to query PostgreSQL checkpoint information: dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory: (retriable: true, purge cache: true","next step":"retrying","message":"unable to find WAL files"}
{"time":"2017-09-30T20:45:08.352073436Z","level":"debug","message":"purging cache"}
{"time":"2017-09-30T20:45:09.558287638Z","level":"debug","walfile":"000000010000000000000067","message":"found WAL segment from /proc"}
{"time":"2017-09-30T20:45:09.86240054Z","level":"debug","walfile":"000000010000000000000067","message":"found WAL segment from /proc"}
{"time":"2017-09-30T20:45:11.144263402Z","level":"debug","walfile":"000000010000000000000067","message":"found WAL segment from /proc"}
{"time":"2017-09-30T20:45:12.379228418Z","level":"debug","walfile":"000000010000000000000068","message":"found WAL segment from /proc"}
{"time":"2017-09-30T20:45:22.760509194Z","level":"debug","backend-pid":45923,"version":"PostgreSQL 9.6.5 on x86_64-sun-solaris2.11, compiled by gcc (GCC) 4.9.4, 64-bit","message":"established DB connection"}
{"time":"2017-09-30T20:45:46.711599818Z","level":"debug","hit":15969,"miss":19,"lookup":15988,"hit-rate":0.9988116087065299,"message":"filehandle-stats"}
{"time":"2017-09-30T20:45:47.005034935Z","level":"debug","hit":785037,"miss":1131044,"lookup":1916081,"hit-rate":0.40970971477719365,"message":"iocache-stats"}
{"time":"2017-09-30T20:45:47.006109038Z","level":"debug","hit":20,"miss":20,"lookup":40,"hit-rate":0.5,"message":"walcache-stats"}
{"time":"2017-09-30T20:46:46.711745518Z","level":"debug","hit":15969,"miss":19,"lookup":15988,"hit-rate":0.9988116087065299,"message":"filehandle-stats"}
{"time":"2017-09-30T20:46:47.005231435Z","level":"debug","hit":785037,"miss":1131044,"lookup":1916081,"hit-rate":0.40970971477719365,"message":"iocache-stats"}
{"time":"2017-09-30T20:46:47.00627201Z","level":"debug","hit":20,"miss":20,"lookup":40,"hit-rate":0.5,"message":"walcache-stats"}

Of note, for non-SmartOS platforms that don't potentially have /proc support, pargs(1) is used as a fallback:

{"time":"2017-09-30T20:33:46.849351855Z","level":"error","error":"unable to query PostgreSQL checkpoint information: dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory: (retriable: true, purge cache: true","next step":"re
trying","message":"unable to find WAL files"}
{"time":"2017-09-30T20:33:46.849404092Z","level":"debug","message":"purging cache"}
{"time":"2017-09-30T20:33:48.302207541Z","level":"debug","walfile":"000000010000000000000060","message":"found WAL segment from pargs(1)"}
{"time":"2017-09-30T20:33:49.690287117Z","level":"debug","walfile":"000000010000000000000061","message":"found WAL segment from pargs(1)"}

Unexpected errors by the database will result in the process exiting. Similarly, if the open(2) count doesn't match the close(2) count, the process will panic. I abused PostgreSQL mightily today and am pretty confident in the error handling at this point and have run pg_prefaulter in a pretty abusive loop where the follower and primary both are regularly kill -9'ed, have their databases recreated regularly, and are under load via make gendata or make pgbench. Cheers.

@davepacheco
Copy link

As I understand it, a version of the prefaulter having this fix was running in production today and we ran into the same problem again.

@sean-
Copy link
Contributor

sean- commented Nov 6, 2017

Mantee Failover Testing

Test Procedure

  1. Intsall pg_prefaulter on three nodes in a shard. Ensure the version is at
    least as new as the October 14th build:
    https://us-east.manta.joyent.com/sean.chittenden/public/pg_prefaulter/pg_prefaulter_SNAPSHOT-21e0ec67ea7d97f4334b61517cbf0983c96a274a_Illumos_64-bit.tar.gz
    1. Verify the state of the shard:
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  23 (0/7705788)
mode:        normal
freeze:      frozen since 2017-10-24T08:46:27.851Z
freeze info: test

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   sync  0/7991B28     0/7991B28     0/7991B28     -     
sync     3ef75d54 ok   async 0/7991B28     0/7991B28     0/7991B28     -     
async    a5ec1ae3 ok   -     -             -             -             -     
  1. primary:
[root@headnode] # cd /zones/cd1aecdc-8634-699e-a6cc-81cfaf649ec5/root/var/tmp
[root@headnode] # mkdir pg_prefaulter
[root@headnode] # cd pg_prefaulter
[root@headnode] # curl -kO https://us-east.manta.joyent.com/sean.chittenden/public/pg_prefaulter/pg_prefaulter_SNAPSHOT-21e0ec67ea7d97f4334b61517cbf0983c96a274a_Illumos_64-bit.tar.gz
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4459k  100 4459k    0     0  3519k      0  0:00:01  0:00:01 --:--:-- 3522k
[root@headnode] # zlogin cd1aecdc-8634-699e-a6cc-81cfaf649ec5
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# ll
total 18738
-rw-r--r-- 1 root root    11358 Jul  4 03:06 LICENSE
-rw-r--r-- 1 root root    17316 Oct 14 15:48 README.adoc
-rwxr-xr-x 1 root root 14508897 Oct 14 16:33 pg_prefaulter*
-rw-r--r-- 1 root root     1601 Oct  9 22:48 pg_prefaulter.defaults.toml
-rw-r--r-- 1 root root      566 Oct  1 01:01 pg_prefaulter.toml.sample
-rw-r--r-- 1 root root  4566316 Nov  6 16:56 pg_prefaulter_SNAPSHOT-21e0ec67ea7d97f4334b61517cbf0983c96a274a_Illumos_64-bit.tar.gz
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# cat <<'EOF' > pg_prefaulter.smf 
> [snip]
> EOF
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# $EDITOR pg_prefaulter.smf 
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# cat pg_prefaulter.smf 
<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle type='manifest' name='export'>
  <service name='network/cloudops/agent/pg_prefaulter' type='service' version='0'>
    <create_default_instance enabled='true'/>
    <single_instance/>

    <dependency name='network' grouping='require_all' restart_on='error' type='service'>
      <service_fmri value='svc:/milestone/network:default'/>
    </dependency>

    <dependency name='filesystem' grouping='require_all' restart_on='error' type='service'>
      <service_fmri value='svc:/system/filesystem/local'/>
    </dependency>

    <method_context/>
    <exec_method name='start' type='method' exec='/var/tmp/pg_prefaulter/pg_prefaulter run --config=%{config_file}' timeout_seconds='60'>
      <method_context>
        <method_environment>
          <envvar name='HOME' value='/var/tmp'/>
          <envvar name='PATH' value='/usr/local/bin:/opt/local/bin:/usr/bin:/bin'/>
          <envvar name='USER' value='postgres'/>
          <envvar name='GOMAXPROCS' value='%{num_cpus}'/>
          <envvar name='LD_LIBRARY_PATH' value='/opt/local/lib/amd64'/>
        </method_environment>
      </method_context>
    </exec_method>
    <exec_method name='stop' type='method' exec=':kill' timeout_seconds='60'/>

    <property_group name='startd' type='framework'>
      <propval name='duration' type='astring' value='child'/>
      <propval name='ignore_error' type='astring' value='core,signal'/>
    </property_group>

    <property_group name='application' type='application'>
      <propval name='config_file' type='astring' value='/var/tmp/pg_prefaulter/pg_prefaulter.toml'/>
      <propval name='num_cpus' type='astring' value='24'/>
    </property_group>

    <stability value='Evolving'/>
    <template>
      <common_name>
        <loctext xml:lang='C'>pg_prefaulter agent</loctext>
      </common_name>
    </template>
  </service>
</service_bundle>
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# cat <<'EOF' > pg_prefaulter.toml
> [snip]
> EOF
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# $EDITOR pg_prefaulter.toml
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# cat pg_prefaulter.toml
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# cat pg_prefaulter.toml
[log]
level = "DEBUG"

[postgresql]
host = "/tmp"
mode = "auto"
port = 5432
pgdata = "/manatee/pg/data/"
user = "postgres"
database = "postgres"

[postgresql.xlog]
pg_xlogdump-path = "/opt/postgresql/9.6.3/bin/pg_xlogdump"

[circonus]
enabled = true

[circonus.api]
# token is a required value if circonus.enabled=true
token = "1234"

[circonus.broker]
# id should be set to "35" (the public HTTPTrap broker) if you have enterprise
# brokers configured in your account.
id = 35

[circonus.check]
tags = "dc1:emy1"
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# svccfg import pg_prefaulter.smf
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# svcs pg_prefaulter
STATE          STIME    FMRI
online         19:23:26 svc:/network/cloudops/agent/pg_prefaulter:default
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# head `svcs -L pg_prefaulter`
[ Nov  6 19:23:26 Enabled. ]
[ Nov  6 19:23:26 Rereading configuration. ]
[ Nov  6 19:23:26 Executing start method ("/var/tmp/pg_prefaulter/pg_prefaulter run --config=/var/tmp/pg_prefaulter/pg_prefaulter.toml"). ]
{"time":"2017-11-06T19:23:26.816903977Z","level":"debug","config-file":"/var/tmp/pg_prefaulter/pg_prefaulter.toml"}
{"time":"2017-11-06T19:23:26.817112595Z","level":"debug","message":"args: []"}
{"time":"2017-11-06T19:23:26.823318705Z","level":"debug","postgresql.pgdata":"/manatee/pg/data/","postgresql.host":"/tmp","postgresql.port":5432,"postgresql.user":"postgres","postgresql.xlog.mode":"pg","postgresql.xlog.pg_xlogdump-path":"/opt/postgresql/9.6.3/bin/pg_xlogdump","postgresql.poll-interval":1000,"message":"flags"}
{"time":"2017-11-06T19:23:26.824105750Z","level":"info","pid":75182,"message":"Starting pg_prefaulter"}
{"time":"2017-11-06T19:23:26.832152028Z","level":"debug","message":"starting gops(1) agent"}
{"time":"2017-11-06T19:23:26.832315904Z","level":"debug","pprof-port":4242,"message":"starting pprof endpoing agent"}
{"time":"2017-11-06T19:23:26.834930515Z","level":"debug","rlimit-nofile":65536,"filehandle-cache-size":2000,"filehandle-cache-ttl":300000,"message":"filehandle cache initialized"}
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# tail `svcs -L pg_prefaulter`
{"time":"2017-11-06T19:23:28.039824632Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    72, tx:       1171, lsn: 0/070BF168, prev 0/070BF118, desc: INSERT_LEAF off 7, blkref #0: rel 1664/0/1232 blk 1","message":"database 0"}
{"time":"2017-11-06T19:23:28.039890574Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    64, tx:       1171, lsn: 0/070BF1B0, prev 0/070BF168, desc: INSERT_LEAF off 3, blkref #0: rel 1664/0/1233 blk 1","message":"database 0"}
{"time":"2017-11-06T19:24:26.835260451Z","level":"debug","hit":90,"miss":83,"lookup":173,"hit-rate":0.5202312138728323,"message":"filehandle-stats"}
{"time":"2017-11-06T19:24:26.898057581Z","level":"debug","hit":3032,"miss":390,"lookup":3422,"hit-rate":0.886031560490941,"message":"iocache-stats"}
{"time":"2017-11-06T19:24:26.898121893Z","level":"debug","hit":59,"miss":1,"lookup":60,"hit-rate":0.9833333333333333,"message":"walcache-stats"}
{"time":"2017-11-06T19:24:26.900901716Z","level":"debug","sender-state":"","db-state":"primary","peer-sync-state":"","durability-lag":0,"flush-lag":0,"visibility-lag":0,"visibility-lag_ms":0,"message":"db-stats"}
{"time":"2017-11-06T19:25:26.835491408Z","level":"debug","hit":90,"miss":83,"lookup":173,"hit-rate":0.5202312138728323,"message":"filehandle-stats"}
{"time":"2017-11-06T19:25:26.898309091Z","level":"debug","hit":3032,"miss":390,"lookup":3422,"hit-rate":0.886031560490941,"message":"iocache-stats"}
{"time":"2017-11-06T19:25:26.898332745Z","level":"debug","hit":119,"miss":1,"lookup":120,"hit-rate":0.9916666666666667,"message":"walcache-stats"}
{"time":"2017-11-06T19:25:26.901052136Z","level":"debug","sender-state":"","db-state":"primary","peer-sync-state":"","durability-lag":0,"flush-lag":0,"visibility-lag":0,"visibility-lag_ms":0,"message":"db-stats"}
  1. sync: Replicate the same procedure as above with 3ef75d54-c0d8-e579-f0df-b893b768c9ed.
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed /var/tmp/pg_prefaulter]# head `svcs -L pg_prefaulter`
[ Nov  6 21:40:46 Enabled. ]
[ Nov  6 21:40:46 Rereading configuration. ]
[ Nov  6 21:40:46 Executing start method ("/var/tmp/pg_prefaulter/pg_prefaulter run --config=/var/tmp/pg_prefaulter/pg_prefaulter.toml"). ]
{"time":"2017-11-06T21:40:46.253909146Z","level":"debug","config-file":"/var/tmp/pg_prefaulter/pg_prefaulter.toml"}
{"time":"2017-11-06T21:40:46.254034038Z","level":"debug","message":"args: []"}
{"time":"2017-11-06T21:40:46.254526283Z","level":"debug","message":"starting gops(1) agent"}
{"time":"2017-11-06T21:40:46.254617217Z","level":"debug","pprof-port":4242,"message":"starting pprof endpoing agent"}
{"time":"2017-11-06T21:40:46.254504246Z","level":"debug","postgresql.pgdata":"/manatee/pg/data/","postgresql.host":"/tmp","postgresql.port":5432,"postgresql.user":"postgres","postgresql.xlog.mode":"pg","postgresql.xlog.pg_xlogdump-path":"/opt/postgresql/9.6.3/bin/pg_xlogdump","postgresql.poll-interval":1000,"message":"flags"}
{"time":"2017-11-06T21:40:46.255194657Z","level":"info","pid":97910,"message":"Starting pg_prefaulter"}
{"time":"2017-11-06T21:40:46.258205525Z","level":"debug","rlimit-nofile":65536,"filehandle-cache-size":2000,"filehandle-cache-ttl":300000,"message":"filehandle cache initialized"}
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed /var/tmp/pg_prefaulter]# tail `svcs -L pg_prefaulter`    
{"time":"2017-11-06T21:40:46.748354419Z","level":"info","input":"rmgr: Heap        len (rec/tot):      3/    80, tx:       1171, lsn: 0/070BD848, prev 0/070BD800, desc: INSERT off 7, blkref #0: rel 1664/0/1214 blk 0","message":"database 0"}
{"time":"2017-11-06T21:40:46.748410544Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    72, tx:       1171, lsn: 0/070BD898, prev 0/070BD848, desc: INSERT_LEAF off 6, blkref #0: rel 1664/0/1232 blk 1","message":"database 0"}
{"time":"2017-11-06T21:40:46.748448014Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    64, tx:       1171, lsn: 0/070BD8E0, prev 0/070BD898, desc: INSERT_LEAF off 3, blkref #0: rel 1664/0/1233 blk 1","message":"database 0"}
{"time":"2017-11-06T21:40:46.749013902Z","level":"info","input":"rmgr: Heap        len (rec/tot):      3/    80, tx:       1171, lsn: 0/070BF118, prev 0/070BF0D0, desc: INSERT off 8, blkref #0: rel 1664/0/1214 blk 0","message":"database 0"}
{"time":"2017-11-06T21:40:46.749079348Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    72, tx:       1171, lsn: 0/070BF168, prev 0/070BF118, desc: INSERT_LEAF off 7, blkref #0: rel 1664/0/1232 blk 1","message":"database 0"}
{"time":"2017-11-06T21:40:46.749108909Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    64, tx:       1171, lsn: 0/070BF1B0, prev 0/070BF168, desc: INSERT_LEAF off 3, blkref #0: rel 1664/0/1233 blk 1","message":"database 0"}
{"time":"2017-11-06T21:41:46.258454929Z","level":"debug","hit":91,"miss":82,"lookup":173,"hit-rate":0.5260115606936416,"message":"filehandle-stats"}
{"time":"2017-11-06T21:41:46.317581987Z","level":"debug","hit":118,"miss":2,"lookup":120,"hit-rate":0.9833333333333333,"message":"walcache-stats"}
{"time":"2017-11-06T21:41:46.318613289Z","level":"debug","sender-state":"receiving","db-state":"follower","peer-sync-state":"applying","durability-lag":0,"flush-lag":0,"visibility-lag":0,"visibility-lag_ms":0,"message":"db-stats"}
{"time":"2017-11-06T21:41:46.318616936Z","level":"debug","hit":3028,"miss":394,"lookup":3422,"hit-rate":0.8848626534190532,"message":"iocache-stats"}
  1. async: Replicate the same procedure as above with a5ec1ae3-9d69-4ae8-9918-abaf41da24d0.
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 /var/tmp/pg_prefaulter]# head `svcs -L pg_prefaulter`
[ Nov  6 21:43:18 Enabled. ]
[ Nov  6 21:43:18 Rereading configuration. ]
[ Nov  6 21:43:18 Executing start method ("/var/tmp/pg_prefaulter/pg_prefaulter run --config=/var/tmp/pg_prefaulter/pg_prefaulter.toml"). ]
{"time":"2017-11-06T21:43:18.559327506Z","level":"debug","config-file":"/var/tmp/pg_prefaulter/pg_prefaulter.toml"}
{"time":"2017-11-06T21:43:18.559471304Z","level":"debug","message":"args: []"}
{"time":"2017-11-06T21:43:18.560034653Z","level":"debug","message":"starting gops(1) agent"}
{"time":"2017-11-06T21:43:18.560316805Z","level":"debug","pprof-port":4242,"message":"starting pprof endpoing agent"}
{"time":"2017-11-06T21:43:18.559974758Z","level":"debug","postgresql.pgdata":"/manatee/pg/data/","postgresql.host":"/tmp","postgresql.port":5432,"postgresql.user":"postgres","postgresql.xlog.mode":"pg","postgresql.xlog.pg_xlogdump-path":"/opt/postgresql/9.6.3/bin/pg_xlogdump","postgresql.poll-interval":1000,"message":"flags"}
{"time":"2017-11-06T21:43:18.560727483Z","level":"info","pid":98980,"message":"Starting pg_prefaulter"}
{"time":"2017-11-06T21:43:18.563906109Z","level":"debug","rlimit-nofile":65536,"filehandle-cache-size":2000,"filehandle-cache-ttl":300000,"message":"filehandle cache initialized"}
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 /var/tmp/pg_prefaulter]# tail `svcs -L pg_prefaulter`    
{"time":"2017-11-06T21:43:19.590317327Z","level":"info","input":"rmgr: Heap        len (rec/tot):      3/    80, tx:       1169, lsn: 0/070BA468, prev 0/070BA418, desc: INSERT off 6, blkref #0: rel 1664/0/1214 blk 0","message":"database 0"}
{"time":"2017-11-06T21:43:19.590401347Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    72, tx:       1169, lsn: 0/070BA4B8, prev 0/070BA468, desc: INSERT_LEAF off 5, blkref #0: rel 1664/0/1232 blk 1","message":"database 0"}
{"time":"2017-11-06T21:43:19.590464544Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    64, tx:       1169, lsn: 0/070BA500, prev 0/070BA4B8, desc: INSERT_LEAF off 3, blkref #0: rel 1664/0/1233 blk 1","message":"database 0"}
{"time":"2017-11-06T21:43:19.590740162Z","level":"info","input":"rmgr: Heap        len (rec/tot):     14/    79, tx:       1170, lsn: 0/070BBD50, prev 0/070BBD18, desc: HOT_UPDATE off 4 xmax 1170 ; new off 5 xmax 0, blkref #0: rel 1664/0/1260 blk 0","message":"database 0"}
{"time":"2017-11-06T21:43:19.591196494Z","level":"info","input":"rmgr: Heap        len (rec/tot):      3/    80, tx:       1171, lsn: 0/070BD848, prev 0/070BD800, desc: INSERT off 7, blkref #0: rel 1664/0/1214 blk 0","message":"database 0"}
{"time":"2017-11-06T21:43:19.591277400Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    72, tx:       1171, lsn: 0/070BD898, prev 0/070BD848, desc: INSERT_LEAF off 6, blkref #0: rel 1664/0/1232 blk 1","message":"database 0"}
{"time":"2017-11-06T21:43:19.591324493Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    64, tx:       1171, lsn: 0/070BD8E0, prev 0/070BD898, desc: INSERT_LEAF off 3, blkref #0: rel 1664/0/1233 blk 1","message":"database 0"}
{"time":"2017-11-06T21:43:19.591638034Z","level":"info","input":"rmgr: Heap        len (rec/tot):      3/    80, tx:       1171, lsn: 0/070BF118, prev 0/070BF0D0, desc: INSERT off 8, blkref #0: rel 1664/0/1214 blk 0","message":"database 0"}
{"time":"2017-11-06T21:43:19.591716052Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    72, tx:       1171, lsn: 0/070BF168, prev 0/070BF118, desc: INSERT_LEAF off 7, blkref #0: rel 1664/0/1232 blk 1","message":"database 0"}
{"time":"2017-11-06T21:43:19.591775684Z","level":"info","input":"rmgr: Btree       len (rec/tot):      2/    64, tx:       1171, lsn: 0/070BF1B0, prev 0/070BF168, desc: INSERT_LEAF off 3, blkref #0: rel 1664/0/1233 blk 1","message":"database 0"}
  1. Generate load against the Manatee primary. Run the following in a
    tmux session:
$ env PGSSLMODE=disable PGHOST=/tmp PGUSER=postgres /opt/postgresql/current/bin/pgbench -i -s 10 -F 90
$ while 1; do env PGSSLMODE=disable PGHOST=/tmp PGUSER=postgres /opt/postgresql/current/bin/pgbench -j 64 -P 60 -r -T 900 --no-vacuum --protocol=prepared; done
  1. Unfreeze the shard and verify WAL entries are flowing:
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  23 (0/7705788)
mode:        normal
freeze:      frozen since 2017-10-24T08:46:27.851Z
freeze info: test

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   sync  0/FF06FE8     0/FF06FE8     0/FF06E58     -     
sync     3ef75d54 ok   async 0/FF06FE8     0/FF06FE8     0/FF06E58     0m00s 
async    a5ec1ae3 ok   -     -             -             -             0m00s 
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  23 (0/7705788)
mode:        normal
freeze:      frozen since 2017-10-24T08:46:27.851Z
freeze info: test

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   sync  0/FF9FBA8     0/FF9FBA8     0/FF9FA50     -     
sync     3ef75d54 ok   async 0/FF9FED8     0/FF9FD40     0/FF9FBA8     0m00s 
async    a5ec1ae3 ok   -     -             -             -             0m00s 
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# manatee-adm unfreeze
Unfrozen.
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# manatee-adm show    
zookeeper:   172.25.14.11
cluster:     lab0
generation:  23 (0/7705788)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   sync  0/108BA3D0    0/108BA238    0/108BA0A0    -     
sync     3ef75d54 ok   async 0/108BA0A0    0/108BA0A0    0/108B9EF0    0m00s 
async    a5ec1ae3 ok   -     -             -             -             0m00s 
  1. Validate replication. Ensure updated WAL records are flowing through to:
    1. primary.
      1. Obtain a list of the open files:
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# pgrep pg_prefaulter | xargs -n1 pfiles | head -n 45
75182:  /var/tmp/pg_prefaulter/pg_prefaulter run --config=/var/tmp/pg_prefault
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:559,138372 ino:2364281560 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE
      /dev/null
      offset:0
   1: S_IFREG mode:0644 dev:90,69592 ino:66286 uid:0 gid:0 size:109814
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/network-cloudops-agent-pg_prefaulter:default.log
      offset:109814
   2: S_IFREG mode:0644 dev:90,69592 ino:66286 uid:0 gid:0 size:109814
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/network-cloudops-agent-pg_prefaulter:default.log
      offset:109814
   3: S_IFCHR mode:0644 dev:559,138372 ino:1983103238 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:570,0 uid:0 gid:0 size:0
   5: S_IFSOCK mode:0666 dev:567,0 ino:47400 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 4242
   6: S_IFSOCK mode:0666 dev:567,0 ino:35184 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 47375
   7: S_IFIFO mode:0000 dev:556,0 ino:700086872 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   8: S_IFREG mode:0600 dev:90,70221 ino:2189 uid:907 gid:907 size:106496
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12851/1259
      offset:0
   9: S_IFSOCK mode:0666 dev:567,0 ino:54637 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_BROADCAST,SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peername: AF_UNIX /tmp/.s.PGSQL.5432
        peer: postgres[50800] zone: cd1aecdc-8634-699e-a6cc-81cfaf649ec5[1576]
  10: S_IFREG mode:0600 dev:90,70221 ino:2213 uid:907 gid:907 size:40960
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12851/2663
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# pgrep pg_prefaulter | xargs -n1 pfiles | tail
 101: S_IFREG mode:0600 dev:90,70221 ino:9119 uid:907 gid:0 size:204800
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12852/16430
      offset:0
 103: S_IFREG mode:0600 dev:90,70221 ino:9111 uid:907 gid:0 size:6848512
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12852/16421
      offset:0
 105: S_IFIFO mode:0000 dev:556,0 ino:700087532 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
  2. [X] Obtain a list of the args and make sure they're changing (or examine the log output from the prefaulter):
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# pgrep postgres | 2>&1 xargs -n1 pargs | grep -i WAL && sleep 30 && pgrep postgres | 2>&1 xargs -n1 pargs | grep -i WAL
argv[0]: postgres: wal sender process postgres 172.25.14.134(41564) streaming 0/14364818
argv[0]: postgres: wal writer process
argv[0]: postgres: wal sender process postgres 172.25.14.134(41564) streaming 0/1491A758
argv[0]: postgres: wal writer process
  1. sync
    1. Obtain a list of the open files:
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed ~]# pgrep pg_prefaulter | xargs -n1 pfiles | head -n 45
97910:  /var/tmp/pg_prefaulter/pg_prefaulter run --config=/var/tmp/pg_prefault
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:559,138373 ino:1688795282 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE
      /dev/null
      offset:0
   1: S_IFREG mode:0644 dev:90,69579 ino:66266 uid:0 gid:0 size:65578
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/network-cloudops-agent-pg_prefaulter:default.log
      offset:65578
   2: S_IFREG mode:0644 dev:90,69579 ino:66266 uid:0 gid:0 size:65578
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/network-cloudops-agent-pg_prefaulter:default.log
      offset:65578
   3: S_IFCHR mode:0644 dev:559,138373 ino:3095488064 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:570,0 uid:0 gid:0 size:0
   5: S_IFSOCK mode:0666 dev:567,0 ino:64242 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 59224
   6: S_IFSOCK mode:0666 dev:567,0 ino:18969 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 4242
   7: S_IFREG mode:0600 dev:90,70223 ino:2591 uid:907 gid:907 size:8192
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12852/2606
      offset:0
   8: S_IFREG mode:0600 dev:90,70223 ino:2213 uid:907 gid:907 size:40960
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12851/2663
      offset:0
   9: S_IFSOCK mode:0666 dev:567,0 ino:53504 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_BROADCAST,SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peername: AF_UNIX /tmp/.s.PGSQL.5432
        peer: postgres[51316] zone: 3ef75d54-c0d8-e579-f0df-b893b768c9ed[1577]
  10: S_IFREG mode:0600 dev:90,70223 ino:2189 uid:907 gid:907 size:106496
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed ~]# pgrep pg_prefaulter | xargs -n1 pfiles | tail
      /manatee/pg/data/base/12852/16436
      offset:0
 110: S_IFREG mode:0600 dev:90,70223 ino:2507 uid:907 gid:907 size:24576
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12852/2610
      offset:0
 111: S_IFREG mode:0600 dev:90,70223 ino:1443 uid:907 gid:0 size:22487040
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12852/16438
      offset:0
  2. [X] Obtain a list of the args and make sure they're changing (or examine the log output from the prefaulter):
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed ~]# pgrep postgres | 2>&1 xargs -n1 pargs | grep -i WAL && sleep 30 && pgrep postgres | 2>&1 xargs -n1 pargs | grep -i WAL
argv[0]: postgres: wal receiver process   streaming 0/1843B070
argv[0]: postgres: wal sender process postgres 172.25.14.132(62171) streaming 0/1843D648
argv[0]: postgres: wal receiver process   streaming 0/18A026B8
argv[0]: postgres: wal sender process postgres 172.25.14.132(62171) streaming 0/18A05880
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  23 (0/7705788)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   sync  0/1AB4D070    0/1AB4D070    0/1AB4D038    -     
sync     3ef75d54 ok   async 0/1AB4D070    0/1AB4D070    0/1AB4D038    0m16s 
async    a5ec1ae3 ok   -     -             -             -             0m16s 
  1. async.
    1. Obtain a list of the open files:
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 ~]# pgrep pg_prefaulter | xargs -n1 pfiles | head -n 45
98980:  /var/tmp/pg_prefaulter/pg_prefaulter run --config=/var/tmp/pg_prefault
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:559,138371 ino:2116658781 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE
      /dev/null
      offset:0
   1: S_IFREG mode:0644 dev:90,69588 ino:66346 uid:0 gid:0 size:29215
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/network-cloudops-agent-pg_prefaulter:default.log
      offset:29215
   2: S_IFREG mode:0644 dev:90,69588 ino:66346 uid:0 gid:0 size:29215
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/network-cloudops-agent-pg_prefaulter:default.log
      offset:29215
   3: S_IFCHR mode:0644 dev:559,138371 ino:1051367549 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:570,0 uid:0 gid:0 size:0
   5: S_IFSOCK mode:0666 dev:567,0 ino:6812 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 65407
   6: S_IFSOCK mode:0666 dev:567,0 ino:56979 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 4242
   7: S_IFIFO mode:0000 dev:556,0 ino:700121633 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
   8: S_IFREG mode:0600 dev:90,70222 ino:2213 uid:907 gid:907 size:40960
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12851/2663
      offset:0
   9: S_IFSOCK mode:0666 dev:567,0 ino:23000 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_BROADCAST,SO_SNDBUF(16384),SO_RCVBUF(5120)
        sockname: AF_UNIX 
        peername: AF_UNIX /tmp/.s.PGSQL.5432
        peer: postgres[50860] zone: a5ec1ae3-9d69-4ae8-9918-abaf41da24d0[1575]
  10: S_IFREG mode:0600 dev:90,70222 ino:2296 uid:907 gid:907 size:32768
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12851/2662
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 ~]# pgrep pg_prefaulter | xargs -n1 pfiles | tail      
      /manatee/pg/data/base/12852/16434
      offset:0
 110: S_IFREG mode:0600 dev:90,70222 ino:1449 uid:907 gid:0 size:22487040
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12852/16438
      offset:0
 111: S_IFREG mode:0600 dev:90,70222 ino:1448 uid:907 gid:0 size:90112
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /manatee/pg/data/base/12852/16436
      offset:0
  2. [X] Obtain a list of the args and make sure they're changing (or examine the log output from the prefaulter):
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 ~]# pgrep postgres | 2>&1 xargs -n1 pargs | grep -i WAL && sleep 30 && pgrep postgres | 2>&1 xargs -n1 pargs | grep -i WAL                                          
argv[0]: postgres: wal receiver process   streaming 0/1AB4D3F0
argv[0]: postgres: wal receiver process   streaming 0/1AB4D4D0
  1. Stop the sync:
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed ~]# svcs manatee-sitter
STATE          STIME    FMRI
online         10:30:16 svc:/manta/application/manatee-sitter:default
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed ~]# svcadm disable manatee-sitter 
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed ~]# svcs manatee-sitter
STATE          STIME    FMRI
disabled       22:17:18 svc:/manta/application/manatee-sitter:default
[root@3ef75d54-c0d8-e579-f0df-b893b768c9ed ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  23 (0/7705788)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   -     -             -             -             -     
sync     3ef75d54 fail -     -             -             -             -     
async    a5ec1ae3 ok   -     -             -             -             9m24s 

error: cannot query postgres on sync: peer "3ef75d54": ECONNREFUSED
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  24 (0/1AB4E1D0)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   -     -             -             -             -     
sync     a5ec1ae3 fail -     -             -             -             -     

error: cannot query postgres on sync: peer "a5ec1ae3": ECONNREFUSED
warning: cluster has no async peers
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  24 (0/1AB4E1D0)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   potential 0/F0E0000     0/F000000     0/E0FE358     -     
sync     a5ec1ae3 ok   -     -             -             -             27m16s

error: peer "cd1aecdc": downstream replication not yet established (expected state "streaming", found "catchup")
warning: cluster has no async peers
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  24 (0/1AB4E1D0)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   potential 0/14F80000    0/14000000    0/123235C0    -     
sync     a5ec1ae3 ok   -     -             -             -             21m03s

error: peer "cd1aecdc": downstream replication not yet established (expected state "streaming", found "catchup")
warning: cluster has no async peers
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# env PGSSLMODE=disable PGHOST=/tmp PGUSER=postgres /opt/postgresql/current/bin/pgbench -j 64 -P 60 -r -T 900 --no-vacuum --protocol=prepared
progress: 60.0 s, 533.0 tps, lat 1.873 ms stddev 0.853
progress: 120.0 s, 498.7 tps, lat 2.003 ms stddev 1.007
progress: 180.0 s, 536.5 tps, lat 1.862 ms stddev 4.018
progress: 240.0 s, 514.6 tps, lat 1.941 ms stddev 1.187
progress: 300.0 s, 516.2 tps, lat 1.935 ms stddev 1.334
progress: 360.0 s, 491.0 tps, lat 2.035 ms stddev 1.137
progress: 420.0 s, 477.4 tps, lat 2.092 ms stddev 5.804
progress: 480.0 s, 535.7 tps, lat 1.864 ms stddev 1.084
progress: 540.0 s, 506.1 tps, lat 1.974 ms stddev 3.084
progress: 600.0 s, 490.5 tps, lat 2.036 ms stddev 2.268
progress: 660.0 s, 490.4 tps, lat 2.037 ms stddev 3.119
progress: 720.0 s, 490.9 tps, lat 2.035 ms stddev 1.134
progress: 780.0 s, 503.1 tps, lat 1.986 ms stddev 1.595
progress: 840.0 s, 501.1 tps, lat 1.993 ms stddev 0.907
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: prepared
number of clients: 1
number of threads: 1
duration: 900 s
number of transactions actually processed: 454097
latency average = 1.980 ms
latency stddev = 2.814 ms
tps = 504.559554 (including connections establishing)
tps = 504.574002 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.003  \set aid random(1, 100000 * :scale)
         0.001  \set bid random(1, 1 * :scale)
         0.001  \set tid random(1, 10 * :scale)
         0.001  \set delta random(-5000, 5000)
         0.066  BEGIN;
         0.194  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.126  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.181  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.195  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.127  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.072  END;
  1. Watch the async take over for the now failed sync:
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 /var/tmp/pg_prefaulter]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  24 (0/1AB4E1D0)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   sync  0/1AB4E9E8    0/1AB4E9E8    0/1AB4E9E8    -     
sync     a5ec1ae3 ok   async 0/1AB4E9E8    0/1AB4E9E8    0/1AB4E9E8    14m22s
async    3ef75d54 ok   -     -             -             -             -     
# echo "Lag caused because the above pgbench finished.  Restarting pgbench reset the time-based lag metric to 0."
Lag caused because the above pgbench finished.  Restarting pgbench reset the time-based lag metric to 0.
  1. Restart the deposed sync (it should now become the async):
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  24 (0/1AB4E1D0)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  cd1aecdc ok   sync  0/1ACB2668    0/1ACB2668    0/1ACB24D8    -     
sync     a5ec1ae3 ok   async 0/1ACB2668    0/1ACB2668    0/1ACB24D8    0m00s 
async    3ef75d54 ok   -     -             -             -             0m00s 
  1. Wait until lag comes down to acceptable levels.
  2. Stop the primary:
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# svcs manatee-sitter
STATE          STIME    FMRI
online         10:29:16 svc:/manta/application/manatee-sitter:default
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# svcadm disable manatee-sitter
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# svcs manatee-sitter
STATE          STIME    FMRI
disabled       22:31:12 svc:/manta/application/manatee-sitter:default
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# env PGSSLMODE=disable PGHOST=/tmp PGUSER=postgres /opt/postgresql/current/bin/pgbench -j 64 -P 60 -r -T 900 --no-vacuum --protocol=prepared
progress: 60.0 s, 607.7 tps, lat 1.643 ms stddev 0.988
progress: 120.0 s, 498.6 tps, lat 2.003 ms stddev 3.468
progress: 180.0 s, 509.1 tps, lat 1.962 ms stddev 1.905
progress: 240.0 s, 484.3 tps, lat 2.062 ms stddev 0.958
progress: 300.0 s, 519.1 tps, lat 1.924 ms stddev 1.128
client 0 aborted in state 8; perhaps the backend died while processing
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: prepared
number of clients: 1
number of threads: 1
duration: 900 s
number of transactions actually processed: 161806
latency average = 1.911 ms
latency stddev = 1.906 ms
tps = 522.243830 (including connections establishing)
tps = 522.303625 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.003  \set aid random(1, 100000 * :scale)
         0.001  \set bid random(1, 1 * :scale)
         0.001  \set tid random(1, 10 * :scale)
         0.001  \set delta random(-5000, 5000)
         0.066  BEGIN;
         0.197  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.123  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.165  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.172  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.125  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.046  END;
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  25 (0/1EBADF68)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  a5ec1ae3 ok   sync  0/1EBAE028    0/1EBAE028    0/1EBADFD8    -     
sync     3ef75d54 ok   -     -             -             -             1m04s 
deposed  cd1aecdc ok   -     -             -             -             -     

warning: cluster has a deposed peer
warning: cluster has no async peers
[root@a5ec1ae3-9d69-4ae8-9918-abaf41da24d0 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  25 (0/1EBADF68)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  a5ec1ae3 ok   sync  0/1EE61B78    0/1EE61B78    0/1EE619E0    -     
sync     3ef75d54 ok   -     -             -             -             0m00s 
deposed  cd1aecdc ok   -     -             -             -             -     

warning: cluster has a deposed peer
warning: cluster has no async peers
  1. Watch the sync take over for the now failed primary.
  2. Watch the async take over as the new sync:
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm show      
zookeeper:   172.25.14.11
cluster:     lab0
generation:  25 (0/1EBADF68)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  a5ec1ae3 ok   sync  0/2185D210    0/2185D080    0/2185CEE8    -     
sync     3ef75d54 ok   -     -             -             -             0m00s 
deposed  cd1aecdc ok   -     -             -             -             -     

warning: cluster has a deposed peer
warning: cluster has no async peers
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm rebuild
This peer is a deposed former primary and may have state that needs to be removed before it can successfully rejoin the cluster.  This operation will remove all local data and then unmark this peer as deposed so that it can rejoin the cluster.  This operation is destructive, and you should only proceed after confirming that this peer's copy of the data is no longer required. (yes/no)?
prompt: no:  yes
Disabling manatee-sitter SMF service to ensure that Manatee is not running.
Waiting up to 90 seconds for this Manatee instance's ZooKeeper session to expire |
Attempting to remove ZFS dataset's contents
Enabling sitter
Watching sitter for postgres startup  
assert.js:92
  throw new assert.AssertionError({
        ^
AssertionError: operation "tcp://postgres@172.25.14.171:5432/postgres" is already pending
    at Barrier.start (/opt/smartdc/manatee/node_modules/manatee/node_modules/vasync/lib/vasync.js:466:13)
    at /opt/smartdc/manatee/node_modules/manatee/lib/adm.js:374:21
    at Array.forEach (native)
    at /opt/smartdc/manatee/node_modules/manatee/lib/adm.js:371:15
    at Array.forEach (native)
    at Object._addPostgresStatus [as func] (/opt/smartdc/manatee/node_modules/manatee/lib/adm.js:351:26)
    at Object._onImmediate (/opt/smartdc/manatee/node_modules/manatee/node_modules/vasync/lib/vasync.js:213:20)
    at processImmediate [as _immediateCallback] (timers.js:330:15)
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# pwd
/root
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm show   
zookeeper:   172.25.14.11
cluster:     lab0
generation:  25 (0/1EBADF68)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  a5ec1ae3 ok   sync  0/22D043A8    0/22D043A8    0/22D04218    -     
sync     3ef75d54 ok   async 0/206A0000    0/20000000    0/1E1F30F8    0m00s 
async    cd1aecdc ok   -     -             -             -             6m50s 

warning: peer "3ef75d54": downstream replication not yet established (expected state "streaming", found "catchup")
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  25 (0/1EBADF68)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  a5ec1ae3 ok   sync  0/2341C4B8    0/2341C4B8    0/2341C320    -     
sync     3ef75d54 ok   async 0/2341C4B8    0/2341C4B8    0/2341C320    0m00s 
async    cd1aecdc ok   -     -             -             -             0m00s 
  1. Stop the async:
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# pgrep pg_prefaulter | xargs -n1 pfiles pg_prefaulter
pfiles: cannot examine pg_prefaulter: no such core file
62985:  /var/tmp/pg_prefaulter/pg_prefaulter run --config=/var/tmp/pg_prefault
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:559,138372 ino:2364281560 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE
      /dev/null
      offset:0
   1: S_IFREG mode:0644 dev:90,69592 ino:66286 uid:0 gid:0 size:8058192
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/network-cloudops-agent-pg_prefaulter:default.log
      offset:8058192
   2: S_IFREG mode:0644 dev:90,69592 ino:66286 uid:0 gid:0 size:8058192
      O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
      /var/svc/log/network-cloudops-agent-pg_prefaulter:default.log
      offset:8058192
   3: S_IFCHR mode:0644 dev:559,138372 ino:1983103238 uid:0 gid:3 rdev:49,1
      O_RDONLY|O_NONBLOCK|O_LARGEFILE FD_CLOEXEC
      /dev/urandom
      offset:16
   4: S_IFPORT mode:0000 dev:570,0 uid:0 gid:0 size:2
   5: S_IFSOCK mode:0666 dev:567,0 ino:12914 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 4242
   6: S_IFSOCK mode:0666 dev:567,0 ino:21818 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_BROADCAST,SO_SNDBUF(128000),SO_RCVBUF(1048576)
        sockname: AF_INET 127.0.0.1  port: 53305
   7: S_IFSOCK mode:0666 dev:567,0 ino:1628 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_DGRAM
        SO_BROADCAST,SO_DGRAM_ERRIND,SO_SNDBUF(57344),SO_RCVBUF(57344)
        sockname: AF_INET 172.25.14.133  port: 43659
        peername: AF_INET 172.25.14.11  port: 53
   8: S_IFSOCK mode:0666 dev:567,0 ino:26429 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_DGRAM
        SO_BROADCAST,SO_DGRAM_ERRIND,SO_SNDBUF(57344),SO_RCVBUF(57344)
        sockname: AF_INET 172.25.14.133  port: 42632
        peername: AF_INET 172.25.14.11  port: 53
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm show                                    
zookeeper:   172.25.14.11
cluster:     lab0
generation:  25 (0/1EBADF68)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  a5ec1ae3 ok   sync  0/240689A0    0/240689A0    0/24068810    -     
sync     3ef75d54 ok   -     -             -             -             0m00s 

warning: cluster has no async peers
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# svcadm enable manatee-sitter       
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm show            
zookeeper:   172.25.14.11
cluster:     lab0
generation:  25 (0/1EBADF68)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  a5ec1ae3 ok   sync  0/2428A500    0/2428A500    0/2428A370    -     
sync     3ef75d54 ok   -     -             -             -             0m00s 
async    cd1aecdc fail -     -             -             -             -     

warning: peer "3ef75d54": downstream replication peer not connected
[root@cd1aecdc-8634-699e-a6cc-81cfaf649ec5 ~]# manatee-adm show
zookeeper:   172.25.14.11
cluster:     lab0
generation:  25 (0/1EBADF68)
mode:        normal
freeze:      not frozen

ROLE     PEER     PG   REPL  SENT          FLUSH         REPLAY        LAG   
primary  a5ec1ae3 ok   sync  0/243AAE78    0/243AAE78    0/243AACE0    -     
sync     3ef75d54 ok   async 0/243AAE78    0/243AAE78    0/23CA4C58    0m00s 
async    cd1aecdc ok   -     -             -             -             0m31s 
  1. Restart the deposed async (it should just rejoin the cluster as the async).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants