Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests hang when running MysqlIsolatedServer on Ubuntu 14 & 15 #177

Closed
anaerobic opened this issue Dec 4, 2015 · 14 comments
Closed

Tests hang when running MysqlIsolatedServer on Ubuntu 14 & 15 #177

anaerobic opened this issue Dec 4, 2015 · 14 comments

Comments

@anaerobic
Copy link

running the tests using mvn install results in the following behavior:

Running com.zendesk.maxwell.SchemaStoreTest
DEBUG MysqlIsolatedServer - booting onetimeserver: ./src/test/onetimeserver --mysql-version=5.5 --log-bin=master --binlog_format=row --innodb_flush_log_at_trx_commit=1 --server_id=123123

which then hangs forever, even though a ps aux | grep mysql shows the wrapper, onetimeserver-go and mysqld processes running along famously.

@osheroff
Copy link
Collaborator

osheroff commented Dec 4, 2015

hmmm, booted up an ubuntu 14 vm and everything seemed to go just fine ( travis runs on ubuntu too).

What do you get if you run:

./src/test/onetimeserver --mysql-version=5.5 --log-bin=master --binlog_format=row --innodb_flush_log_at_trx_commit=1 --server_id=123123 --debug

@anaerobic
Copy link
Author

replacing actual home with ~, of course:

exec:  ~/.onetimeserver/Linux-x86_64/wrapper -debug -mysql-version 5.5 -ppid 19879 -type mysql -- --log-bin=master --binlog_format=row --innodb_flush_log_at_trx_commit=1 --server_id=123123

and it hangs..

if I edit src/test/onetimeserver to use $CACHE_DIR/ontimeserver-go instead of $CACHE_DIR/wrapper, I can run it with the expected behavior on command line, but the test will still hang as described if I mvn install

@osheroff
Copy link
Collaborator

osheroff commented Dec 5, 2015

tried it out on all the ubuntu boxes I have access to, no repro. What's your gcc? gcc --version?

@osheroff
Copy link
Collaborator

osheroff commented Dec 5, 2015

oh, the output of strace would help too. strace ~/.onetimeserver/Linux-x86_64/wrapper -debug -mysql-version 5.5 -ppid 19879 -type mysql -- --log-bin=master --binlog_format=row --innodb_flush_log_at_trx_commit=1 --server_id=123123. There's gotta be something wrong with the way it communicates with the child process.

@anaerobic
Copy link
Author

The last section repeats until I Ctrl+C out to detach:

strace ~/.onetimeserver/Linux-x86_64/wrapper -debug -mysql-version 5.5 -ppid 19879 -type mysql -- --log-bin=master --binlog_format=row --innodb_flush_log_at_trx_commit=1 --server_id=123123
execve("~/.onetimeserver/Linux-x86_64/wrapper", ["!/.onetimeserver/Li"..., "-debug", "-mysql-version", "5.5", "-ppid", "19879", "-type", "mysql", "--", "--log-bin=master", "--binlog_format=row", "--innodb_flush_log_at_trx_commit"..., "--server_id=123123"], [/* 71 vars */]) = 0
brk(0)                                  = 0x1dc7000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37b355e000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=140262, ...}) = 0
mmap(NULL, 140262, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f37b353b000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\v\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1869392, ...}) = 0
mmap(NULL, 3972864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f37b2f73000
mprotect(0x7f37b3133000, 2097152, PROT_NONE) = 0
mmap(0x7f37b3333000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c0000) = 0x7f37b3333000
mmap(0x7f37b3339000, 16128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f37b3339000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37b353a000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37b3539000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37b3538000
arch_prctl(ARCH_SET_FS, 0x7f37b3539700) = 0
mprotect(0x7f37b3333000, 16384, PROT_READ) = 0
mprotect(0x601000, 4096, PROT_READ)     = 0
mprotect(0x7f37b3560000, 4096, PROT_READ) = 0
munmap(0x7f37b353b000, 140262)          = 0
getpid()                                = 5200
open("/tmp/onetimeserver.u0OCTP", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f37b35399d0) = 5208
fcntl(3, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
brk(0)                                  = 0x1dc7000
brk(0x1de8000)                          = 0x1de8000
fstat(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37b355d000
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 3523
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 3594
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {0, 314131915})       = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 4297
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, {0, 978452678})       = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
restart_syscall(<... resuming interrupted call ...>) = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 4297
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 4297
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 4297
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 4297
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 4297
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 4297
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 4297
read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, ^CProcess 5200 detached
 <detached ...>

@osheroff
Copy link
Collaborator

osheroff commented Dec 5, 2015

ok, what about if you do strace -ff ...

@osheroff
Copy link
Collaborator

osheroff commented Dec 5, 2015

ah, ok. I think I know what this is now.

read(3, "", 4096)                       = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc11e7ca70)       = 0
wait4(5208, NULL, WNOHANG, NULL)        = 0
lseek(3, 0, SEEK_CUR)                   = 3523

the behavior of fseek(fd, SEEK_CUR,0 ) appears to be inconsistant -- in your case it's seeking to what the child has written, in my case it seems to be what we've read. patch forthcoming...

@anaerobic
Copy link
Author

Attached
wat.txt

@osheroff
Copy link
Collaborator

osheroff commented Dec 5, 2015

pushed what I think should fix this to github. can you do rm -Rf ~/.onetimeserver and retry?

Also, thanks so much for your time. Really appreciated.

@osheroff
Copy link
Collaborator

osheroff commented Dec 5, 2015

( I think it was osheroff/onetimeserver@670b8d6)

@anaerobic
Copy link
Author

likewise!

no dice on rm -Rf ~/.onetimeserver && mvn install, but running ~/.onetimeserver/Linux-x86_64/wrapper -debug -mysql-version 5.5 -ppid 19879 -type mysql -- --log-bin=master --binlog_format=row --innodb_flush_log_at_trx_commit=1 --server_id=123123 completes with:

{"extra_args":["--log-bin=master","--binlog_format=row","--innodb_flush_log_at_trx_commit=1","--server_id=123123"],"mysql_path":"~/.onetimeserver/tmp/mysql456723168","output":"/tmp/onetimeserver.1RzS1Y","parent_pid":19879,"port":30430,"server_pid":19513}

progress!

@osheroff
Copy link
Collaborator

osheroff commented Dec 5, 2015

bugger, that should have fixed it. try with the attached patch, that should turn on debug output for onetimeserver in the test suite.

patch.txt

@anaerobic
Copy link
Author

💥 we have liftoff. all tests are running, now. I believe I just didn't wait long enough for the process to come back last time.
thank you, sir! happy holidays

@osheroff
Copy link
Collaborator

osheroff commented Dec 6, 2015

Rad. Thanks again for the help tracking this down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants