Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: PostgreSQL container fails part of the times #1619

Closed
doron-cohen opened this issue Sep 14, 2023 · 11 comments
Closed

[Bug]: PostgreSQL container fails part of the times #1619

doron-cohen opened this issue Sep 14, 2023 · 11 comments
Labels
bug An issue with the library

Comments

@doron-cohen
Copy link

Testcontainers version

0.22.0

Using the latest Testcontainers version?

No

Host OS

Mac

Host arch

ARM

Go version

1.20

Docker version

Client:
 Cloud integration: v1.0.29
 Version:           20.10.17
 API version:       1.41
 Go version:        go1.17.11
 Git commit:        100c701
 Built:             Mon Jun  6 23:04:45 2022
 OS/Arch:           darwin/arm64
 Context:           default
 Experimental:      true

Server: Docker Desktop 4.12.0 (85629)
 Engine:
  Version:          20.10.17
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.11
  Git commit:       a89b842
  Built:            Mon Jun  6 23:01:01 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.8
  GitCommit:        9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.9.1)
  compose: Docker Compose (Docker Inc., v2.10.2)
  extension: Manages Docker extensions (Docker Inc., v0.2.9)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.19.0)

Server:
 Containers: 40
  Running: 3
  Paused: 0
  Stopped: 37
 Images: 34
 Server Version: 20.10.17
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.124-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 5
 Total Memory: 7.653GiB
 Name: docker-desktop
 ID: RKGO:4J6V:52OX:3ZQI:TARP:KB5E:TGP4:ORLD:DT72:FTO5:TVDK:G4HS
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false

What happened?

Some of the times the PostgreSQL container fails with the following logs / message. This is how I use the module:

	c, err := postgres.RunContainer(
		ctx,
		postgres.WithDatabase(defaultPostgresDatabase),
		postgres.WithUsername(defaultPostgresUsername),
		postgres.WithPassword(defaultPostgresPassword),
		testcontainers.WithImage("postgres:13.8"),
		testcontainers.WithWaitStrategyAndDeadline(
			time.Second*20,
			wait.ForSQL(
				nat.Port(defaultPostgresPort),
				dialect.Postgres,
				getPostgresURL,
			).WithStartupTimeout(time.Second),
		),
	)

Relevant log output

2023/09/14 11:22:47 🚧 Waiting for container id 1d70df344982 image: postgres:13.8. Waiting for: &{timeout:<nil> deadline:0x14000bc5e50 Strategies:[0x140006e0be0]}
2023/09/14 11:22:48 container logs:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok


Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start

waiting for server to start....2023-09-14 09:22:48.208 UTC [48] LOG:  starting PostgreSQL 13.8 (Debian 13.8-1.pgdg110+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-09-14 09:22:48.208 UTC [48] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-09-14 09:22:48.210 UTC [49] LOG:  database system was shut down at 2023-09-14 09:22:48 UTC
2023-09-14 09:22:48.215 UTC [48] LOG:  database system is ready to accept connections
 done
server started

/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*

waiting for server to shut down...2023-09-14 09:22:48.334 UTC [48] LOG:  received fast shutdown request
.2023-09-14 09:22:48.334 UTC [48] LOG:  aborting any active transactions
2023-09-14 09:22:48.335 UTC [48] LOG:  background worker "logical replication launcher" (PID 55) exited with exit code 1
2023-09-14 09:22:48.336 UTC [50] LOG:  shutting down
2023-09-14 09:22:48.341 UTC [48] LOG:  database system is shut down
 done
server stopped

PostgreSQL init process complete; ready for start up.

2023-09-14 09:22:48.447 UTC [1] LOG:  starting PostgreSQL 13.8 (Debian 13.8-1.pgdg110+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-09-14 09:22:48.447 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-09-14 09:22:48.447 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-09-14 09:22:48.447 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-09-14 09:22:48.448 UTC [60] LOG:  database system was shut down at 2023-09-14 09:22:48 UTC
2023-09-14 09:22:48.452 UTC [1] LOG:  database system is ready to accept connections

panic: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.41/containers/1d70df34498270a3ddc437bea4e7d7702907bb1f17c70531985225c6c334c692/json": context deadline exceeded: failed to start container

Additional information

No response

@doron-cohen doron-cohen added the bug An issue with the library label Sep 14, 2023
@mdelapenya
Copy link
Collaborator

Hey @doron-cohen I noticed you are overriding the default wait strategy that comes with the postgres module. Is there a reason for that? The default one is needed because postgres needs two occurrences of that log entry.

I say override because of

req.WaitingFor = wait.ForAll(strategies...).WithDeadline(deadline)
.

This probably is a bug though 🤔

@doron-cohen
Copy link
Author

Hey @mdelapenya I didn't see any existing wait strategies. I am looking here: https://github.com/testcontainers/testcontainers-go/blob/v0.22.0/modules/postgres/postgres.go#L113. Am I looking at the wrong place?

I also looked at the latest 0.23 version and didn't find any.

@doron-cohen
Copy link
Author

BTW, I added the SQL wait strategy since I found the PG container not to be completely ready after RunContainer returned successfully. This was the best approach to make sure I can work with it right away.

@mdelapenya
Copy link
Collaborator

Hey @mdelapenya I didn't see any existing wait strategies. I am looking here: https://github.com/testcontainers/testcontainers-go/blob/v0.22.0/modules/postgres/postgres.go#L113. Am I looking at the wrong place?

I also looked at the latest 0.23 version and didn't find any.

You are right 🤔 I thought it was in the default settings for the module (I was typing from phone at that moment, I'm now back at my desktop), and it's defined in the tests: https://github.com/testcontainers/testcontainers-go/blob/v0.23.0/modules/postgres/postgres_test.go#L35

I think we should probably do of the following things:
a. update the docs for postgres, saying that it usually needs that wait strategy when there is no data in it (see https://github.com/testcontainers/testcontainers-java/pull/327/files#diff-57a8aa7d73d96a310849a91d3d5fa311288bc8b487f13672fc5e8bba3743983aR26)
b. update the defaults

@doron-cohen
Copy link
Author

Thanks. This will clear up the confusion around waiting for it to be ready. Now the only thing remaining is failure to start from time to time. Do you need any logs or information to try and debug it?

@mdelapenya
Copy link
Collaborator

mdelapenya commented Sep 14, 2023

Do you need any logs or information to try and debug it?

Yes please, the library should fail if any of the preStart hooks fails, and the wait strategies happens there. And the container logs should be outputted to the terminal, so you should see if there is anything in there

@doron-cohen
Copy link
Author

I added most of it in the issue. The container logs and all. I couldn't find anything unusual but let me try and compare to the logs on a successful startup. Anyway, this line is suspicious:

panic: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.41/containers/1d70df34498270a3ddc437bea4e7d7702907bb1f17c70531985225c6c334c692/json": context deadline exceeded: failed to start container

I raise the panic but I don't know what causes the context deadline exceeded while talking to the Docker daemon.

@mdelapenya
Copy link
Collaborator

By the Docker endpoint, it seems that inspecting the container caused the deadline exceeded error.

Could it be that the ctx context has a timeout or something that causes it to be cancelled before expected?

@eddumelendez
Copy link
Member

I think this is related to postgres container behavior which

  1. Start the postgres service
  2. Setup... (client can connect but see next step)
  3. Shutdown the postgres service
  4. Start the postgres service
  5. Make the connection reliable

Testcontainers for Java uses the following log strategy to happen so mark the container as a ready to receive connections.

Another issue could be related to the log strategy provided by Testcontainers for Java, which would fail when postgres image (custom images) contains data already because in that case the number of occurrences for the regex is just 1. See logs attached in the issue description. Just for awareness, probably not a priority to fix it right away.

@mdelapenya
Copy link
Collaborator

@doron-cohen I think #2208 could be related, as it improves the examples with an wait.ForSQL condition.

In any case, the example provided here https://golang.testcontainers.org/modules/postgres/ is showing how to configure Postgres in a generic use case, so I'd say that we can close this issue. Please let me know if it's fine to close it.

Thanks!

@mdelapenya
Copy link
Collaborator

Closing as the example is illustrative enough. Please feel free to reopen it if you consider it.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue with the library
Projects
None yet
Development

No branches or pull requests

3 participants