Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit code of restart_process is wrong #92

Closed
jani123 opened this issue Oct 13, 2020 · 5 comments
Closed

Exit code of restart_process is wrong #92

jani123 opened this issue Oct 13, 2020 · 5 comments

Comments

@jani123
Copy link

jani123 commented Oct 13, 2020

Introduction

Restart process wrapper returns exit code zero even if software exits with any other. That affects on Tilt UI side so that service seems ok(green) even though it is not. Problem lies on used entr cli command, that seem to be returning its own exit status, instead of the status code from software it runs.

Example

Lets have a script that fails on star

$ cat fail.sh 
#!/usr/bin/env bash

exit 1

And then we run restart process wrapper

$ go run tilt-restart-wrapper.go -entr_path=/usr/bin/entr -watch_file=test.go -- ./fail.sh; echo $?
hi there
0

The last line 0 is exit status of restart process wrapper, and in this case it should be 1 instead of 0.

This can be tracked to be fault(or feature depending on the view) of entr cli tool:

$ echo "./test.go" | entr -rzns ./fail.sh; echo $?
0

Here we have same failing script but the entr is only returns zeroes.

How to almost fix this

One can use Kubernetes liveness properties in case there is multiple component starting up and they depend on eachother, but the most annoying part of this is that Tilt user interface lies about everything being ok even though it is not. We have 15 to 20 components in our software that is started up/developed and if one of them fails, you don't get any indication about it.

Likely fix

My bet would be to get rid of the entr tool, since this is likely feature of the tool. It shouldn't be that hard to make golang code to run application and restart it when some files changes. What do you think? Unfortunately this is way too big of the job for now for me to do pull request since I have schedule of my own.

@nicks
Copy link
Member

nicks commented Oct 13, 2020

cc @maiamcc do you remember the history behind this? i thought we had baked in a special version of entr that returned the right exit code, but i'm probably misremembering

@maiamcc
Copy link
Contributor

maiamcc commented Oct 13, 2020

Oh ugh :-/ good catch @jani123. We don't have a special version of entr for the exit code-- @nicks you might be remembering that we baked in a version of entr with the -z flag (which wasn't yet in a release at the time the extension was published).

I (foolishly) didn't verify the exit code behavior because for a Pod, any process exit is enough to alert Tilt that something weird is going on. Are you running into this bad behavior on Jobs/other shortlived k8s objects? Or on pods? Can you tell me a little more about the behavior you're seeing?

@jani123
Copy link
Author

jani123 commented Oct 14, 2020

I (foolishly) didn't verify the exit code behavior because for a Pod, any process exit is enough to alert Tilt that something weird is going on. Are you running into this bad behavior on Jobs/other shortlived k8s objects? Or on pods? Can you tell me a little more about the behavior you're seeing?

Yes, batch jobs are the things that is the problem atm. So, the most easiest example use case would be that we have N8N put up with the Tilt. We are developing new nodes for our products to N8N, so of'course live sync would be nice to have when you make changes to nodes.

Helm chart is pretty much the one you get from helm create except few environment variables and database initialization. Database initialization batch job is the problem. If we have a bug like sql syntax error in it(for an example I made exit 1), it will show green like this:
image.
And last lines from logs:

+ /sources/conf/scripts/create-db.sh
+ exit 1
[K8s EVENT: Pod n8n-8zn4l (ns: n8n-dev)] Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "n8n-8zn4l": Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:438: writing syncT 'resume' caused \\\"write init-p: broken pipe\\\"\"": unknown

Here is the tilt file we use:

load('Tiltfile.base', 'helm_remote')

# Note that .helm is in the .tiltignore!
helm_remote('postgresql', repo_url='https://charts.bitnami.com/bitnami',namespace='n8n-dev', values=['./charts/helm/postgres/backend-postgresql.yaml'])
watch_file('./charts/helm/postgres/backend-postgresql.yaml')

local('export DOCKER_BUILDKIT=1')
load('ext://restart_process', 'docker_build_with_restart')

docker_build_with_restart(
	'registry.<censored>/<censored>/n8n',
	'./',
	ssh='default',
	target='install',
	live_update=[
		sync('./n8n-nodes', '/sources'),
		run('cd /sources && n8n-node-dev build --destination /root/.n8n/custom')
	],
	entrypoint='/sources/dev-entrypoint.sh'
)

# Deploy n8n
def deploy_service(name):
	k8s_yaml(helm(
		'./charts/helm/'+name,
		name=name,
		namespace='n8n-dev',
	))
	watch_file('./charts/'+name)

k8s_resource("n8n:job", resource_deps=['postgresql-postgresql-n8n'])
k8s_resource("n8n:deployment", resource_deps=['n8n:job'])
deploy_service('n8n')

Here is the batch job yaml for helm chart

apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    "helm.sh/hook": pre-install
    "helm.sh/hook-delete-policy": hook-succeeded
    "helm.sh/hook-weight": "-1"
  labels:
    job-name: {{ include "n8n.fullname" . }}
  name: {{ include "n8n.fullname" . }}
  labels:
    {{- include "n8n.labels" . | nindent 4 }}
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  template:
    metadata:
      labels:
        job-name: {{ include "n8n.fullname" . }}
    spec:
    {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
    {{- end }}
      serviceAccountName: {{ include "n8n.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
      - command:
          - /bin/sh
          - -c
          - /sources/entrypoint.sh
        imagePullPolicy: "{{ .Values.image.pullPolicy }}"
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        name: "{{ .Chart.Name }}-creation-job"
        readinessProbe:
          exec:
            command: ["psql", "${DATABASE_URL}", "-c", "SELECT 1"]
            failureThreshold: 3
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
        envFrom:
        - configMapRef:
            name: "{{ include "n8n.fullname" . }}-cm-envs"
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: "{{ include "n8n.fullname" . }}-db-init"
              key: DATABASE_URL
        resources:
          {{- toYaml .Values.resources | nindent 12 }}
      dnsPolicy: ClusterFirst
      restartPolicy: OnFailure
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 60
status: {}

Job description of the work

$ k describe job/n8n
Name:           n8n
Namespace:      n8n-dev
Selector:       controller-uid=930cac86-1c68-4b57-ae31-2dc7450a77e7
Labels:         app.kubernetes.io/instance=n8n
                app.kubernetes.io/managed-by=tilt
                app.kubernetes.io/name=n8n
                app.kubernetes.io/version=1.16.0
                helm.sh/chart=n8n-0.1.0
Annotations:    helm.sh/hook: pre-install
                helm.sh/hook-delete-policy: hook-succeeded
                helm.sh/hook-weight: -1
Parallelism:    1
Completions:    1
Start Time:     Wed, 14 Oct 2020 08:36:44 +0300
Completed At:   Wed, 14 Oct 2020 08:36:46 +0300
Duration:       2s
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/managed-by=tilt
                    controller-uid=930cac86-1c68-4b57-ae31-2dc7450a77e7
                    job-name=n8n
                    tilt.dev/pod-template-hash=5dca3c785474d123afae
  Service Account:  n8n
  Containers:
   n8n-creation-job:
    Image:      registry.<censored>/<censored>/n8n:tilt-ba3ad5d468b1badd
    Port:       <none>
    Host Port:  <none>
    Command:
      /tilt-restart-wrapper
      --watch_file=/.restart-proc
      sh
      -c
      /sources/dev-entrypoint.sh
    Readiness:  exec [psql ${DATABASE_URL} -c SELECT 1] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      n8n-cm-envs  ConfigMap  Optional: false
    Environment:
      DATABASE_URL:  <set to the key 'DATABASE_URL' in secret 'n8n-db-init'>  Optional: false
    Mounts:          <none>
  Volumes:           <none>
Events:
  Type    Reason            Age    From            Message
  ----    ------            ----   ----            -------
  Normal  SuccessfulCreate  9m12s  job-controller  Created pod: n8n-8zn4l

dev-entrypoint.sh is just a script to execute n8n or database create(docker build restart needs entrypoint so we need some hackery to do multiple things):

#!/bin/sh

set -ex

if [ -z "${DATABASE_URL}" ]; then
        n8n
else
        /sources/conf/scripts/create-db.sh

And the create-db.sh is as follows:

#!/bin/sh

set -ex

exit 1
psql "${DATABASE_URL}" -tc "SELECT 1 FROM pg_database WHERE datname = '${DB_POSTGRESDB_DATABASE}'" | grep -q 1 || psql "${DATABASE_URL}" -c "CREATE DATABASE ${DB_POSTGRESDB_DATABASE}"

psql "${DATABASE_URL}" -tc "SELECT 1 FROM pg_roles WHERE rolname = '${DB_POSTGRESDB_USER}'" | grep -q 1 || psql "${DATABASE_URL}" -tc "CREATE USER ${DB_POSTGRESDB_USER} WITH PASSWORD '$DB_POSTGRESDB_PASSWORD'"

I hope this is enough, but let me know if I'm missing something! :)

@maiamcc
Copy link
Contributor

maiamcc commented Oct 16, 2020

@jani123 thanks for raising this--should be fixed. If you rm -rf tilt_modules/restart_process then on your next Tilt run, Tilt will pull down the latest version of the extension, which contains the fix. Let us know how it works for you!

@maiamcc maiamcc closed this as completed Oct 16, 2020
@jani123
Copy link
Author

jani123 commented Oct 19, 2020

Yes, this works great! Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants