Skip to content

worker error when liveness_probe is set #2390

Closed
@nellaG

Description

@nellaG

hello, I'm currently using cortex 0.39.1.

I have a BatchAPI with some of configuration.

I set readiness_probe to my batch api already and It works fine. But also when I set liveness_probe together, the job of api is finished with worker error status.

Is there a good practice to set liveness_probe ?

My api has only 2 endpoints (/, /healthz ) and here's my api configuration yaml.

batch api configuration
name: ***
kind: BatchAPI
pod:
  port: 8080
  containers:
    - name: ***
      image: ***
      env:
       [***]
      command: [./run_app.sh]
      readiness_probe:
        http_get:
          path: /healthz
          port: 8080
        initial_delay_seconds: 180
        timeout_seconds: 1
        period_seconds: 10
        success_threshold: 1
        failure_threshold: 3
      liveness_probe:
        http_get:
          path: /
          port: 8080
        initial_delay_seconds: 0
        timeout_seconds: 1
        period_seconds: 10
        success_threshold: 1
        failure_threshold: 3
      compute:
        cpu: 200m
        gpu: 1
        mem: 2G
        shm: 1Gi
networking:
  endpoint: /tracker-dev
node_groups: [gpu-spot, gpu-on-demand]

I'm always thankful for your support and cortex. 😃

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions