Skip to content

snooze [tcp] probe to user container failed: dial tcp 127.0.0.1:8080: connect: connection refused log #2389

Closed
@nellaG

Description

@nellaG

hello, I'm currently using cortex 0.39.1.

I have a BatchAPI with some of configuration.
When I check the log using AWS Cloudwatch log insight, there's so many [tcp] probe to user container failed: dial tcp 127.0.0.1:8080: connect: connection refused logs so I cannot check my api job status well.

Is there a good practice to snooze that log using readiness_probe or liveness_probe config?

My api has only 2 endpoints ( /, /healthz ) and here's my api configuration yaml.

batch api configuration
name: ***
kind: BatchAPI
pod:
  port: 8080
  containers:
    - name: ***
      image: ***
      env:
       [***]
      command: [./run_app.sh]
      readiness_probe:
        http_get:
          path: /healthz
          port: 8080
        initial_delay_seconds: 180
        timeout_seconds: 1
        period_seconds: 10
        success_threshold: 1
        failure_threshold: 3
      liveness_probe:
        http_get:
          path: /
          port: 8080
        initial_delay_seconds: 0
        timeout_seconds: 1
        period_seconds: 10
        success_threshold: 1
        failure_threshold: 3
      compute:
        cpu: 200m
        gpu: 1
        mem: 2G
        shm: 1Gi
networking:
  endpoint: /tracker-dev
node_groups: [gpu-spot, gpu-on-demand]

I'm always thankful for your support and cortex. 😃

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions