Context deadline exceeded when using the context passed to the activity #1424

mrkaspa · 2024-03-19T21:56:41Z

Expected Behavior

I should no receive the error context deadline exceeded when doing DB operations with the context passed in the Activity parameter

Actual Behavior

I have code like this in my app

func CleanupActivity(ctx context.Context) error {
	// this is a TemporalLogger (pkg/logger/temporal.go)
	log := activity.GetLogger(ctx)

	return cleanup.Cleanup(ctx, param.JobID)
}

func Cleanup(ctx context.Context, jobID string) error {
	db := database.GetDB()

	// Fetch the job from the database.
	j := &job.Job{ID: jobID}
	err := db.NewSelect().Model(j).WherePK().Scan(ctx)
	if err != nil {
		return wferrors.NewTaskError(errors.WithStack(err), wferrors.ErrCodeDatabase, "failed to fetch job")
	}

and inside the cleanup.Cleanup function I do database operations with the Bun library that uses the context that is passed so when it tries to make a query I got the error:

DatabaseError, failed to fetch job, context deadline exceeded

so the database query is failing due to context deadline exceeded

Steps to Reproduce the Problem

Use the context passed in the Activity as argument for database access
Deploy on production

Specifications

Version: go.temporal.io/sdk v1.25.1
Platform: Linux

The text was updated successfully, but these errors were encountered:

Quinn-With-Two-Ns · 2024-03-19T22:05:38Z

The context passed into an activity has documentation around when users should expect it to be cancelled

sdk-go/activity/doc.go

Line 77 in 3da09e0

# Context Cancellation

I would suspect in your case the activity is timing out before the database operation is complete.

mrkaspa · 2024-03-20T16:47:07Z

Yes, I thought that, but the problem is that my deadline is of 1 hour per activity and when it starts failing for example failed the job 1 the subsquents jobs keep failing for the same reason, how can fail a new job for this reason if I have a deadline of 1 hour per activity.

btw, this is how I have the workflow settings

activityoptions := workflow.ActivityOptions{
		// Set Activity Timeout duration
		// ScheduleToCloseTimeout: 5 * time.Second,
		StartToCloseTimeout: 60 * time.Minute,
		// ScheduleToStartTimeout: 10 * time.Second,
	}
	ctx = workflow.WithActivityOptions(ctx, activityoptions)
	ctx = workflow.WithRetryPolicy(ctx, temporal.RetryPolicy{
		MaximumAttempts: 10,
	})

Quinn-With-Two-Ns · 2024-03-20T16:58:15Z

The timeouts can be shorter depending on what other activity and workflow option, it is also possible the error is coming from some internal deadline set in your database library and not the activity context. You can check the deadline of the context using ctx.Deadline() to see when the context would expire.

https://pkg.go.dev/context#Context

mrkaspa · 2024-03-20T17:18:19Z

The problem is that this does not happen everytime, in our production experience we have deployed the solution and everything works fine for some executions, and after some days one workflow starts to fail and the next ones will always fail for the same reason

Quinn-With-Two-Ns · 2024-03-20T17:45:55Z

On one of these occurrences can you share the actual activity schedule event ?

mrkaspa · 2024-04-09T19:09:57Z

@Quinn-With-Two-Ns where can I see that?

mrkaspa · 2024-04-09T19:10:58Z

rn this is failing again

error
activity error (type: PreprocessingActivity, scheduledEventID: 5, startedEventID: 6, identity: ): activity StartToClose timeout (type: StartToClose): activity StartToClose timeout (type: StartToClose)

Error
last connection error: connection error: desc = "error reading server preface: read tcp 172.17.0.12:39422->52.26.119.98:7233: use of closed network connection"

PanicError
runtime error: index out of range [4096] with length 4096

we are seeing this error, I wonder if somehow the connection with the temporal servers is lost

Quinn-With-Two-Ns · 2024-04-09T19:18:34Z

The last errors looks like an issue with you application and not the SDK, but just the error message is not enough for me to provide any insight and I cannot tell what is wrong with your application.

To help debug any further what I would need is a stand alone reproduction of the issue showing the SDK canceling the context outside of the documented cases where users should expect it to be cancelled

sdk-go/activity/doc.go

Line 77 in 3da09e0

# Context Cancellation

mrkaspa · 2024-04-09T19:24:58Z

we are using nomad to deploy our containers and when restart them the issue is solved, this issue uses to happen everyweek, all the workflows start to throw timeouts and I think the reason is they lost connection with the temporal servers, so the temporal server can not execute the activities and time out

mrkaspa added the potential-bug label Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context deadline exceeded when using the context passed to the activity #1424

Context deadline exceeded when using the context passed to the activity #1424

mrkaspa commented Mar 19, 2024

Quinn-With-Two-Ns commented Mar 19, 2024

mrkaspa commented Mar 20, 2024 •

edited

Loading

Quinn-With-Two-Ns commented Mar 20, 2024

mrkaspa commented Mar 20, 2024

Quinn-With-Two-Ns commented Mar 20, 2024

mrkaspa commented Apr 9, 2024

mrkaspa commented Apr 9, 2024

Quinn-With-Two-Ns commented Apr 9, 2024

mrkaspa commented Apr 9, 2024

Context deadline exceeded when using the context passed to the activity #1424

Context deadline exceeded when using the context passed to the activity #1424

Comments

mrkaspa commented Mar 19, 2024

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Quinn-With-Two-Ns commented Mar 19, 2024

mrkaspa commented Mar 20, 2024 • edited Loading

Quinn-With-Two-Ns commented Mar 20, 2024

mrkaspa commented Mar 20, 2024

Quinn-With-Two-Ns commented Mar 20, 2024

mrkaspa commented Apr 9, 2024

mrkaspa commented Apr 9, 2024

Quinn-With-Two-Ns commented Apr 9, 2024

mrkaspa commented Apr 9, 2024

mrkaspa commented Mar 20, 2024 •

edited

Loading