Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pkg/errors Stacktracer interface #303

Closed
akshayjshah opened this issue Feb 16, 2017 · 4 comments · Fixed by #333
Closed

Support pkg/errors Stacktracer interface #303

akshayjshah opened this issue Feb 16, 2017 · 4 comments · Fixed by #333

Comments

@akshayjshah
Copy link
Contributor

Consider whether we should take on a third-party dependency to support pkg/errors. If we really think that this package is garnering widespread adoption, we should support its error type and include both stacktraces and causes in log output.

Thoughts, @prashantv and @jcorbin?

@jcorbin
Copy link
Contributor

jcorbin commented Feb 16, 2017

I'm generally +1 on supporting anything that's "almost" cannon; for me that heuristic looks like:

  • golang.org
  • not golang.org/x

In particular, I think the case of errors' Stacktracer a good fit.

@prashantv
Copy link
Collaborator

What kind of support were you thinking? Logging a separate field like "errorStackTrace" with the stack trace?

I'd prefer not to take a dependency on pkg/errors, especially since it doesn't seem like the most popular third-party errors package yet,
https://godoc.org/?q=errors

It's unfortunate that we can't check whether there's a stacktrace we can log without depending on the package. There are discussions around changing this (pkg/errors#79). I think we should wait till there's a 1.0 tagged before taking on the dependency.

There's one "trick" to getting stack traces in logs if we really want -- instead of calling err.Error(), we could do fmt.Sprintf("%+v", err) which will include the stack trace. I don't think we want to do that by default though, but there could be an ErrorVerbose or some other field to do that.

@akshayjshah
Copy link
Contributor Author

%+v might be a nice middle ground. Unfortunately, it won't work in zap.Any (or in the SugaredLogger) unless we take a dependency on pkg/errors, since we'll need to make an interface cast. Might be best to first open a PR to fix that issue, see if it's accepted, and hope that we'll be able to support them without a source dependency.

@akshayjshah
Copy link
Contributor Author

Actually, it looks like we can just cast into the fmt.Formatter interface - this has the added virtue of working with a few other error-wrapping packages. Yay!

MichaelSnowden added a commit to temporalio/temporal that referenced this issue Nov 6, 2023
<!-- Describe what has changed in this PR -->
**What changed?**
I upgraded our zap version from v1.24.0 to v1.26.0, which contains
support for pkg/errors. See [this
issue](uber-go/zap#303) and [this
commit](uber-go/zap@5fc2db7).
<img width="448" alt="image"
src="https://github.com/temporalio/temporal/assets/5942963/7d15d91a-27d3-45f6-9628-30bdcac38771">


<!-- Tell your future self why have you made these changes -->
**Why?**
Before this change, our logs would only contain the stack trace from
where the logger itself was invoked, not from the source of where the
error was generated or wrapped. This provided very little useful
information.

<!-- How have you verified this change? Tested locally? Added a unit
test? Checked in staging env? -->
**How did you test it?**
I ran a custom [build of
server](https://gist.github.com/MichaelSnowden/c649dfd1efeb92f10bc72a040a792a8d)
which overwrote some deep code in history to return an error. I then set
up docker-compose to output logs -> promtail -> loki -> grafana. Then, I
queried Grafana to verify that the error log contained an "errorVerbose"
field with the stack trace from where my error was generated. As you can
see from the below image, the stack trace does appear under this field,
and if you turn on JSON parsing and newline escaping, you can both see
it rendered correctly, and you can copy-paste the stack trace.

<img width="983" alt="image"
src="https://github.com/temporalio/temporal/assets/5942963/cb9b0c83-b146-4060-9eac-3ccf9b807657">
<img width="683" alt="image"
src="https://github.com/temporalio/temporal/assets/5942963/6fa28f2f-5d04-47ae-b8a3-7512c3dd85e7">

The stack trace from Grafana:

```
oopsie woopsie
main.(*faultyShardEngine).StartWorkflowExecution
	/Users/mikey/src/temporalio/temporal/.scratches/main.go:39
go.temporal.io/server/service/history.(*Handler).StartWorkflowExecution
	/Users/mikey/src/temporalio/temporal/service/history/handler.go:595
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1
	/Users/mikey/src/temporalio/temporal/api/historyservice/v1/service.pb.go:1300
...
```

<!-- Assuming the worst case, what can be broken when deploying this
change to production? -->
**Potential risks**
The stack traces are pretty deep because of all our gRPC interceptors.
However, we can definitely fix that later if we want by filtering the
`pkg/errors.StackTrace`. I'd rather do that in a follow-up after getting
support for this initial change first, though.

<!-- Is this PR a hotfix candidate or require that a notification be
sent to the broader community? (Yes/No) -->
**Is hotfix candidate?**
No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants