-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose error codes #61
Comments
#25 Related, TCP layer doesn't have errors because out of 19 different return values, only one or two may actually happen. Those one or two different actual errors are essentially ECONNRESET; something you can catch by checking if you get a proper on_end callback before on_close. So even if you would use the Linux syscalls directly, there never were a way to get "the exact reason". And there is no need to. Sure, as bug 25 reports, we have issues with connect errors and yes they need to be improved. But even here there are only a handful of actual errors. TLS doesn't have errors because TCP doesn't. You kind of have the same reasoning here: if you do not get on_end but only get on_close then_something_ went wrong either protocol error or reset. Now, the Linux kernel can see protocol errors for TCP just like openssl can see protocol errors for TLS. Linux will just report all protocol errors as ECONNRESET or EPOLLHUP or any such generic failure. You'll never get super detailed errors such as "TCP handskar failed". Therefore this lib makes it I've of the design goals not to emit pointless errors nobody understands anyways. You either get on_end for a graceful non-error shutdown or you get only on_close when something went wrong. Use strace to log and debug. |
Under key aspects on readme:
|
You can do what you want in the plug-in, if you want to log things do it. You could define a function like register_error_handler which takes a c function then just declare it in your program and use it. Nothing is stopping you from doing that, but the public API should not have detailed errors. Maybe we can add an integer to on_close which may or may not hold extra info. Kind of like a reserved thing for use by those implementations who care |
You have a point about error codes being pretty much irrelevant when an open socket terminates. But when connecting, there are several distinct error conditions we [and probably many apps / higher level libraries] care about:
I'm not saying we have different logic for each one of these, but we use these as heuristics for whether to retry immediately, or treat the client as "offline" and wait for a network interface to go up, or give up and report an error to the user. (Some of this complexity comes from running on mobile devices, which find themselves in changing and unstable network conditions.)
That sort of thing works when developers are testing in-house, but most troubleshooting is forensic, based on logs from customer devices (iOS and Android mostly.) Also, to be honest, a lot of our enterprise customers employ developers who are not super knowledgeable at anything below the app-framework layers of the OS, so it's for the best if we can tell them to just turn on our logging, instead of having to explain what system-tracing is and how to run it. (You may roll your eyes, but our support engineers are very emphatic about this, especially after a few beers.) |
Connecting is in a pretty bad shape in this library - I've mainly focused on server side. I agree connecting is a special case where you might want at least a few possible errors. Problem is the API need to be symmetric and I do not want to introduce on_error as that makes less sense for server side. What I've thought about is passing error to on_open. Why on_open? Why not on_close? Because if you pass the error to on_close you cannot know if you were closed from open state or closed immediately, so you cannot know if you have run a proper on_open before. So it could make sense to have on_open take an integer error or even uintptr_t or something like that. Then you basically have to check if it was opened without error or if you opened with an error. Maybe it makes sense to rename on_open to on_try_open? |
Can't you [the client code] set a flag in on_open and then test it in on_close? Same advice you gave above about detecting disconnects (just with on_open instead of on_end.) |
I've experimentally added the |
Both client and server get the IP and IP length. Is_client is kind of a helper, you could have two different contexts if you want to separate them |
Hmm maybe not |
Also closing due to inactivity |
@snej I know you're not interested anymore but this is how I want to solve it:
Timeout errors are up to you to track with us_socket_timeout calls |
Sorry I vanished — I decided that for our architecture we don't need to do multiplexed or async socket I/O, since this is a client that usually only opens one socket and will never open more than a handful. So I've gone with a simpler approach that just spawns a read thread and a write thread, instead of using more complex 3rd party libraries with async I/O. |
The API now has int code and void *reason to its on_close and close functions. This allows passing TCP/TLS error codes 👍 |
The API does not expose the details of an error, e.g. error codes.
Many use cases need to know the exact reason for a connection failure, for recovery and logging and to inform the user. (The library I work on uses these codes to decide whether to retry a connection. And my employer’s support team is often found poring over error logs submitted by developers using our library, to help them figure out why the client won’t connect to the 🤬 server.)
Error codes are problematic because there are multiple namespaces (errno, gethostbyname status, OpenSSL, mbedTLS, Apple OSStatus, etc.) What I tend to do is define an enumeration covering all the namespaces, then make my error type a {namespace, code} tuple. But defining your own namespace with an enum of your own error codes works too...
(This is another task I’m willing & likely to work on, if I end up choosing uWebSockets for this work project.)
The text was updated successfully, but these errors were encountered: