Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report more info on "bad response from server" #4050

Open
gnprice opened this issue Apr 18, 2020 · 6 comments
Open

Report more info on "bad response from server" #4050

gnprice opened this issue Apr 18, 2020 · 6 comments

Comments

@gnprice
Copy link
Member

gnprice commented Apr 18, 2020

We have the following lines in our API code:

  // Server has responded, but the response is not a valid error-object.
  // (This should never happen, even on old versions of the Zulip server.)
  logging.warn(`Bad response from server: ${JSON.stringify(data) ?? 'undefined'}`);
  return new Error('Server responded with invalid message');

Turns out we get quite a lot of these reports, from a lot of distinct users, where data is undefined. So we should debug.

This will happen if the body of the server's response is not valid JSON, because of this line:

    const json = await response.json().catch(() => undefined);

Some of these are cases of #3567. So for those, we know the cause and we should just fix #3567.

Others remain puzzling and call for debugging:

  • Some are from 400 responses on posting to /api/v1/user_uploads.
  • Some are from 200 responses on /api/v1/server_settings?.
  • Some are from 404 response on posting to /api/v1/users/me/status.
  • Some are from 401 responses on posting to /api/v1/users/me/apns_device_token.

It's possible there are server bugs where we're for some reason not returning JSON in certain error cases.

It's also possible something else is going on.

A good debugging step would be to edit that logging.warn call so that it includes the text of the response. This might require also some editing of the await response.json().catch(... line.

P1 because of the number of users affected.

@chrisbobbe
Copy link
Contributor

Some of these are cases of #3567. So for those, we know the cause and we should just fix #3567.

This is open as #3819.

@gnprice
Copy link
Member Author

gnprice commented Apr 18, 2020

This is open as #3819.

Indeed, thanks! Should have looked closer. Just bumped that PR.

@chrisbobbe
Copy link
Contributor

If it's fresh on your mind and you have a moment, could you take a quick look at this question I had, over at #4033 (comment):

There's been another report of something that looks quite similar, also on Android, here. That report also said it was happening to multiple other people.

@gnprice, @ray-kraesig, do you know a way to filter our Sentry logs to see errors whose first occurrence is recent, and that only happen on Android?

@gnprice
Copy link
Member Author

gnprice commented Apr 18, 2020

do you know a way to filter our Sentry logs to see errors whose first occurrence is recent, and that only happen on Android?

I think firstSeen: and os.name: in the Sentry search box will get you those. (There's some autocomplete to help with the syntax.) If more details would help, that'd be a good question for a chat thread.

(And then maybe we should serialize the result into a bit of docs.)

@rk-for-zulip
Copy link
Contributor

It's also possible something else is going on.

I suspect some of these 4xx errors may be cases where the user is attempting to connect to a server that is not a Zulip realm at all (but, perhaps, used to be). 404s and 401s would certainly be explained by that, and I could see a 400 being a response to a file-upload attempt, too.

The 200 on /api/v1/server_settings is a bit more surprising, but might be due to the user's service provider doing that horrible thing where they return a fake DNS entry for nonexistent domains.

@gnprice
Copy link
Member Author

gnprice commented Apr 20, 2020

I suspect some of these 4xx errors may be cases where the user is attempting to connect to a server that is not a Zulip realm at all (but, perhaps, used to be).

Hmm, interesting thought. That's certainly a thing that can happen (I remember we ran across an example of that a few months ago) -- but I'd be a bit surprised if that happened on the user_uploads or users/me/status routes. If the host is no longer a Zulip realm, then register won't succeed, and the user is unlikely to go very far into the UI.

In any case, this is a question we can answer. 😉 I linked the specific events I was looking at, and they contain the URLs in question. Fetching /api/v1/server_settings for each of those (with curl -s https://chat.example/api/v1/server_settings | jq .zulip_version)...

  • Indeed the user_uploads and users/me/status cases are genuine Zulip servers (or careful imitations 😛 ): versions 2.1.0-5-g49ff894d6a and 1.9.1 respectively.

  • The server_settings case is peculiar -- the URL that's failing is like https://example@hotmail.com/api/v1/server_settings?! (Where I've edited only the username, to example.) So that's probably something we can prevent at an earlier step -- the realm URL shouldn't have a username (or password) in it.

  • The users/me/apns_device_token case is... well, the breadcrumbs show a number of successful requests to URLs like https://chat.example:1111/api/v1/register, with a port. Then the failure is on a similar URL but lacking the port, like https://chat.example/api/v1/users/me/apns_device_token.

    And indeed a server_settings request succeeds with that explicit port, and fails with a 401 error page without it.

I'll file issues for those latter two. But the user_uploads and users/me/status cases remain mysterious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants