Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESP response encodes all field values as strings, JSON retains types of fields #719

Closed
Kilowhisky opened this issue Feb 21, 2024 · 9 comments

Comments

@Kilowhisky
Copy link
Contributor

Kilowhisky commented Feb 21, 2024

Describe the bug
The RESP response output of a tile38 server outputs all field values as string types whereas the JSON output properly returns the fields in the types they were sent to the server in. This is a pretty significant behavior difference between the protocols that should at least be notated somewhere.

To Reproduce
Following the example from here: https://tile38.com/topics/filter-expressions

SET fleet truck1  FIELD name Andy  POINT 33 -112
SET fleet truck2  FIELD speed 90  POINT 33 -112
SET fleet truck3  FIELD inactive true  POINT 33 -112
SET fleet truck4  FIELD info '{"speed":60,"age":21,"name":"Tom"}'  POINT 33 -112

The following response is created using a TELNET client.

SCAN fleet
*2
:0
*4
*3
$6
truck1
$40
{"type":"Point","coordinates":[-112,33]}
*2
$4
name
$4
Andy
*3
$6
truck2
$40
{"type":"Point","coordinates":[-112,33]}
*2
$5
speed
$2
90
*3
$6
truck3
$40
{"type":"Point","coordinates":[-112,33]}
*2
$8
inactive
$4
true
*3
$6
truck4
$40
{"type":"Point","coordinates":[-112,33]}
*2
$4
info
$34
{"speed":60,"age":21,"name":"Tom"}

If you switch to JSON output, the following is received

OUTPUT JSON
$29
{"ok":true,"elapsed":"283ns"}
SCAN fleet
$492
{"ok":true,"fields":["inactive","info","name","speed"],"objects":[{"id":"truck1","object":{"type":"Point","coordinates":[-112,33]},"fields":[0,0,"Andy",0]},{"id":"truck2","object":{"type":"Point","coordinates":[-112,33]},"fields":[0,0,0,90]},{"id":"truck3","object":{"type":"Point","coordinates":[-112,33]},"fields":[true,0,0,0]},{"id":"truck4","object":{"type":"Point","coordinates":[-112,33]},"fields":[0,{"speed":60,"age":21,"name":"Tom"},0,0]}],"count":4,"cursor":0,"elapsed":"45.118┬╡s"}

As you can see the fields arrays retain proper encoding of the type (and even put the JSON field as a object!).

Expected behavior
The RESP response should alter the first byte to indicate number vs string.
See: https://redis.io/docs/reference/protocol-spec/#:~:text=Redis%20generally%20uses%20RESP%20as,the%20arguments%20for%20the%20command.

In RESP, the type of some data depends on the first byte:
For Simple Strings the first byte of the reply is "+"
For Errors the first byte of the reply is "-"
For Integers the first byte of the reply is ":"
For Bulk Strings the first byte of the reply is "$"
For Arrays the first byte of the reply is "*"

Logs
Not Applicable

Operating System (please complete the following information):

  • OS: Linux
  • CPU: amd64
  • Version: 1.32.1
  • Container: Kubernetes

Additional context
I'm not entirely sure how to fix this without a breaking change.
Best i can think of is adding another OUTPUT type of RESP2 or something?

@Kilowhisky
Copy link
Contributor Author

Here's another oddity... if you quote a number in FSET JSON will return a number not a string....

Here is redis-cli

FSET fleet truck1 birthday '1987'
(integer) 1
GET fleet truck1 WITHFIELDS
1) "{\"type\":\"Point\",\"coordinates\":[-112,33]}"
2) 1) "birthday"
   2) "1987"
   3) "name"
   4) "Andy"
OUTPUT json
"{\"ok\":true,\"elapsed\":\"740ns\"}"
GET fleet truck1 WITHFIELDS
"{\"ok\":true,\"object\":{\"type\":\"Point\",\"coordinates\":[-112,33]},\"fields\":{\"birthday\":1987,\"name\":\"Andy\"},\"elapsed\":\"28.275\xc2\xb5s\"}"

Here's TILE38

GET fleet truck1 WITHFIELDS
*2
$40
{"type":"Point","coordinates":[-112,33]}
*4
$8
birthday
$4
1987
$4
name
$4
Andy

@Kilowhisky
Copy link
Contributor Author

....obviously there is some coercion happing behind the scenes.

FSET fleet truck1 birthday '1987_2'
GET fleet truck1 WITHFIELDS
"{\"ok\":true,\"object\":{\"type\":\"Point\",\"coordinates\":[-112,33]},\"fields\":{\"birthday\":1987_2,\"name\":\"Andy\"},\"elapsed\":\"14.698\xc2\xb5s\"}"

FSET fleet truck1 birthday '1987.2'
GET fleet truck1 WITHFIELDS
"{\"ok\":true,\"object\":{\"type\":\"Point\",\"coordinates\":[-112,33]},\"fields\":{\"birthday\":1987.2,\"name\":\"Andy\"},\"elapsed\":\"12.124\xc2\xb5s\"}"

FSET fleet truck1 birthday '1987,2'
GET fleet truck1 WITHFIELDS
"{\"ok\":true,\"object\":{\"type\":\"Point\",\"coordinates\":[-112,33]},\"fields\":{\"birthday\":\"1987,2\",\"name\":\"Andy\"},\"elapsed\":\"9.674\xc2\xb5s\"}"

FSET fleet truck1 birthday '1987a2'
GET fleet truck1 WITHFIELDS
"{\"ok\":true,\"object\":{\"type\":\"Point\",\"coordinates\":[-112,33]},\"fields\":{\"birthday\":\"1987a2\",\"name\":\"Andy\"},\"elapsed\":\"12.686\xc2\xb5s\"}"

@tidwall
Copy link
Owner

tidwall commented Feb 22, 2024

The RESP response should alter the first byte to indicate number vs string.

Tile38 fields were originally designed to only store floating points. At that time RESPv2 wasn't available and the proper way to store floating points was to use a Bulk string ($). I avoided Simple strings (+) because I wanted to future-proof the option of fields holding arbritary types such as strings or bytes, and simple string only support short single line strings. The Integer type (:) was a worse option because it's only suppose to integers and some clients didn't like it when floating points or strings were used. So for the best compatibility at that time and the future was the Bulk string.

Since I can't control what client the user chooses, any change at this point to the RESP types may break compatibility.

obviously there is some coercion happing behind the scenes.

Yes, now that Tile38 also supports strings there needs to be detection of the field type at runtime to correctly decode and encode for JSON. But the original functionality of representing floating point fields needs to be maintained to keep compatibility.

@Kilowhisky
Copy link
Contributor Author

I figured as much.

Would you be open to RESP2 output type PR or should I just switch to JSON output?

I'm using tile38 in high scale environment so i am a bit concerned about performance hit using JSON as output.

I actually didn't notice this problem at first because I actually started off development using the json output but encountered a problem where storing a json object in a field in quoted form returned it unquoted during GET which made the C# deserializer explode.

@Kilowhisky
Copy link
Contributor Author

BTW FSET also has this problem. So serialization with REDIS based libs will also have problems occasionally.

var fArgs = new object[]
{
       "devices", serial,
       "serial", serial,

       // Using int will cause stackexchange.redis to try and encode 
       // as * (int) which will freak tile38 out
       "company", message.Headers.CompanyId.ToString(), 
        "stack", message.Headers.StackId.ToString(),
};
_tile38.ExecuteAsync("FSET", fArgs);
{ "time": "2024-02-24 00:31:29.9971", "level": "Error", "message": "Protocol error: expected '$', got '*'" }

@tidwall
Copy link
Owner

tidwall commented Feb 25, 2024

Would you be open to RESP2 output type PR or should I just switch to JSON output?

I would consider a PR for RESP2, but it'll probably be a big undertaking. I'm skeptical that it'll provide much of a performance gain. Though, I'd be happy to be surprised. I think the redis folks are on RESP3 now.

I believe that using the JSON protocol and specifically GeoJSON "feature" types with the "properties" member is the way to go because it has the broadest compatibility. But I can understand the need to squeeze every ounce of performance out of Tile38. The JSON protocol shouldn't be much slower that RESP and I'm curious of the gains you are seeing on your side by only using RESP.

Regarding the C# example using the StackExchange.Redis client. I haven't encountered that style of command before; where the command includes anything but an array of bulk strings. Is that a RESP2 feature?

@Kilowhisky
Copy link
Contributor Author

I'll give json a chance to see if it works out. I have to process 7.128e10*4 records per day at least so scale is a bit of a concern.

I'll also take a look into RESP2 and see what is needed for a PR.

As for stackexchange.Redis it follows full protocol and even does pipelining, batching, and a whole lot of other things. I'll pull the code apart when I get a chance.

@Kilowhisky
Copy link
Contributor Author

Kilowhisky commented Feb 25, 2024

I also have to use FSET because I had to split into multiple commands due payload length causing errors.

@Kilowhisky
Copy link
Contributor Author

I decided to just treat all fields coming or going to tile38 as strings and let the server do what it wants with them.

It does make where clauses a bit funky but it works.

I'll close the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@tidwall @Kilowhisky and others