Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
socketio_data:encode does not count escapes when determining the length of a JSON string #57
Comments
I'm wondering if this can be related to #56 Do you have the original Erlang data structure so that I can make a test case out of it? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 12, 2011
sure does look related. Sorry i missed that.
let me see if i can get you the original erlang structure....but the issue
can be seen with any erlang tuple list which includes binary strings....the
resulting json will have the binary strings escaped and its those that
length does not count.
thanks
On Fri, Aug 12, 2011 at 4:36 PM, ferd <
reply@reply.github.com>wrote:
I'm wondering if this can be related to
#56Do you have the original Erlang data structure so that I can make a test
case out of it?Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 12, 2011
sure does look related. Sorry i missed that. let me see if i can get you the original erlang structure....but the issue thanks On Fri, Aug 12, 2011 at 4:36 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ferd
Aug 12, 2011
Collaborator
I have doubts on the idea of 'any binary string'. The current test suite does have these. Here's a simple counter-example:
1> socketio_data:decode(#msg{content=socketio_data:encode(#msg{content=[{<<"hi">>,<<"there \"bud\\\"dy!">>}], json=true})}).
[#msg{content = [{<<"hi">>,<<"there \"bud\\\"dy!">>}],
json = true,length = 0}]
The decoded copy is the same as the encoded copy, same length, escaping of quotation marks in binaries or not. This is why I'm more or less wondering whether there were unicode sequences in there, which might mess up the length of the string, or the original data structure. It doesn't seem like escaping is the actual problem.
There's a funny thing that you made me discover though:
2> socketio_data:encode(#msg{content=[{<<"hi">>,<<"there \"buddy">>}]}).
[126,109,126,49,126,109,126,{<<"hi">>,<<"there \"buddy">>}]
Encoding a tuplelist without the 'json' flag in the struct shouldn't work. The list right there should be caught by the encoder. I'll try to at least make this more strict. It's purely luck (and wanting to implement something simple) that the decoder hasn' caught that one.
I have doubts on the idea of 'any binary string'. The current test suite does have these. Here's a simple counter-example:
The decoded copy is the same as the encoded copy, same length, escaping of quotation marks in binaries or not. This is why I'm more or less wondering whether there were unicode sequences in there, which might mess up the length of the string, or the original data structure. It doesn't seem like escaping is the actual problem. There's a funny thing that you made me discover though:
Encoding a tuplelist without the 'json' flag in the struct shouldn't work. The list right there should be caught by the encoder. I'll try to at least make this more strict. It's purely luck (and wanting to implement something simple) that the decoder hasn' caught that one. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 12, 2011
Did you run the bif length on that example ? it will return a count lesser
than that of the string
The issue i was pointing out is not to do with encoding and decoding, i
agree that works fine....
However after you get the encoded json back, the bif length() is called on
that string and its that value which results in the truncating of what is
sent to the client..as length() is not counting escaped sequences.....so the
encoder works fine buts the number of bytes being sent over the wire is
incorrect ( when using firefox ).
I suspect the issue is to do with the decoding, the encoded message is in
the format
?FRAME ++ Length ++ ?FRAME ++ ?JSON_FRAME ++ JSON
That same length is being used in the decode function....so as it has a
value less than the size of the encoded json string, then that is resulting
in the decoder truncating the json...
Hth,
Andy
On Fri, Aug 12, 2011 at 4:52 PM, ferd <
reply@reply.github.com>wrote:
I have doubts on the idea of 'any binary string'. The current test suite
does have these. Here's a simple counter-example:1>
socketio_data:decode(#msg{content=socketio_data:encode(#msg{content=[{<<"hi">>,<<"there
"bud\"dy!">>}], json=true})}).
[#msg{content = [{<<"hi">>,<<"there "bud\"dy!">>}],
json = true,length = 0}]The decoded copy is the same as the encoded copy, same length, escaping of
quotation marks in binaries or not. This is why I'm more or less wondering
whether there were unicode sequences in there, which might mess up the
length of the string, or the original data structure. It doesn't seem like
escaping is the actual problem.There's a funny thing that you made me discover though:
2> socketio_data:encode(#msg{content=[{<<"hi">>,<<"there "buddy">>}]}).
[126,109,126,49,126,109,126,{<<"hi">>,<<"there "buddy">>}]Encoding a tuplelist without the 'json' flag in the struct shouldn't work.
The list right there should be caught by the encoder. I'll try to at least
make this more strict. It's purely luck (and wanting to implement something
simple) that the decoder hasn' caught that one.Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 12, 2011
Did you run the bif length on that example ? it will return a count lesser However after you get the encoded json back, the bif length() is called on I suspect the issue is to do with the decoding, the encoded message is in ?FRAME ++ Length ++ ?FRAME ++ ?JSON_FRAME ++ JSON That same length is being used in the decode function....so as it has a Hth, Andy On Fri, Aug 12, 2011 at 4:52 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 12, 2011
Sorry that was a rushed response... up to my eyes here
I will be looking at this in more detail on monday and will be able to get
you more solid info then
Thanks
On Fri, Aug 12, 2011 at 4:52 PM, ferd <
reply@reply.github.com>wrote:
I have doubts on the idea of 'any binary string'. The current test suite
does have these. Here's a simple counter-example:1>
socketio_data:decode(#msg{content=socketio_data:encode(#msg{content=[{<<"hi">>,<<"there
"bud\"dy!">>}], json=true})}).
[#msg{content = [{<<"hi">>,<<"there "bud\"dy!">>}],
json = true,length = 0}]The decoded copy is the same as the encoded copy, same length, escaping of
quotation marks in binaries or not. This is why I'm more or less wondering
whether there were unicode sequences in there, which might mess up the
length of the string, or the original data structure. It doesn't seem like
escaping is the actual problem.There's a funny thing that you made me discover though:
2> socketio_data:encode(#msg{content=[{<<"hi">>,<<"there "buddy">>}]}).
[126,109,126,49,126,109,126,{<<"hi">>,<<"there "buddy">>}]Encoding a tuplelist without the 'json' flag in the struct shouldn't work.
The list right there should be caught by the encoder. I'll try to at least
make this more strict. It's purely luck (and wanting to implement something
simple) that the decoder hasn' caught that one.Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 12, 2011
Sorry that was a rushed response... up to my eyes here I will be looking at this in more detail on monday and will be able to get Thanks On Fri, Aug 12, 2011 at 4:52 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 15, 2011
Hi, havent had much time to play around with this but here is a description
of the problem from what i can see:
-
When the transport is polling and on the server side using
socketio_transport_polling.erl, JSON data received by the browser fails to
parse the received msg to valid JSON. -
This is due to the length() BIF being used within socket_io_data:encode,
the encoded JSON string returned from here includes a number of headers /
frames one of which is the length in bytes of the JSON string. Full
structure of the encoded msgs being ?FRAME ++ Length ++ ?FRAME ++
?JSON_FRAME ++ JSON; -
The length returned in the msg header described above does not equal the
length of the JSON string as the length() function does not count escape
sequence characters, so calling length(""name"") will resolve to 6,
whereas it needs to resolve to 8 as we wish to send the entire string
including escape sequences. -
On the client side then the msg received from the server is decoded
using the ?FRAME headers, and uses the length header included in the
received msg to construct the data payload. This can be see in
the Transport.prototype._decode function of socket.io.js. As the length
header received is less than the size of the data payload, the js at line
198 of socket.io.jsthis.base._onMessage(JSON.parse(message.substr(3)));
Fails to parse the message, as the msgs it received does not equal the fully
payload sent by the server.
Let me know what you think or if you need any further info. I could be
completely wrong on all of this as i havent got my head around all the code
yet but this is what i am seeing and the theory matches the results i see in
firefox.
Thanks
Andrew
On Fri, Aug 12, 2011 at 4:52 PM, ferd <
reply@reply.github.com>wrote:
I have doubts on the idea of 'any binary string'. The current test suite
does have these. Here's a simple counter-example:1>
socketio_data:decode(#msg{content=socketio_data:encode(#msg{content=[{<<"hi">>,<<"there
"bud\"dy!">>}], json=true})}).
[#msg{content = [{<<"hi">>,<<"there "bud\"dy!">>}],
json = true,length = 0}]The decoded copy is the same as the encoded copy, same length, escaping of
quotation marks in binaries or not. This is why I'm more or less wondering
whether there were unicode sequences in there, which might mess up the
length of the string, or the original data structure. It doesn't seem like
escaping is the actual problem.There's a funny thing that you made me discover though:
2> socketio_data:encode(#msg{content=[{<<"hi">>,<<"there "buddy">>}]}).
[126,109,126,49,126,109,126,{<<"hi">>,<<"there "buddy">>}]Encoding a tuplelist without the 'json' flag in the struct shouldn't work.
The list right there should be caught by the encoder. I'll try to at least
make this more strict. It's purely luck (and wanting to implement something
simple) that the decoder hasn' caught that one.Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 15, 2011
Hi, havent had much time to play around with this but here is a description
Fails to parse the message, as the msgs it received does not equal the fully Let me know what you think or if you need any further info. I could be Thanks Andrew On Fri, Aug 12, 2011 at 4:52 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 16, 2011
Hi, wondering if you have had a chance to look at my last email?
Keen to know what think...
Thanks
Andrew
On Fri, Aug 12, 2011 at 4:52 PM, ferd <
reply@reply.github.com>wrote:
I have doubts on the idea of 'any binary string'. The current test suite
does have these. Here's a simple counter-example:1>
socketio_data:decode(#msg{content=socketio_data:encode(#msg{content=[{<<"hi">>,<<"there
"bud\"dy!">>}], json=true})}).
[#msg{content = [{<<"hi">>,<<"there "bud\"dy!">>}],
json = true,length = 0}]The decoded copy is the same as the encoded copy, same length, escaping of
quotation marks in binaries or not. This is why I'm more or less wondering
whether there were unicode sequences in there, which might mess up the
length of the string, or the original data structure. It doesn't seem like
escaping is the actual problem.There's a funny thing that you made me discover though:
2> socketio_data:encode(#msg{content=[{<<"hi">>,<<"there "buddy">>}]}).
[126,109,126,49,126,109,126,{<<"hi">>,<<"there "buddy">>}]Encoding a tuplelist without the 'json' flag in the struct shouldn't work.
The list right there should be caught by the encoder. I'll try to at least
make this more strict. It's purely luck (and wanting to implement something
simple) that the decoder hasn' caught that one.Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 16, 2011
Hi, wondering if you have had a chance to look at my last email? Keen to know what think... Thanks Andrew On Fri, Aug 12, 2011 at 4:52 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ferd
Aug 16, 2011
Collaborator
Not yet. I'm having a pretty busy week. I know this is pretty important so I'll try to find some free time ASAP.
Not yet. I'm having a pretty busy week. I know this is pretty important so I'll try to find some free time ASAP. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 17, 2011
Hi, ok np.
FYI here is a test that will demo the issue:
simple_json_test2() ->
Msg = "{"hello":"world"}",
TestMsg = lists:concat(["~m", integer_to_list(length(Msg)), "m~~j",
Msg]),
[X] = decode(#msg{content=TestMsg}),
?assertMatch(#msg{content=[{<<"hello">>,<<"world">>}], json=true}, X).
Thanks
On Wed, Aug 17, 2011 at 12:16 AM, ferd <
reply@reply.github.com>wrote:
Not yet. I'm having a pretty busy week. I know this is pretty important so
I'll try to find some free time ASAP.Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 17, 2011
Hi, ok np. FYI here is a test that will demo the issue: simple_json_test2() -> Thanks On Wed, Aug 17, 2011 at 12:16 AM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ferd
Aug 17, 2011
Collaborator
Regarding the test case, that's because the correct length isn't 17. See:
20> socketio_data:encode(#msg{content=[{<<"hello">>,<<"world">>}], json=true}).
"~m~20~m~~j~{\"hello\":\"world\"}"
The 3 characters difference is due to the j framing, which counts towards the total length, but you didn't count it for the test. See https://github.com/LearnBoost/socket.io/blob/0.6.17/lib/socket.io/utils.js#L26 for the correct JS implementation as of 0.6.x.
Using the encoding functions we provide to create your messages seems to work.
Regarding the test case, that's because the correct length isn't 17. See:
The 3 characters difference is due to the Using the encoding functions we provide to create your messages seems to work. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ferd
Aug 17, 2011
Collaborator
Note being, I still can't produce any of these problems you had when I start by submitting Erlang Data structures rather than a JSON string and then trying to shove it in the app. Are you using the interface functions or just submitting the JSON?
Note being, I still can't produce any of these problems you had when I start by submitting Erlang Data structures rather than a JSON string and then trying to shove it in the app. Are you using the interface functions or just submitting the JSON? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 17, 2011
k, i guess i am not communication this properly. Hopefully this example
using real data will help:
Here is the term i trying to send via socket.io
[[{<<"action">>,<<"feedback">>},
{<<"payload">>,
[{<<"context">>,<<"order_ack">>},
{<<"report_id">>,90},
{<<"report_type">>,<<"new">>},
{<<"order_id">>,76},
{<<"user_id">>,<<"stig">>},
{<<"client_order_id">>,6},
{<<"client_head_id">>,6},
{<<"order_type">>,<<"new">>},
{<<"status">>,<<"new">>},
{<<"fill_type">>,<<"day">>},
{<<"book_id">>,<<"usd_rate_outright_2">>},
{<<"side">>,<<"bid">>},
{<<"price">>,9000},
{<<"order_qty">>,120},
{<<"head_qty">>,120},
{<<"display_qty">>,<<"undefined">>},
{<<"fill_qty">>,0},
{<<"left_qty">>,120},
{<<"child_id">>,<<"undefined">>},
{<<"head_id">>,76}]}]]
Here is the resulting JSON returned from the encode function
"[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty":0,"left_qty":120,"child_id":"undefined","head_id":76}}]"
Now if you run length() from an erlang shell on that JSON string you will
see that it results in a length of 396 characters, whereas in fact the
length should be 459
When sending this data via socket.io api i.e.
socketio_client:send(Client, #msg{ content=Data, json=true })
If the browser is Firefox the JSON.parse in the clientside script i.e.
Transport.prototype._onMessage = function(message){
if (!this.sessionid){
this.sessionid = message;
this._onConnect();
} else if (message.substr(0, 3) == 'h'){
this._onHeartbeat(message.substr(3));
} else if (message.substr(0, 3) == 'j'){
this.base._onMessage(JSON.parse(message.substr(3)));
} else {
this.base._onMessage(message);
}
},
Fails to parse the string, if I look at the length of the string passed to
here it is 396 characters.
If you use the above term along with a firefox browser on the client ( i am
using firefox 3.6.17 on debian ), i would expect you will see the same
result. NOTE: the transport method has to be jsonp-polling. All works fine
when its websockets.
Andrew
On Wed, Aug 17, 2011 at 6:36 PM, ferd <
reply@reply.github.com>wrote:
Note being, I still can't produce any of these problems you had when I
start by submitting Erlang Data structures rather than a JSON string and
then trying to shove it in the app. Are you using the interface functions or
just submitting the JSON?Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 17, 2011
k, i guess i am not communication this properly. Hopefully this example Here is the term i trying to send via socket.io[[{<<"action">>,<<"feedback">>}, Here is the resulting JSON returned from the encode function"[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty":0,"left_qty":120,"child_id":"undefined","head_id":76}}]" Now if you run length() from an erlang shell on that JSON string you will When sending this data via socket.io api i.e. socketio_client:send(Client, #msg{ content=Data, json=true }) If the browser is Firefox the JSON.parse in the clientside script i.e. Transport.prototype._onMessage = function(message){ Fails to parse the string, if I look at the length of the string passed to If you use the above term along with a firefox browser on the client ( i am Andrew On Wed, Aug 17, 2011 at 6:36 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ferd
Aug 17, 2011
Collaborator
On Wed, Aug 17, 2011 at 2:22 PM, andymck <
reply@reply.github.com>wrote:
k, i guess i am not communication this properly. Hopefully this example
using real data will help:
Here is the term i trying to send via socket.io
[[{<<"action">>,<<"feedback">>},
{<<"payload">>,
[{<<"context">>,<<"order_ack">>},
{<<"report_id">>,90},
{<<"report_type">>,<<"new">>},
{<<"order_id">>,76},
{<<"user_id">>,<<"stig">>},
{<<"client_order_id">>,6},
{<<"client_head_id">>,6},
{<<"order_type">>,<<"new">>},
{<<"status">>,<<"new">>},
{<<"fill_type">>,<<"day">>},
{<<"book_id">>,<<"usd_rate_outright_2">>},
{<<"side">>,<<"bid">>},
{<<"price">>,9000},
{<<"order_qty">>,120},
{<<"head_qty">>,120},
{<<"display_qty">>,<<"undefined">>},
{<<"fill_qty">>,0},
{<<"left_qty">>,120},
{<<"child_id">>,<<"undefined">>},
{<<"head_id">>,76}]}]]Here is the resulting JSON returned from the encode function
"[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty":0,"left_qty":120,"child_id":"undefined","head_id":76}}]"
Now if you run length() from an erlang shell on that JSON string you will
see that it results in a length of 396 characters, whereas in fact the
length should be 459The length is 396, including the
jframe, yes. The \ you see are
_not_something you want to count. They're only a visual representation
in the
shell for you:
30> "m396m~~j["++[Char0,Char1,Char2|_] = JSON.
"m396m~~j[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty":0,"left_qty":120,"child_id":"undefined","head_id":76}}]"
31> {Char0,Char1,Char2}.
{123,34,97}
32> $".
34
The backslash characters are not part of the string and it's normal for
length/1 not to count them. They're just there so it makes sense when you
read it as a string so you get """ and not """.
When sending this data via socket.io api i.e.
socketio_client:send(Client, #msg{ content=Data, json=true })
If the browser is Firefox the JSON.parse in the clientside script i.e.
Transport.prototype._onMessage = function(message){
if (!this.sessionid){
this.sessionid = message;
this._onConnect();
} else if (message.substr(0, 3) == 'h'){
this._onHeartbeat(message.substr(3));
} else if (message.substr(0, 3) == 'j'){this.base._onMessage(JSON.parse(message.substr(3)));
} else {
this.base._onMessage(message);
}
},Fails to parse the string, if I look at the length of the string passed to
here it is 396 characters.If you use the above term along with a firefox browser on the client ( i am
using firefox 3.6.17 on debian ), i would expect you will see the same
result. NOTE: the transport method has to be jsonp-polling. All works
fine
when its websockets.
I'm trying it with jsonp-polling and I am able to reproduce the bug, even
with FF6. That sounds like a good way to start looking into it. The error is
likely not about the length of the string and how we encode the data, but
likely has to do with the transport or the client. More likely the transport
module.
I inserted some debugging code in the client and got this as the output:
j[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty
There we have backslashes added when they need not to, rather than length
not taking them into account.
We'll start to work from there. Thanks for the report and the useful use
case there.
Andrew
On Wed, Aug 17, 2011 at 6:36 PM, ferd <
reply@reply.github.com>wrote:
Note being, I still can't produce any of these problems you had when I
start by submitting Erlang Data structures rather than a JSON string and
then trying to shove it in the app. Are you using the interface functions
or
just submitting the JSON?Reply to this email directly or view it on GitHub:
Reply to this email directly or view it on GitHub:
On Wed, Aug 17, 2011 at 2:22 PM, andymck < k, i guess i am not communication this properly. Hopefully this example
30> " 31> {Char0,Char1,Char2}. The backslash characters are not part of the string and it's normal for
I'm trying it with jsonp-polling and I am able to reproduce the bug, even I inserted some debugging code in the client and got this as the output:
There we have backslashes added when they need not to, rather than length We'll start to work from there. Thanks for the report and the useful use
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 17, 2011
yes, but in firefox running jsonp-polling here is what the browser receives
for that same msg
j[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_
As you can see the backslashes are present....and the string terminates at
the exact same length as what the erlang shell returns for the length of the
json string....
Firefox throws the following error, as it cannot obviously parse the
truncated string...
JSON.parse
[Break On This Error] this.base._onMessage(JSON.parse(message.substr(3)));
socket.io.js (line 199)
On Wed, Aug 17, 2011 at 7:47 PM, ferd <
reply@reply.github.com>wrote:
On Wed, Aug 17, 2011 at 2:22 PM, andymck <
reply@reply.github.com>wrote:k, i guess i am not communication this properly. Hopefully this example
using real data will help:
Here is the term i trying to send via socket.io
[[{<<"action">>,<<"feedback">>},
{<<"payload">>,
[{<<"context">>,<<"order_ack">>},
{<<"report_id">>,90},
{<<"report_type">>,<<"new">>},
{<<"order_id">>,76},
{<<"user_id">>,<<"stig">>},
{<<"client_order_id">>,6},
{<<"client_head_id">>,6},
{<<"order_type">>,<<"new">>},
{<<"status">>,<<"new">>},
{<<"fill_type">>,<<"day">>},
{<<"book_id">>,<<"usd_rate_outright_2">>},
{<<"side">>,<<"bid">>},
{<<"price">>,9000},
{<<"order_qty">>,120},
{<<"head_qty">>,120},
{<<"display_qty">>,<<"undefined">>},
{<<"fill_qty">>,0},
{<<"left_qty">>,120},
{<<"child_id">>,<<"undefined">>},
{<<"head_id">>,76}]}]]Here is the resulting JSON returned from the encode function
"[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty":0,"left_qty":120,"child_id":"undefined","head_id":76}}]"
Now if you run length() from an erlang shell on that JSON string you will
see that it results in a length of 396 characters, whereas in fact the
length should be 459The length is 396, including the
jframe, yes. The \ you see are
_not_something you want to count. They're only a visual representation
in the
shell for you:30> "
m396m~~j["++[Char0,Char1,Char2|_] = JSON."
m396m~~j[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty":0,"left_qty":120,"child_id":"undefined","head_id":76}}]"31> {Char0,Char1,Char2}.
{123,34,97}
32> $".
34The backslash characters are not part of the string and it's normal for
length/1 not to count them. They're just there so it makes sense when you
read it as a string so you get """ and not """.When sending this data via socket.io api i.e.
socketio_client:send(Client, #msg{ content=Data, json=true })
If the browser is Firefox the JSON.parse in the clientside script i.e.
Transport.prototype._onMessage = function(message){
if (!this.sessionid){
this.sessionid = message;
this._onConnect();
} else if (message.substr(0, 3) == 'h'){
this._onHeartbeat(message.substr(3));
} else if (message.substr(0, 3) == 'j'){this.base._onMessage(JSON.parse(message.substr(3)));
} else {
this.base._onMessage(message);
}
},Fails to parse the string, if I look at the length of the string passed
to
here it is 396 characters.If you use the above term along with a firefox browser on the client ( i
am
using firefox 3.6.17 on debian ), i would expect you will see the same
result. NOTE: the transport method has to be jsonp-polling. All works
fine
when its websockets.I'm trying it with jsonp-polling and I am able to reproduce the bug, even
with FF6. That sounds like a good way to start looking into it. The error
is
likely not about the length of the string and how we encode the data, but
likely has to do with the transport or the client. More likely the
transport
module.I inserted some debugging code in the client and got this as the output:
j[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qtyThere we have backslashes added when they need not to, rather than length
not taking them into account.We'll start to work from there. Thanks for the report and the useful use
case there.Andrew
On Wed, Aug 17, 2011 at 6:36 PM, ferd <
reply@reply.github.com>wrote:
Note being, I still can't produce any of these problems you had when I
start by submitting Erlang Data structures rather than a JSON string
and
then trying to shove it in the app. Are you using the interface
functions
or
just submitting the JSON?Reply to this email directly or view it on GitHub:
Reply to this email directly or view it on GitHub:
Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 17, 2011
yes, but in firefox running jsonp-polling here is what the browser receives
As you can see the backslashes are present....and the string terminates at Firefox throws the following error, as it cannot obviously parse the JSON.parse On Wed, Aug 17, 2011 at 7:47 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 17, 2011
Sorry, just noticed the bottom half of your email....which correlates with
what i just send
So ignore that please. Glad to see that you are able to reproduce the
problem
On Wed, Aug 17, 2011 at 7:47 PM, ferd <
reply@reply.github.com>wrote:
On Wed, Aug 17, 2011 at 2:22 PM, andymck <
reply@reply.github.com>wrote:k, i guess i am not communication this properly. Hopefully this example
using real data will help:
Here is the term i trying to send via socket.io
[[{<<"action">>,<<"feedback">>},
{<<"payload">>,
[{<<"context">>,<<"order_ack">>},
{<<"report_id">>,90},
{<<"report_type">>,<<"new">>},
{<<"order_id">>,76},
{<<"user_id">>,<<"stig">>},
{<<"client_order_id">>,6},
{<<"client_head_id">>,6},
{<<"order_type">>,<<"new">>},
{<<"status">>,<<"new">>},
{<<"fill_type">>,<<"day">>},
{<<"book_id">>,<<"usd_rate_outright_2">>},
{<<"side">>,<<"bid">>},
{<<"price">>,9000},
{<<"order_qty">>,120},
{<<"head_qty">>,120},
{<<"display_qty">>,<<"undefined">>},
{<<"fill_qty">>,0},
{<<"left_qty">>,120},
{<<"child_id">>,<<"undefined">>},
{<<"head_id">>,76}]}]]Here is the resulting JSON returned from the encode function
"[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty":0,"left_qty":120,"child_id":"undefined","head_id":76}}]"
Now if you run length() from an erlang shell on that JSON string you will
see that it results in a length of 396 characters, whereas in fact the
length should be 459The length is 396, including the
jframe, yes. The \ you see are
_not_something you want to count. They're only a visual representation
in the
shell for you:30> "
m396m~~j["++[Char0,Char1,Char2|_] = JSON."
m396m~~j[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qty":0,"left_qty":120,"child_id":"undefined","head_id":76}}]"31> {Char0,Char1,Char2}.
{123,34,97}
32> $".
34The backslash characters are not part of the string and it's normal for
length/1 not to count them. They're just there so it makes sense when you
read it as a string so you get """ and not """.When sending this data via socket.io api i.e.
socketio_client:send(Client, #msg{ content=Data, json=true })
If the browser is Firefox the JSON.parse in the clientside script i.e.
Transport.prototype._onMessage = function(message){
if (!this.sessionid){
this.sessionid = message;
this._onConnect();
} else if (message.substr(0, 3) == 'h'){
this._onHeartbeat(message.substr(3));
} else if (message.substr(0, 3) == 'j'){this.base._onMessage(JSON.parse(message.substr(3)));
} else {
this.base._onMessage(message);
}
},Fails to parse the string, if I look at the length of the string passed
to
here it is 396 characters.If you use the above term along with a firefox browser on the client ( i
am
using firefox 3.6.17 on debian ), i would expect you will see the same
result. NOTE: the transport method has to be jsonp-polling. All works
fine
when its websockets.I'm trying it with jsonp-polling and I am able to reproduce the bug, even
with FF6. That sounds like a good way to start looking into it. The error
is
likely not about the length of the string and how we encode the data, but
likely has to do with the transport or the client. More likely the
transport
module.I inserted some debugging code in the client and got this as the output:
j[{"action":"feedback","payload":{"context":"order_ack","report_id":90,"report_type":"new","order_id":76,"user_id":"stig","client_order_id":6,"client_head_id":6,"order_type":"new","status":"new","fill_type":"day","book_id":"usd_rate_outright_2","side":"bid","price":9000,"order_qty":120,"head_qty":120,"display_qty":"undefined","fill_qtyThere we have backslashes added when they need not to, rather than length
not taking them into account.We'll start to work from there. Thanks for the report and the useful use
case there.Andrew
On Wed, Aug 17, 2011 at 6:36 PM, ferd <
reply@reply.github.com>wrote:
Note being, I still can't produce any of these problems you had when I
start by submitting Erlang Data structures rather than a JSON string
and
then trying to shove it in the app. Are you using the interface
functions
or
just submitting the JSON?Reply to this email directly or view it on GitHub:
Reply to this email directly or view it on GitHub:
Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 17, 2011
Sorry, just noticed the bottom half of your email....which correlates with So ignore that please. Glad to see that you are able to reproduce the On Wed, Aug 17, 2011 at 7:47 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ferd
Aug 17, 2011
Collaborator
I'm out of time to check this for now, but the source of the problem seems to be
Message0 = binary_to_list(jsx:term_to_json(list_to_binary(Message), [{strict, false}]))
on line 298 in socketio_transport_polling.erl (send_message_1/5). The problem is that if the previous data was already JSON, then it ends up being escaped just a bit too much.
I don't have much time left today, but if this is critical for you, this quick patch ought to do the job until we find a real solution to the problem:
diff --git a/src/socketio_transport_polling.erl b/src/socketio_transport_polling.erl
index e52e0ce..0b76d03 100644
--- a/src/socketio_transport_polling.erl
+++ b/src/socketio_transport_polling.erl
@@ -296,7 +296,7 @@ send_message(Message, Req, Index, ServerModule, Sup) ->
send_message_1(Headers, Message, Req, Index, ServerModule) ->
Headers0 = [{"Content-Type", "text/javascript; charset=UTF-8"}|Headers],
- Message0 = binary_to_list(jsx:term_to_json(list_to_binary(Message), [{strict, false}])),
+ Message0 = escape(binary_to_list(jsx:term_to_json(list_to_binary(Message), [{strict, false}]))),
Message1 = "io.JSONP["++Index++"]._(" ++ Message0 ++ ");",
apply(ServerModule, respond, [Req, 200, Headers0, Message1]).
@@ -325,3 +325,7 @@ reset_duration({TimerRef, Time}) ->
erlang:cancel_timer(TimerRef),
NewRef = erlang:start_timer(Time, self(), polling),
{NewRef, Time}.
+
+escape([]) -> [];
+escape("\\\\\\"++Rest) -> "\\"++escape(Rest);
+escape([C|Rest]) -> [C|escape(Rest)].
It will only 'unescape' some characters, so I'd consider it somewhat risky for user-submitted data. Risks of XSS are to be evaluated. I hope this helps in the meantime. As I said, we'll need to find a good solution, not just this toy thing. I've quickly tested the other transports and they seemed to work fine, in any case.
I'm out of time to check this for now, but the source of the problem seems to be
on line 298 in socketio_transport_polling.erl (send_message_1/5). The problem is that if the previous data was already JSON, then it ends up being escaped just a bit too much. I don't have much time left today, but if this is critical for you, this quick patch ought to do the job until we find a real solution to the problem:
It will only 'unescape' some characters, so I'd consider it somewhat risky for user-submitted data. Risks of XSS are to be evaluated. I hope this helps in the meantime. As I said, we'll need to find a good solution, not just this toy thing. I've quickly tested the other transports and they seemed to work fine, in any case. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 17, 2011
Thanks. The patch works for our usage right now, so thats a big help.
Cheers
On Wed, Aug 17, 2011 at 8:29 PM, ferd <
reply@reply.github.com>wrote:
I'm out of time to check this for now, but the source of the problem seems
to beMessage0 = binary_to_list(jsx:term_to_json(list_to_binary(Message),
[{strict, false}]))on line 298 in socketio_transport_polling.erl (send_message_1/5). The
problem is that if the previous data was already JSON, then it ends up being
escaped just a bit too much.I don't have much time left today, but if this is critical for you, this
quick patch ought to do the job until we find a real solution to the
problem:diff --git a/src/socketio_transport_polling.erl
b/src/socketio_transport_polling.erl
index e52e0ce..0b76d03 100644
--- a/src/socketio_transport_polling.erl
+++ b/src/socketio_transport_polling.erl
@@ -296,7 +296,7 @@ send_message(Message, Req, Index, ServerModule, Sup)
->send_message_1(Headers, Message, Req, Index, ServerModule) -> Headers0 = [{"Content-Type", "text/javascript;
charset=UTF-8"}|Headers],
- Message0 = binary_to_list(jsx:term_to_json(list_to_binary(Message),
[{strict, false}])),- Message0 =
escape(binary_to_list(jsx:term_to_json(list_to_binary(Message), [{strict,
false}]))),
Message1 = "io.JSONP["++Index++"]._(" ++ Message0 ++ ");",
apply(ServerModule, respond, [Req, 200, Headers0, Message1]).@@ -325,3 +325,7 @@ reset_duration({TimerRef, Time}) ->
erlang:cancel_timer(TimerRef),
NewRef = erlang:start_timer(Time, self(), polling),
{NewRef, Time}.
+
+escape([]) -> [];
+escape("\"++Rest) -> ""++escape(Rest);
+escape([C|Rest]) -> [C|escape(Rest)].It will only 'unescape' some characters, so I'd consider it somewhat risky
for user-submitted data. Risks of XSS are to be evaluated. I hope this helps
in the meantime. As I said, we'll need to find a good solution, not just
this toy thing. I've quickly tested the other transports and they seemed to
work fine, in any case.Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 17, 2011
Thanks. The patch works for our usage right now, so thats a big help. Cheers On Wed, Aug 17, 2011 at 8:29 PM, ferd <
|
Should be fixed and safer as of a146778 andymck, if you could update your version, you'd have something safer with regards to XSS, although likely a bit slower. Safety over speed. |
ferd
closed this
Aug 19, 2011
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
andymck
Aug 19, 2011
ferd, thanks for that
But can i ask that you dont use the data i sent you as an example. That
data is from a live product and I dont know if my boss would be that happy
to see it out in the open
Appreciate the fix.
Thanks
Andrew
On Fri, Aug 19, 2011 at 4:56 PM, ferd <
reply@reply.github.com>wrote:
Should be fixed and safer as of
a146778andymck, if you could update your version, you'd have something safer with
regards to XSS, although likely a bit slower. Safety over speed.Reply to this email directly or view it on GitHub:
#57 (comment)
andymck
commented
Aug 19, 2011
ferd, thanks for that But can i ask that you dont use the data i sent you as an example. That Appreciate the fix. Thanks Andrew On Fri, Aug 19, 2011 at 4:56 PM, ferd <
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
ferd
Aug 19, 2011
Collaborator
That should be fixed in the merge. Let me know if this is enough or are looking to have all traces gone (which I am not sure how to do)
That should be fixed in the merge. Let me know if this is enough or are looking to have all traces gone (which I am not sure how to do) |
andymck commentedAug 12, 2011
I noticed when sending JSON that data is truncated when received in firefox.
I tracked this down to an issue in socketio_data:encode, after encoding the term via jsx:encode, the length of the returned Json is determined using:
Length = integer_to_list(length(JSON) + ?JSON_FRAME_LENGTH),
However length(JSON) does not count escape sequences, only escaped chars, so the following JSON string:
"[{"name":"andrew","address":"5 nowhere st"}]"
returns 44, where it should return 52.
This results in payloads sent to the browser being truncated to that of the value return from length(JSON) above....though it is only truncated on specific transport mechanisms. I have seen it using firefox which was using jsonp-polling.