New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with pattern entity ids including backslashes #2142

Closed
fgalan opened this Issue May 12, 2016 · 11 comments

Comments

Projects
None yet
4 participants
@fgalan
Copy link
Member

fgalan commented May 12, 2016

It has been found in the NGSIv1 API that using pattern entity ids can be problematic, as the \ is not correctly interpreted.

Although the problem has been detected in NGSIv1, the NGSIv2 equivalent functionality should be also checked (and covered with tests).

@fgalan fgalan added bug P6 labels May 12, 2016

@dmoranj

This comment has been minimized.

Copy link
Contributor

dmoranj commented May 12, 2016

An example that has been proved to fail in verison 0.26 is the following: MOBA_[23]\d{2}

@jmcanterafonseca jmcanterafonseca changed the title Problems with pattern entity ids includeing backslashes Problems with pattern entity ids including backslashes May 24, 2016

@kzangeli

This comment has been minimized.

Copy link
Member

kzangeli commented Jun 15, 2016

The following payload gives a JSON parse error, also in jsonlint.com:

{
"id": "MOBA_[23]\d{2}",
"type": "T1"
}

If I remove the backslash, it works, both in the broker and in jsonlint.com ...

@fgalan fgalan added this to the 1.3.0 milestone Jun 15, 2016

@kzangeli

This comment has been minimized.

Copy link
Member

kzangeli commented Jun 15, 2016

I'm having some kind of deja-vu here ...
Sure we've discussed this before.

JSON accepts backslash only if followed by one of the following chars:

"\/bfnrtu

Any other char after \ provokes a parse error as it is incorrect JSON.

Now, if I change the 'd' for a 'b' in the example supplied by @dmoranj, the broker no longer complains about JSON parse error but about invalid chars in payload ...
Actually, just MOBA_\b gives invalid chars in payload ...

@iariasleon

This comment has been minimized.

Copy link
Contributor

iariasleon commented Jun 16, 2016

This behavior with a backslash is the same when we try to create a new subscription using NGSI v2. It is a restriction of json (http://www.json.org/ )

after of the backslash only are allowed: " \ / b f n r t u

returning an error "Errors found in incoming JSON buffer", if we use regexp chars as: \a \e \v \x \d \w \s \D \W \S \A \Z \B

@fgalan

This comment has been minimized.

Copy link
Member Author

fgalan commented Jun 23, 2016

Let's try to define the expected behaviour. Starting with a clean CB (i.e. empty database), the following entity is created:

curl -v 'localhost:1026/v2/entities?options=keyValues' --header 'Accept: application/json' --header 'Content-Type: application/json' -d @- <<EOF
{
  "id": "MOBA_255",
  "A": 42
}
EOF

Now, considering the following query:

(curl localhost:1026/v1/queryContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d @- | python -mjson.tool) <<EOF
{
    "entities": [
        {
            "isPattern": "true",
            "id": "MOBA_[23]\\d{2}"
        }
    ]
} 
EOF
  1. Payload is legal JSON (checked with jsonlint.com)
  2. After un-scaping the slash (i.e. after the parsing layer) regex is: MOBA_[23]\d{2}. This match MOBA_255, so that entity should be included in the queryContext response.

As far as I have cheched (reference CB version: 1.2.0-next git version: 85aa85d) the result of the above queryContext is:

{
    "errorCode": {
        "code": "400",
        "details": "JSON Parse Error",
        "reasonPhrase": "Bad Request"
    }
}

So it seems we are now blocked at point 1. Once point 1 get solved, let's see what happens at point 2...

@fgalan

This comment has been minimized.

Copy link
Member Author

fgalan commented Jun 23, 2016

Addition fact, the following request:

curl localhost:1026/v1/queryContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d @- | python -mjson.tool) <<EOF
{
    "entities": [
        {
            "isPattern": "true",
            "id": "MOBA_[23]\\\\d{2}"
        }
    ]
}
EOF

returns MOBA_255.

This is of course a consequence of how Orion is implemented right now. However, that doesn't means that it is the way we want to work in the future (neither the opposite ;)

@fgalan

This comment has been minimized.

Copy link
Member Author

fgalan commented Jun 23, 2016

More info: similar test done in NGSIv2:

(curl localhost:1026/v2/op/query -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d @- | python -mjson.tool) <<EOF
{
    "entities": [
        {
            "idPattern": "MOBA_[23]\\d{2}"
        }
    ]
} 
EOF

The payload is valid (according with JSONLint) and CB returns "parse error".

(curl localhost:1026/v2/op/query -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d @- | python -mjson.tool) <<EOF
{
    "entities": [
        {
            "idPattern": "MOBA_[23]\\\\d{2}"
        }
    ]
} 
EOF

returns MOBA_255.

Conclusion: similar behaviour to NGSIv1.

@fgalan

This comment has been minimized.

Copy link
Member Author

fgalan commented Jun 23, 2016

In my previous tests today I was missing an subtle but important detail: curl shell applies its own scaping so what you write at curl level is not the same you get "on the wire". Let's start by the beginning. We have the following:

curl localhost:1026/v1/queryContext -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d @- | python -mjson.tool) <<EOF
{
    "entities": [
        {
            "isPattern": "true",
            "id": "MOBA_[23]\\\\d{2}"
        }
    ]
}
EOF

It can be checked that on the wire, what curl is sendinng is \\ and not \\\\. With nc:

fermin@neodeb:~$ nc -l -p 1026
POST /v2/op/query HTTP/1.1
User-Agent: curl/7.38.0
Host: localhost:1026
Content-Type: application/json
Accept: application/json
Content-Length: 85

{    "entities": [        {            "idPattern": "MOBA_[23]\\d{2}"        }    ]}

With Wireshark:

2016-06-23 15_58_30-debian8 2 - vmware player non-commercial use only

Thus, on the wire CB gets MOBA_[23]\\d{2} which the JSON parser translates to MOBA_[23]\d{2}, which is a valid expression. As has been tested (see above) the result if MOBA_255 (i.e. correct behaviour).

The case for "JSON parser error" makes also sense: at curl level we have MOBA_[23]\\d{2} with on the wire is MOBA_[23]\d{2} which is a wrong string in JSON, detected by the parser (as been already explained by @iariasleon and @kzangeli, is is not legal that a \ in JSON is followed by d).

@fgalan

This comment has been minimized.

Copy link
Member Author

fgalan commented Jun 23, 2016

Another test, being payload.json

{
    "entities": [
        {
            "idPattern": "MOBA_[23]\\d{2}"
        }
    ]
} 

With:

curl localhost:1026/v2/op/query -s -S --header 'Content-Type: application/json' --header 'Accept: application/json' -d @payload.json

sends (at nc):

POST /v2/op/query HTTP/1.1
User-Agent: curl/7.38.0
Host: localhost:1026
Content-Type: application/json
Accept: application/json
Content-Length: 85

{    "entities": [        {            "idPattern": "MOBA_[23]\\d{2}"        }    ]}

I.e. in the case of using a file to hold the payload, the "shell scaping effect" is avoided.

@fgalan

This comment has been minimized.

Copy link
Member Author

fgalan commented Jun 23, 2016

In sum, it seems that the CB behaviour is correct right now, but documentation should be improved to include some comment regarding this stuff.

@fgalan

This comment has been minimized.

Copy link
Member Author

fgalan commented Jul 7, 2016

Documentation clarifications add to doc/manuals/user/regex_in_payload.md file (PR #2333).

@fgalan fgalan closed this Jul 7, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment