Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL percent-encoding #3682

Closed
bigbes opened this issue Sep 11, 2018 · 0 comments · Fixed by #7254
Closed

URL percent-encoding #3682

bigbes opened this issue Sep 11, 2018 · 0 comments · Fixed by #7254
Assignees
Labels
2.11 Target is 2.11 and all newer release/master branches app feature A new functionality lua

Comments

@bigbes
Copy link
Contributor

bigbes commented Sep 11, 2018

cURL easy API contains functions curl_easy_escape and curl_easy_unescape that are respectively analogues of url_encode and url_decode:

Old implementation of these functions in http.server are slow and inefficient.

RFC: https://www.notion.so/tarantool/http-add-percent-encoding-decoding-of-query-string-in-request-76a2425a4c4744a1a72643527a4fe7f7

@kyukhin kyukhin added the feature A new functionality label Oct 2, 2018
@kyukhin kyukhin added this to the wishlist milestone Oct 2, 2018
olegrok added a commit to olegrok/tarantool that referenced this issue Apr 13, 2019
Lua FFI bindings of curl_easy_escape and curl_easy_unescape.

Closes tarantool#3682
olegrok added a commit to olegrok/tarantool that referenced this issue Apr 13, 2019
Lua FFI bindings of curl_easy_escape and curl_easy_unescape.

Closes tarantool#3682
olegrok added a commit to olegrok/tarantool that referenced this issue Apr 24, 2019
Lua FFI bindings of curl_easy_escape and curl_easy_unescape.

Closes tarantool#3682

@TarantoolBot document
Title: httpc: url_encode and url_decode functions
  * http_client.url_encode - URL encodes the given string
  * http_cleint.url_decode - URL decodes the given string

See:
  https://curl.haxx.se/libcurl/c/curl_easy_escape.html
  https://curl.haxx.se/libcurl/c/curl_easy_unescape.html
olegrok added a commit to olegrok/tarantool that referenced this issue Apr 24, 2019
Lua FFI bindings of curl_easy_escape and curl_easy_unescape.

Closes tarantool#3682

@TarantoolBot document
Title: httpc: url_encode and url_decode functions
  * http_client.url_encode - URL encodes the given string
  * http_cleint.url_decode - URL decodes the given string

See:
  https://curl.haxx.se/libcurl/c/curl_easy_escape.html
  https://curl.haxx.se/libcurl/c/curl_easy_unescape.html
olegrok added a commit to olegrok/tarantool that referenced this issue Apr 24, 2019
Lua FFI bindings of curl_easy_escape and curl_easy_unescape.

Closes tarantool#3682

@TarantoolBot document
Title: httpc: url_encode and url_decode functions
  * http_client.url_encode - URL encodes the given string
  * http_cleint.url_decode - URL decodes the given string

See:
  https://curl.haxx.se/libcurl/c/curl_easy_escape.html
  https://curl.haxx.se/libcurl/c/curl_easy_unescape.html
olegrok added a commit to olegrok/tarantool that referenced this issue Apr 24, 2019
Lua FFI bindings of curl_easy_escape and curl_easy_unescape.

Closes tarantool#3682

@TarantoolBot document
Title: httpc: url_encode and url_decode functions
  * http_client.url_encode - URL encodes the given string
  * http_cleint.url_decode - URL decodes the given string

See:
  https://curl.haxx.se/libcurl/c/curl_easy_escape.html
  https://curl.haxx.se/libcurl/c/curl_easy_unescape.html
olegrok added a commit to olegrok/tarantool that referenced this issue Apr 24, 2019
Lua FFI bindings of curl_easy_escape and curl_easy_unescape.

Closes tarantool#3682

@TarantoolBot document
Title: httpc: url_encode and url_decode functions
  * http_client.url_encode - URL encodes the given string
  * http_cleint.url_decode - URL decodes the given string

See:
  https://curl.haxx.se/libcurl/c/curl_easy_escape.html
  https://curl.haxx.se/libcurl/c/curl_easy_unescape.html
olegrok added a commit that referenced this issue Jan 7, 2020
cURL easy API contains functions curl_easy_escape and
curl_easy_unescape:

  - https://curl.haxx.se/libcurl/c/curl_easy_escape.html
  - https://curl.haxx.se/libcurl/c/curl_easy_unescape.html

This functions can be sometimes useful for our users
especially when they works with http client and form
query arguments for http request

This patch introduces two functions available from
our http client:
```lua
httpc.url_escape
httpc.url_unescape
```

Need for #3682
olegrok added a commit that referenced this issue Jan 7, 2020
Previous patch introduced url_escape and url_unescape.
However it's quite poorly to give our users too
low-level functions. The main use case is to form and parse
http query parameters

This patch introduces format_query and parse_query functions

Closes #3682

@TarantoolBot document
Title: New http.client functions

Four new functions are available in http.client module:

  - url_escape - simple wrapper over curl_easy_escape
      (https://curl.haxx.se/libcurl/c/curl_easy_escape.html)

  - url_unescape - simple wrapper over curl_easy_unescape
      (https://curl.haxx.se/libcurl/c/curl_easy_unescape.html)

Examples:
```lua
-- According RFC3986

tarantool> httpc.url_escape('hello')
---
- hello
...

tarantool> httpc.url_escape('Привет')
---
- '%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82'
...

tarantool> httpc.url_escape('$&+,:;=?@-._~')
---
- '%24%26%2B%2C%3A%3B%3D%3F%40-._~'
...
```

  - format_query - format query arguments from key-value string pairs
for HTTP request

  - parse_query - parse query string into table

Example:
```lua
-- The query string is composed of a series of field-value pairs
-- Within each pair, the field name and value are
--   separated by an equals sign, "="
-- The series of pairs is separated by the ampersand, "&"

tarantool> httpc.format_query({['hello'] = 'world', ['привет'] = 'мир'})
---
- '%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82=%D0%BC%D0%B8%D1%80&hello=world'
...

tarantool> httpc.parse_query('%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82=' ..
'%D0%BC%D0%B8%D1%80&hello=world')
---
- привет: мир
  hello: world
...

```
ligurio added a commit to ligurio/tarantool that referenced this issue Jun 9, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced. First one
allows to escape symbols in a string and second one to unescape symbols in a
string. Escaping and unescaping symbols implemented using CURL functions
curl_easy_escape() and curl_easy_unescape() and conforms to RFC 3986.
Maximum length of string is limited by CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').decode('/search/?text=%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB&lr=213')
---
- /search/?text=тарантул&lr=213
...

tarantool>
```

NO_CHANGELOG=internal

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Jun 9, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced. First one
allows to escape symbols in a string and second one to unescape symbols in a
string. Escaping and unescaping symbols implemented using CURL functions
curl_easy_escape() and curl_easy_unescape() and conforms to RFC 3986.
Maximum length of string is limited by CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').decode('/search/?text=%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB&lr=213')
---
- /search/?text=тарантул&lr=213
...

tarantool>
```

NO_CHANGELOG=internal

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Jun 9, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced. First one
allows to escape symbols in a string and second one to unescape symbols in a
string. Escaping and unescaping symbols implemented using CURL functions
curl_easy_escape() and curl_easy_unescape() and conforms to RFC 3986.
Maximum length of string is limited by CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').decode('/search/?text=%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB&lr=213')
---
- /search/?text=тарантул&lr=213
...

tarantool>
```

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Jun 9, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced.
First one allows to escape symbols in a string and second one to
unescape symbols in a string. Escaping and unescaping symbols
implemented using CURL functions curl_easy_escape() and
curl_easy_unescape() and conforms to RFC 3986. Maximum length of string
is limited with CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').decode('/search/?text=%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB&lr=213')
---
- /search/?text=тарантул&lr=213
...

tarantool>
```

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Jun 9, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced.
First one allows to escape symbols in a string and second one to
unescape symbols in a string. Escaping and unescaping symbols
implemented using CURL functions curl_easy_escape() and
curl_easy_unescape() and conforms to RFC 3986. Maximum length of string
is limited with CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').encode('тарантул')
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool>
```

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Jun 9, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced.
First one allows to escape symbols in a string and second one to
unescape symbols in a string. Escaping and unescaping symbols
implemented using CURL functions curl_easy_escape() and
curl_easy_unescape() and conforms to RFC 3986. Maximum length of string
is limited with CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').encode('тарантул')
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool>
```

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Jun 9, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced.
First one allows to escape symbols in a string and second one to
unescape symbols in a string. Escaping and unescaping symbols
implemented using CURL functions curl_easy_escape() and
curl_easy_unescape() and conforms to RFC 3986. Maximum length of string
is limited with CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').encode('тарантул')
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool>
```

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Jun 9, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced.
First one allows to escape symbols in a string and second one to
unescape symbols in a string. Escaping and unescaping symbols
implemented using CURL functions curl_easy_escape() and
curl_easy_unescape() and conforms to RFC 3986. Maximum length of string
is limited with CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').encode('тарантул')
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool>
```

Fixes tarantool#3682
@ligurio ligurio self-assigned this Jun 10, 2022
ligurio added a commit to ligurio/tarantool that referenced this issue Sep 15, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced.
First one allows to escape symbols in a string and second one to
unescape symbols in a string. Escaping and unescaping symbols
implemented using CURL functions curl_easy_escape() and
curl_easy_unescape() and conforms to RFC 3986. Maximum length of string
is limited with CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').encode('тарантул')
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool>
```

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Sep 15, 2022
Lua FFI bindings of curl_easy_escape() [1] and curl_easy_unescape() [2].

1. https://curl.se/libcurl/c/curl_easy_escape.html
2. https://curl.se/libcurl/c/curl_easy_unescape.html

@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New methods "uri:encode()" and "uri:decode()" have been introduced.
First one allows to escape symbols in a string and second one to
unescape symbols in a string. Escaping and unescaping symbols
implemented using CURL functions curl_easy_escape() and
curl_easy_unescape() and conforms to RFC 3986. Maximum length of string
is limited with CURL_MAX_INPUT_LENGTH (8 MB).

```
tarantool> require('uri').encode('тарантул')
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool>
```

Fixes tarantool#3682
@igormunkin igormunkin added the 2.11 Target is 2.11 and all newer release/master branches label Sep 16, 2022
ligurio added a commit to ligurio/tarantool that referenced this issue Sep 19, 2022
NO_CHANGELOG=internal
NO_DOC=internal
NO_TEST=TODO

Part of tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Sep 19, 2022
@TarantoolBot document
Title: Document a new methods to encode and decode URL's

New functions `uri.url_encode()` and `uri.url_decode()` have been
introduced. First one allows to escape symbols in a string and second
one to unescape symbols in a string.

```
tarantool> require('uri').encode('тарантул')
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool>
```

Fixes tarantool#3682
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 23, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape 159859.451617 ops/sec
uri.unescape 322307.411753 ops/sec
```

Performance of C implementation (after the patch):

```
uri.escape 4573385.455594 ops/sec
uri.unescape 4456735.138300 ops/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 23, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape 159859.451617 ops/sec
uri.unescape 322307.411753 ops/sec
```

Performance of C implementation (after the patch):

```
uri.escape 4573385.455594 ops/sec
uri.unescape 4456735.138300 ops/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 23, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape 159859.451617 ops/sec
uri.unescape 322307.411753 ops/sec
```

Performance of C implementation (after the patch):

```
uri.escape 4573385.455594 ops/sec
uri.unescape 4456735.138300 ops/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 23, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape 159859.451617 ops/sec
uri.unescape 322307.411753 ops/sec
```

Performance of C implementation (after the patch):

```
uri.escape 4573385.455594 ops/sec
uri.unescape 4456735.138300 ops/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Added a simple benchmark for URI escape/unescape.

Part of tarantool#3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   13.17  runs/sec
uri.unescape 23.43  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   381.42  runs/sec
uri.unescape 317.81  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Added a simple benchmark for URI escape/unescape.

Part of tarantool#3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   132.23  runs/sec
uri.unescape 208.02  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   1002.67  runs/sec
uri.unescape 1014.79  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Added a simple benchmark for URI escape/unescape.

Part of tarantool#3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   152.37  runs/sec
uri.unescape 263.44  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   4983.03  runs/sec
uri.unescape 4197.19  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Closes tarantool#3682

@TarantoolBot document
Title: Document a new functions to encode and decode parts of URI

New functions `uri.escape()` and `uri.unescape()` have been introduced.
First one allows to escape symbols to a string and second one to
unescape symbols to a string according to RFC 3986 [1].

Examples:

```
tarantool> uri.escape("тарантул")
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool> uri.unescape("%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB")
---
- тарантул
...

```

`uri.escape()` accepts a string that will be encoded and optionally a
table with encoding options: string with unreserved symbols that will
*not* be encoded and boolean option that enables/disables encoding of a
space characters as '+'. By default `uri.escape()` uses a set of
unreserved symbols defined in RFC 3986 ("2.3. Unreserved Characters")
and encoding of space characters as '+' is disabled. Table with default
encoding options is defined as `uri.RFC3986`.

`uri.unescape()` accepts a string that will be decoded and optionally a
table with decoding options: string with unreserved symbols (these
symbols are actually unused by decoding function) and boolean option
that enables/disables decoding of '+' as a space character. Table with
default decoding options is defined as `uri.RFC3986`.

See detailed description in RFC "http: add percent-encoding/decoding of
query string in request" [2].

NO_WRAP
1. Uniform Resource Identifier (URI): Generic Syntax
   https://datatracker.ietf.org/doc/html/rfc3986
2. https://www.notion.so/tarantool/http-add-percent-encoding-decoding-of-query-string-in-request-76a2425a4c4744a1a72643527a4fe7f7
NO_WRAP
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Added a simple benchmark for URI escape/unescape.

Part of tarantool#3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 24, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   152.37  runs/sec
uri.unescape 263.44  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   4983.03  runs/sec
uri.unescape 4197.19  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 25, 2022
Closes tarantool#3682

@TarantoolBot document
Title: Document a new functions to encode and decode parts of URI

New functions `uri.escape()` and `uri.unescape()` have been introduced.
First one allows to escape symbols to a string and second one to
unescape symbols to a string according to RFC 3986 [1].

Examples:

```
tarantool> uri.escape("тарантул")
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool> uri.unescape("%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB")
---
- тарантул
...

```

`uri.escape()` accepts a string that will be encoded and optionally a
table with encoding options: string with unreserved symbols that will
*not* be encoded and boolean option that enables/disables encoding of a
space characters as '+'. By default `uri.escape()` uses a set of
unreserved symbols defined in RFC 3986 ("2.3. Unreserved Characters")
and encoding of space characters as '+' is disabled. Table with default
encoding options is defined as `uri.RFC3986`.

`uri.unescape()` accepts a string that will be decoded and optionally a
table with decoding options: string with unreserved symbols (these
symbols are actually unused by decoding function) and boolean option
that enables/disables decoding of '+' as a space character. Table with
default decoding options is defined as `uri.RFC3986`.

See detailed description in RFC "http: add percent-encoding/decoding of
query string in request" [2].

NO_WRAP
1. Uniform Resource Identifier (URI): Generic Syntax
   https://datatracker.ietf.org/doc/html/rfc3986
2. https://www.notion.so/tarantool/http-add-percent-encoding-decoding-of-query-string-in-request-76a2425a4c4744a1a72643527a4fe7f7
NO_WRAP
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 25, 2022
Added a simple benchmark for URI escape/unescape.

Part of tarantool#3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 25, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   152.37  runs/sec
uri.unescape 263.44  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   4983.03  runs/sec
uri.unescape 4197.19  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 25, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   152.37  runs/sec
uri.unescape 263.44  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   4983.03  runs/sec
uri.unescape 4197.19  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit

Co-authored-by: Alexander Turenko <alexander.turenko@tarantool.org>
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 25, 2022
Closes tarantool#3682

@TarantoolBot document
Title: Document a new functions to encode and decode parts of URI

New functions `uri.escape()` and `uri.unescape()` have been introduced.
First one allows to escape symbols to a string and second one to
unescape symbols to a string according to RFC 3986 [1].

Examples:

```
tarantool> uri.escape("тарантул")
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool> uri.unescape("%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB")
---
- тарантул
...

```

`uri.escape()` accepts a string that will be encoded and optionally a
table with encoding options: string with unreserved symbols that will
*not* be encoded and boolean option that enables/disables encoding of a
space characters as '+'. By default `uri.escape()` uses a set of
unreserved symbols defined in RFC 3986 ("2.3. Unreserved Characters")
and encoding of space characters as '+' is disabled. Table with default
encoding options is defined as `uri.RFC3986`.

`uri.unescape()` accepts a string that will be decoded and optionally a
table with decoding options: string with unreserved symbols (these
symbols are actually unused by decoding function) and boolean option
that enables/disables decoding of '+' as a space character. Table with
default decoding options is defined as `uri.RFC3986`.

See detailed description in RFC "http: add percent-encoding/decoding of
query string in request" [2].

NO_WRAP
1. Uniform Resource Identifier (URI): Generic Syntax
   https://datatracker.ietf.org/doc/html/rfc3986
2. https://www.notion.so/tarantool/http-add-percent-encoding-decoding-of-query-string-in-request-76a2425a4c4744a1a72643527a4fe7f7
NO_WRAP
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 25, 2022
Added a simple benchmark for URI escape/unescape.

Part of tarantool#3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 25, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   152.37  runs/sec
uri.unescape 263.44  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   4983.03  runs/sec
uri.unescape 4197.19  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit

Co-authored-by: Alexander Turenko <alexander.turenko@tarantool.org>
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 26, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   152.37  runs/sec
uri.unescape 263.44  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   4983.03  runs/sec
uri.unescape 4197.19  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit

Co-authored-by: Alexander Turenko <alexander.turenko@tarantool.org>
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 27, 2022
Closes tarantool#3682

@TarantoolBot document
Title: Document a new functions to encode and decode parts of URI

New functions `uri.escape()` and `uri.unescape()` have been introduced.
First one allows to escape symbols to a string and second one to
unescape symbols to a string according to RFC 3986 [1].

Examples:

```
tarantool> uri.escape("тарантул")
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool> uri.unescape("%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB")
---
- тарантул
...

```

`uri.escape()` accepts a string that will be encoded and optionally a
table with encoding options: string with unreserved symbols that will
*not* be encoded and boolean option that enables/disables encoding of a
space characters as '+'. By default `uri.escape()` uses a set of
unreserved symbols defined in RFC 3986 ("2.3. Unreserved Characters")
and encoding of space characters as '+' is disabled. Table with default
encoding options is defined as `uri.RFC3986`.

`uri.unescape()` accepts a string that will be decoded and optionally a
table with decoding options: string with unreserved symbols (these
symbols are actually unused by decoding function) and boolean option
that enables/disables decoding of '+' as a space character. Table with
default decoding options is defined as `uri.RFC3986`.

See detailed description in RFC "http: add percent-encoding/decoding of
query string in request" [2].

NO_WRAP
1. Uniform Resource Identifier (URI): Generic Syntax
   https://datatracker.ietf.org/doc/html/rfc3986
2. https://www.notion.so/tarantool/http-add-percent-encoding-decoding-of-query-string-in-request-76a2425a4c4744a1a72643527a4fe7f7
NO_WRAP
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 27, 2022
Added a simple benchmark for URI escape/unescape.

Part of tarantool#3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test
ligurio added a commit to ligurio/tarantool that referenced this issue Dec 27, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   152.37  runs/sec
uri.unescape 263.44  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   4983.03  runs/sec
uri.unescape 4197.19  runs/sec
```

Follows up tarantool#3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit

Co-authored-by: Alexander Turenko <alexander.turenko@tarantool.org>
igormunkin pushed a commit that referenced this issue Dec 27, 2022
Closes #3682

@TarantoolBot document
Title: Document a new functions to encode and decode parts of URI

New functions `uri.escape()` and `uri.unescape()` have been introduced.
First one allows to escape symbols to a string and second one to
unescape symbols to a string according to RFC 3986 [1].

Examples:

```
tarantool> uri.escape("тарантул")
---
- '%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB'
...

tarantool> uri.unescape("%D1%82%D0%B0%D1%80%D0%B0%D0%BD%D1%82%D1%83%D0%BB")
---
- тарантул
...

```

`uri.escape()` accepts a string that will be encoded and optionally a
table with encoding options: string with unreserved symbols that will
*not* be encoded and boolean option that enables/disables encoding of a
space characters as '+'. By default `uri.escape()` uses a set of
unreserved symbols defined in RFC 3986 ("2.3. Unreserved Characters")
and encoding of space characters as '+' is disabled. Table with default
encoding options is defined as `uri.RFC3986`.

`uri.unescape()` accepts a string that will be decoded and optionally a
table with decoding options: string with unreserved symbols (these
symbols are actually unused by decoding function) and boolean option
that enables/disables decoding of '+' as a space character. Table with
default decoding options is defined as `uri.RFC3986`.

See detailed description in RFC "http: add percent-encoding/decoding of
query string in request" [2].

NO_WRAP
1. Uniform Resource Identifier (URI): Generic Syntax
   https://datatracker.ietf.org/doc/html/rfc3986
2. https://www.notion.so/tarantool/http-add-percent-encoding-decoding-of-query-string-in-request-76a2425a4c4744a1a72643527a4fe7f7
NO_WRAP
igormunkin pushed a commit that referenced this issue Dec 27, 2022
Added a simple benchmark for URI escape/unescape.

Part of #3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test
igormunkin pushed a commit that referenced this issue Dec 27, 2022
Patch replaces encoding and decoding functions written in Lua with
functions implemented in C.

Performance of Lua implementation (before the patch):

```
uri.escape   152.37  runs/sec
uri.unescape 263.44  runs/sec
```

Performance of C implementation (after the patch):

```
uri.escape   4983.03  runs/sec
uri.unescape 4197.19  runs/sec
```

Follows up #3682

NO_CHANGELOG=see previous commit
NO_DOC=see previous commit

Co-authored-by: Alexander Turenko <alexander.turenko@tarantool.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.11 Target is 2.11 and all newer release/master branches app feature A new functionality lua
Projects
None yet
5 participants