Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: modify the signature of TRIM() #3879

Closed
romanhabibov opened this issue Dec 12, 2018 · 4 comments
Closed

sql: modify the signature of TRIM() #3879

romanhabibov opened this issue Dec 12, 2018 · 4 comments
Assignees
Labels
bug Something isn't working sql
Milestone

Comments

@romanhabibov
Copy link
Contributor

Tarantool version: 2.1

OS version: any

In accordance with the standard, the trim function has a signature TRIM ([ specification FROM ] ). As you can see the signature does not have commas inside and may contain FROM and specification as keywords. In the SQLite or earlier versions of the SQL, there were TRIM, LTRIM, RTRIM with (, ) signature. Now the standard does not support LTRIM, RTRIM. The trimming mode is determined in TRIM using the specification parameter.
Standard: ISO/IEC CD 9075-2, 2013-02-05.

@kyukhin kyukhin added this to the 2.2.0 milestone Dec 14, 2018
@kyukhin kyukhin added the bug Something isn't working label Dec 14, 2018
@pgulutzan
Copy link
Contributor

Konstantin Osipov asked me about this on 2019-01-11 in dev thread
Re: Basic acceptance tests for SQL features 00 Part 2.
I decided to put my reply as a comment on this issue.
Since Roman Khabibov can read the standard document,
he obviously already understands this -- but perhaps
it is useful for checking that we understand the same details.

I have Ubuntu 18.04. I have Tarantool 2.1, pulled from source today.
I try to execute these statements:
SELECT TRIM(FROM 'a ');
SELECT TRIM(' ' FROM 'a ');
SELECT TRIM(TRAILING ' ' FROM 'a ');
They all result in syntax errors.
Tarantool supports TRIM(). However,
Standard syntax: TRIM ([LEADING|TRAILING|BOTH] [trim character] FROM string1)
Tarantool syntax: TRIM (string1 [, trim-string])

It is possible to support "TRIM (string1 [, trim-string])"
if SQLite compatibility is important. But that's not common.

MySQL + DB2 + Oracle are reasonably close to the standard:
https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_trim
https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.sqls.doc/ids_sqs_1556.htm
https://docs.oracle.com/database/121/SQLRF/functions235.htm#SQLRF06149

SQL Server is vaguely similar but does not have LEADING|TRAILING|BOTH:
https://docs.microsoft.com/en-us/sql/t-sql/functions/trim-transact-sql?view=sql-server-2017

If [LEADING|TRAILING|BOTH] is omitted, assume BOTH.

If trim_character is omitted:
if arguments are char|varchar, assume ' ' (0x20)
if arguments are blob etc., assume 0x00.
This is not happening now, Tarantool always assumes 0x20.

(Currently Tarantool cannot handle TRIM correctly with BLOBs
because it thinks strings end with 0X00, but that is a different issue.)
A related issue,
Issue#3543 sql: LTRIM(X'004100',X'00') does not trim X'00'
is fixed.

If [LEADING|TRAILING|BOTH] is omitted,
and trim_character is omitted:
the standard document doesn't allow for that.
However, SELECT TRIM(FROM 'a '); looks like an easy extension.

In standard SQL the arguments must be strings (char/varchar/blob),
but Tarantool allows TRIM(3.5). This seems harmless but useless.

The resulting data type should be the same as the data type of string1,
except of course that it may be shorter.

What I do not understand about this proposal is:
? is it saying that LTRIM and RTRIM should be removed?
? is it saying th at SQLite-style syntax dhould be removed?

@romanhabibov
Copy link
Contributor Author

romanhabibov commented Mar 28, 2019

What I do not understand about this proposal is:
? is it saying that LTRIM and RTRIM should be removed?
? is it saying th at SQLite-style syntax dhould be removed?

@pgulutzan, hello! In my opinion, both. I think to do it as follows:
If character specialized:
LTRIM(string, character) -> TRIM(LEADING character FROM string)
RTRIM(string, character) -> TRIM(TRAILING character FROM string)
TRIM(string, character) -> TRIM(BOTH character FROM string).
And similar for binary string.

If character is not specialized:
LTRIM(string) -> TRIM(LEADING FROM string)
RTRIM(string) -> TRIM(TRAILING FROM string)
TRIM(string) -> TRIM([FROM] string)
Spaces will be cut if it is a string and X'00' will be cut if it is a binary string.

In standard SQL the arguments must be strings (char/varchar/blob),
but Tarantool allows TRIM(3.5). This seems harmless but useless.

I think, we should throw error in this case.

The resulting data type should be the same as the data type of string1,
except of course that it may be shorter.

This also should be supported.

@pgulutzan
Copy link
Contributor

I would like to agree with everything that romanhabibov said above. However, we will have inconsistency if we disallow TRIM(non-string) and continue to allow QUOTE(non-string), SUBSTR(non-string,1,1), HEX(non-string), LOWER(non-string), etc. If we throw an error for one function where non-string makes no sense, then we should throw an error for all functions where non-string makes no sense. So perhaps that is not part of this issue.

romanhabibov added a commit that referenced this issue Apr 4, 2019
According to the ANSI standart, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming charcters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 4, 2019
According to the ANSI standart, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming charcters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 4, 2019
According to the ANSI standart, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming charcters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 11, 2019
According to the ANSI standart, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming charcters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 11, 2019
According to the ANSI standart, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming charcters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 11, 2019
According to the ANSI standart, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming charcters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 15, 2019
According to the ANSI standart, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming charcters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 15, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 15, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 15, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 15, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 16, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 16, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 18, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 18, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 18, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 18, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 19, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 20, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 22, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879
romanhabibov added a commit that referenced this issue Apr 23, 2019
According to the ANSI standard, ltrim, rtrim and trim should
be merged into one unified TRIM() function. The specialization of
trimming (left, right or both and trimming characters) determined
in arguments of this function.

Closes #3879

@TarantoolBot document
Title: TRIM() function

Modify signature of SQL function TRIM(). This function removes
characters included in <trim character> (binary) string from
<trim source> (binary) string until encounter a character that doesn't
belong to <trim character>. Removal occurs on the side, specified by
<trim specification>. Now, syntax is following:
TRIM([ [ <trim specification> ] [ <trim character> ] FROM ] <trim source>).

<trim specification> can be one of the following keywords: LEADING,
TRAILING and BOTH.
<trim character> is the set of trimming characters.
<trim source> is the string, that will be trimmed.
If FROM is specified, then:
1) Either <trim specification> or <trim character> or both shall be
specified.
2) If <trim specification> is not specified, then BOTH is implicit.
3) If <trim character> is not specified, then ' ' is implicit.
@kyukhin kyukhin added the tmp label Aug 2, 2019
@Totktonada
Copy link
Member

It was fixed in 2.2.0-198-gf95a34da9 (and all 2.3 versions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sql
Projects
None yet
Development

No branches or pull requests

5 participants