Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON data with \u0000 errors in Postgres with "unsupported Unicode escape sequence" #118

Open
LalitLab opened this issue Nov 17, 2021 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@LalitLab
Copy link

LalitLab commented Nov 17, 2021

Describe the bug

Running this query

select jsonb_pretty(attributes) from ldap_user where object_sid = 'S-1-5-21-4049728555-799940961-2503284476-7632';

Seems to return some JSON data with \u0000, leading to the error:

select jsonb_pretty(attributes) from ldap_user where object_sid = 'S-1-5-21-4049728555-799940961-2503284476-7632';
Error: unsupported Unicode escape sequence
Error: syntax error at or near "Error"

Steampipe version (steampipe -v)
Example: v0.9.1

logs

2021-11-17 16:14:56.341 GMT [2646] ERROR:  unsupported Unicode escape sequence
2021-11-17 16:14:56.341 GMT [2646] DETAIL:  \u0000 cannot be converted to text.
2021-11-17 16:14:56.341 GMT [2646] CONTEXT:  JSON data, line 1: ...on","organizationalPerson","user"],"objectGUID":[...
2021-11-17 16:14:56.341 GMT [2646] STATEMENT:  select cn, object_sid from  ldap_user where object_sid = 'S-1-5-21-4049728555-799940961-2503284476-7632'

Can refer here Data which causing this error

Expected behavior

I'd expect the JSON column handler to either strip the null field \u0000 out to fix the parsing or to replace the JSON with error content.

Additional context

N/A

@cbruno10 cbruno10 added the bug Something isn't working label Nov 17, 2021
@LalitLab
Copy link
Author

LalitLab commented Nov 18, 2021

branch to reproduce issue https://github.com/turbot/steampipe-plugin-chaos/compare/json_issue?diff=unified

> select * from chaos_json_breaking
Error: unsupported Unicode escape sequence

@cbruno10
Copy link

@binaek Any updates on this? I believe we're hitting up against it still in the LDAP plugin, e.g., turbot/steampipe-plugin-ldap#3

@kaidaguerre
Copy link
Contributor

Hmm we'll need to dig into this - there was a fix which covered at least some cases but not all it seems

@cbruno10
Copy link

@kaidaguerre Any update on this? Thanks!

@cbruno10
Copy link

Accidentally closed, re-opening

@cbruno10 cbruno10 reopened this Apr 13, 2022
@kaidaguerre
Copy link
Contributor

kaidaguerre commented May 3, 2022

There is code in the FDW which attempts to strip out non-escaped unicode nulls, but to leave in place escaped unicode nulls

The original code was:


func jsonValueString(val interface{}, valueString string) (string, error) {
	bytes, err := json.Marshal(val)
	if err != nil {
		return "", err
	}
	// remove invalid unicode characters
	valueString = strings.Replace(string(bytes), `\u0000`, "", -1)
	return valueString, nil
}

However in response to this issue this code was changed to try to leave nulls which were escaped, as follows:

func jsonValueString(val interface{}) (string, error) {
	jsonBytes, err := json.Marshal(val)
	if err != nil {
		return "", err
	}
	valueString := string(jsonBytes)

	// remove unicode null char "\u0000", UNLESS escaped, i.e."\\u0000"
	if strings.Contains(valueString, `\u0000`) {
		re := regexp.MustCompile(`(?:(\\\\u0000)|([^\\]?)\\u0000)`)
		valueString = re.ReplaceAllString(valueString, "$1$2")
	}

	return valueString, nil
}

This error is caused by this JSON field:

 "objectGUID":[
    "��g��\\\u0000K����\u0001m:�"
  ],

Current version of code (trying to handle escaped null) modifies this to:

 "objectGUID":[
    "��g��\\\u0000K����\u0001m:�"
  ],

which fails.

The old code (replacing all unicode nulls) keeps this as:

"objectGUID":[
    "��g��\\K����\u0001m:�"
  ],

which works.

So it looks like the regex needs updating to handle this case

@cbruno10
Copy link

@kaidaguerre - A user, christianherweg0807, is reporting he's still seeing issues - turbot/steampipe-plugin-ldap#3 (comment). The LDAP plugin's latest version is built off of Steampipe SDK v3.1.0.

Is there something else we should look into from the plugin side, or is more information needed to debug this issue? Thanks!

@cbruno10 cbruno10 reopened this Jun 21, 2022
@christianherweg0807
Copy link

Hi, sorry for the delay...

$ steampipe -v
steampipe version 0.14.6

Here an example with latest ldap plugin running against an Microsoft AD LDAP:

> select mail from ldap_user where department like 'MY-DE-PA-RT-MENT';
**Error: unsupported Unicode escape sequence (SQLSTATE 22P05)**
> select mail from ldap_user where department = 'MY-DE-PA-RT-MENT';
+-----------------------------+
| mail                        |
+-----------------------------+
| someone.somewhere@mail.de    |

@robdew
Copy link

robdew commented Dec 9, 2022

Still seeing this. LDAP server is AD.

> select dn from ldap_user;

Error: unsupported Unicode escape sequence (SQLSTATE 22P05)

(about a dozen results are returned, omitted them)

CentOS 7.9

$ steampipe plugin list
+--------------------------------------------------+---------+-------------+
| Installed Plugin                                 | Version | Connections |
+--------------------------------------------------+---------+-------------+
| hub.steampipe.io/plugins/turbot/ldap@latest      | 0.2.0   | ldap        |
| hub.steampipe.io/plugins/turbot/steampipe@latest | 0.6.0   | steampipe   |
+--------------------------------------------------+---------+-------------+
$ steampipe -v
steampipe version 0.17.4

There is a link to an old slack thread in the #3 but it appears to be outside the retention window for messages.

@pdecat
Copy link
Contributor

pdecat commented Dec 19, 2022

I faced the same issue when using the LDAP plugin, and came up with the following changes to the FDW component:

diff --git a/helpers.go b/helpers.go
index 4287e5f..cfb8d32 100644
--- a/helpers.go
+++ b/helpers.go
@@ -13,7 +13,6 @@ import (
 	"fmt"
 	"log"
 	"os"
-	"regexp"
 	"strings"
 	"time"
 	"unsafe"
@@ -194,10 +193,9 @@ func jsonValueString(val interface{}) (string, error) {
 	valueString := string(jsonBytes)
 
 	// remove unicode null char "\u0000", UNLESS escaped, i.e."\\u0000"
-	if strings.Contains(valueString, `\u0000`) {
-		log.Printf("[TRACE] null unicode character detected in JSON value - removing if not escaped")
-		re := regexp.MustCompile(`((?:^|[^\\])(?:\\\\)*)(?:\\u0000)+`)
-		valueString = re.ReplaceAllString(valueString, "$1")
+	if strings.Contains(valueString, "\u0000") {
+		log.Printf("[TRACE] non escaped null unicode character detected in JSON value - removing")
+		valueString = strings.ReplaceAll(valueString, "\u0000", "$1")
 	}
 
 	return valueString, nil

Edit: we probably want to keep the regex replace, I only gave attention to the comment that said removing, and did not realize it was intended to escape the nulls. An alternative may be to base64 encode the whole string if it contains nulls as does ldapsearch by default.

There's no need to check whether the null unicode character is escaped or not as the following:

package main

import (
	"fmt"
	"strings"
)

func main() {
	fmt.Println(strings.Contains("Hello \u0000", "\u0000"))
	fmt.Println(strings.Contains("Hello \u0000", `\u0000`))
	fmt.Println(strings.Contains("Hello \\u0000", "\u0000"))
	fmt.Println(strings.Contains("Hello \\u0000", `\u0000`))
}

returns:

true
false
false
true

https://go.dev/play/p/8RjKYbSUNOC

And I also applied these changes to the LDAP plugin to drop values for attributes like jpegPhoto, thumbnailPhoto, etc. that have low interest from steampipe point of view IMO and frequently contain null unicode character:

diff --git a/ldap/utils.go b/ldap/utils.go
index 2689f4a..18c3d3b 100644
--- a/ldap/utils.go
+++ b/ldap/utils.go
@@ -5,6 +5,7 @@ import (
 	"crypto/tls"
 	"errors"
 	"fmt"
+	"log"
 	"strings"
 	"time"
 
@@ -274,7 +275,19 @@ func convertToTimestamp(ctx context.Context, str string) *time.Time {
 func transformAttributes(ctx context.Context, attributes []*ldap.EntryAttribute) map[string][]string {
 	var data = make(map[string][]string)
 	for _, attribute := range attributes {
-		data[attribute.Name] = attribute.Values
+		found_null := false
+		for _, value := range attribute.Values {
+			if strings.Contains(value, "\u0000") {
+				found_null = true
+				break
+			}
+		}
+		if found_null {
+			log.Printf("[TRACE] dropping %q attribute value as it contains unicode nulls", attribute.Name)
+			data[attribute.Name] = []string{"dropped"}
+		} else {
+			data[attribute.Name] = attribute.Values
+		}
 	}
 	return data
 }

@kaidaguerre would you accept PRs proposing such changes?

@zhujian0805
Copy link

any updates on this, I met the same issue?

@kaidaguerre
Copy link
Contributor

Sorry for the radio silence on this one

@pdecat @kaidaguerre would you accept PRs proposing such changes?
Yes, certainly open to pull requests!

@matteoredaelli
Copy link

hello, any news?

I still have the error (I have just udated steampipe and ldap plugin to teh latest releases)

select a.mail 
from ldap_user a 
where dn in (
  select 
    jsonb_array_elements_text(attributes -> 'manager')
  from adgroup.ldap_user where sam_account_name='matteo');

Error: unsupported Unicode escape sequence (SQLSTATE 22P05)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants