Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid UTF-8 String Causes BSONError When Submitting Data to MongoDB #1745

Open
calycekr opened this issue Mar 5, 2025 · 3 comments
Open
Labels
bug Something isn't working

Comments

@calycekr
Copy link
Contributor

calycekr commented Mar 5, 2025

I am encountering an issue where submitting data to MongoDB results in a BSONError: Invalid UTF-8 string in BSON document. This error occurs when the chat UI input contains an invalid UTF-8 string.

Steps to Reproduce

  1. Open the chat UI.
  2. Enter a string that contains invalid UTF-8 characters.
  3. Submit the data to the backend.
  4. The backend attempts to save the data to MongoDB, resulting in the BSONError.

Error Stack Trace

BSONError: Invalid UTF-8 string in BSON document
at getValidatedString (/app/node_modules/bson/lib/bson.cjs:3052:27)
at deserializeObject (/app/node_modules/bson/lib/bson.cjs:2677:21)
at deserializeObject (/app/node_modules/bson/lib/bson.cjs:2736:25)
at deserializeObject (/app/node_modules/bson/lib/bson.cjs:2754:21)
at deserializeObject (/app/node_modules/bson/lib/bson.cjs:2736:25)
at deserializeObject (/app/node_modules/bson/lib/bson.cjs:2754:21)
at deserializeObject (/app/node_modules/bson/lib/bson.cjs:2736:25)
at internalDeserialize (/app/node_modules/bson/lib/bson.cjs:2587:12)
at Object.deserialize (/app/node_modules/bson/lib/bson.cjs:4063:12)
at BinMsg.parse (/app/node_modules/mongodb/lib/cmap/commands.js:461:54)

Expected Behavior

The system should either:

  1. Validate the input and reject invalid UTF-8 strings with a user-friendly error message.
  2. Clean or normalize the input before saving it to the database.

Actual Behavior

  1. The system throws a BSONError and fails to save the data to MongoDB.

Possible Solutions

  • Input Validation: Implement input validation to ensure only valid UTF-8 strings are accepted.
  • User Feedback: Provide clear error messages to users when invalid input is detected.
  • Data Cleaning: Clean or normalize the input to remove or replace invalid characters before saving to the database.
@calycekr calycekr added the bug Something isn't working label Mar 5, 2025
@nsarrazin
Copy link
Collaborator

Hi! Just for reproduction could you share the input that did this for you ?

@calycekr
Copy link
Contributor Author

calycekr commented Mar 6, 2025

@nsarrazin Invalid UTF-8 strings seem to be automatically converted when stored somewhere or posted on a web page. It's difficult to actually post a string with issues to GitHub.
The format is as follows. It probably got converted when posted here.

If this can reproduce the issue, that would be great.

1 ÆÄÀÏÀ» ó¸®ÇßÀ¸¸ç 0 ÆÄÀÏÀº ó¸®ÇÏÁö ¸øÇß½À´Ï´Ù.

Removing inherited ACLs on ( C:\Program Files\PostgreSQL\15\data)...
ó¸®µÈ ÆÄÀÏ: C:\Program Files\PostgreSQL\15\data
1 ÆÄÀÏÀ» ó¸®ÇßÀ¸¸ç 0 ÆÄÀÏÀº ó¸®ÇÏÁö ¸øÇß½À´Ï´Ù.

Parent of Data Directory: C:\Program Files\PostgreSQL\15
Logged in user: TEST\db

Called AclCheck(C:\Program Files\PostgreSQL\15\data)
Executing icacls to ensure the TEST\db account can read the path C:\Program Files\PostgreSQL\15\data
Executing batch file 'zok2ge4a.bat'...
ó¸®µÈ ÆÄÀÏ: C:\Program Files\PostgreSQL\15\data
1 ÆÄÀÏÀ» ó¸®ÇßÀ¸¸ç 0 ÆÄÀÏÀº ó¸®ÇÏÁö ¸øÇß½À´Ï´Ù.

Ensuring we can write to the data directory (using icacls) for TEST\db:
Executing batch file 'zok2ge4a.bat'...
ó¸®µÈ ÆÄÀÏ: C:\Program Files\PostgreSQL\15\data
1 ÆÄÀÏÀ» ó¸®ÇßÀ¸¸ç 0 ÆÄÀÏÀº ó¸®ÇÏÁö ¸øÇß½À´Ï´Ù.

Granting full access to NT AUTHORITY\NetworkService on C:\Program Files\PostgreSQL\15\data
Executing batch file 'zok2ge4a.bat'...
ó¸®µÈ ÆÄÀÏ: C:\Program Files\PostgreSQL\15\data
1 ÆÄÀÏÀ» ó¸®ÇßÀ¸¸ç 0 ÆÄÀÏÀº ó¸®ÇÏÁö ¸øÇß½À´Ï´Ù.

Granting full access to CREATOR OWNER on C:\Program Files\PostgreSQL\15\data
Executing batch file 'zok2ge4a.bat'...
ó¸®µÈ ÆÄÀÏ: C:\Program Files\PostgreSQL\15\data
1 ÆÄÀÏÀ» ó¸®ÇßÀ¸¸ç 0 ÆÄÀÏÀº ó¸®ÇÏÁö ¸øÇß½À´Ï´Ù.

Granting full access to SYSTEM on C:\Program Files\PostgreSQL\15\data
Executing batch file 'zok2ge4a.bat'...
ó¸®µÈ ÆÄÀÏ: C:\Program Files\PostgreSQL\15\data
1 ÆÄÀÏÀ» ó¸®ÇßÀ¸¸ç 0 ÆÄÀÏÀº ó¸®ÇÏÁö ¸øÇß½À´Ï´Ù.

Granting full access to Administrators on C:\Program Files\PostgreSQL\15\data
Executing batch file 'zok2ge4a.bat'...
ó¸®µÈ ÆÄÀÏ: C:\Program Files\PostgreSQL\15\data
1 ÆÄÀÏÀ» ó¸®ÇßÀ¸¸ç 0 ÆÄÀÏÀº ó¸®ÇÏÁö ¸øÇß½À´Ï´Ù.

Initializing PostgreSQL database cluster...
Executing batch file 'zok2ge4a.bat'...

Called Die(Failed to initialise the database cluster with initdb)...

@nsarrazin
Copy link
Collaborator

Yeah this doesn't seem to give an error on my end, I'll see if I can reproduce somehow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants