Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: init script UTF8 conversion breaks encoding used in database when not in UTF8 #8776

Open
dcdh opened this issue Jun 13, 2024 · 0 comments
Labels

Comments

@dcdh
Copy link

dcdh commented Jun 13, 2024

Module

Core

Testcontainers version

1.19.8

Using the latest Testcontainers version?

Yes

Host OS

Windows

Host Arch

x86

Docker version

Client:
 Cloud integration: v1.0.35+desktop.13
 Version:           26.0.0
 API version:       1.45
 Go version:        go1.21.8
 Git commit:        2ae903e
 Built:             Wed Mar 20 15:18:56 2024
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.29.0 (145265)
 Engine:
  Version:          26.0.0
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.8
  Git commit:       8b79278
  Built:            Wed Mar 20 15:18:01 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

What happened?

I am using the init script feature to create my tables and insert data after my container has been started and before my application is started.
It is a very convenient way to init my Oracle database for testing purpose.

My database is very old and use the CP1252 encoding to store data.

To respect this encoding requirement my init script is encoded using the windows-1252 .

And in fact it fails to start because my accents encoding is not respected and take more spaces than allowed by the column definition.
In CP1252 the accent character is stored in one byte. And when I run the script I suspect it to occupy more than one byte.

I guess it occurs in this part:

    public static void runInitScript(DatabaseDelegate databaseDelegate, String initScriptPath) {
        try {
            URL resource = Thread.currentThread().getContextClassLoader().getResource(initScriptPath);
            if (resource == null) {
                resource = ScriptUtils.class.getClassLoader().getResource(initScriptPath);
                if (resource == null) {
                    LOGGER.warn("Could not load classpath init script: {}", initScriptPath);
                    throw new ScriptLoadException(
                        "Could not load classpath init script: " + initScriptPath + ". Resource not found."
                    );
                }
            }
            String scripts = IOUtils.toString(resource, StandardCharsets.UTF_8);
            executeDatabaseScript(databaseDelegate, initScriptPath, scripts);
        } catch (IOException e) {
            LOGGER.warn("Could not load classpath init script: {}", initScriptPath);
            throw new ScriptLoadException("Could not load classpath init script: " + initScriptPath, e);
        } catch (ScriptException e) {
            LOGGER.error("Error while executing init script: {}", initScriptPath, e);
            throw new UncategorizedScriptException("Error while executing init script: " + initScriptPath, e);
        }
    }

and specifically here

String scripts = IOUtils.toString(resource, StandardCharsets.UTF_8);

The scripts encoding has been converted using the UTF_8 charset. And now my accent will be stored using 2 bytes when running my insert sql command.

Is it possible to change the code by avoiding enforcing the Charset ?

Regards,

Damien

Relevant log output

No response

Additional Information

No response

@dcdh dcdh added the type/bug label Jun 13, 2024
@dcdh dcdh changed the title [Bug]: [Bug]: init script UTF8 conversion breaks encoding used in database when not in UTF8 Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant