Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Can't copy large files to container #1180

Open
Raniz85 opened this issue May 17, 2024 · 4 comments
Open

[Bug]: Can't copy large files to container #1180

Raniz85 opened this issue May 17, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@Raniz85
Copy link

Raniz85 commented May 17, 2024

Testcontainers version

3.8.0

Using the latest Testcontainers version?

Yes

Host OS

Windows

Host arch

x86

.NET version

8.0.3

Docker version

Client:       Podman Engine
Version:      4.9.2
API Version:  4.9.2
Go Version:   go1.21.6
Git Commit:   f9a48ebcfa9a39144be0f86f4ba842752835f945
Built:        Sat Feb  3 00:29:04 2024
OS/Arch:      windows/amd64

Server:       Podman Engine
Version:      4.7.0
API Version:  4.7.0
Go Version:   go1.20.8
Built:        Wed Sep 27 20:24:38 2023
OS/Arch:      linux/amd64

Docker info

host:
  arch: amd64
  buildahVersion: 1.32.0
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 99.76
    systemPercent: 0.07
    userPercent: 0.17
  cpus: 16
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: container
    version: "38"
  eventLogger: journald
  freeLocks: 2036
  hostname: mlse2068
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 5.15.146.1-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: journald
  memFree: 30362726400
  memTotal: 33512914944
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.8.0-1.fc38.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.8.0
    package: netavark-1.8.0-2.fc38.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.8.0
  ociRuntime:
    name: crun
    package: crun-1.9.2-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.9.2
      commit: 35274d346d2e9ffeacb22cc11590b0266a23d634
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231004.gf851084-1.fc38.x86_64
    version: |
      pasta 0^20231004.gf851084-1.fc38.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.1-1.fc38.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8589934592
  swapTotal: 8589934592
  uptime: 25h 7m 1.00s (Approximately 1.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/user/.local/share/containers/storage
  graphRootAllocated: 1081101176832
  graphRootUsed: 31850049536
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 54
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 4.7.0
  Built: 1695839078
  BuiltTime: Wed Sep 27 20:24:38 2023
  GitCommit: ""
  GoVersion: go1.20.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.0

What happened?

Trying to use a large file with .WithResourceMapping results in an IOException stating that the stream is too long. The file I'm copying is a few gigabytes in size.

Relevant log output

System.IO.IOException: Stream was too long.
   at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at System.IO.MemoryStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
--- End of stack trace from previous location ---
   at ICSharpCode.SharpZipLib.Tar.TarBuffer.WriteRecordAsync(CancellationToken ct, Boolean isAsync)
   at ICSharpCode.SharpZipLib.Tar.TarBuffer.WriteBlockAsync(Byte[] buffer, Int32 offset, CancellationToken ct, Boolean isAsync)
   at ICSharpCode.SharpZipLib.Tar.TarOutputStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken, Boolean isAsync)
   at System.IO.Stream.<CopyToAsync>g__Core|27_0(Stream source, Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
   at System.IO.Strategies.BufferedFileStreamStrategy.CopyToAsyncCore(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
   at DotNet.Testcontainers.Containers.TarOutputMemoryStream.AddAsync(DirectoryInfo directory, FileInfo file, UnixFileModes fileMode, CancellationToken ct) in /_/src/Testcontainers/Containers/TarOutputMemoryStream.cs:line 145

Additional information

No response

@Raniz85 Raniz85 added the bug Something isn't working label May 17, 2024
@HofmeisterAn
Copy link
Collaborator

Thanks for creating the issue. It looks like that it fails copying the file stream content to the underlying SharpZipLib stream.

await stream.CopyToAsync(this, 81920, ct)
.ConfigureAwait(false);

How large are a few gigabytes? It should not be difficult to reproduce, I guess 😬. I can try to reproduce it later the day.

@HofmeisterAn
Copy link
Collaborator

I did not remember the actual implementation, but after spending a few minutes looking at it, the exception and error you are seeing make sense. The maximum size of a MemoryStream object is approximately 2GB (2,147,483,591 bytes).

public sealed class GitHub : IResourceMapping
{
    public MountType Type => MountType.Tmpfs;

    public AccessMode AccessMode => AccessMode.ReadOnly;

    public string Source => "foo";

    public string Target => "foo";

    UnixFileModes IResourceMapping.FileMode => Unix.FileMode755;

    [Fact]
    public async Task Issue1180()
    {
        using var memoryStream = new MemoryStream();
        // using var fileStream = new FileStream(Target, FileMode.CreateNew, FileAccess.Write, FileShare.Read);
        using var tarOutputMemoryStream = new TarOutputMemoryStream(memoryStream, NullLogger.Instance);
        await tarOutputMemoryStream.AddAsync(this);
    }

    Task IFutureResource.CreateAsync(CancellationToken ct)
    {
        return Task.CompletedTask;
    }

    Task IFutureResource.DeleteAsync(CancellationToken ct)
    {
        return Task.CompletedTask;
    }

    Task<byte[]> IResourceMapping.GetAllBytesAsync(CancellationToken ct)
    {
        // https://learn.microsoft.com/en-us/dotnet/api/system.array.
        const int maxArrayDimension = 2147483591;
        return Task.FromResult(new byte[maxArrayDimension]);
    }
}

This example demonstrates it very well. If you change the MemoryStream to a FileStream (you will need to adjust the TarOutputMemoryStream ctor), the issue no longer occurs. Storing that amount of data in memory does not make a lot of sense. Streaming it would be more efficient, but at this point, I have no idea how to forward the data internally without taking a closer look at it. Right now, I assume supporting files larger than 2GB (total tarball size) will require some more work than I initially thought.

@Raniz85
Copy link
Author

Raniz85 commented May 17, 2024

I've only glanced at the implementation when debugging the issue, but is it possible to determine the size of the files and then choose either file or memory based on that?

@HofmeisterAn
Copy link
Collaborator

is it possible to determine the size of the files and then choose either file or memory based on that?

I do not think that is an appropriate fix. Writing it to a file and then reading it again won't be very performant. I think it is better to properly support and forward a stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants