Skip to content

Memory leak in DirectoryListing / PHP Iterator? #1856

Open
@Yinci

Description

@Yinci

Bug Report

Q A
Flysystem Version 3.29.1
Adapter Name flysystem-aws-s3-v3
Adapter version 3.29.0
AWS SDK 3.339.10
Laravel Framework 10.48.28
Spatie Laravel Media Library 10.15.0
PHP 8.1

Summary

In short, we have a long running Laravel project with lots of media (60k+ rows and counting). These media undergo conversions via de Media Library package from Spatie. All media is stored in S3. We use the provided clean command to clean up deprecated conversions. To do this, the underlying code retrieves the stored paths, which ends up calling the files method in the FilesystemAdapter provided by Laravel. The code is probably not unknown:

public function files($directory = null, $recursive = false)
{
    return $this->driver->listContents($directory ?? '', $recursive)
        ->filter(function (StorageAttributes $attributes) {
            return $attributes->isFile();
        })
        ->sortByPath()
        ->map(function (StorageAttributes $attributes) {
            return $attributes->path();
        })
        ->toArray();
}

listContents returns a DirectoryListing instance which is then modified and then simply returns an array of string paths. Every time toArray is called (and basically the iterator contents are converted into an array), the memory usage increases. This increase isn't insane (like 2000 bytes per iteration), however you can imagine with such amounts of data this can quickly become a large amount of memory.

As you can expect, eventually the command runs out of memory.

I've tried to see what I can do to fix it, however I am unable to identify the issue of why the toArray call will simply not let go of memory, so I am not able to say if this is truly related to Flysystem or perhaps a native PHP issue. For now I've had to implement a work-around, which is to chunk the process, which basically means that the memory is freed and then a new process is started to continue where it had left off. It is however not an ideal solution. Any help would be appreciated.

How to reproduce

(Laravel based snippet)

$before = memory_get_usage();

Storage::disk("media")->files("1/conversions");

$mid = memory_get_usage();

for ($i = 0; $i < 100; $i++) {
  Storage::disk("media")->files("1/conversions");
}

dd($before, $mid, memory_get_usage());

Outputs:
17822728
19260872
19915000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions