Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Piping streams which come from third party servers #74

Open
fkocherovsky opened this issue Sep 11, 2022 · 3 comments
Open

Piping streams which come from third party servers #74

fkocherovsky opened this issue Sep 11, 2022 · 3 comments

Comments

@fkocherovsky
Copy link

fkocherovsky commented Sep 11, 2022

I use addReadStream API to create a ZIP file based on multiple streams. Works as expected when streams are locally created streams. But there is an issue when streams come from third party server and yazl even doesn't detect the issue. Current implementation takes 'first not done entry', pipes it, and on "end" event takes the next not done entry and pipes it ...
When there are many entries to pipe, might pass significant time interval when some next entry/stream is being started handled. Significant means in this case that the third party server decides that some stream is to long idle, and as a result the stream is just closed by the server. And as a result Zip isn't created. that such stream is already aborted. It's what happens when we try to create Zip by yazl based on streams from our S3 server. The solution for such situation is new API which doesn't takes streams like addReadStream does, but takes a function which creates stream just before to start piping this specific stream/file. Something like the following:

ZipFile.prototype.addStreamCreator = function(creator, metadataPath, options) {
   var self = this;
   metadataPath = validateMetadataPath(metadataPath, false);
   if (options == null) options = {};
   var entry = new Entry(metadataPath, false, options);
   self.entries.push(entry);
   entry.setFileDataPumpFunction(async function() {
      creator(metadataPath).then((stream) => {
         entry.state = Entry.FILE_DATA_IN_PROGRESS;
         console.log(`Starting to pump ${metadataPath}`);
         pumpFileDataReadStream(self, entry, stream);
         //pumpEntries(self);
      });
   });
 };

BTW, this is already working and tested function.
Thanks

fkocherovsky added a commit to fkocherovsky/yazl that referenced this issue Sep 11, 2022
Like addReadStream, but instead of getting stream as a parameter, the API gets a function which returns Promise. The Promise returns stream. The need of the API is described in: thejoshwolfe#74
@overlookmotel
Copy link

overlookmotel commented Feb 5, 2023

I can see 2 ways to solve this with yazl's existing interface by implementing some form of "laziness":

1. Lazily add entries

Only call zipfile.addReadStream for each entry once the previous entry is consumed (listen for close event on the previous read stream to know when this has happened).

2. Stream wrapper

Create your own ReadStream wrapper. The wrapper would lazily create the real 3rd party read stream only when the wrapper is first read from, and then pipe the data through. You'd pass the wrapper to yazl.

Hope this helps.

@fkocherovsky
Copy link
Author

We already use yazl with our own code customization in our Prod servers. Actually it's new exported function. The modifications we did are described in the following pull request:
#75
Do you see this code can be officially added to yazl?

@overlookmotel
Copy link

As far as I can see, this package is no longer maintained. Last commit was Dec 2018. Which is a shame, because it's a brilliant piece of work! But hey... obviously we all understand the pressures on open source maintainers.

But I can't imagine there's much chance your PR will ever be merged, hence why I suggested some other ways to solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants