Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process input in streaming mode #5

Open
krackers opened this issue Aug 22, 2023 · 4 comments
Open

Process input in streaming mode #5

krackers opened this issue Aug 22, 2023 · 4 comments

Comments

@krackers
Copy link

krackers commented Aug 22, 2023

Currently we block to read the entire input before we begin processing it. However, this means that we cannot use it in a streaming fashion yes | pyp "p". So long as the input makes use of only p and not pp, we should instead prefer to process in a streaming manner. It might also be a good idea to use generators for pp so that things like getting the last element don't require materializing the entire list in memory.

It seems this can be done by moving process_inputs inside the process_master_switch, which already has a branch for the two modes.

Edit: I also found that by default pyp saves the input to a temp file, used for the rerun feature. In addition to not being compatible with streaming mode, in general it seems this could potentially leak sensitive data. It feels like it would be better to add an explicit option to store to disk (maybe -w for --write?, or otherwise only enable this feature when pyp is invoked with a tty stdin, along with a message letting the user know that the input was saved to a temp file.

@krackers
Copy link
Author

krackers commented Aug 26, 2023

@thepyedpiper I rewrote pyp to use generators everywhere. https://gist.github.com/krackers/f73486bf2f625b9f39f33298d33b8932

From my hasty testing, everything seems to work, the only part I disabled is using fpp and spp since that apparently required knowing size of input apriori (it should still be possible to handle it, I just don't use it myself).

It seems to work quite nicely, and it can do some very cool things like

seq 1 100000 | pyp "pp[-1]"

without blowing up your memory usage. I'm actually surprised it works so well, with relatively few changes.

@thepyedpiper
Copy link
Owner

thepyedpiper commented Aug 29, 2023 via email

@krackers
Copy link
Author

krackers commented Aug 30, 2023

Sure, here's some quantitative data:

Command run: seq 1 1000000 | pyp "pp[-1]"

My version: Memory used: < 20 MB

   usr time    3.28 secs    0.15 millis    3.28 secs
   sys time    0.04 secs    2.19 millis    0.04 secs

Original version: 1 GB memory used

 usr time    6.85 secs    0.13 millis    6.85 secs
   sys time    0.28 secs    2.10 millis    0.28 secs

I wonder if it's easy to implement an alternative
solution when those are employed.

It should be possible, worst case that can be treated as an array instead of a generator. I just don't use it myself so I haven't implemented it, nor would I know what to test for.

@thepyedpiper
Copy link
Owner

thepyedpiper commented Aug 30, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants