Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery TypeError "cannot serialize '_io.BufferedReader' object #148

Closed
arolling opened this issue Jul 28, 2016 · 35 comments
Closed

Celery TypeError "cannot serialize '_io.BufferedReader' object #148

arolling opened this issue Jul 28, 2016 · 35 comments
Milestone

Comments

@arolling
Copy link

I just updated my Wooey install to today's new version and immediately hit a problem. Any script I try to run returns an error (with live updating, which is wonderful!) and in the Celery worker window I get the following message:

[2016-07-28 09:16:07,984: INFO/MainProcess] Received task: wooey.tasks.submit_script[add3df58-d675-44f0-8380-6ac85089df08]
[2016-07-28 09:16:08,121: ERROR/MainProcess] Task wooey.tasks.submit_script[add3df58-d675-44f0-8380-6ac85089df08] raised unexpected: TypeError("cannot serialize '_io.BufferedReader' object",)
Traceback (most recent call last):
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\celery\app\trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\celery\app\trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\wooey\tasks.py", line 109, in submit_script
    qout, pout = output_monitor_queue(proc.stdout)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\wooey\tasks.py", line 41, in output_monitor_queue
    p.start()
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\billiard\process.py", line 137, in start
    self._popen = Popen(self)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\billiard\forking.py", line 263, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\billiard\py3\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\billiard\forking.py", line 459, in main
    self = load(from_parent)
EOFError: Ran out of input

I tried downgrading my Celery install from 3.1.23, first to 3.1.15 then to 3.1.12 (which is what I had before the Wooey upgrade) to no avail. Do you have any suggestions?

@Chris7
Copy link
Member

Chris7 commented Jul 28, 2016

Could you try upgrading billiard and kombu? Otherwise can I get the output
of pip freeze and the script you are running?

Sent from my phone

On Jul 28, 2016 12:37 PM, "arolling" notifications@github.com wrote:

I just updated my Wooey install to today's new version and immediately hit
a problem. Any script I try to run returns an error (with live updating,
which is wonderful!) and in the Celery worker window I get the following
message:

[2016-07-28 09:16:07,984: INFO/MainProcess] Received task: wooey.tasks.submit_script[add3df58-d675-44f0-8380-6ac85089df08]
[2016-07-28 09:16:08,121: ERROR/MainProcess] Task wooey.tasks.submit_script[add3df58-d675-44f0-8380-6ac85089df08] raised unexpected: TypeError("cannot serialize '_io.BufferedReader' object",)
Traceback (most recent call last):
File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\celery\app\trace.py", line 240, in trace_task
R = retval = fun(_args, *_kwargs)
File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\celery\app\trace.py", line 438, in protected_call
return self.run(_args, *_kwargs)
File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\wooey\tasks.py", line 109, in submit_script
qout, pout = output_monitor_queue(proc.stdout)
File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\wooey\tasks.py", line 41, in output_monitor_queue
p.start()
File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\billiard\process.py", line 137, in start
self._popen = Popen(self)
File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\billiard\forking.py", line 263, in init
dump(process_obj, to_child, HIGHEST_PROTOCOL)
File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\billiard\py3\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\billiard\forking.py", line 459, in main
self = load(from_parent)
EOFError: Ran out of input

I tried downgrading my Celery install from 3.1.23, first to 3.1.15 then to
3.1.12 (which is what I had before the Wooey upgrade) to no avail. Do you
have any suggestions?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#148, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA61RQQeajxbEVrhMSvpAdtoenMQSokQks5qaNqugaJpZM4JXZ9H
.

@arolling
Copy link
Author

This happens for every single script I run, my own as well as the example scripts, including the "sum" script in the "Adding and Managing Scripts" part of the documentation. I just tried upgrading and I have the latest versions. Pip freeze:

amqp==1.4.9
anyjson==0.3.3
billiard==3.3.0.23
celery==3.1.23
clinto==0.1.3
colorama==0.3.7
cycler==0.10.0
Django==1.9.8
django-autoslug==1.9.3
django-celery==3.1.17
honcho==0.7.1
Jinja2==2.8
kombu==3.0.35
MarkupSafe==0.23
matplotlib==1.5.1
numpy==1.11.0
pyparsing==2.1.5
pyserial==3.1.1
python-dateutil==2.5.3
pytz==2016.6.1
requests==2.10.0
six==1.10.0
virtualenv==15.0.2
wooey==0.9.3

@Chris7
Copy link
Member

Chris7 commented Jul 28, 2016

Thank you, I'll see if I can replicate this with your setup.

@arolling
Copy link
Author

Thanks! In case it's relevant, in order to get it working on my Windows 10 machine my Procfile reads

web: python manage.py runserver
timing: python manage.py celery beat
worker: python manage.py celery worker -c 1 -l info --pool=solo
EOM

@Chris7
Copy link
Member

Chris7 commented Jul 28, 2016

I am unable to reproduce it on a linux machine. I'll check out python on a windows VM later today. What python distribution are you running?

@arolling
Copy link
Author

python 3.5.1

@Chris7
Copy link
Member

Chris7 commented Jul 28, 2016

Just got the same crash on windows 10 python 3.5. I'll see if I can find a workaround.

@Chris7
Copy link
Member

Chris7 commented Jul 28, 2016

The issue is python 3.5 on windows cannot pickle subprocess.PIPE for capturing stdout/stderr. I don't know an immediate workaround for this.

@arolling
Copy link
Author

That is extremely disappointing. I was trying to Google to figure out what this means, and I came across this StackOverflow question that seems to be addressing the same problem, but I doubt the solution transfers very well.

@Chris7
Copy link
Member

Chris7 commented Jul 31, 2016

Are you able to test this PR:

#149

It works on my windows test, but I'd like to verify it works for you first if possible

@arolling
Copy link
Author

arolling commented Aug 1, 2016

Well, good news: no crashes, no errors, and dynamic updating. Bad news: jobs never end, they just execute into infinity and don't complete. Looking at the folder where results are saved, it seems to not be reaching the step where it saves things to zip and tar packages. I did try making the thread a daemon, hoping it would close on its own, but that didn't seem to work.

I don't know if this is related - I've been assuming it's a Celery bug, and they no longer officially support Windows - but after clicking "Stop" on the job page I am forced to close and re-start my Celery worker before it will take any additional jobs.

Edit:
Sorry, I didn't look hard enough for errors. In the window that shows the results of runserver, this error keeps recurring:

[01/Aug/2016 10:12:44] "GET /jobs/165/jsonhtml HTTP/1.1" 500 17661
Internal Server Error: /jobs/165/jsonhtml
Traceback (most recent call last):
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\django\core\handlers\base.py", line 149, in get_response
    response = self.process_exception_by_middleware(e, request)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\django\core\handlers\base.py", line 147, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\django\views\generic\base.py", line 68, in view
    return self.dispatch(request, *args, **kwargs)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\django\views\generic\base.py", line 88, in dispatch
    return handler(request, *args, **kwargs)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\django\views\generic\detail.py", line 119, in get
    return self.render_to_response(context)
  File "C:\Users\rollinga\AppData\Local\Programs\Python\Python35-32\lib\site-packages\wooey\views\wooey_celery.py", line 222, in render_to_response
    for output_group, output_files in context['job_info']['file_groups'].items():
KeyError: 'job_info'
NoneType

@Chris7
Copy link
Member

Chris7 commented Aug 3, 2016

It's actually the line ending that is causing the problem in this bit of code:
for line in iter(out.readline, b''):

I'm looking into a cross-platform solution now and fixing our appveyor windows CI to reproduce this error (since oddly it was passing tests)

@Chris7 Chris7 added this to the 0..9.4 milestone Aug 3, 2016
@Chris7
Copy link
Member

Chris7 commented Aug 3, 2016

@arolling can you give it another test please? I think the latest changes in #149 will fix it.

@arolling
Copy link
Author

arolling commented Aug 3, 2016

I've tested this and I get no more errors! One odd quirk is that my scripts have several status updates that they print to stdout when they hit various points in their 3-10 minute run time and I'm finding that only the first two or three ever show up, and when they do it is at quite a lag (2-3 minutes). When the job is finished, only the messages that showed up on the console are displayed on the "Results" page. Is this expected behavior?

@Chris7
Copy link
Member

Chris7 commented Aug 3, 2016

That's likely due to buffering. How continuous are the messages? Could you elaborate on this statement as well: When the job is finished, only the messages that showed up on the console are displayed on the "Results" page?

@arolling
Copy link
Author

arolling commented Aug 3, 2016

So the messages should come in bunches - two together at once, then a third a second later, then 3 or four seconds before the fourth, then 60 seconds until the next, then 60 seconds, then 2-5 minutes, then several at once, then the job completes. I'm only getting the first two, and those only show up when the job is almost done. The rest of the messages never show up; they aren't stored anywhere as part of the job information, so if they don't appear on the console as it is running, they get dropped and not saved.

@Chris7
Copy link
Member

Chris7 commented Aug 4, 2016

Ok, definitely shouldn't be like that. Any way I can see the script you are running? I tested with the 8 queens problem script and the output from that was as intended.

@arolling
Copy link
Author

arolling commented Aug 4, 2016

I can't send you the script because it's driving a piece of hardware that you don't have. However, I put together a quick math-with-pauses script and had the same (partial) result.

import argparse
import sys
import time


parser = argparse.ArgumentParser(description="Perform some math on a chosen number.")
parser.add_argument('--number', help='Choose a number.', type=int, default=1000)


def prime_factors(n):
    primfac = []
    d = 2
    while d*d <= n:
        while (n % d) == 0:
            primfac.append(d)  # supposing you want multiple factors repeated
            n //= d
        d += 1
    if n > 1:
       primfac.append(n)
    return primfac


def primes_sieve1(limit):
    limitn = limit+1
    primes = dict()
    for i in range(2, limitn): primes[i] = True

    for i in primes:
        factors = range(i,limitn, i)
        for f in factors[1:]:
            primes[f] = False
    return [i for i in primes if primes[i]==True]


def main():
    args = parser.parse_args()
    s = sum((i for i in range(args.number)))
    print("Sum of all numbers below {}: {}".format(args.number, s))
    time.sleep(15)
    print("Prime factors of {}: {}".format(args.number,
                                           prime_factors(args.number)))
    time.sleep(15)
    print("Sieve of Eratosthenes for {}: {}".format(args.number, primes_sieve1(args.number)))
    return 0

if __name__ == "__main__":
    sys.exit(main())

math_screenshot

@arolling
Copy link
Author

arolling commented Aug 4, 2016

Also, sometimes it displays the command-line parameters in white (like in the screenshot) and sometimes that line doesn't show up while running but is present if you navigate to the "Results" page for that job.

@Chris7
Copy link
Member

Chris7 commented Aug 8, 2016

Just a heads up I'm trying to find time to resolve this, just busy atm with work.

@Chris7
Copy link
Member

Chris7 commented Aug 16, 2016

Can you give it another try? It should work as in it'll eventually give you the output, but realtime console output isn't really possible with windows. To get it, you need to unbuffer the output via setting PYTHON_UNBUFFERED=1 in your environment. In the future, I'm going to implement custom parameters for the python executable so you may pass '-u' to it if desired on windows.

@arolling
Copy link
Author

Thank you so much for working on all these Windows-caused problems. I really appreciate it, especially since it seems I'm your only user with Windows.

This is very very close! The screen updates properly - at a lag, as you mention, which is not a problem for me - with two small caveats. The final zip and tarball packages and any other file-based results never show up in the "All Files" section after the script is finished running; you have to navigate to the completed job under the Results tab to find them. And on that final static results page, the information that was displayed in the console has vanished and isn't saved with the job. This doesn't seem to happen with the smaller scripts though, like the math based one above.

@Chris7
Copy link
Member

Chris7 commented Aug 25, 2016

Thanks. It seems like the hard part is done. I'll try to wrap this up soon!

@Chris7
Copy link
Member

Chris7 commented Aug 27, 2016

I put in a fix for the file issue and took care of an embarrassing use of x is False in the update that may have been causing your output to be deleted. Can you give me an idea of the latency of the other script that is causing your output to be deleted?

@arolling
Copy link
Author

arolling commented Aug 29, 2016

If by latency you mean total time frame, the scripts take 3-10 minutes to run. After updating with your new fixes, I just tried one that prints every 10-30 seconds over 3-4 minutes and it wrote to the console with very minimal lag (maybe 2 seconds) which is wonderful. However, when it finished and I checked the saved job, the console messages had vanished again. After a bit of testing with the mandlebrot script, I discovered that the distinguishing factor might be whether there is a saved file involved - no saved file, console readout persists; saved file, no console readout. EDIT: After more testing I don't think this is 100% true.

@Chris7
Copy link
Member

Chris7 commented Aug 31, 2016

I test with a script that outputs files, so it isn't that. Are you using sqlite? Maybe there is an issue with concurrency?

@Chris7
Copy link
Member

Chris7 commented Aug 31, 2016

Would it be possible to screencast a session where this happens?

@arolling
Copy link
Author

arolling commented Sep 1, 2016

I believe I am using sqlite, but only one celery worker due to Windows issues (--pool=solo, as mentioned above). I'm sorry, I can't screencast due to confidentiality. But I did a series of tests and I can isolate it: Which print statements output to the console matters. If I eliminate a specific print statement that records the contents of a specific list, the problem goes away. I cannot seem to create a test case that replicates this, however, so I don't know if it's an interaction with other factors. This amount of data does not get saved:

Initializing 
xxxxxSystem
xxxxxSystem
xxxxxSystem
ble is connected at Thu Sep  1 10:44:20 2016
ble is connected at Thu Sep  1 10:44:20 2016
ble is connected at Thu Sep  1 10:44:20 2016
Format xxxxxx xxxxx
Registering and starting xxxxx
Stopping xxxxx
Refreshing xxxxx xxxx
com13 Time in Queue: [0.005002, 0.0, 0.001001, 0.002002, 0.003001, 0.029019, 0.002001, 0.035024, 0.005004, 0.029019, 0.001001, 0.034023, 0.002001, 0.025015, 0.001002, 0.032022, 0.003001, 0.036023, 0.003002, 0.026017, 0.004002, 0.034022, 0.005003, 0.037024]
com5 Time in Queue: [0.006005, 0.002002, 0.004003, 0.001001, 0.003002, 0.006005, 0.004003, 0.0, 0.006004, 0.029019, 0.002001, 0.036024, 0.001001, 0.028019, 0.003002, 0.035023, 0.002001, 0.026017, 0.0, 0.033023, 0.003001, 0.036023, 0.004002, 0.029019, 0.004002, 0.035023, 0.004002, 0.026017, 0.005003]
COM12 Time in Queue: [0.005002, 0.001001, 0.003002]



Downloading xxxxxxxxxxxx at Thu Sep  1 10:45:34 2016
Download complete at Thu Sep  1 10:45:49 2016
{ "timescale" : 60000, "duration" : 1001, "xxxxxxxxxx" : 742 }
10
10
10
10
Started at Thu Sep  1 10:44:20 2016
Duration: 1.4815114666666667 minutes
Finished at Thu Sep  1 10:45:49 2016

Removing those "Time in Queue" print statements leads to the preservation of the console log.

@Chris7
Copy link
Member

Chris7 commented Sep 1, 2016

That is interesting. Can you paste what print statement is being used? Is it print xxx or are you using a redirection version of print? What happens if you use sys.stdout.write(...)?

@arolling
Copy link
Author

arolling commented Sep 1, 2016

This is so strange! Every single output statement I'm using is print xxx. The relevant line is print("{} Time in Queue: {}\n".format(threading.current_thread().name, self.time_in_queue)) and changing print to sys.stdout.write works. Changing it back to print makes it fail again. As is clear from the details, it is a print from within a thread, and I believe it was the only print statement from a thread being triggered on these particular tests. I tried adding a print statement from a thread to test, but that did not make it fail.

@Chris7
Copy link
Member

Chris7 commented Sep 2, 2016

Awesome! I just replicated it with this:
https://gist.github.com/Chris7/9cf05ac2cca190cc02722a757e0e4b14

@arolling
Copy link
Author

arolling commented Sep 2, 2016

Yes, that's it exactly! Is this an easy fix, or should I just change all my threaded prints to sys.stdout.write?

@Chris7
Copy link
Member

Chris7 commented Sep 2, 2016

Fixed! It should be good to go now. Thank you so much for your effort!

@arolling
Copy link
Author

arolling commented Sep 2, 2016

Oh great, thanks so much for all your work!

@Chris7
Copy link
Member

Chris7 commented Sep 2, 2016

Fixed in #149

@Chris7 Chris7 closed this as completed Sep 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants