Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Email error when log is too big to send. #76

Closed
surbas opened this issue May 13, 2014 · 9 comments
Closed

Email error when log is too big to send. #76

surbas opened this issue May 13, 2014 · 9 comments

Comments

@surbas
Copy link
Contributor

surbas commented May 13, 2014

I log a lot in my processes and got the following in stdout on completion of a job.

Traceback (most recent call last):
File "c:\dagobah\dagobah\core\components.py", line 31, in emit
method.call(_args, *_kwargs)
File "c:\dagobah\dagobah\daemon\daemon.py", line 151, in job_complete_email
email_handler.send_job_completed(kwargs['event_params'])
File "c:\dagobah\dagobah\email\basic.py", line 24, in send_job_completed
self._construct_and_send('Job Completed: %s' % data.get('name', None))
File "c:\dagobah\dagobah\email\common.py", line 39, in _construct_and_send
self._send_message()
File "c:\dagobah\dagobah\email\common.py", line 72, in _send_message
self.message.as_string())
File "C:\Python27\Lib\smtplib.py", line 739, in sendmail
raise SMTPDataError(code, resp)
SMTPDataError: (552, 'message line is too long')

So should we truncate logs if they are too big? This is actually controlled by the smtp server, so how big were my logs, and what is too big?

Should we offer an option not to send a log with an email (I personally don't need to see my logs in email).

@thieman
Copy link
Owner

thieman commented May 13, 2014

I think there is currently some very crude truncation going on in an attempt to keep the logs for each individual Task pretty small. These could still pile up if you have a bunch of tasks.

I'm in favor of adding configuration around sending logs and, if this is a common problem, configuring a max size on emails. Both should be their own issues.

@rclough
Copy link
Collaborator

rclough commented May 13, 2014

Yeah this was another feature I was thinking of. I know our processes will probably have a lot of STDOUT, so sending it directly in the email, while convenient, can be a bit much.

With utkarsh's logs code, you could have an email template that links to the logs in the web UI from the email, instead of putting them directly in the email.

@thieman
Copy link
Owner

thieman commented May 13, 2014

With utkarsh's logs code, you could have an email template that links to the logs in the web UI from the email, instead of putting them directly in the email.

I love this idea. Would reduce the need for both of those email config vars, I would think.

@surbas
Copy link
Contributor Author

surbas commented May 13, 2014

Having a link would be cool, but you still have the problem of people who want it in the email.
To fix this issue, if we get an SMTPDataError, send the email again with the log replaced with the sentence "The log was too big to be sent by your email server".

If you are cool with that I will work on that and summit a pull request.

Edit: English

@rclough
Copy link
Collaborator

rclough commented May 13, 2014

Yes, in either case there should be detection for the email being too large, so +1 @surbas

@rclough
Copy link
Collaborator

rclough commented Nov 14, 2014

I think we might be hitting this issue with one of our jobs, so this is on our radar. (we have a long running job every morning that produces a lot of output that doesn't end up having an email sent.) Although looking through the dagobah log, we arent getting these errors, we're getting something like:

192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /job/540e27e27eb4da4753d3bba4/Run%20wh%20job HTTP/1.1" 200 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/css/task_detail.css HTTP/1.1" 304 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/js/task_detail.js HTTP/1.1" 304 -
192.168.144.165 - - [14/Nov/2014 10:48:47] "GET /static/lib/Kickstrap1.3.2/Kickstrap/js/kickstrap.min.js HTTP/1.1" 404 -
192.168.144.165 - - [14/Nov/2014 10:48:48] "GET /api/logs?job_name=wh-log-transfer&task_name=Run+wh+job HTTP/1.1" 200 -
database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/dagobah/daemon/util.py", line 47, in wrapper
    result = fn(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/dagobah/daemon/api.py", line 122, in tail_task
    return task.tail(**call_args)
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 909, in tail
    self.name)
  File "/usr/lib/python2.6/site-packages/dagobah/backend/mongo.py", line 139, in get_latest_run_log
    for rec in cur:
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1038, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 982, in _refresh
    self.__uuid_subtype))
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 925, in __send_message
    self.__compile_re)
  File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 109, in _unpack_response
    error_object)
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes
Exception on /api/tail [GET]
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1687, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1360, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1358, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1344, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/lib/python2.6/site-packages/flask_login.py", line 663, in decorated_view
    return func(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/dagobah/daemon/util.py", line 54, in wrapper
    raise e
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes

I see a bunch a few of these OperationFailure errors, here's another:

Exception in thread Thread-3769:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 736, in run
    self.function(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 824, in check_complete
    complete_time=datetime.utcnow())
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 1027, in _task_complete
    self.parent_job._complete_task(self.name, **kwargs)
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 547, in _complete_task
    self._on_completion()
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 592, in _on_completion
    self._serialize(include_run_logs=True))
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 639, in _serialize
    for task in self.tasks.itervalues()]
  File "/usr/lib/python2.6/site-packages/dagobah/core/core.py", line 1044, in _serialize
    self.name)
  File "/usr/lib/python2.6/site-packages/dagobah/backend/mongo.py", line 139, in get_latest_run_log
    for rec in cur:
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 1038, in next
    if len(self.__data) or self._refresh():
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 982, in _refresh
    self.__uuid_subtype))
  File "/usr/lib64/python2.6/site-packages/pymongo/cursor.py", line 925, in __send_message
    self.__compile_re)
  File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 109, in _unpack_response
    error_object)
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes

@thieman
Copy link
Owner

thieman commented Nov 14, 2014

The exact issue here is probably a Mongo server-side bug, but we do need a better way in general for handling giant logs.

https://www.google.com/search?q=mongo+overflow+sort+stage

@thieman
Copy link
Owner

thieman commented Nov 14, 2014

@rclough Do you think it makes more sense to just drop logs over a certain size and warn the user about it or to implement something that could actually handle giant logs? We could try using GridFS in Mongo but as far as I know SQLite is going to be constrained by whatever maximum size we set for that column. No idea how large we can go, but it seems probably inefficient.

@rclough
Copy link
Collaborator

rclough commented Nov 17, 2014

I feel like if you are going to be running jobs with huge logs, you are probably going to want a backend more robust than SQLite. That said, being able to get full logs is pretty important, I think. A short fix would be dropping large logs with a warning, but often times in my experience, there's not much option to fix it (ie would be sacrificing necessariy info for job failures).

I don't know how GridFS works but quickly glancing it seems like a cool idea. It might be handy to have emails send a dagobah link to the log if the log is too big.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants