Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace all uses of twisted.python.log with twisted.logger. #29

Merged
merged 3 commits into from Jul 12, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 10 additions & 2 deletions ampoule/child.py
@@ -1,8 +1,13 @@
from twisted.python import log
from twisted import logger
from twisted.internet import error
from twisted.protocols import amp
from ampoule.commands import Echo, Shutdown, Ping



log = logger.Logger()


class AMPChild(amp.AMP):
def __init__(self):
super(AMPChild, self).__init__(self)
Expand Down Expand Up @@ -33,7 +38,10 @@ def shutdown(self):
This method is needed to shutdown the child gently without
generating an exception.
"""
log.msg("Shutdown message received, goodbye.")
log.info(
u'Shutdown message received, goodbye.',
log_system=self.__class__.__name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put the log object onto AMPChild, and then do self.log.info, this type of namespacing comes along for free. Is this a feature you knew about but didn't want to use, or just something that wasn't clearly documented enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iog_system itself is undocumented at https://twistedmatrix.com/documents/current/core/howto/logger.html; I found it while digging around in the code looking for the equivalent of twisted.python.log.msg's "system" kwarg. twisted.logger looks as though this kind of thing should be handled by the "namespace" kwarg passed to Logger(); in my experiments, setting the namespace explicitly, e.g., "log = logger.Logger(namespace=u'foo')" doesn't always work. I didn't get to the bottom of that, as I found the "log_system" kwarg and it did exactly what I wanted.

Ideally, I think a logger should be constructed with the namespace argument, as the preferred way to do instantiate it, as opposed to embedding it as a class member and relying on automagical namespace configuration, but that's likely to be out of scope for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I tried the "self.log" approach, and it emits a more correct, but also more verbose namespace for messages, which I found very noisy.

self.log.info(u'Shutdown message received, goodbye.')
[ampoule.test.test_process.TempDirChild#info] Shutdown message received, goodbye.'

versus

log.info(
    u'Shutdown message received, goodbye.',
    log_system=self.__class__.__name__
)
[TempDirChild] Shutdown message received, goodbye.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glyph any thoughts on this? I personally like the log_system approach, because it offers more control over what shows up in the log message's namespace, and also because it allows me to have exactly one instance of Logger for the module, whereas the self.log approach creates a new Logger per instance of the containing class. Maybe that's not a big deal, but if you are making a lot of instances, that's a lot of identical loggers logging exactly the same information and seems wasteful to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don't like the log_system approach 😄. As you've observed, it's not documented! It's the way the log system implements the namespacing feature, not a parameter that's supposed to be passed in directly.

TempDirChild is not a qualified name, so there's no way for someone to look at that message and figure out what module or source file it's come from. The design of the log system is that you should rely on automagical namespace configuration, unless you have a really good reason not to. Why is it that you want to avoid it? I can see wanting to avoid too much Logger construction, since objects are not cheap as they should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid self.log because of the potentially excessive object creation. Also, I am a fan of "log" being a module-level object that can serve the needs of any class within that module, rather than each class having its own log attribute. The module-level usage of a logger fits fairly well with the idioms employed in other Python projects, and I don't see a lot of value in being different here.

OK. Module-global is definitely OK by me, and Python is definitely slower than I'd like about creating objects. And in practice (i.e.: module-level) this is how I've mostly used it; I was mainly suggesting instance-level just because you were deriving the namespace from the class rather than the module.

Regarding the notion that "you should rely on automagical namespace configuration", I'd rather have no namespace at all, than one that I can't control. People who create logging messages in their code know what they want to log, so letting them control the namespace is a win, imo. Using log_system or something equivalent, lets a single logger serve multiple needs instead of having a logger-per-namespace approach. I'd be happy with any mechanism other than an automagical one :)

For an application, you might be right in some cases. (However, at work we just completed a migration from manual namespace specification to automagical ) But ampoule isn't an application, and the purpose of a namespace is not for our reading pleasure but for people operating applications to filter on. So having a hierarchical identifier that separates it from the application is key.

I think I understand the desire to make logging something more structured, searchable, etc., but anecdotally, I can't find anybody who wants to deal with that.

Surprise! You found one. I run Ampoule in production and consume structured logs via Cloudwatch in AWS :). (In a previous job, I consumed structured logs in ElasticSearch. And I'm aware of at least 2 other Twisted deployments doing this with an ELK stack.)

People want to read the same kind of logfile output that Apache and Nginx have been putting out for years; they don't want structured stuff that gets dumped into ElasticSearch and exposed via Kibana. Given that other logging systems tend to adhere to this admittedly primitive format, I don't see the value in doing anything else.

  1. I agree! There's a huge audience for plaintext logging. This is why Twisted exposes a format string; to preserve this traditional log-grepping experience where you just get a string out and get mostly ad-hoc formatted text files you can scrape with regexes.
  2. I totally disagree! In addition to my personal extensive usage of structured logging on multiple occasions, I think that the reason that people don't "don't want structured stuff" is that the tooling is largely garbage and the setup and configuration (particularly for ELK stacks) is so complicated and intricate that it's not worth it to set it up at the beginning of an app's development cycle. And then later, it's not worth setting up because you don't have structured output, so there's not much point in investing heavily in a backend that can do structured queries. Breaking this cycle, I believe, starts with libraries and frameworks defaulting to emitting structured logs, so that when you do have the time to invest in the backend to manage it, the source data is already there.

Sorry for the wall of text, this discussion is way outside of the scope of this PR. Please let me know what you want me to do in this specific instance, and I'll implement it. The larger discussion about twisted.logger is probably better for the mailing list.

One good wall deserves another :). But I think for the purpose of this code review, just "let's use a module-level thing" is a fine start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glyph thanks for the thoughtful response, I appreciate it. So, module-level logger instance for this PR, what about the use of the log_system kwarg? I ask because a module-level logger will not include the classname in its namespace.

Also, I decided to figure out how I got ahold of the log_system kwarg in the first place and it was in the docstring for formatEventAsClassicLogText:

      C{system} is the event's C{"log_system"} value, if set, otherwise,
      the C{"log_namespace"} and C{"log_level"}, joined by a C{u"#"}.  Each
      defaults to C{u"-"} is not set.

...which reads to me like it is something I could meaningfully expect to use.

Further, eventAsText documents its use as well, using a copy of the same docstring, but still it is an expected value.

If using log_system is acceptable, then a single, module-level Logger instance can still provide class-level logging, e.g.,

# assumes self._loggingName = log.namespace + u'.' + self.__class__.__name__
log.info(
    u'Shutdown message received, goodbye.',
    log_system=self._loggingName + u'#info'
)

2019-07-11 17:40:54-0500 [-] FROM 0: b'2019-07-11T17:40:54-0500 [ampoule.child.TempDirChild#info] Shutdown message received, goodbye.'

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really need to include the class name in the namespace at all, frankly, especially for this first cut; I'd just skip passing it entirely. I definitely wouldn't want to custom-construct one like this which mashes the log level in there as a string like that. So just

log.info(u'Shutdown message received, goodbye.')

Except maybe structured-format like a PID in there or something so we can identify who received it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glyph I've updated to PR to remove the log_system kwarg and go with plain module-level logging for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find any clear path forward to adding a PID to that shutdown message, so I skipped that task.

)
self.shutdown = True
return {}
Shutdown.responder(shutdown)
Expand Down
25 changes: 19 additions & 6 deletions ampoule/main.py
Expand Up @@ -5,14 +5,20 @@

from zope.interface import implementer

from twisted import logger
from twisted.internet import reactor, protocol, defer, error
from twisted.python import log, reflect
from twisted.python import reflect
from twisted.protocols import amp
from twisted.python import runtime
from twisted.python.compat import set

from ampoule import iampoule



log = logger.Logger()


gen = itertools.count()

if runtime.platform.isWindows():
Expand Down Expand Up @@ -63,7 +69,7 @@ def signalProcess(self, signalID):
return self.transport.signalProcess(signalID)

def connectionMade(self):
log.msg("Subprocess %s started." % (self.name,))
log.info(u'Subprocess {n} started.', n=self.name)
self.amp.makeConnection(self)

# Transport
Expand Down Expand Up @@ -94,10 +100,10 @@ def childDataReceived(self, childFD, data):

def errReceived(self, data):
for line in data.strip().splitlines():
log.msg("FROM %s: %s" % (self.name, line))
log.error(u'FROM {n}: {l}', n=self.name, l=line)

def processEnded(self, status):
log.msg("Process: %s ended" % (self.name,))
log.info(u'Process: {n} ended', n=self.name)
self.amp.connectionLost(status)
if status.check(error.ProcessDone):
self.finished.callback('')
Expand All @@ -111,8 +117,15 @@ def main(reactor, ampChildPath):
from twisted.application import reactors
reactors.installReactor(reactor)

from twisted.python import log
log.startLogging(sys.stderr)
from twisted import logger
observer = logger.textFileLogObserver(sys.stderr)
logLevelPredicate = logger.LogLevelFilterPredicate(
defaultLogLevel=logger.LogLevel.info
)
filteringObserver = logger.FilteringLogObserver(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing the bootstrapping as well! One interesting question might be how to configure the filters on this predicate without having to put in a whole custom BOOTSTRAP. For a future change, perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good thought. I kind of hate that all this code lives in a big-old wad of text. I can vaguely imagine some hand-wavey thing where we take command-line arguments, or an ampoule.cfg file, and generate the bootstrap code from a template. Definitely bigger than the scope of this work, but it would have several benefits, including configuration of log levels and log formatters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of hate that all this code lives in a big-old wad of text

same

an ampoule.cfg file, and generate the bootstrap code from a template

or you know, parse a datastructure passed in argv ?

observer, [logLevelPredicate]
)
logger.globalLogBeginner.beginLoggingTo([filteringObserver])

from twisted.internet import reactor, stdio
from twisted.python import reflect, runtime
Expand Down
34 changes: 20 additions & 14 deletions ampoule/pool.py
Expand Up @@ -9,17 +9,23 @@
count = functools.partial(next, itertools.count())
pop = heapq.heappop

from twisted import logger
from twisted.internet import defer, task, error
from twisted.python import log

from ampoule import commands, main



log = logger.Logger()


try:
DIE = signal.SIGKILL
except AttributeError:
# Windows doesn't have SIGKILL, let's just use SIGTERM then
DIE = signal.SIGTERM


class ProcessPool(object):
"""
This class generalizes the functionality of a pool of
Expand Down Expand Up @@ -135,17 +141,17 @@ def _addProcess(self, child, finished):
Adds the newly created child process to the pool.
"""
def fatal(reason, child):
log.msg("FATAL: Process exited %s" % (reason,))
log.error(u'FATAL: Process exited.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a single log message as well.

log.error(u'\t{r}', r=reason.getErrorMessage())
self._pruneProcess(child)

def dieGently(data, child):
log.msg("STOPPING: '%s'" % (data,))
log.info(u'STOPPING: {s}', s=data)
self._pruneProcess(child)

self.processes.add(child)
self.ready.add(child)
finished.addCallback(dieGently, child
).addErrback(fatal, child)
finished.addCallback(dieGently, child).addErrback(fatal, child)
self._finishCallbacks[child] = finished
self._lastUsage[child] = now()
self._calls[child] = 0
Expand Down Expand Up @@ -377,15 +383,15 @@ def _cb(_):
return defer.DeferredList(l).addCallback(_cb)

def dumpStats(self):
log.msg("ProcessPool stats:")
log.msg('\tworkers: %s' % len(self.processes))
log.msg('\ttimeout: %s' % (self.timeout))
log.msg('\tparent: %r' % (self.ampParent,))
log.msg('\tchild: %r' % (self.ampChild,))
log.msg('\tmax idle: %r' % (self.maxIdle,))
log.msg('\trecycle after: %r' % (self.recycleAfter,))
log.msg('\tProcessStarter:')
log.msg('\t\t%r' % (self.starter,))
log.info(u'ProcessPool stats:')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind putting this together into one big log message, rather than spreading it out across a whole bunch of different ones? Conceptually, what's happening here is a dump of stats at a particular moment in time; they should all be associated with each other. The repeated logging calls were just there trying to get the message to format in a particular way, which shouldn't be the log emitter's concern.

log.info(u'\tworkers: {w}', w=len(self.processes))
log.info(u'\ttimeout: {t}', t=self.timeout)
log.info(u'\tparent: {p}', p=self.ampParent)
log.info(u'\tchild: {c}', c=self.ampChild)
log.info(u'\tmax idle: {i}', i=self.maxIdle)
log.info(u'\trecycle after: {r}', r=self.recycleAfter)
log.info(u'\tProcessStarter:')
log.info(u'\t\t{s}', s=self.starter)

pp = None

Expand Down