Permalink
Fetching contributors…
Cannot retrieve contributors at this time
288 lines (176 sloc) 9.7 KB

DO NOT EDIT THIS PAGE DIRECTLY

This file is managed in git - edit the markdown source at src/python/twitter/common/styleguide.md, and publish with ./pants goal confluence src/python/twitter/common:python-styleguide

Overview

The Python style at Twitter closely follows PEP-8 + PEP-257 with the following modifications:

  • 2-space indents instead of 4
  • 100-character lines are allowed

Automated checking

An automated checker for the Twitter Python style guide can be found in Science.

  $ ./pants src/python/twitter/checkstyle:check
  $ dist/check.pex path1 path2 ...

Paths may be files or directories. You may also check the diff of the current branch against another branch, e.g.

  $ dist/check.pex --diff=master

Basic naming tenets

  • Class names should be CamelCased with the first character always capitalized.
  • Toplevel and class methods should be snake_cased.
  • Private class methods and variables should be prefixed with '_'

Science-specific guidelines

Internal libraries

  • Do not reinvent the wheel. Take a look in twitter.common first.
  • If what you're doing seems generic enough, consider factoring it out, adding tests and adding it to twitter.common
  • Unless your application is trivial, you should probably leverage twitter.common.app
  • If you need argument parsing, you should definitely be using twitter.common.app
  • If your script is basic, print + sys.stdout/stderr is fine
  • If you ever anticipate running in production, consider using twitter.common.log

External libraries

  • No use of global site-packages; instead leverage python_requirements to manage external dependencies.
  • In most cases, python_requirements should be specified in 3rdparty/python/BUILD and tied to an explicit version number so as to reduce the possibility of version conflicts within the repository.

Testing

  • Our preferred testing harness is pytest and is automatically added to the environment of all Python test targets
  • Our preferred mocking framework is mock

Naming

  • For autogenerated Python Thrift code, if your project is in src/python/twitter/thermos/runner, namespace the Python thrift into gen.twitter.thermos.runner. Namespace packages slow down importing for everything, so isolating it into a single gen namespace is preferred.

Importing

Imports should be done in the following order:

  • standard library
  • science libraries
  • code generated libraries
  • third party libraries

Within each collection, imports should be lexically ordered. For example:

  import os
  import sys
  import time

  try:
    from twitter.common import log
  except ImportError:
    import logging as log
  from twitter.common.rpc import make_client
  from twitter.common.rpc.finagle import TFinagleProtocolWithClientId

  from thrift.Thrift import TApplicationException
  import zookeeper

If you are importing more than two or three symbols from a single package, consider wrapping the imports as well.

Best practices

  1. Wrapped lines should be indented 4 spaces, rather than aligned with the opening delimiter:

    No:

         def my_function(some_args,
                         that_would_be,
                         too_long_for_one_line):
           do_something()
    

    Yes:

         def my_function(
             some_args,
             that_would_be,
             too_long_for_one_line):
    
           do_something()
    

    Note: For function definitions, the first arg is on its own line, and there's a blank line between the end of the args and the first line of the function.

  2. Prefer accessors/getters for all but the most trivial externally accessible variables, for which you may use properties.

  3. Try the least magical implementation first. Avoid metaclasses and overriding __getattr__ unless absolutely necessary.

  4. If you use print, use it as a function, preferably importing it from __future__. This way your code is compliant with both Python 2.6+ and 3.x.

    Bad:

       print 'Hello world'
       print >> sys.stderr, 'Get off my lawn!'
       print 'Processing...',
    

    Better:

       print('Hello world')
    

    Best:

       from __future__ import print_function
       print('Hello world')
       print('Get off my lawn!', file=sys.stderr)
       print('Processing...', end='')
    
  5. Only use new style classes, e.g.:

       class MyClass(object):
         ...
    

    instead of

       class MyClass:
         ...
    
  6. If your class has exceptional behavior, prefer to declare your exceptions inside your class, e.g.:

       class GarbageCollector(object):
         class CouldNotRecoverEnoughSpaceError(Exception): pass
         def __init__(self):
           ...
         def collect(self):
           ...
           raise GarbageCollector.CouldNotRecoverEnoughSpaceError("Path %s insufficient." % ...)
    

    This means that if the user wants to catch exceptions, they do not need tons of import statements.

    It is occasionally fine to put user-defined exceptions elsewhere, e.g. in __init__.py for libraries. Whatever you can do to avoid excessive imports.

  7. Do not rely upon __file__. Instead prefer pkgutil, pkg_resources and __name__. By using the latter, you may run inside a zip archive or in an exploded directory structure and it will work the same way.

    No:

       with open(os.path.join(os.path.dirname(__file__), 'resources', 'data.txt')) as fp:
       data = fp.read()
    

    Yes:

       from pkg_resources import resource_string
       data = resource_string(__name__, os.path.join('resources', 'data.txt'))
    
  8. Utilize context managers as much as possible, especially with files and locks:

    Yes:

       with open("my_file.txt") as fp:
         data = fp.read()
    
       lock = threading.Lock()
       with lock:
         print('Holding the lock!')
    

    No:

       data = open("my_file.txt").read()
    

    Nor:

       fp = open("my_file.txt")
       data = fp.read()
       fp.close()
    

    For things like zipfile which does not have a context manager in Python 2.6, use contextlib.closing:

       from contextlib import closing
       with closing(zipfile('/tmp/myfile.zip')) as zf:
         data = zf.read('manifest.txt')
    
  9. When catching exceptions, use new-style grammar:

    Yes:

       try:
         val = array[key]
       except KeyError as e:
         print('Could not access key: %s!' % e)
    

    No:

       try:
         val = array[key]
       except KeyError, e:
         print >> sys.stderr, 'Could not access key %s!' % e
    
  10. Use the os module for path manipulation as much as possible, e.g. os.path.join.

    Similarly, avoid string concatenation operations as much as possible. ''.join([a,b,c,d]) will likely be much faster than a+b+c+d.

  11. Never put dashes in Python filenames because you will not be able to import code from them. Use underscores instead.

  12. Use string.format only when you need to interpolate strings with both keyword and positional arguments. Basic printf-style string interpolation is sufficient 99.9% of the time and does not require learning a new DSL.

Tips for 2.x / 3.x interoperability:

  1. Remember that in 3.x all strings are unicode. basestring no longer exists, so if you're in science, do the following to test for stringyness:

       from twitter.common.lang import Compatibility
       if isinstance(input, Compatibility.string):
         ...
    
  2. In most situations, you probably want to open files in 'rb' or 'wb' mode instead.

  3. Use Compatibility from twitter.common.lang for:

    • Compatibility.integer
    • Compatibility.real
    • Compatibility.PY2 and Compatibility.PY3 booleans
    • Compatibility.exec_function since exec is no longer a statement in Python 3.x.
    • Compatibility.StringIO to avoid complex try/except ImportError chains.
  4. Avoid metaclasses because syntax has changed. But if you must use a metaclass, do not use the __metaclass__ = MyMetaclass syntax, instead:

       MyMetaclassBase = MyMetaclass('MyMetaclassBase', (object,), {})
       class MyClassThatNeedsAMetaclass(MyMetaclassBase):
         ...
    
  5. Relative imports must be .-delimited, so from within application.py in foo/bar along with baz.py:

    Yes:

       from foo.bar import baz
    

    or (for same-level imports):

       from .baz import poop
    

    No (won't work on Python 3.x):

       import baz
       from baz import poop
    
  6. Almost always use list/generator comprehensions instead of filter or map. The latter should only be used in the specific circumstance where a) the equivalent comprehension would be significantly longer; and b) the use case is clearly consume-once (for example, as input to for loops).

    Further, filters or maps should never be returned from functions, or as part of an API - this is particularly dangerous with the change in their behaviour between Python 2.x and 3.x (i.e. returning lists in the former and generators in the latter).

    Yes:

       odd_numbers = [i for i in range(1, 10) if i % 2]
    

    No (different behaviour between Python 2.x/3.x!):

       odd_numbers = filter(lambda i: i % 2, range(1, 10))
    

    Similarly, when dealing with dictionaries, prefer the use of their list-returning functions over their iterator counterparts, unless performance is demonstrably impacted:

    Yes:

       for k, v in my_dict.items():
         ...
    

    No (won't work on Python 3.x):

       for k, v in my_dict.iteritems():
         ...