# New formatting in Python 3.6

## A talk by Phil Robare

## Chipy meeting, March 8, 2018

There are a variety of ways to format a string for output in Python.  To motivate this discussion I will use the example of a phone number field that for some reason consists of just the digits to dial.  You want to print this out in the format we are used to with three digits for the slash, three digits for the exchange and four digits for the line number.

By the way, I am currently looking for a job and you might notice during the presentation that I have subtly made a mention of that in some of the code examples.  But if you don't notice it that's fine.

So lets look at how Python can format output.

At its core, and existing in Python since the beginning, are the built-in's `str()` and `repr()`.  The difference is `str()` gives you a nice looking string and `repr()` gives you something that (possibly) can be ingested by the interpreter to give you a copy of the object.  With Unicode support in Python 3.0 we got a way to go from a string back to a unicode format.  It is sort-of the opposite of 2.x's 'unicode()' built-in.

Note that all these work by calling dunder (double_underscore) methods defined in the Object class that all objects descend from.

In [144]:
n = 0
plea = "call Phil at 773/989-7978 and off\u0435r him a job."

print(str(plea))
print(repr(n))
print(ascii(plea))
print()
print(n.__str__())
print(plea.__repr__())


call Phil at 773/989-7978 and offеr him a job.
0
'call Phil at 773/989-7978 and off\u0435r him a job.'

0
'call Phil at 773/989-7978 and offеr him a job.'


But that isn't really formatting where we want to get our output mixed with other text in a readable way.  The simplest way to do that is just use string slicing and concatenation.

In [126]:
ph_number = '7739897978'
n = 1

print("Time "+str(n)+": call Phil at "+ph_number[:3]+"/"+ph_number[3:6]+
      "-"+ph_number[6:]+" and offer him a job.")


Time 1: call Phil at 773/989-7978 and offer him a job.


This works, but does not convey the intent of the line (since any string could be in the ph_number variable), is hard to write (you have to count character positions) and is hard to read.

To get around the hard-to-read problem we have he next method, % string formatting.  This also has the advantage that you can specify type conversions in the formatting rather than having to explicitly convert them in your command.

In [127]:
n = 2

print("Time %d: call Phil at %s/%s-%s and offer him a job." %
       (n, ph_number[:3], ph_number[3:6], ph_number[6:]))


Time 2: call Phil at 773/989-7978 and offer him a job.


A problem here is getting the arguments in the right order and making sure that there are the right number of arguments.  It is often hard, particularly with long format strings, to know where in the output the argument is going to end up.

Something that can be done with % formatting, but that I don't see being done enough, is to pass a dictionary after the % instead of a tuple.  You decorate the format items in the string with the key name (in parentheses) and now you just have to make sure that all the keys in the format string are in the dictionary.  Getting them in order is no longer required.

It could even look like mad-libs when you created a dictionary of substitutions and passed it into the format.

In [128]:
print("Time %(n)03d: call %(name)s at %(area_code)s/"
      "%(exchange)s-%(line)s and offer him a %(offer)s" % 
      {"n":3,
       "area_code":ph_number[:3],
       "exchange":ph_number[3:6],
       "line":ph_number[6:],
       "name":"Phil",
       "offer":"job"})


Time 003: call Phil at 773/989-7978 and offer him a job



The third way we had of formatting output in Python was the built-in `format()` method.  The built-in first appeared in Python 2.6 and, like the other built-ins worked by calling `__format__()`. This allowed more elaborate format options because `format` took two parameters, the object being formatted and a format specifier.  This allowed the creation of format strings that you could read and understand, and could control the presentation for things like leading zeroes. The format string passed to format can include position markers enclosed in curly braces.  

In [129]:
n = 4

print("Time {0:03d}: call Phil at {1}/{2}-{3} and offer him a job.".format(
       n, ph_number[:3], ph_number[3:6], ph_number[6:]))


Time 004: call Phil at 773/989-7978 and offer him a job.


And then in 2.7 the need to specify the need to specify the position number was dropped and you could just put in curly braces in order.

In [130]:
n = 5

print("Time {:03d}: call Phil at {}/{}-{} and offer him a job.".format(
       n, ph_number[:3], ph_number[3:6], ph_number[6:]))


Time 005: call Phil at 773/989-7978 and offer him a job.


Having the `__format__()` method being called when an object was formatted made it possible to create custom formatters for classes. 

In [131]:
class phone_no:
    def __init__(self, num):
        self.num = num
    def __format__(self, fspec):
        return "{}/{}-{}".format(self.num[:3], self.num[3:6], self.num[6:])

my_phone = phone_no(ph_number)

n = 6
print("Time {:03d}: call Phil at {} and offer him a job.".format(n, my_phone))


Time 006: call Phil at 773/989-7978 and offer him a job.


Now we come to the innovations introduced in PEP 498 "Formatted String Literals" which has been implemented in the most recent stable release Python 3.6.

In the previous `format()` you had the format string and you had the variables to put into the format.  In the new version of format you can put the variables right into the string and the parser figures out which variables are being referenced and creates a string that captures their value.

To tell the parser that this string is to be parsed for format a new syntax element has been added, putting the letter "f" in front of the string.

In [132]:
n = 7
print(f"Time {n:03d}: call Phil at {my_phone} and offer him a job.")


Time 007: call Phil at 773/989-7978 and offer him a job.


But since the parser is interpreting the string anyway, it is allowed to put an expression inside the `{}`'s instead of just a variable and the parser will parse the expression.

In [133]:
n = 7
print(f"Time {n+1:03d}: call Phil at {my_phone} and offer him a job.")


Time 008: call Phil at 773/989-7978 and offer him a job.


The f-string (or "formatted string") is an object that can be passed around.  Once it is assigned it is just a string.

In [134]:
n = 8

msg = f"Time {n+1:03d}: call Phil at {my_phone} and offer him a job."
print(type(msg))

def print_plea(a_msg):
    print(a_msg)

print_plea(msg)


<class 'str'>
Time 009: call Phil at 773/989-7978 and offer him a job.


Now there is no limitation on the expression that can be between the curly braces, only that it must be a valid Python expression - so no assignments allowed, no compounds of multiple statements, the usual limitations.  But we can get quite creative with lambda's, function calls, etc.  A limitation is that a backslash escape character (such as newline's `\n`) is not allowed in the f-string.  This limitation is easily avoided.

In [135]:
downtown_redline = [ "Washington", "Madison", "Jackson" ]
lf = '\n'
print(f'Downtown the Red Line subway stops at {len(downtown_redline)} stops:' '\n'
      f'{lf.join([" * " + stop for stop in downtown_redline])}')


Downtown the Red Line subway stops at 3 stops:
 * Washington
 * Madison
 * Jackson


A limitation on lambdas is that when used within f-strings they must be within parens.

In [136]:
f'Three squared is {(lambda x: x*x)(3)}'


'Three squared is 9'

It is my opinion that lambdas in Python are somewhat problematic today, now that Python has internal functions.  A function is not limited to a single expression and its name provides documentation that is missing when a lambda is used.

In [137]:
def get_sq(x):
    def sq(x): return x*x
    print( f'{x:,} squared is {sq(x):,}')
    
get_sq(3)
get_sq(45466356 + 684)


3 squared is 9
45,467,040 squared is 2,067,251,726,361,600


The format specifier is a string that is separated from the value expression by a colon.  Their meaning depends upon the type of the expression. Format specificiers can themselves have formatted expressions inside them

In [138]:
import decimal

width = 10
precision = 4
value = decimal.Decimal('12.34567')
print(f'result: {value:{width}.{precision}}')


result:      12.35


The f-string is executed when it is reached in the normal flow of the program and is evaluated based on the variables that are visible at the time to f-string is executed.  If you want to create an f-string that is passed around and instantiated multiple times you can `exec` the string.  To access the value after the exec the result must be assigned to a field in a mutable data structure.

In [197]:
static_f_string = "f'The value of A is {A}.'"

def do_something(A, f_string):
    s=[]  # a mutable data structure that exec will change
    s.append('init value')
    # Bad, bad code: use of variable A is totally hidden
    exec(f's[0] = {f_string}')
    print(s[0])
    
do_something(3, static_f_string)
do_something('a string this time', static_f_string)


The value of A is 3.
The value of A is a string this time.


Another way to achieve a formatting with variable intent is to use classical % operator formatting but applying it to an f-string that first gets formatted into something with the `%(var)s` place holders.  Note that this works but is not readable code: NOT RECOMMENDED. 

In [140]:
# checking what happens when you combine f-strings with % formatting
def test(place2):
    place1 = 'hole in the ground'
    print(f'We were evicted from *our* {place1}; ' +
          'we had to go and live in %(place2)s!' % 
          {'place2':place2})
    
test('a lake')

We were evicted from *our* hole in the ground; we had to go and live in a lake!


Let's take a look at the `format_spec` element.

The `format_spec` element comes afer a colon within the field.  The element adds additional information about how the variable is to be displayed in the output string.  In formatting an integer you can specify if commas or underscores are used in decimal representations and whether underscores should be added to hex displays.  In formatting a floating point number the `format_spec` can say how wide the field is to be, if leading zero's are to be displayed, or how many decimal places are to be displayed.  For a string the field width can be specified.  But if you specify a width too small to hold the string the width is ignored.  If you specify a negative width for a string you get a `ValueError` exception.


In [200]:
my_int = id(None)
print(f'{my_int}      A number with no formatting')
print(f'{my_int:_}   With underscores every three digits (new in 3.6)')
print(f'{my_int:,}   Change the underscores to commas')
print(f'{my_int:015,} Commas in the number, field width 15, leading 0\'s')
print(f'{my_int:x}        Format the number as hex digits')
print(f'{my_int:_x}       With underscores every 4 hex digits (new in 3.6)')
print(f'{my_int:_x}'.replace('_',' ') + '       Change underscore to space as in hex dumps')

1823249552      A number with no formatting
1_823_249_552   With underscores every three digits (new in 3.6)
1,823,249,552   Change the underscores to commas
001,823,249,552 Commas in the number, field width 15, leading 0's
6cac9490        Format the number as hex digits
6cac_9490       With underscores every 4 hex digits (new in 3.6)
6cac 9490       Change underscore to space as in hex dumps


In [141]:
a_float = 17/3
print(f'{a_float} - a float without format specified')
print(f'{a_float:+07.3} - a float with format +07.3')
print()

a_string = 'A string'
print(f'{a_string} - a string without format specified')
print(f'{a_string:20} - a string with format "20" specified')
print(f'{a_string:5} - a string with format "5" specified')
print(f'{a_string:-20} - a string with format "-20" specified') # this raises an exception


5.666666666666667 - a float without format specified
+005.67 - a float with format +07.3

A string - a string without format specified
A string             - a string with format "20" specified
A string - a string with format "5" specified


ValueError: Sign not allowed in string format specifier

So the interpretation of the format spec is up to the `__format__` method of the class that the variable is a member of.  So if we want we can extend a class by specifying a `__format__` that adds extra capabilities.  So here I am adding the ability to hide characters that exceed the specified width, and right specify a string with a negative width.

In [None]:
import re

class my_str(str):
    def __init__(self, value):
        super(str, value)
    def __format__(self, f_spec):
        def shorten(f_spec, right_adj):
            width = int(f_spec)
            length = len(self.__str__())
            if width < length:
                if right_adj:
                    # right adjust and shorter, show end of string
                    return f'{self.__str__()[(length-width):]}'
                else:
                    # left adjust and shorter, show start of string
                    return f'{self.__str__()[:width]}'
            elif right_adj:
                # right adjust and longer, add (width - length) spaces to left
                pad = (width - length) * ' '
                return f'{pad+self.__str__()}'
            else:
                # left adjust and longer, user parent obj's formatting
                return f'{self.__str__():{f_spec}}'
        if f_spec is '':
            # no format spec, user parent obj's formatting
            return f'{self.__str__()}'
        if re.match('[+]?\d+', f_spec):
            # left adjust, possibly shorten
            return shorten(f_spec, False)
        if re.match('-\d+', f_spec):
            # right adjust, possibly lengthen
            return shorten(f_spec[1:], True)
        # unrecognized spec
        raise ValueError( 'format specification must be a '
                         f'(optionally signed) integer, got "{f_spec}"')

a_string = my_str('A string')
print(f'{a_string} - my string without format specified')
# note nesting of f-strings in next line which makes an int expression into a string f-spec
print(f'{a_string:{len(a_string)}} - my string with format length same as string length')
print(f'{a_string:5} - my string with format "5" specified')
print(f'{a_string:-5} - my string with format "-5" specified')
print(f'{a_string:20} - my string with format "20" specified')
print(f'{a_string:-20} - my string with format "-20" specified')
print(f'{a_string:?20} - my string with format "?20" specified') # this one throws an error


So the interpretation of format specification is totally up to the class being formatted.  This opens up all sorts of potential for misuse of the specification feature, allowing obfuscation and side effects. For instance I can pass a JSON string in as a format specification.

Consider the following code that applies a dictionary to a template:

In [None]:
# 'classic' formatting with a dictionary.  Relatively straight-forward and readable
data_dict ={"my_class": "my_header",
            "my_id":"my_header1",
            "text": "What is your favorite colour?"}
template = '<h1 class="%(my_class)s" id="%(my_id)s">%(text)s</h1>'
print(template % data_dict)


The application of the dictionary to the template could be moved inside the formatting for a class and the dictionary can be passed into the format as a JSON object since JSON is a string format and strings can be format specifications.

I would not consider the following to be pythonic.

In [None]:
# using a class with a __format__ to instantiate HTML templates
# This does NOT get past code review!
import json

class html_maker:
    def __init__(self, template):
        self.template = template
    def __format__(self, spec):
        if spec:
            return (f'{self.template}' % json.loads(spec))
        else:
            return f'{self.template}'

# a template that takes no arguments
br_maker = html_maker("<br>")
print(f'{br_maker}')

# template and data_dict as assigned in code cell above
header_maker = html_maker(template)
print(f'{header_maker:{json.dumps(data_dict)} }')


Since f-string formatting is applied to the formatted template in the above example the template does not have to be a string, just something that returns a string upon formatting. We can create an object that adds pointy braces to the template and makes unique IDs and assign classes.  Because we can - not because it makes good code!

(Please, use jinja templates.  The code below reminds me of the worst aspects of writing XML.)

In [189]:
from collections import Counter
# uses html_maker class and br_maker object from previous code cell

tag_counter = Counter()

class template_maker:
    def __init__(self, html_tag, contents):
        global tag_counter
        tag_counter[html_tag] += 1
        self.id = tag_counter[html_tag]
        self.html_tag = html_tag
        self.contents = contents
    def _get_data_dict(self):
        return {"inside":self.contents}
    def __format__(self, json_spec):
        # a template for our generic tag
        self.template = html_maker(f'<{self.html_tag} class="my_{self.html_tag}" '
                         f'id="{self.html_tag}{self.id}">'+'%(inside)s'+f'</{self.html_tag}>')
        # if we got some JSON in use it as the dictionary for formatting
        if json_spec:
            form = f'{self.template}'
            d = json.loads(json_spec)
            return (form % d)
        # no JSON, use the contents passed in at object creation
        else:
            #return f'{self.template}'
            form = f'{self.template}'
            d = self._get_data_dict()
            return (form % d)

heading1 = template_maker('h1','Do you have any Stilton?')
para1 = template_maker('p', f'The {template_maker("i", "Cheese Shop")} skit ...')
div1 = template_maker('div', f'{heading1}{br_maker}{para1}')
print(f'{div1}')

<div class="my_div" id="div1"><h1 class="my_h1" id="h11">Do you have any Stilton?</h1><br><p class="my_p" id="p1">The <i class="my_i" id="i1">Cheese Shop</i> skit ...</p></div>


The full BNF definition of the f-string, as given in the Python documentation, is:

    f_string          ::=  (literal_char | "{{" | "}}" | replacement_field)\*
    replacement_field ::=  "{" f_expression ["!" conversion] [":" format_spec] "}"
    f_expression      ::=  (conditional_expression | "*" or_expr)
                             ("," conditional_expression | "," "*" or_expr)* [","]
                           | yield_expression
    conversion        ::=  "s" | "r" | "a"
    format_spec       ::=  (literal_char | NULL | replacement_field)\*
    literal_char      ::=  \<any code point except "{", "}" or NULL>
    
Something that I have not yet successfully played with is where it indicates that a `yield_expression` is allowed within the `f_expression`.  That has powerful implications for future code abuse.


There is no discussion of yield expressions in the official documentation of f-strings or in the PEP that defined them (other than in the BNF above). The simple example below does not work.

In [196]:
g = f'{yield "a"}'
for var in f'{g}':
    print(f'We get back "{var}"')

SyntaxError: 'yield' outside function (<ipython-input-196-550cd1ca71a4>, line 1)

Although I think the above should work like the below:

In [195]:
def g():
    yield "a"

for var in g():
    print(f'We get back "{var}"')

We get back "a"
