# Getting stuff into strings in Python

By [Allison Parrish](http://www.decontextualize.com/)

This is a little tutorial on how to interpolate values into strings in Python. As with many of my tutorials, we'll need a bit of randomness:

In [1]:
import random

## String concatenation

There are several ways to perform string interpolation in Python. The first is just to use the `+` operator to concatenate multiple strings together, like so:

In [2]:
val = "one hundred"
s = "We have " + val + " widgets in stock."
print(s)

We have one hundred widgets in stock.


This solution doesn't work so great, however, if the value that you want to interpolate isn't already a string type. For example:

In [3]:
val = 100
s = "We have " + val + " widgets in stock."
print(s)

TypeError: must be str, not int

You get a `TypeError`, which is means (essentially) "hey, you told me to add two things together, but I don't know how to add a string to an integer; that doesn't even make sense." You can add strings to strings, and integers to integers, but not strings to integers. Those are just the rules.

To get around this, you can use the `str()` function to get Python to make its best guess about how to represent the provided value as a string:

In [4]:
str(val)

'100'

In [5]:
s = "We have " + str(val) + " widgets in stock."
print(s)

We have 100 widgets in stock.


This also works for floating point numbers:

In [6]:
fval = 3.14159265

In [7]:
s = "We have " + str(fval) + " kilograms of widgets in stock."
print(s)

We have 3.14159265 kilograms of widgets in stock.


A drawback of using string concatenation is that it gets messy quick when you start using more than one value.
For example, imagine attempting to generate HTML using string concatenation. In the code below, I'm trying to build a `div` tag interpolating a random font, a random float, and a content string:

In [8]:
font = random.choice(["'Comic Sans MS'", "'Times New Roman'", "Helvetica"])
float_val = random.choice(["left", "right"])
content = "hello!"
s = '<div style="font-family: ' + font + "; float: " + float_val + ';">' + content + "</div>"
print(s)

<div style="font-family: 'Times New Roman'; float: right;">hello!</div>


The line with all of the concatenations is very difficult to read: on quick glance, it's not clear whether the quotes belong to the Python syntax or to the HTML syntax of the string I'm trying to create. The code is also brittle: you can imagine how difficult it would be to go in and make changes.

## The `.format()` function

One strategy for keeping code like this clean is finding ways to separate *form* and *content*. Ideally, we'd have a Python string that defines the structure of the output we want to create, and then a way to insert content into that structure. In other words, we want a *template* that can be "filled in" with arbitrary values. Python has a few ways of doing this. The most handy is the `.format()` function, which is available on all values of type `str`.

Here's a simple use of the `.format()` method:

In [9]:
s = "We have {count} widgets."
output = s.format(count=10)
print(output)

We have 10 widgets.


Here's how it works. In this example, `s` is the "template." The curly brackets in `s` are a special syntax that the `.format()` method understands, meaning "leave this spot open to fill in later with something called `count`." When you call the `.format()` method, you supply [keyword arguments](https://docs.python.org/3/tutorial/controlflow.html#keyword-arguments) that correspond with the named placeholders in the template string. The method then replaces the placeholder with the value you supplied.

You can have as many placeholder values as you'd like, and you can pick the names of your placeholders (there's nothing magical about the word `count`):

In [10]:
s = "On {date}, we'll have {count} widgets in stock."
s.format(date="Feb 17th", count=12345)

"On Feb 17th, we'll have 12345 widgets in stock."

... but if there are placeholders in the string that don't receive a value in the parameters you pass to `.format()`, you'll get an error:

In [11]:
s = "On {date}, we'll have {count} widgets in stock."
s.format(count=10)

KeyError: 'date'

Although there is no corresponding error if you supply parameters that 

In [12]:
s = "On {date}, we'll have {count} widgets in stock."
s.format(date="Feb 17th", count=12345, tastiness=17)

"On Feb 17th, we'll have 12345 widgets in stock."

Returning to our simple example of generating HTML, you can see that the solution with `.format()` is a little bit cleaner:

In [13]:
font = random.choice(["'Comic Sans MS'", "'Times New Roman'", "Helvetica"])
float_val = random.choice(["left", "right"])
content = "hello!"
s = '<div style="font-family: {font}; float: {float_val};">{content}</div>'
print(s.format(font=font, float_val=float_val, content=content))

<div style="font-family: 'Times New Roman'; float: right;">hello!</div>


In this code, the `s` string now functions as a template that I can plug values into using `.format()`. I've separated the format from the content a little bit. The code is a bit clearer now and easier to edit.

A common pattern you'll see in Python code is grouping the variables to be provided to the template in a dictionary, and then supplying that dictionary to the `.format()` method. Doing so requires the use of the unpacking operator (**):

In [14]:
vals = {
    'font': random.choice(["'Comic Sans MS'", "'Times New Roman'", "Helvetica"]),
    'float_val': random.choice(["left", "right"]),
    'content': "hello!"
}
s = '<div style="font-family: {font}; float: {float_val};">{content}</div>'
print(s.format(**vals))

<div style="font-family: 'Times New Roman'; float: left;">hello!</div>


[Read more about the unpacking operator](https://realpython.com/python-kwargs-and-args/).

To include a literal `{` or `}` in a formatted string, write `{{` or `}}`, respectively:

In [15]:
word = random.choice(["Wow", "Hey", "Whoa", "What?", "Nice"])
"{interjection}! This string has an actual {{ and an actual }}.".format(interjection=word)

'What?! This string has an actual { and an actual }.'

### `.format()` with positional parameters

I wanted to mention a use of `.format()` that you're likely to see when looking examples up on the Internet. If you pass regular (non-keyword) comma-separated parameters to `.format()`, the placeholders in the template string are identified by index, not by name. Here's an example:

In [16]:
s = "On {0}, we'll have {1} widgets in stock."
s.format("Feb 17th", 12345)

"On Feb 17th, we'll have 12345 widgets in stock."

You can use the same placeholder more than once, as this example that I like in [the official documentation](https://docs.python.org/3/library/string.html#formatspec) demonstrates:

In [17]:
'{0}{1}{0}'.format('abra', 'cad')

'abracadabra'

If the `.format()` function has only one argument, you can use an empty pair of curly brackets in the template string:

In [18]:
'hello there {}, how are you?'.format("friend")

'hello there friend, how are you?'

### Format specifications

But there's more to turning data into strings than just including the most obvious string representation of a value in a template. The `.format()` method offers a few more tricks for getting your string output just right, most importantly *format specifiers*. A format specifier is a bit of code that you include inside template placeholders in order to modify the formatting of the placeholder's value in a more fine-grained fashion.

There's a lot more to format specifications than we can cover here, so check [the official documentation](https://docs.python.org/3/library/string.html#format-specification-mini-language) for more information and examples. Here are the basics.

To use format specifications, follow the name in the placeholder (between the curly brackets) with a colon. After the colon, you can put a series of characters that correspond to the following categories of functionality.

#### Alignment

You can pad and/or align a parameter with the `alignment` operators `<` (left), `^` (center) and `>` (right), followed by a number specifying the width (in characters) to pad the string with:

In [19]:
s = "Good morning, {name:<30}"
s.format(name="Allison")

'Good morning, Allison                       '

In [20]:
s = "Good morning, {name:>30}"
s.format(name="Allison")

'Good morning,                        Allison'

In [21]:
s = "Good morning, {name:^30}"
s.format(name="Allison")

'Good morning,            Allison            '

If you put a character in front of `<`, `^`, or `>`, the `.format()` method will use this to pad the string instead of an empty space:

In [22]:
s = "Good morning, {name:🌞^30}"
s.format(name="Allison")

'Good morning, 🌞🌞🌞🌞🌞🌞🌞🌞🌞🌞🌞Allison🌞🌞🌞🌞🌞🌞🌞🌞🌞🌞🌞🌞'

This works with integers as well:

In [23]:
"Your serial number is {count:0>10}.".format(count=1745)

'Your serial number is 0000001745.'

Aaaaand it works with floating-point numbers, but there's a bit of weirdness:

In [24]:
"We have {amount:>10} kgs of widget in stock today.".format(amount=172.45)

'We have     172.45 kgs of widget in stock today.'

But consider:

In [25]:
"We have {amount:>10} kgs of widget in stock today.".format(amount=22/7)

'We have 3.142857142857143 kgs of widget in stock today.'

In this case, the repeating decimal gets us in trouble, making `.format()` exceed the specified width. We'll discuss workarounds for this below.

#### Integer bases

The `b` and `x` presentation characters cause `.format()` to render the specified numbers as binary or hexadecimal numbers, respectively. For example, to render a number in binary:

In [26]:
"{val:b}".format(val=17)

'10001'

You can combine this with `<` to ensure that you're printing out an entire octet (i.e., byte):

In [27]:
"{val:0>8b}".format(val=17)

'00010001'

Displaying an integer as a hexidecimal number:

In [28]:
"{val:x}".format(val=42)

'2a'

This is helpful for producing HTML-style hexadecimal color triplets:

In [29]:
"#{r:x}{g:x}{b:x}".format(r=25, g=240, b=190)

'#19f0be'

#### Floating-point precision

One of the main reasons to use format specifications is to format floating point numbers with a particular precision. The problem becomes evident when you're working with a number with a repeating decimal, or any other number with a large fractional component:

In [30]:
pi_approx = 22 / 7
print(pi_approx)

3.142857142857143


In [32]:
two_revolutions = 4 * pi_approx
print(two_revolutions)

12.571428571428571


Printing numbers like these to a particular precision (i.e., with a limited number of places after the decimal point) would be tricky using only string operations: you'd have to take into account the position of the decimal point, rounding, etc. An easier way is to use the `.` character and a following number in a format specification, with a trailing `f`:

In [33]:
"how much pi? this much: {val:.4f}".format(val=pi_approx)

'how much pi? this much: 3.1429'

In [34]:
"how much pi? this much: {val:.4f}".format(val=two_revolutions)

'how much pi? this much: 12.5714'

A digit preceding the the `.` tells `.format()` to left-pad the number with spaces so the entire representation takes up that amount of space: 

In [35]:
"how much pi? this much: {val:8.4f}".format(val=pi_approx)

'how much pi? this much:   3.1429'

In [36]:
"how much pi? this much: {val:8.4f}".format(val=two_revolutions)

'how much pi? this much:  12.5714'

You can use this to print out pretty tables:

In [37]:
vals = [random.random() * 200 for i in range(12)]
for item in vals:
    print("{val:10.5f}".format(val=item))

 115.73000
 105.02891
  87.79166
  20.45000
 127.02266
 189.96105
 172.28016
  41.04942
 104.92696
 159.49087
  94.17991
  14.64796


## F-strings

Since version 3.6, Python has supported [formatted string literals](https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings), commonly called *f-strings*. These work a lot like the `.format()` function, except the placeholder names in the template string are automatically filled in with the value of the corresponding variable. This is super handy and can save you quite a bit of typing!

An f-string looks just like a regular string, except it has an `f` right before the initial quote. If there aren't any placeholders in an f-string, it behaves just like a regular string:

In [38]:
s = f"Mother said there'd be days like these."
print(s)

Mother said there'd be days like these.


But if you include placeholders, an f-string interpolates values into those placeholders from variables with the same name:

In [40]:
date = "Feb 17th"
count = 12345
s = f"On {date}, we'll have {count} widgets in stock."
print(s)

On Feb 17th, we'll have 12345 widgets in stock.


This makes our original HTML-replacement example much cleaner:

In [41]:
font = random.choice(["'Comic Sans MS'", "'Times New Roman'", "Helvetica"])
float_val = random.choice(["left", "right"])
content = "hello!"
s = f'<div style="font-family: {font}; float: {float_val};">{content}</div>'
print(s)

<div style="font-family: 'Comic Sans MS'; float: right;">hello!</div>


If the placeholder references a variable that doesn't exist, you get an error:

In [42]:
print(f'check out this rad {skateboard}')

NameError: name 'skateboard' is not defined

The placeholders in f-strings support the same format specifications as those supported by `.format()`. For example, you can easily format floating-point numbers as strings like so:

In [43]:
print(f'{pi_approx:0.4f}')
print(f'{two_revolutions:0.4f}')

3.1429
12.5714


A handy feature of f-strings is that (unlike `.format()` string interpolation) you can put arbitrary Python expressions inside of the curly brackets. For example:

In [44]:
print(f'{17 * 32}')

544


In [45]:
message = "hello"
print(f'{message.upper()}')

HELLO


Or even:

In [46]:
words = ["this", "is", "a", "test"]
print(f'{" ".join([w.upper() for w in words])}')

THIS IS A TEST


## Example: Creating a chapbook of Mad Libs

A [Mad Lib](https://en.wikipedia.org/wiki/Mad_Libs) is a "phrasal template word game" in which players fill in a template with missing words. Each of the blanks in the template has a label (like "noun," "verb," "adjective," "place name," etc.). Only one player can see the template, and that player prompts the others for words to fill in the template, giving them only the label as a prompt. It's great fun and also one of the basic forms of computational composition. ([House of Dust](http://zachwhalen.net/pg/dust/) by Alison Knowles and James Tenney is essentially a Mad Lib where the computer fills in the blanks at random.)

In this example, I'm going to create a "chapbook" of short stories by filling in a template with randomly selected words. The template is going to generate HTML that you can open up in a browser.

I'm starting with a template that just has boilerplate HTML, along with a few CSS styles (note the use of doubled curly brackets so that we end up with actual curly brackets in the output):

In [47]:
html_tmpl = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>{title}: A Computer-generated Chapbook</title>
    <style>
    h2 {{
        page-break-before: always;
    }}
    p {{
        width: 14em;
        font-size: 14pt;
    }}
    </style>
</head>
<body>
<h1>{title}</h1>
{content}
</body>
</html>
"""

This string is intended to be used with the `.format()` method. It has two placeholders: `title` and `content`. You can fill it in like this:

In [48]:
html_src = html_tmpl.format(title="Testing", content="This is a test")

The following function makes it possible to preview the resulting HTML right in our notebook, which is nice!

In [49]:
from IPython.display import display, HTML
def show_html(src):
    return display(HTML(src), metadata=dict(isolated=True))

And here's what the resulting file looks like:

In [50]:
show_html(html_src)

The code in the following cell writes the HTML out to a file on your hard drive:

In [51]:
with open("test.html", "w") as fh:
    fh.write(html_src)

If you're using macOS, the following command will open the resulting file in your web browser. (If your system doesn't have the `open` command, you can locate the file using in the same folder as this notebook on your computer. Try using your browser's "Open File..." menu option.)

In [52]:
!open test.html

Looks good so far! But it's not especially interesting. The code in the following cell uses a `for` loop and f-strings to build up a list of small stories, drawing on words selected at random from a fixed vocabulary, then shows the HTML fragment thereby generated.

In [53]:
# number of pages to generate
n_pages = 10

# vocabulary
nouns = ['hallway', 'intercession', 'equator', 'boasting', 'outdoors', 'terrier', 'line-up', 'defection']
adjectives = ['disheveled', 'long-held', 'aggravated', 'occurring', 'perplexed', 'professional', 'corporatist', 'codified']
verbs = ['corroded', 'deployed', 'tautened', 'dragged', 'swung', 'hastened', 'drilled', 'charred']
adverbs = ['nearby', 'meanwhile', 'also', 'nevertheless', 'later']
content = ""
for i in range(n_pages):
    # pick words for this story
    protagonist = random.choice(nouns)
    antagonist = random.choice(nouns)
    onlooker = random.choice(nouns)
    adj = random.choice(adjectives)
    adv = random.choice(adverbs)
    verb1 = random.choice(verbs)
    verb2 = random.choice(verbs)
    # concatenate together a series of f-strings...
    story = f"<h2>{i+1}. The {protagonist.capitalize()}</h2>\n"
    story += f"<p>The {protagonist} {verb1} the {antagonist}. "
    story += f"{adv.capitalize()}, the {adj} {onlooker} {verb2}.</p>"
    story += "\n"
    # and then add to the content
    content += story
print(content)

<h2>1. The Intercession</h2>
<p>The intercession hastened the terrier. Nearby, the aggravated terrier dragged.</p>
<h2>2. The Intercession</h2>
<p>The intercession swung the boasting. Later, the occurring boasting corroded.</p>
<h2>3. The Hallway</h2>
<p>The hallway swung the line-up. Later, the disheveled boasting dragged.</p>
<h2>4. The Defection</h2>
<p>The defection tautened the boasting. Later, the perplexed terrier dragged.</p>
<h2>5. The Defection</h2>
<p>The defection corroded the defection. Nearby, the occurring boasting hastened.</p>
<h2>6. The Boasting</h2>
<p>The boasting drilled the terrier. Later, the perplexed defection dragged.</p>
<h2>7. The Outdoors</h2>
<p>The outdoors drilled the boasting. Nevertheless, the long-held intercession drilled.</p>
<h2>8. The Hallway</h2>
<p>The hallway drilled the equator. Meanwhile, the aggravated line-up charred.</p>
<h2>9. The Line-up</h2>
<p>The line-up hastened the intercession. Later, the codified intercession deployed.</p>
<h2>10.

Tricky parts of this code:

* The `+=` operator appends the string on the right to the string in the variable to the left.
* Why `{i+1}`? I wanted to number the stories, but I didn't want the first one to be story #0.
* The `.capitalize()` method capitalizes the first letter of every word in a string. (In this case, the strings only have one word.)

As with the first example, we'll interpolate this content into the template, selecting a random noun as the title:

In [54]:
html_src = html_tmpl.format(title=random.choice(nouns).capitalize(), content=content)

And here's what it looks like:

In [55]:
show_html(html_src)

This cell writes the file out to `chapbook.html` and the following cell uses `open` to open it in your browser:

In [56]:
with open("chapbook.html", "w") as fh:
    fh.write(html_src)

In [57]:
!open chapbook.html

Try printing this (or at least looking at it through print preview)! Because of the `page-break-before: always;` style on the `h2` tag, each story will be printed on a separate page.

## Further resources

* [More on f-strings](https://cito.github.io/blog/f-strings/)
* [Official Python tutorial on string formatting](https://docs.python.org/3.8/tutorial/inputoutput.html#fancier-output-formatting)
* [Tracery](http://tracery.io/) is a more sophisticated way of constructing strings from rules and templates. I wrote a [Python port of Tracery](https://github.com/aparrish/pytracery) along with a tutorial on how to use Python and Tracery ([Tracery and Python](https://github.com/aparrish/rwet/blob/master/tracery-and-python.ipynb)) and a tutorial on advanced Tracery syntax ([Making a Propp-inspired story generator in Tracery](http://catn.decontextualize.com/public/notebooks/propp-inspired-tracery.html)).
* Need a refresher on HTML? I suggest [MDN's HTML tutorial](https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML).