#Mangle Data Like A Pro

##Output formating in Python 2 and 3

By this point, we've seen that the print statement is an extremely useful tool for displaying the values of our variables. We often print out variables along with a string using the string concatenation method as in this example:

In [1]:
import datetime
today = datetime.datetime.now().strftime("%A")

print("Today is now " + today)

Today is now Sunday


While string concatenation is useful in situations like this, Python has tools that give us a lot more flexibility in displaying output. Python allows us to specify precision for floating point variables, alignment for strings, and whether integers should be displayed in decimal, hexadecimal, or octal notation. 

There are two ways of doing this, referred to as the old style and the new style. The old style is available in both Python 2 and 3 but we encourage you to use the new style that is available only in Python 3. We will teach both because you may run into both styles in the future.

##Old Style with %

The old style of string formatting used the % symbol and the order of parameters to designate where a variable should be placed within the new string. 

Let's say we have a list of students and their grades:

| Name           | Email             | Midterm | Final | Grade |
|----------------|-------------------|---------|-------|-------------|
| Max Powers     | max@gmail.com     | 87.76   | 88.65 | B+          |
| Julie Thompson | julie@outlook.com | 93.43   | 90.45 | A-          |
| Amber Francis  | amber@gmail.com   | 85.23   | 97.54 | A-          |
| Andrew Smith   | andrew@yahoo.com  | 87.43   | 80.32 | B           |

We could store that information in a list of dictionaries:

In [2]:
students = []
students.append({"name":"Max Powers", "email":"max@gmail.com", "midterm": 87.76, "final":88.65, "grade":"B+"})
students.append({"name":"Julie Thompson", "email":"julie@outlook.com", "midterm": 93.43, "final":90.45, "grade":"A-"})
students.append({"name":"Amber Francis", "email":"amber@gmail.com", "midterm": 85.23, "final":97.54, "grade":"A-"})
students.append({"name":"Andrew Smith", "email":"andrew@yahoo.com", "midterm": 87.43, "final":80.32, "grade":"B"})

Using the old style, this is how we might print the data of this table to the screen:

In [3]:
for student in students:
    student_data = ("%-20s %-20s %-5.0f %-5.0f %-5s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"])) 
    print(student_data)

Max Powers           max@gmail.com        88    89    B+   
Julie Thompson       julie@outlook.com    93    90    A-   
Amber Francis        amber@gmail.com      85    98    A-   
Andrew Smith         andrew@yahoo.com     87    80    B    


Lets break down this print statement one piece at a time.

First, we'll begin without all the formatting instructions.  Here's a simplified way to insert variable values into a string.

In [4]:
for student in students:
    student_data = ("%s %s %f %f %s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"])) 
    print(student_data)

Max Powers max@gmail.com 87.760000 88.650000 B+
Julie Thompson julie@outlook.com 93.430000 90.450000 A-
Amber Francis amber@gmail.com 85.230000 97.540000 A-
Andrew Smith andrew@yahoo.com 87.430000 80.320000 B


What is happening here is that we have a single string that contains five place holders `("%s %s %f %f %s")`.  Each placeholder specifies a different format for displaying a Python object.

These formats are listed below:

- %s: string
- %d: decimal integer
- %x: hex integer
- %o: octal integer
- %f: decimal float
- %e: exponential float
- %g: decimal or exponential float

After the string, we include the percent operator.  This is followed by a sequence of objects to be used in replacing the placeholders in the string. In our example, the following occurs:

- The first %s is subsituted for `str(student["name"])`
- The second %s is subsituted for `str(student["email"])`
- The first %f is subsituted for `float(student["midterm"])`
- The second %f is subsituted for `float(student["final"])`
- The third %s is subsituted for `str(student["grade"])`

Now that we're successfully inserting values into our string, we can start to include more formatting options:

In [5]:
for student in students:
    student_data = ("%20s %20s %10f %10f %5s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"]))
    print(student_data)

          Max Powers        max@gmail.com  87.760000  88.650000    B+
      Julie Thompson    julie@outlook.com  93.430000  90.450000    A-
       Amber Francis      amber@gmail.com  85.230000  97.540000    A-
        Andrew Smith     andrew@yahoo.com  87.430000  80.320000     B


In this example, we specify a minimum width for each of the columns: 20 for the name and email column, 10 for the midterm and final, and 5 for the final grade. By default, each columns is right justified. We can left align the columns by using a negative symbol:

In [6]:
for student in students:
    student_data = ("%-20s %-20s %-10f %-10f %-5s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"]))
    print(student_data)

Max Powers           max@gmail.com        87.760000  88.650000  B+   
Julie Thompson       julie@outlook.com    93.430000  90.450000  A-   
Amber Francis        amber@gmail.com      85.230000  97.540000  A-   
Andrew Smith         andrew@yahoo.com     87.430000  80.320000  B    


For floating point numbers, we can specify the number of digits we want printed after the decimal point.  In the case of our midterm and grade columns, we might want zero such digits.

In [7]:
for student in students:
    student_data = ("%-20s %-20s %-5.0f %-5.0f %-5s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"]))
    print(student_data)

Max Powers           max@gmail.com        88    89    B+   
Julie Thompson       julie@outlook.com    93    90    A-   
Amber Francis        amber@gmail.com      85    98    A-   
Andrew Smith         andrew@yahoo.com     87    80    B    


By adding the ".0" to the midterm and final columns we were able to print out the float with no decimal points. Note that each number was rounded instead of truncated.

##New Style with {} and format

The new style is available in Python 3 and is the recommended way of formatting string output. Here is how we might use the new style to display our student data:

In [8]:
for student in students:
    student_data = "{name:<20s} {email:<20s} {midterm:<5.0f} {final:<5.0f} {grade:<5s}".format(
        name=student["name"], 
        email=student["email"], 
        midterm=student["midterm"], 
        final=student["final"], 
        grade=student["grade"])
    print(student_data)

Max Powers           max@gmail.com        88    89    B+   
Julie Thompson       julie@outlook.com    93    90    A-   
Amber Francis        amber@gmail.com      85    98    A-   
Andrew Smith         andrew@yahoo.com     87    80    B    


As you can see, the new syntax uses brackets instead of using percent signs.  Once again, let's break this code down, starting with minimal formatting options.

To begin with, we don't even have to specify data types:

In [9]:
for student in students:
    student_data = "{} {} {} {} {}".format(
        student["name"], 
        student["email"], 
        student["midterm"], 
        student["final"], 
        student["grade"])
    print(student_data)

Max Powers max@gmail.com 87.76 88.65 B+
Julie Thompson julie@outlook.com 93.43 90.45 A-
Amber Francis amber@gmail.com 85.23 97.54 A-
Andrew Smith andrew@yahoo.com 87.43 80.32 B


With the new style, you can include a custom name inside each placeholder, so that you do not have to worry about the order of the parameters.

In [10]:
for student in students:
    student_data = "{name} {email} {midterm} {final} {grade}".format(
        grade=student["grade"],
        name=student["name"], 
        email=student["email"], 
        final=student["final"], 
        midterm=student["midterm"])
    print(student_data)

Max Powers max@gmail.com 87.76 88.65 B+
Julie Thompson julie@outlook.com 93.43 90.45 A-
Amber Francis amber@gmail.com 85.23 97.54 A-
Andrew Smith andrew@yahoo.com 87.43 80.32 B


Now each variable is clearly specified in the string using a name, like "grade" or "email". In many cases this can improve readability and make it easier to find which placeholder corresponds to which variable.

We can also specify data types in the new style.  The letter corresponding to a format goes after a colon after the name of the placeholder:

In [11]:
for student in students:
    student_data = "{name:s} {email:s} {midterm:f} {final:f} {grade:s}".format(
        name=student["name"], 
        email=student["email"], 
        midterm=student["midterm"], 
        final=student["final"], 
        grade=student["grade"])
    print(student_data)

Max Powers max@gmail.com 87.760000 88.650000 B+
Julie Thompson julie@outlook.com 93.430000 90.450000 A-
Amber Francis amber@gmail.com 85.230000 97.540000 A-
Andrew Smith andrew@yahoo.com 87.430000 80.320000 B


As before, we can specify a minimum width, the left alignment (using the "<" operator), as well as precision for floats:

In [12]:
for student in students:
    student_data = "{name:<20s} {email:<20s} {midterm:<5.0f} {final:<5.0f} {grade:<5s}".format(
        name=student["name"], 
        email=student["email"], 
        midterm=student["midterm"], 
        final=student["final"], 
        grade=student["grade"])
    print(student_data)

Max Powers           max@gmail.com        88    89    B+   
Julie Thompson       julie@outlook.com    93    90    A-   
Amber Francis        amber@gmail.com      85    98    A-   
Andrew Smith         andrew@yahoo.com     87    80    B    


There are a few other options that are worth exploring: check the [Python 3 docs](https://docs.python.org/3/library/string.html#format-string-syntax) for more information.