#Mangle Data Like A Pro

##Output formating in Python 2 and 3

We have printed a lot of statements to the screen using the print method. We also indirectly printed out variables along with string using the string concatenation method like this example:

In [1]:
import datetime
today = datetime.datetime.now().strftime("%A")

print("Today is now " + today)

Today is now Sunday


However, Python provides functionality to do much more than string concatenation can provide. Python can specify precision with floating point variables, alignment for strings, and even convert integers from decimal to hexadecimal to octal numbers on the fly. 

There are two ways of doing this, referred to as the old style and the new style. The old style is available in both Python 2 and 3 but it is encouraged to move to the new style that is available only in Python 3. We will teach both because you may run into both styles in the future.

##Old Style with %

The old style of string formatting used the % symbol and the order of parameters to designate where a variable should be placed within the new string. 

Let's say we have a list of students and their grades:

| Name           | Email             | Midterm | Final | Grade |
|----------------|-------------------|---------|-------|-------------|
| Max Powers     | max@gmail.com     | 87.76   | 88.65 | B+          |
| Julie Thompson | julie@outlook.com | 93.43   | 90.45 | A-          |
| Amber Francis  | amber@gmail.com   | 85.23   | 97.54 | A-          |
| Andrew Smith   | andrew@yahoo.com  | 87.43   | 80.32 | B           |

And we'll add that information in a list of dictionaries:

In [2]:
students = []
students.append({"name":"Max Powers", "email":"max@gmail.com", "midterm": 87.76, "final":88.65, "grade":"B+"})
students.append({"name":"Julie Thompson", "email":"julie@outlook.com", "midterm": 93.43, "final":90.45, "grade":"A-"})
students.append({"name":"Amber Francis", "email":"amber@gmail.com", "midterm": 85.23, "final":97.54, "grade":"A-"})
students.append({"name":"Andrew Smith", "email":"andrew@yahoo.com", "midterm": 87.43, "final":80.32, "grade":"B"})

Using the old style, this is how we would print the data of this table to the screen:

In [3]:
for student in students:
    student_data = ("%-20s %-20s %-5.0f %-5.0f %-5s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"])) 
    print(student_data)

Max Powers           max@gmail.com        88    89    B+   
Julie Thompson       julie@outlook.com    93    90    A-   
Amber Francis        amber@gmail.com      85    98    A-   
Andrew Smith         andrew@yahoo.com     87    80    B    


Lets break down this print statement and start from the very beginning.

First off we could print the student grade data without all of the formatting by using the following print statement:

In [4]:
for student in students:
    student_data = ("%s %s %f %f %s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"])) 
    print(student_data)

Max Powers max@gmail.com 87.760000 88.650000 B+
Julie Thompson julie@outlook.com 93.430000 90.450000 A-
Amber Francis amber@gmail.com 85.230000 97.540000 A-
Andrew Smith andrew@yahoo.com 87.430000 80.320000 B


What is happening here is that we have a single string with five place holders `("%s %s %f %f %s")` specifying variables of different data formatting types.

These datatypes are listed below:

- %s: string
- %d: decimal integer
- %x: hex integer
- %o: octal integer
- %f: decimal float
- %e: exponential float
- %g: decimal or exponential float

The percentage mark after that specifies that a list of objects will be listed and will fill in the value of the placeholder in the string. So for the above print out statement the following occurs:

- The first %s is subsituted for `str(student["name"])`
- The second %s is subsituted for `str(student["email"])`
- The first %f is subsituted for `float(student["midterm"])`
- The second %f is subsituted for `float(student["final"])`
- The third %s is subsituted for `str(student["grade"])`

In addition to making printing strings with variables easier, they also have formatting options:

In [5]:
for student in students:
    student_data = ("%20s %20s %10f %10f %5s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"]))
    print(student_data)

          Max Powers        max@gmail.com  87.760000  88.650000    B+
      Julie Thompson    julie@outlook.com  93.430000  90.450000    A-
       Amber Francis      amber@gmail.com  85.230000  97.540000    A-
        Andrew Smith     andrew@yahoo.com  87.430000  80.320000     B


We now have specified a minimum width for each of the columns: 20 for the name and email column, 10 for the midterm and final, and 5 for the final grade. The columns are by default right justified. We can left align the columns by using a negative symbol:

In [6]:
for student in students:
    student_data = ("%-20s %-20s %-10f %-10f %-5s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"]))
    print(student_data)

Max Powers           max@gmail.com        87.760000  88.650000  B+   
Julie Thompson       julie@outlook.com    93.430000  90.450000  A-   
Amber Francis        amber@gmail.com      85.230000  97.540000  A-   
Andrew Smith         andrew@yahoo.com     87.430000  80.320000  B    


For floating point numbers we can also specify the prescision of the number that is being printed to the screen:

In [7]:
for student in students:
    student_data = ("%-20s %-20s %-5.0f %-5.0f %-5s" % 
          (student["name"], 
           student["email"], 
           student["midterm"], 
           student["final"], 
           student["grade"]))
    print(student_data)

Max Powers           max@gmail.com        88    89    B+   
Julie Thompson       julie@outlook.com    93    90    A-   
Amber Francis        amber@gmail.com      85    98    A-   
Andrew Smith         andrew@yahoo.com     87    80    B    


By adding the ".0" to the midterm and final columns we were able to print out the float with no decimal points. Note how the number was rounded instead of truncated.

##New Style with {} and format

The new format is available in Python 3 and is encouraged to be used from now on. Here is the final grade list again printed using the new style of string formatting:

In [8]:
for student in students:
    student_data = "{name:<20s} {email:<20s} {midterm:<5.0f} {final:<5.0f} {grade:<5s}".format(
        name=student["name"], 
        email=student["email"], 
        midterm=student["midterm"], 
        final=student["final"], 
        grade=student["grade"])
    print(student_data)

Max Powers           max@gmail.com        88    89    B+   
Julie Thompson       julie@outlook.com    93    90    A-   
Amber Francis        amber@gmail.com      85    98    A-   
Andrew Smith         andrew@yahoo.com     87    80    B    


The first major change is the new syntax using brackets instead of using percent signs. If you are only printing out strings you even don't have to specify placeholder data types:

In [9]:
for student in students:
    student_data = "{} {} {} {} {}".format(
        student["name"], 
        student["email"], 
        student["midterm"], 
        student["final"], 
        student["grade"])
    print(student_data)

Max Powers max@gmail.com 87.76 88.65 B+
Julie Thompson julie@outlook.com 93.43 90.45 A-
Amber Francis amber@gmail.com 85.23 97.54 A-
Andrew Smith andrew@yahoo.com 87.43 80.32 B


With the new style you can specify custom variables to use in your string as placeholders so that you do not have to depend on the order of the parameters for Python to know which variable to print out:

In [10]:
for student in students:
    student_data = "{name} {email} {midterm} {final} {grade}".format(
        grade=student["grade"],
        name=student["name"], 
        email=student["email"], 
        final=student["final"], 
        midterm=student["midterm"])
    print(student_data)

Max Powers max@gmail.com 87.76 88.65 B+
Julie Thompson julie@outlook.com 93.43 90.45 A-
Amber Francis amber@gmail.com 85.23 97.54 A-
Andrew Smith andrew@yahoo.com 87.43 80.32 B


Now each variable is specified in the string as their semantic meaning, such as "grade" and "email". In many cases that can greatly help readability and make it easier to find which placeholder corresponds to which variable.

We can specify data types like we can do in the old style by using the colon operator after the name of the placeholder:

In [11]:
for student in students:
    student_data = "{name:s} {email:s} {midterm:f} {final:f} {grade:s}".format(
        name=student["name"], 
        email=student["email"], 
        midterm=student["midterm"], 
        final=student["final"], 
        grade=student["grade"])
    print(student_data)

Max Powers max@gmail.com 87.760000 88.650000 B+
Julie Thompson julie@outlook.com 93.430000 90.450000 A-
Amber Francis amber@gmail.com 85.230000 97.540000 A-
Andrew Smith andrew@yahoo.com 87.430000 80.320000 B


We can also add on the width formatting, the left align adjustment (using the "<" operator) and the precision operator as you would think to achieve the end result:

In [12]:
for student in students:
    student_data = "{name:<20s} {email:<20s} {midterm:<5.0f} {final:<5.0f} {grade:<5s}".format(
        name=student["name"], 
        email=student["email"], 
        midterm=student["midterm"], 
        final=student["final"], 
        grade=student["grade"])
    print(student_data)

Max Powers           max@gmail.com        88    89    B+   
Julie Thompson       julie@outlook.com    93    90    A-   
Amber Francis        amber@gmail.com      85    98    A-   
Andrew Smith         andrew@yahoo.com     87    80    B    


There are a few other options that are worth exploring: check the [Python 3 docs](https://docs.python.org/3/library/string.html#format-string-syntax) for more information.

Now that we know how to format strings in Python, let's move into how we can search and manipulate the strings we format using regular expressions.