In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Mini-focus: Code layout and structure in Python

Python is an unusual language in that "whitespace" (spaces, tabs, newlines, and so forth) are significant in determining what is valid Python.

This was (is?) shocking to proponents of other languages.  For example, in Python, we would write a for loop like this:

```
for i in range(5):
    print(i)
```

where the `print` function call must be indented.  Although you have choice HOW MUCH to indent (as long as your are consistent), you MUST indent.

Meanwhile, in other languages like C, one would write

```
for (i = 0; i < 5; i++) {
    printf("%d", i);
}
```

But also it could be written

```
for (i = 0; i < 5; i++) { printf("%d", i); }
```

```
for (i = 0; i < 5; i++) 
{
    printf("%d", i);
}
```

```
for(i=0;i<5;i++)printf("%d",i);
```

among a myriad of other ways.

An important implication of the spacing rules of Python is that they (somewhat) encourage you to structure your code to make it more human-readable.  We'll look a bit at some potentially confusing points and also some suggestions for good practice.



You may have noticed I tend to have certain conventions in how I have laid out the expressions I have written.

First, because spacing is important in Python, an expression ends at the end of a line.  So for example I cannot do

In [3]:
foo =
1

SyntaxError: invalid syntax (2770051154.py, line 1)

The end of the line ends the expression, which is incomplete because Python is expecting something to be on the right-side of that assignment.  Now, if we had a good reason to, there is a way to continue an expression across multiple lines, which is using the `\` (backslash) character:

In [4]:
foo = \
1

In [5]:
foo

1

There's really no good reason to write such a simple expression over multiple lines.  However, if you have long expressions, you might want to break them up for readability.  Most Python style guides suggest lines should not be too long - 80 characters is a standard recommendation although there are a few guides which are more permissive.

In [8]:
def myexpression(x):
    return \
        3 * x**2 + \
        16 * x + \
        24
myexpression(15)

939

However, there is an exception to the end-of-line equals end-of-expression rule.  If you have an "open" delimiter like `(`, `[`, or `{`, then the expression **automatically** is assumed to continue to the next line, and you don't need to use the backslash character to continue the expression.

So I could write the function `myexpression` like this as well, by wrapping the expression inside parentheses.
There parentheses don't affect the meaning of the code, but by having the open parenthesis, the expression automatically extends to the next line.

Further, I use the flexibility of spacing to lay out the expression in a way that is visually appealing (I think!)

In [9]:
def myexpression(x):
    return (
        3 * x**2 +
        16 * x +
        24
    )
myexpression(15)

939

This technique combines very well with working with `DataFrame`s and transforming data in `pandas`.

For example, when I have created ad-hoc `DataFrame`s, I usually use a layout like the one below.  This creates a `DataFrame` based on a list of `dict`s.  In this case, I am able to put one row of the `DataFrame` (one `dict`) on each line.  Further, because I have open delimiters - two in fact (a parenthesis and a square bracket), I don't have to worry about explicitly continuing a line.

In [11]:
import pandas as pd
df = pd.DataFrame([
    {'city': "Aberdeen", 'temperature': 0},
    {'city': "Norwich", 'temperature': 5}
])
df

Unnamed: 0,city,temperature
0,Aberdeen,0
1,Norwich,5


Compare this with the below, which is exactly equivalent, but more difficult to read, and more difficult to keep your delimiters straight - there is the sequence of `([{` and then the sequence of `}])`.

In [13]:
df = pd.DataFrame([{'city': "Aberdeen", 'temperature': 0}, {'city': "Norwich", 'temperature': 5}])
df

Unnamed: 0,city,temperature
0,Aberdeen,0
1,Norwich,5


## Mini-focus: Strings in Python

Python takes a flexible approach to how you indicate literal text strings.  In particular, you can use either single-quotes or double-quotes - as long as you use the same type of quotes on each string.  You can mix-and-match all you want with different strings.

In the above, I used both double-quotes (when giving the city names) and single-quotes (when giving the field names).  This is a convention I use myself - I use the different types of quoting to denote different types of information.  I usually use single-quotes for column names and double-quotes for data values.  However, you will also find plenty of examples where I don't follow this convention.  And, it's purely a personal convention - an example of using the language to let you try to express more information to the human reader.

I could just as well create the previous DataFrame like this.  It's exactly the same and exactly as correct.

In [14]:
df = pd.DataFrame([
    {'city': 'Aberdeen', 'temperature': 0},
    {'city': 'Norwich', 'temperature': 5}
])
df

Unnamed: 0,city,temperature
0,Aberdeen,0
1,Norwich,5


Allowing both the single-quote and double-quote makes it easier to deal with text strings which themselves have quotes in them.  If we have a string that has a single-quote in it, then we can use double-quotes to indicate the string:

In [24]:
"Dwayne 'The Rock' Johnson"

"Dwayne 'The Rock' Johnson"

Or, if we have a string that has a double-quote inside of it, we can use single-quotes to indicate the string:

In [25]:
'Dwayne "The Rock" Johnson'

'Dwayne "The Rock" Johnson'

Of course, the two strings are *not* the same, because although single-quotes and double-quotes both mean "this is a string" in Python, when comparing the text **inside** the string, a single-quote and a double-quote are different characters.

In [26]:
"Dwayne 'The Rock' Johnson" == 'Dwayne "The Rock" Johnson'

False

As noted before, however, we don't like to have long lines in programs because they're difficult to read and difficult to maintain.  Python allows you to create multi-line strings using **three double-quotes** or **three single-quotes** in succession.
If you do this, everything between the quotes is included in the string.  Notice below that the newline characters (represented by `\n` in the output) are retained, so the formatting inside these strings is significant and is taken literally.

In [20]:
"""It is a period of civil war.
Rebel spaceships, striking
from a hidden base, have won
their first victory against
the evil Galactic Empire.

During the battle, Rebel
spies managed to steal secret
plans to the Empire's
ultimate weapon, the DEATH
STAR, an armored space
station with enough power to
destroy an entire planet.

Pursued by the Empire's
sinister agents, Princess
Leia races home aboard her
starship, custodian of the
stolen plans that can save
her people and restore
freedom to the galaxy....
"""

"It is a period of civil war.\nRebel spaceships, striking\nfrom a hidden base, have won\ntheir first victory against\nthe evil Galactic Empire.\n\nDuring the battle, Rebel\nspies managed to steal secret\nplans to the Empire's\nultimate weapon, the DEATH\nSTAR, an armored space\nstation with enough power to\ndestroy an entire planet.\n\nPursued by the Empire's\nsinister agents, Princess\nLeia races home aboard her\nstarship, custodian of the\nstolen plans that can save\nher people and restore\nfreedom to the galaxy....\n"

But what if you don't want to retain the newlines?  Python also automatically joins up strings which are adjacent.  So we could write:

In [21]:
crawler = (
    "It is a period of civil war. "
    "Rebel spaceships, striking "
    "from a hidden base, have won "
    "their first victory against "
    "the evil Galactic Empire."
)
crawler

'It is a period of civil war. Rebel spaceships, striking from a hidden base, have won their first victory against the evil Galactic Empire.'

This feature is useful, but also leads to a type of error, which arises from a common kind of typo where you forget to put, for example, a comma between successive strings in a list.

For example, in the below I forgot the comma between `'city'` and `'temperature'`.  You might think this would be a syntax error and Python would tell you something is missing.  But actually, what it does is treats this like you wrote `'citytemperature`', and as a result the error we get is from `pandas` telling us there is no such column.  This can be a tricky kind of bug to track down because if you search your code for `citytemperature`, you won't find it...!

In [23]:
df[['city' 'temperature']]

KeyError: "None of [Index(['citytemperature'], dtype='object')] are in the [columns]"