# Python Essentials I

Today, we will begin exploring some foundational Python concepts:

* Code Structure and Whitespace
* Object-Oriented Programming
* Print Statements
* Variable Assignment
* Importing Modules and Scripts
* Scalar Objects
* Control Flow

Each of these concepts (and ones that follow) form building blocks for more functional work processes in Python. Our (initial) goal is to understand the syntax and purpose behind these individual concepts, but we will build on them as we become more proficient.

Friendly reminders:

* DataCamp modules for Python Lists, Fundamental Data Types, and Dictionaries are due tonight by 11:59 p.m.
* Homework #1 released, due Feb. 12

## Code Structure and Whitespace

Python code is primarily structured around *whitespace* (i.e., spaces, tabs, and line breaks). This is different than many other programming languages (e.g., C/C++, Java) that require additional punctuation in the code structure (e.g., semicolon at the end of each statement, braces/brackets around nested statements).

* Good for readability
* Generally, one statment per line, but you can combine multiple statements on a line using the ';' separator

In [1]:
# Example of code structure and whitespace
# Applies to def, class, if, for, while statements
for x in range(5):
    print(x)

0
1
2
3
4


In [2]:
# Example of multiple statments on a single line
print(5); print(6); print(7)

5
6
7


### Comments

Any text preceded by '#' on a given line of code is ignored by the Python interpreter.

Comments are an excellent way to document and communicate what is happening in your code. They can help you remember what each part of your code is trying to accomplish (which you are not likely to remember in detail at a later time), and also help others understand what your code does.

Comments are primarily used to explain your code, but they can also be used to ignore code that you want to save for a later time. Or alternatively, you can ignore a part of your code that does not work but potentially test another part.

Comments can also be used to help you plan and structure your code. Spell out the steps that you need to complete using comments, then work on each step.

In [3]:
# Typically, comments are listed before a statement (or a set of statements) 
# or at the beginning of a cell
print(5)

5


In [4]:
print(5) # Statements can be inline as well

5


## Object-Oriented Programming

**Everything** in Python is an object!

* Scalars, sequences, dictionaries, DataFrames, etc.
* Functions
* Modules
* Generators
* And more!

Each type of object has an associated set of:

* Attributes - Characteristics of the object
* Methods - Functions that can operate on the object and/or other objects

Attributes and methods are accessible by:

* Attributes: obj.attribute_name or getattr(obj, 'attribute_name')
* Methods: obj.method_name(*args*)

In [5]:
# Demonstrate attributes and methods using an imported module
import pandas as pd

In [6]:
# Explore attributes and methods using tab completion
pd.array

<function pandas.core.arrays.array_.array(data, dtype=None, copy=True)>

In [7]:
# Extract module name
pd.__name__

'pandas'

In [8]:
# Extract module name
getattr(pd, '__name__')

'pandas'

In [9]:
# Extract module method
pd.concat

<function pandas.core.reshape.concat.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)>

### Mutable vs. Immutable Objects

Mutable objects can be modified via assignment or an appropriate function/method. Examples include lists, sets, dictionaries, NumPy arrays, instances of a class

In [10]:
# Example of the mutability of a list
L = [1,2,3]
L[1] = 4
L

[1, 4, 3]

In [11]:
# Example of the mutability of a set
S = set([1,2,3])
S.add(4)
S

{1, 2, 3, 4}

## Immutable objects cannot be modified. Examples include strings and tuples.

In [12]:
# Example of the immutability of a string
s = 'this is a string'
s[0] = 'T'

TypeError: 'str' object does not support item assignment

## Print Statements

Any expression that returns a result will automatically print (if it's the last line in the cell). Examples of expressions include computation; comparisons; or indexing, selecting, or filtering from a data structure (e.g., sequence, collection).

In [None]:
3 + 5

In [13]:
'Hello World'
pass

In [14]:
print(range(10)[9]) # range(0,10)

9


Print statements are typically only needed if you have a more complex statement that you would like to display, or if you have multiple statements to print in a single cell.

In [15]:
# Combine string with numerical output
print('The sum of 3 and 5 is:', 3 + 5)

The sum of 3 and 5 is: 8


In [16]:
# Formatted output - Method 1
# %d int ,%f float, %s text
print('The values are %d, %f, and %s' % (5.333333, 3.14157, 'Monday'))

The values are 5, 3.141570, and Monday


In [17]:
# Formatted output - Method 2
print('The values are {0:d}, {1:f}, and {2:s}'.format(5, 3.14157, "5"))

The values are 5, 3.141570, and 5


Print statements also properly format special string characters.

In [18]:
s = 'This is a \tstring with special \ncharacters'
s
# / Tab /n #$%$^&* shift

'This is a \tstring with special \ncharacters'

In [19]:
print(s)

This is a 	string with special 
characters


## Variable Assignment

Variable assignment is a critical task, especially if you need to utilize objects during multiple steps of your process. The Python character for variable assignment is the equal sign (=). Any object can be assigned to a variable.

In [20]:
# Scalars
a = 5
b = 10
a + b

15

In [21]:
# Functions
import numpy as np
sum_func = np.sum
sum_func

<function numpy.sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>)>

In [22]:
sum_func([a,b])

15

In [23]:
# Modules
import pandas as pd
pd

<module 'pandas' from '/anaconda3/lib/python3.7/site-packages/pandas/__init__.py'>

## Importing Modules and Scripts

Modules and scripts are both loaded using the **import** statement. Remember, both modules and scripts are written as Python (.py) files, so they are really the same thing. The main difference is semantic, in that modules are often libraries from which we want to load functionality and scripts are often some type of automated process (that often leveragies functionality from other modules).

Primary modules for this course (conventions for shorthand names in parentheses):

* numpy (np) - Data processing and analysis
* pandas (pd) - Data processing and analysis
* matplotlib.pyplot (plt) - Data visualization
* seaborn (sns) - Data visualization
* statsmodels (sm) - Statistical analysis
* sklearn - Machine learning
* scipy - Scientific computing
* nltk - Natural language processing

Standard Python library modules that will also be useful:

* os - Operating system
* re - Regular expression
* string - String processing
* urllib2 - Processing HTML
* glob - File directory
* csv - Data import/export for .csv files
* copy - Deep object copying
* datetime, time - Functionality for working with datetime and time objects

In [24]:
# Load entire module
import pandas
pandas

<module 'pandas' from '/anaconda3/lib/python3.7/site-packages/pandas/__init__.py'>

In [25]:
# Load module and assign to shorter variable name
import pandas as pd
pd

<module 'pandas' from '/anaconda3/lib/python3.7/site-packages/pandas/__init__.py'>

In [26]:
# Load specific functionality from module - Use * for all functions (not recommended)
from numpy import floor # import multiple functions via comma separated list
floor(4.6)
# import math
# math.ceil(4.6)
# round(4.6)

4.0

If you are working with a module or script that has been updated since your import, you can reload using the **reload** function (within the importlib module). Be careful! The module object itself will be updated but any variables associated with previous statements are not!

In [27]:
import importlib
np = importlib.reload(np)

## Scalar Objects

Scalar objects are basically singluar data structures (i.e., they have a single value). The most common scalar objects are:

* Numerical: int, float
* Boolean: bool
* String: str
* Dates and Times (later)

There is also the None object, which is technically a singleton object (not a scalar), but also important to know.

### Numerical Scalars

Integer and floating point numbers are the most common numerical scalar types. We typically use integers to represent indivisible (discrete) quanties, whereas floating point numbers are continuous.

In [28]:
# Integer
a = 5
type(a)

int

In [29]:
# Float
b = 10.
type(b)

float

In [30]:
# Casting to an integer
x = int(10.6)
y = int(10.2)
print(x,y)

10 10


In [31]:
# Casting to a floating point integer
float(a)

5.0

In [32]:
# Scientific notation
1e6

1000000.0

In [33]:
# Basic arithmetic - +, -, /, *, ** (exponential), % (modulo), // (floor division)
op = '*'
expr = str(a) + op + str(b)
print('Expression:', expr)
eval(expr)

Expression: 5*10.0


50.0

In [34]:
# Combine arithmetic with assignment
a += b
a

15.0

### Boolean

Boolean objects convey truthiness, and can only take one two possible values, **True** or **False**.

In [None]:
True

In [None]:
False

In [None]:
not False

Boolean objects are most often created via comparison operators or casting (using the **bool** function). Objects that are equivalent to zero or emptiness are cast as False, otherwise they are True.

In [35]:
# Comparison operators - <, >, ==, !=, <=, >=, is, is not
print(a, b)
print(a < b)
print(a != b)
print(a >= b)
print(a is None)
print(type(a) is not str)
print(True > False)

15.0 10.0
False
True
True
False
True
True


In [36]:
# Combine boolean objects
print(True & False) # and
print(True | False) # or
print(False ^ True) # xor

False
True
True


In [37]:
# Chaining comparisons
print(5 < 10 and 10 < 25)
print(5 < 10 & 10 < 25)
print(5 < 10 < 25)

True
True
True


In [38]:
# Casting
print(bool(0))
print(bool(1))
print(bool([]))
print(bool([0]))
print(bool(""))
print(bool({}))
print(bool(()))
print(bool(set()))
print(bool(None))

False
True
False
True
False
False
False
False
False


### Strings

Strings are essentially sequences of characters (which means that they have length), which allow us to do things with text. You can create strings by enclosing any text within single (''), double(""), or triple ('''''') quotes.

In [39]:
# Example string
s = 'This is a string'
len(s)

16

In [40]:
# Index substring (from beginning)
s[0:4] # Python is zero indexed, indexing stops at one prior to last index

'This'

In [41]:
# Index substring (from end)
s[-6:] # negative indices are equivalent to n (length of sequence) - 1

'string'

Similar to other scalars, there are implementations of addition (concatenation) and multiplication (replication), as well as boolean comparisons.

In [42]:
'This is ' + 'a string'

'This is a string'

In [43]:
'Beetlejuice ' * 3

'Beetlejuice Beetlejuice Beetlejuice '

In [44]:
'Apple' < 'Ba'

True

You can also cast numeric types to strings (primarily for output purposes) and vice versa (typically, when input data is interpreted as text when it's actually numerical)

In [45]:
str(4.6)*2

'4.64.6'

In [46]:
float('4.6') * 2

9.2

There are many built-in methods for manipulating strings:

* Case: s.capitalize, s.lower, s.swapcase, s.title, s.upper 
* Conditions: s.isalnum, s.isalpha, s.isdigit, s.isupper, s.islower, s.istitle, s.isspace
* Basic search: s.startswith, s.endswith, s.find, s.index, s.rfind, s.rindex 
* Format: s.format  
* Split: s.split, s.rsplit, s.partition  
* Strip: s.strip, s.lstrip, s.rstrip  
* Replace: s.replace  
* Join: s.join(t) joins the strings in sequence t with s as a separator

As strings are immutable, all methods return a new string object (which must be assigned to a new variable, if applicable)

In [47]:
# Example of case
s.upper()

'THIS IS A STRING'

In [48]:
# Example of conditions
s.isupper()

False

In [49]:
s

'This is a string'

In [50]:
# Example of search
s.find('a')

8

In [51]:
# Example of format
'Today, the temperature is {0:.1f} degrees Fahrenheit.'.format(35.8)

'Today, the temperature is 35.8 degrees Fahrenheit.'

In [52]:
# Example of split
s.split(' ')

['This', 'is', 'a', 'string']

In [53]:
# Example of strip
s.strip('This')

' is a string'

In [54]:
# Example of strip
s.lstrip('This')

' is a string'

In [55]:
# Example of replace
s.replace('string', 'new string')

'This is a new string'

In [56]:
# Example of join
'@@'.join(['Apple', 'Banana', 'Orange'])

'Apple@@Banana@@Orange'

## Control Flow

Control flow describes the order in which code is executed, and allows for more complex tasks to be accomplished than a linear sequence of statements. This functionality allows you to write code that can be used for multiple purposes, under multiple conditions, or on different types of objects. We will focus mostly on two categories of control flow:

* Conditionals (if statements)
* Loops (for, while statements)

### Conditionals

**if** statements are the most common type of conditional. They allow you to execute a different set of statements under different conditions (i.e., value of a particular boolean scalar).

In [57]:
# if-else statement
today = 'Monday'
if today in ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']:
    print('Today is a weekday')
else:
    print('Today is a weekend day')

Today is a weekday


In [58]:
# if-elif-else statement
today = 'Someday'
if today in ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']:
    print('Today is a weekday')
elif today in ['Saturday', 'Sunday']:
    print('Today is a weekend day')
else:
    print('Invalid day given')

Invalid day given


In [59]:
# Nested if-else statement
today = 'Someday'
if today in ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']:
    print('Today is a weekday')
else:
    if today in ['Saturday', 'Sunday']:
        print('Today is a weekend day')
    else:
        print('Invalid day given')

Invalid day given


In [60]:
# pass statement
b = True
if b:
    pass
else:
    print(b)

**Ternary expressions** are a nice (Pythonic) construction when you want to return one value under one condition and a different value otherwise:

In [61]:
# if statement approach
b = True
if b:
    x = 1
else:
    x = 0
x

1

In [62]:
# Ternary expression
b = True
x = 1 if b else 0
x
# if b is ture 

1

### Loops

Loops are used to perform a series of steps repeatedly. Similar to conditionals, loops can be nested or combined with other control flow constructs. There are two primary constructs for loops:

* **for** loops are used to iterate through each element of a sequence
* **while** loops are used to repeat a series of steps as long as a particular condition remains True

**for** loops in Python are implemented in a more concise way, whereas **while** loops are quite similar to other languages. In general, **for** loops are much more commonly used in Python; **while** loops are not useful very often and while it's possible, you should not use a **while** loop when a **for** loop should be used.

**Be wary about whether a loop is the most appropriate approach for a particular task!** Think about whether a particular loop will take a long time to execute or whether it will run forever. We will also discuss comprehensions and vectorization, which are often more efficient approaches for completing many common tasks than a loop. These two constructs play a major role in why Python can be an easier language to process data.

In [63]:
# Standard for loop - Print each element from a list
for x in range(10):
    print(x)

0
1
2
3
4
5
6
7
8
9


In [64]:
# for loop - Enumerate each element from a list
for i, x in enumerate(range(10)):
    print(i, x ** 2)

0 0
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81


In [65]:
# Nested for loop
for x in range(5):
    for y in range(5):
        print(x * y)

0
0
0
0
0
0
1
2
3
4
0
2
4
6
8
0
3
6
9
12
0
4
8
12
16


In [66]:
# for loop over tuples - Leave elements packed
for pair in zip(range(5), range(5)):
    print(pair[0],pair[1],pair[0]*pair[1])

0 0 0
1 1 1
2 2 4
3 3 9
4 4 16


In [None]:
# for loop over tuples - Unpack each element
for x, y in zip(range(5), range(5)):
    print(x * y)

In [None]:
# break statement
for x in range(10):
    if x > 5:
        break
    else:
        print(x)

In [None]:
# while loop as a for loop
i = 0
L = 'abcde'
while i < len(L):
    print(L[i])
    i += 1

In [None]:
# More appropriate while loop
x = 5
factorial = 1
while x > 0:
    factorial *= x
    x -= 1
factorial

## Next Time: Python Essentials II