# Introduction to Python
## Last Revised: Aug 27, 2022

Welcome to "Introduction to Python", which is part of the StatLab workshop series. I'm assuming that a lot of people come to the StatLab with experience in R or SPSS, but might be new or a novice to Python. I just wanted to say that you shouldn't be scared of learning Python — as knowledge in R or SASS sets you up with a lot of the basic intutition for coding in Python, and you're already ahead of the curve! 

With that being said, let's begin!

### Strings 

Let's begin with the most classic coding example, which is to print out the `Hello World` statement.

In [1]:
print("Hello World")

# You could also use single quotation marks, i.e.,
# print('Hello World')

# Can you think of a situation where a single quotation mark might not be optimal? 
# Hint: Try printing the name of a TV show based on an American medical drama starring Patrick Dempsey?

Hello World


You could also assign a variable to `Hello World` and print it out.

In [2]:
message = "Hello World"
print(message)

Hello World


We can also think of the string as a *string of individual characters*. Now let's try and guess how many characters are there in the sentence `Hello World`, before applying a function that helps us find that. *Hint: White spaces are also counted as characters (and hence are called white space characters)**.

In [3]:
# Use the 'len' function (which stands for length) to find the length of an object

print(len(message))

11


Since the message ```Hello World``` is a set of characters, we should be able to find the specific character by looking at the index. In other words, if we call `message[some_number]`, we should be able to find the location for the specific letter.

In [4]:
print(message[11]) # Returns a IndexError

# Woah, this doesn't work! 

IndexError: string index out of range

Whoops, an **Index Error** message (i.e., your code is trying to access an index that is invalid). Didn't we say that there are 11 indexes in the sentence `Hello World`? This is because Python indexes at 0. In other words, what we want to get the letter `d` is:


In [5]:
print(message[10])

# To get the letter 'H', we reference the 0th index, i.e., print(message[0])

d


One feature that you might find in Python but not other languages is that each class (so far, we've only been working with the string class) contains a certain set of methods. For instance, we could turn our `Hello World` message to all upper-case.

In [6]:
print(message.upper())

HELLO WORLD


For now, remember that when you call in a method, you need to include parentheses. This dives into a bit about object-orientated programming (which is a bit more advnaced topic), but the logic behind this is that you're passing the message `Hello World` through a function that turns letters into uppercase (i.e., ```upper(insert_characters_here)```). <br><br>
In fact, to access all the methods that are associated with the string class, we can use:

In [7]:
print(dir(str))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


This is not terribly important to remember now, but can prove useful later, especially when using packages that can run powerful models. <br>

<br> Finally, we can also print multiple string statements in one line of code, allowing for the ability to mix and match string and characters. 

In [8]:
name = "Will"

print("Hello, my name is" + " " + name)

# Note that we need to include " " because spaces are characters too!

Hello, my name is Will


There are really many ways to do this, mainly because Python is inspired and built upon older programming lanugages. I generally prefer using method 4, which is an homage to printf (print formatted in C programming) as it is the simplest. 

In [9]:
name = "Will"

print("Method 2: Hello, my name is %s" % (name))
print("Method 3: Hello, my name is {}".format(name))
print(f"Method 4: Hello, my name is {name}")

Method 2: Hello, my name is Will
Method 3: Hello, my name is Will
Method 4: Hello, my name is Will


### Integers and Floats

Before we dive more into coding complex tools, it is important to understand how to perform basic mathematical operations in Python. Let's begin by assigning a value to our variables and performing a simple addition. 

In [10]:
# Assign a variables 
a = 3
b = 2

# Addition 
print(a + b)

5


To perform multiplication, we use the ```*``` operator. And to perform exponentials, we use the ```**``` operator. 

In [11]:
# Multiplication:
print(f"Multiplication: {a* b}")

# Expoential:
print(f"Exponential: {a ** b}")

Multiplication: 6
Exponential: 9


To perform division, we use the ```/``` operator. And to perform a floor division that only returns the largest integer (i.e., the decimal points gets truncated and we are returned an integer), we use the ```//``` operator. 

In [12]:
# Division:
print(f"Division: {a / b}")

# Floor Division:
print(f"Floor Division: {a // b}")

Division: 1.5
Floor Division: 1


The modulus operator, which returns the remainder of the divison (recall your long division) is also commonly used in programming. Most notably, it can tell us whether an integer is even or odd when you take the modulus of 2.

In [13]:
# Just to refresh your long division 
print(f"The modulus of 3 by 4 is : {3 % 4}")

# Is 3 an even or odd number? 0 for yes, 1 for No. 
print(f"The modulus of 3 by 2 is: {a % b}")

The modulus of 3 by 4 is : 3
The modulus of 3 by 2 is: 1


We can also perform operations on the variables themselves. For instance, one common use is to write a counter which incrememnts by 1 whenever an operation is completed (this is mostly used in for-loops, but more on that later) and done over and over again.

In [14]:
# Define the counter
counter = 0

# A simple way to increment our counter by 1. Note that the operation is performed on the R.H.S (right hand side)
counter = counter + 1

print(counter)

1


Another (and more elegant) way to do this would be to use the ```+=``` construct.

In [15]:
# Reset the counter
counter = 0

counter += 1

print(counter)

# How would you multiply counter by 10?

counter *= 10
print(f"Multiplied by 10: {counter}")

1
Multiplied by 10: 10


Lastly, we need to be careful not to assign numerics as characters. For instance:

In [16]:
# Assign variables
num_1 = "100"
num_2 = "5"

# Use the plus operator: 
print(num_1 + num_2)

1005


Really, this should be simialr to the output we would expect if we tried adding up the words ```Hello``` and ```World```. To do this mathematical operation properly, we need to change the strings back to integers. We can use the `int()` function to force `num_1` and `num_2` as integers instead of strings.

In [17]:
print(int(num_1) + int(num_2))

105


### Lists, Tuples, and Dictionaries
If you're thinking about using Python for data analysis, lists serve as the foundation of any dataframe. As such, understanding the ins and outs of a list will be crucial and will serve as the foundation for learning more complicated libraries (such as Pandas). Let's begin by creating a list of our own:

In [17]:
ivies = ['Yale', 'Harvard', 'Brown', 'Dartmouth']
print(ivies)

['Yale', 'Harvard', 'Brown', 'Dartmouth']


Each item on the list can be accessed by calling its index. For instance, if we wanted to call the last item on the list (i.e., Dartmouth), we can call the last index of the list (remember that indexing starts at 0 — if our list is of length 4, that would mean we want to access index #3). However, if you work with millions and millions of rows of data, recalling the exact index can be clumbersome. 

In [1]:
print(f"Method 1: Last item on the list: {ivies[3]}")
print(f"Method 2 (Preferred): Last item on the list {ivies[-1]}")

NameError: name 'ivies' is not defined

What happens if we only want to print the last two items on the list? 

In [19]:
print(f"Print last 2 items: {ivies[2:4]}")
print(f"Print last 2 items: {ivies[2:]}")

Print last 2 items: ['Brown', 'Dartmouth']
Print last 2 items: ['Brown', 'Dartmouth']


Note that when we call ivies[2:4], we should interpret it as [2,4), i.e., inclusive of 2, 3 but exclusive of 4. Another way would just be to write [2:], which means from index 2 to the end of the list. 

Okay, say you wanted five schools on the ivies list, and gave this task to your summer intern. We can use the `append` function. Note that the append function modifies the exisiting list only — i.e., it does not return a new list. Hence, trying to assign a variable after an `append` operation will not work (the technical reasons for this relates more to data structures and how to use memory efficiently, but we will not dig too deep into this rabbit hole today). 

In [23]:
ivies.append("Stanford")
print(ivies)

# This will not work
# ivies = ivies.append("Stanford")

['Yale', 'Harvard', 'Brown', 'Dartmouth', 'Stanford', 'Princeton', 'Stanford']


Another way of doing this is to `extend` function. However, the item that is added must be a list as well (i.e., you need to add a list to a list — when can this be useful?)

In [31]:
ivies = ['Yale', 'Harvard', 'Brown', 'Dartmouth']        # Resetting the ivies variable 
new_school = ['Stanford']

ivies.extend(new_school)
print(ivies)

['Yale', 'Harvard', 'Brown', 'Dartmouth', 'Stanford']


The intern passes the code back to you. As their supervisor, you notice something is not quite right...

In [32]:
ivies.remove('Stanford')
print(ivies)

['Yale', 'Harvard', 'Brown', 'Dartmouth']


It so happens that Stanford is the last item on the list. As such, we can use the `pop` method as well.

In [52]:
ivies = ['Yale', 'Harvard', 'Brown', 'Dartmouth', 'Stanford']  # Resetting the ivies variable 
popped = ivies.pop()
print(f"Popped item: {popped}")
print(ivies)

Popped item: Stanford
['Yale', 'Harvard', 'Brown', 'Dartmouth']


Now let me introduce two more methods, `reverse` and `sort`. Reverse sorts the list in *reverse* alphabetically:

In [54]:
print(f"Original List: {ivies}")
ivies.reverse()
print(f"Reversed List: {ivies}")

Original List: ['Dartmouth', 'Brown', 'Harvard', 'Yale']
Reversed List: ['Yale', 'Harvard', 'Brown', 'Dartmouth']


Whereas `sort` and `sorted` does this alphabetically. `Sort` operates similarly with `append` in the sense that a new list is not generated. 

In [58]:
print(f"Alphabetically Reversed List: {ivies}")
ivies.sort()
print(f"Alphabetically Sorted List: {ivies}")
ivies.sort(reverse = True)
print(f"Alphabetically Reversed List: {ivies}")

Alphabetically Reversed List: ['Yale', 'Harvard', 'Brown', 'Dartmouth']
Alphabetically Sorted List: ['Brown', 'Dartmouth', 'Harvard', 'Yale']
Alphabetically Reversed List: ['Yale', 'Harvard', 'Dartmouth', 'Brown']


To sort and generate a new list, you can use `sorted`.

In [57]:
ivies = ['Yale', 'Harvard', 'Brown', 'Dartmouth']        # Resetting the ivies variable 
ivies_sorted = sorted(ivies)
print(ivies_sorted)

['Brown', 'Dartmouth', 'Harvard', 'Yale']


Another useful tool when creating lists is to check whether the item is even in the list or not. To do this, we can use the `in` command:

In [59]:
ivies = ['Yale', 'Harvard', 'Brown', 'Dartmouth']        # Resetting the ivies variable 
print('Princeton' in ivies)

False


This is a good segway for me to give a soft introduction to for loops as well. As some people might already know, for loops are used to repeat a section of code a number of times.

In [75]:
for dow in ["Sun","Mon","Tue", "Wed", "Thu", "Fri", "Sat"]:     # dow = day of week, but you can change this name
    print(f"{dow}: Medication taken.")

Sun: Medication taken.
Mon: Medication taken.
Tue: Medication taken.
Wed: Medication taken.
Thu: Medication taken.
Fri: Medication taken.
Sat: Medication taken.


What's happening here is that the algorithim loops through the block of **indented** code. The number of loops is determined by the number of items in the list (e.g., if there's three items in the list, the function will loop three times). One way to visualize this is to imagine a weekly pill organizer where you open the boxes sequentially based on the day of the week.

<img src="https://thumbor.forbes.com/thumbor/fit-in/x/https://www.forbes.com/health/wp-content/uploads/2021/08/pill_organizer_feature_getty_creative.jpeg" width=400 height=400 />


Now it's your turn to try implementing a for loop to print out all the ivy league names sequentially.

In [76]:
ivies = ['Yale', 'Harvard', 'Brown', 'Dartmouth']       # Resetting the ivies variable 
for school_name in ivies:
    print(school_name)

Yale
Harvard
Brown
Dartmouth


In addition to the item, you might also want the index of the list (i.e., the order of the item in the list). This is accomplished using the `enumerate` function. 

In [77]:
for index, school_name in enumerate(ivies):
    print(index, school_name)

0 Yale
1 Harvard
2 Brown
3 Dartmouth


I'd like to briefly talk about tuples, which is essentially a list that cannot be modified (there are other uses of tuples, but we'll cover that in the future). In programming lingo, it is *immutable*. Lists as we've seen is created using square brackets, but tuples are created using parentheses.

In [85]:
# Lists can be changed
ivies = ['Yale', 'Harvard', 'Brown', 'Dartmouth']  
ivies[0] = 'Stanford'
print(f"List: {ivies}")

ivies = ('Yale', 'Harvard', 'Brown', 'Dartmouth')
ivies[0] = 'Stanford'
print(f"Tuple: {ivies}")     # Returns a TypeError

List: ['Stanford', 'Harvard', 'Brown', 'Dartmouth']


TypeError: 'tuple' object does not support item assignment

Finally, I'd like to introduce dictionaries. We all know that dictionaries give you definition by looking up the word. In Python, dictionaries give you what is called a key-value pair — the ability to use a key to lookup and call a value. I won't go through this in too much detail as you most likely won't be asked to create a dictionary on your own, but it's good to know the fundamentals behind it. Let's start by creating a dictionary of three Ivy League schools with their founding years:

In [93]:
ivies = {"Yale": 1701, "Harvard": 1636, "Brown": 1764}
print(ivies)

{'Yale': 1701, 'Harvard': 1636, 'Brown': 1764}


One item of interest might be to know what are the words stored in our dictionary — or the keys stored in our dictionary. We can call this using the `keys` method:

In [90]:
print(ivies.keys())

dict_keys(['Yale', 'Harvard', 'Brown'])


To get the founding year of Yale, we can simply call the key from our dictionary.

In [91]:
print(ivies["Yale"])

1701


### Conditionals
Conditional statements, often referred to as if-else statements, lets you control which pieces of code to run based on whether the condition is met (or not met). Let's work with a very simple math example to show his this works. Let's first define a variable and assign a value to it.

In [94]:
number = 4

Now, let's write an if-statement to check if the number is truly 4. Note that this is similar to the for-loop we introduced before, where we use identation to indicate the chunk of code we want to run a special operation/command.

In [96]:
if number == 4:
    print("Yes, the value is four")

Yes, the value is four


Let's try setting the number variable to 5 and see what happens.

In [98]:
number = 5

if number == 4:
    print("Yes, the value is four")

Woah, nothing happened. This is because the idented chunk of code will not run if it doesn't fulfill the conditional statement. For obvious reasons, this is not good coding practice as we don't get any error messages and hence do not know whether the number is/is not truly 4.

In [99]:
number = 5

if number == 4:
     print("Yes, the value is four")
else:
    print("Number is not 4")

Number is not 4


We could also write this as a double negative, i.e., if number is **not** equal to 4 as our if statement.

In [102]:
number = 5

if not number == 4:
    print("Number is not 4")
else:
    print("Yes, the value is four")

Number is not 4


To specificy more than one *else* conditions, we could use the `elif` command.

In [103]:
number = 5

if number == 4:
     print("Yes, the value is four")
elif number > 4:
    print("Number is greater than 4")
else:
    print("Number is less than four")

Number is greater than 4


### Loops and Iterations
The biggest advantage of a computer is its ability to do repetitive tasks. Why do the hard work when computers can do it for you? *All jokes aside, understanding for loops can help automate your code and run tasks that needs to be done over and over again!* Let's refresh our muscles once again and try to print out all the values of a list.

In [2]:
numbers = [1, 2, 3, 4, 4, 4, 4, 4, 4]

for number in numbers:
    print (number)

1
2
3
4
4
4
4
4
4


Say we had some bug in our code and that if we keep printing the list it will print 4's ad infinitum. We want the loop to stop as `number` hits 4, hence avoiding an unnecessary long print statement. 

In [3]:
for number in numbers:
    if number == 4:
        print("We've hit four. Stop printing!")
        break
    print(number)

1
2
3
We've hit four. Stop printing!


In some cases, however, we might *want* the crazy long statement so we can tell the user that we're continuously printing 4, and that this bug really needs to be fixed ASAP. 

In [7]:
for number in numbers:
    if number == 4:
        print("We've hit four. Fix the bug!")
        continue
    print(number)

1
2
3
We've hit four. Fix the bug!
We've hit four. Fix the bug!
We've hit four. Fix the bug!
We've hit four. Fix the bug!
We've hit four. Fix the bug!
We've hit four. Fix the bug!


We could also write something called a nested loop (just like a nested doll) — which is a loop in a loop. 

In [8]:
numbers = [1, 3, 5, 7]

# Outer loop
for number in numbers:
    # Inner loop
    for letter in 'abc':
        print(number, letter)

1 a
1 b
1 c
3 a
3 b
3 c
5 a
5 b
5 c
7 a
7 b
7 c


What's happening here is that the inner loop runs first — and once the inner loop is completed, the outer loop is runs. In the above example, the object `abc` is iterated (imagine putting a in one box, b in another box, and c in the third box). Once the object `abc` is iterated once, number shifts to the second position of the list, and the object `abc` gets iterated on again. <br> <br>
There are also times when you want to run through a loop a certain amount of times. To do this, we use the range `range` operator:

In [10]:
for i in range(10):
    print (i)

0
1
2
3
4
5
6
7
8
9


What the range operator does is print out a list of values. To what's happening behind the scenes, we can can force range to become a list:

In [13]:
print(list(range(10)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


As we've seen over and over again, Python initializes from 0. There is also a way to set the loop so it initializes from 1.

In [11]:
for i in range(1, 10):
    print (i)

1
2
3
4
5
6
7
8
9


Since we've covered for loops as well as if-else statements, we should also introduce a while loop, which is basically the mashup of both. In other words, the while loop in Python is used to iterate over a block of code as long as the test expression (condition) is true. Let's now utilize the counter we've built early on in the workshop:

In [18]:
counter = 0       # Initialize counter as 0

while counter < 8:
    print(counter)       # Print the value of the counter
    counter += 1         # Increment the counter by 1 everytime the loop runs through one cycle

0
1
2
3
4
5
6
7


To solidify our understanding of for loops, I want us to work with an applied example. But first, let me introduce the `split` function, which might be very handy when used to parse through textual data.

In [29]:
text = "I hope you're well"
words = text.split()
print(words)

['I', 'hope', "you're", 'well']


As you can see, our sentence is broken up to words. By default, the `split` function splits sentences based on blank characters. <br> <br>

Lastly, I want to compute the number of characters per word and list it out. 

In [33]:
for word in words:
    print(word, len(word))

I 1
hope 4
you're 6
well 4
