# Learning Objectives

In this notebook we are going to:

- Get started with the Python programming language.
- Get familiar with python notebooks.

If you feel comfortable working with Python, feel free to skip this tutorial and proceed to Exercise 0 where the task is to find the 25 most frequent words in a large piece of text using using basic text processing functions covered in this tutorial.

# 0. Jupyter Notebooks
This is a "jupyter notebook" (formerly "ipython notebook"). It is an interactive python programming tool which allows you to work with code along with prose and visualisations in the same file.

The notebook is divided into cells, each of which contains things like text (what you're reading) or python code. It is also possible for cells to contain code in other programming languages, such as R or bash scripts. Text cells use [Markdown](https://en.wikipedia.org/wiki/Markdown) for formatting. You can edit cells by clicking anywhere inside the cell (double clicking for text cells).

The notebook has one active cell (click on a cell to activate it). You can run the active cell either with the "Run" button left to the cell or by pressing `shift-enter`. A python "kernel" is maintained by the notebook server which contains the current program state: variables and their values. This is quite similar to matlab's interactive prompt. When executed, if the code in a cell prints any text, that text is displayed in teh notebook after the cell. The value retuerned by the final command in the cell is also presented.  In this way, you can, for example, use a cell as a calculator:

In [None]:
# click on this cell (it gets a thin border to highlight it) then press ctrl-enter to execute it
print("Using a cell as a calculator:")
1+2*(3+4)

Any line inside a code cell starting with a `#` is comment and ignored by the kernel.

After executing, the cell is preceded by `In [1]` and the value of the final statement by `Out [1]`. This tells you it's the first cell executed by the ipython kernel running behind the notebook. You can access the output values in subsequent cells by the shorthand `_1`, or `_6` for the output of the 6th cell executed. Try it out below. Note that this is an ipython feature, and is not available if you use the standard python command line.

Try it with the `Out[]` number you see in the calculator cell above.

In [None]:
# this give the output of the first cell execution
_1

## 0.1 Command and Edit mode

The notebook interface has two modes of interaction: an edit mode, where you can edit the contents of a cell, and a command mode, where you can use a rich set of keyboard shortcuts to interact with the notebook. In edit mode, a small pencil icon appears in the top right of the notebook (next to the kernel name, perhaps "Python 3").

Some ways to move between command and edit mode:

 - If you click on the text of a code cell, you enter edit mode for that cell (double click for text cells).
 - If you are in edit mode, press the _esc_ key to switch to command mode. 
 - If you are in command mode, press the _enter_ key to switch to edit mode.

You can see a complete list of available keyboard shortcuts by pressing the __h__ key in command mode or choosing _Keyboard Shortcuts_ from the _Help_ menu.


## 0.2 The Ipython Kernel
The ipython kernel running behind the notebook (the thing that executes the python code you enter) stores any variables you create. To see this, run the first cell below once, then the following cell several times. You can see that the value of the variable `i` is incremented each time you run the second cell.

In [None]:
# run this once
i = 0

In [None]:
# run this multiple times. You see the value of variable 'i' is retained between runs.
i = i + 1
print(i)

## 0.3 Suggestions and Automatic Code Completion in Jupyter Notebooks
Jupyter notebooks give you code suggestions by pressing the tab key. This handy feature has several uses:

 - automatically completing names, such as variable and function names. In the cell below, put the cursor at the end of the word "sent" and press tab. The remainder of the function name "sentence" should appear (it you executed the cell above in which the function was defined).
 - suggesting possible methods and attributes of an object. This time, put the cursor after "sentence." and press the tab key. You should see a list of methods and atributes of sentence, which is a python `string`. It includes `format`, `count` and `islower` as well as several other things. Choose one of them, by scrolling with the arrow keys and pressing return or by clicking on it.
 - suggestions for function parameters by pressing `shift-tab` when the cursor is within the parentheses of a function.

In [None]:
# to see the notebook editor help capabilities, first we need somethiing for it to make suggestions about
sentence = "If this sentence is true, then nothing is true!"

In [None]:
# run the cell above (ctrl-enter), then try the auto-complete and suggestion features here

# put the cursor after the word "sent" below, then press tab
sent
# put the cursor after "sentence." below, then press tab. (press esc to get rid of the list that pops up)
sentence.
# put the cursor within the brackets below, then press shift-tab, then press it again
sentence.replace()

## 1.4 Notebok Reference Resources and Help
There are a number of useful links in the **Help** menu that contain tutorials and reference material. The notebook help, python and numpy links may be particularly useful.

A handy built in feature of ipython is to access documentation and source code of python objects by appending one or more question marks (two to see the python source code for an object, if it exists).

In [None]:
# this gives some documentation on the print statement
print?

---

# 1. Basic Data Types and Operators in Python
Python supports 4 primitive data types:
1. Integers
2. Floating-point Numbers
3. Boolean
4. Strings

## 1.1 Mathematical Operators

In [None]:
# Addition
1 + 1   # => 2

In [None]:
# Subtraction
8 - 1   # => 7

In [None]:
# Multiplication
10 * 2  # => 20

In [None]:
# Division
35 / 5  # => 7.0

To get only the integer part of division, use the `//` operator.

In [None]:
# Integer Division
35 // 5

In [None]:
# Modulo
7 % 3  # => 1

In [None]:
# Exponentiation
2**3  # => 8

## 1.2 Booleans
Boolean values are primitives (Note the capitalization).

In [None]:
True

In [None]:
False

The basic boolean operations `not`, `and` and `or` have their usual meanings.

In [None]:
not True # => False

In [None]:
True or False # => True

In [None]:
True and False # => False

## 1.3 Conditionals

Python has the usual set of boolean relation operators that can be applied between numbers and in some cases other types of objects.

> "__`< `__"  less than  
> "__`<=`__"  less than or equal to  
> "__`==`__"  equal to (note this is two "=" signs, not one)  
> "__`!=`__"  not equal to  
> "__`> `__"  greater than  
> "__`>=`__"  greater than or equal to  

In [None]:
5 == 5 # True

In [None]:
4 > 5 # False

## 1.4 Strings

In [None]:
# Strings are created with " or '
"This is a string."

In [None]:
'This is also a string.'

In [None]:
# Strings can be added too.
"Hello " + "world!"  # => "Hello world!"

In [None]:
# A string can be treated like a list of characters; Indexing starts at 0.
"Hello world!"[4]  # => 'o'

In [None]:
# You can find the length of a string
len("This is a string")  # => 16

There are many built-in functions for strings which are useful for text processing.

> __`s.startswith(t)`__ - - - - - - - - - - test if s starts with t  
> __`s.endswith(t)`__ - - - - - - - - - - test if s ends with t  
> __`t in s`__ - - - - - - - - - - test if t is a substring of s  
> __`s.islower()`__ - - - - - - - - - - test if s contains cased characters and all are lowercase  
> __`s.isupper()`__ - - - - - - - - - - test if s contains cased characters and all are uppercase  
> __`s.lower()`__ - - - - - - - - - - convert a string to lowercase  
> __`s.upper()`__ - - - - - - - - - - convert a string to uppercase  
> __`s.isalpha()`__ - - - - - - - - - - test if s is non-empty and all characters in s are alphabetic  
> __`s.isalnum()`__ - - - - - - - - - - test if s is non-empty and all characters in s are alphanumeric  
> __`s.isdigit()`__ - - - - - - - - - - test if s is non-empty and all characters in s are digits  
> __`s.istitle()`__ - - - - - - - - - - test if s contains cased characters and is titlecased (i.e. all words in s have initial capitals)  
> __`s.strip()`__ - - - - - - - - - - remove leading and trailing spaces, newline and tab characters in a string


In [None]:
'Hello World!'.lower()

In [None]:
'     this string contains leading and trailing spaces   \n'

In [None]:
'     this string does not contain leading and trailing spaces   \n'.strip()

# 2. Variables
There are no declarations, only assignments. The convention is to use `lower_case_with_underscores` for variable names.

In [None]:
some_var = 42

In [None]:
some_var  # => 42

In [None]:
# Accessing a previously unassigned variable is an exception.
some_unknown_var  # Raises a NameError

# 3. Collections
Apart from the basic data types mentioned above, Python also supports various useful constructs like `list`, `tuple`, `set` and `dictionary`.

## 3.1 List
Lists store sequences and are same as arrays in other programming langagues.

In [None]:
# create an empty list
a = []

In [None]:
# or create a list with some pre-filled values
a = [1, 2, 3]
a

In [None]:
# add stuff to the end of a list with append
a = []         # a is an empty list
a.append(1)    # a is now [1]
a.append(2)    # a is now [1, 2]
a.append(4)    # a is now [1, 2, 4]
a.append(3)    # a is now [1, 2, 4, 3]
a

In [None]:
# remove from the end with pop
a.pop()        # => remove 3 and a is now [1, 2, 4]
a

In [None]:
# put it back with append
a.append(3)
a

In [None]:
# access a list with an index like you would any array
a[0]  # => returns the first element

In [None]:
# python also supports negative indexing to access elements at the end
a[-1] # => returns the last element

In [None]:
# looking out of bounds is an IndexError
a[4]  # Raises an IndexError

In [None]:
# ranges are supported with index slicing: a[start:end]
# start index is included but last index is not
a[1:3]   # Return list from index 1 to 3 => [2, 4]

In [None]:
a[2:]    # return list starting from index 2 => [4, 3]

In [None]:
a[:3]    # return list from beginning until index 3  => [1, 2, 4]

In [None]:
a[1:-1]  # return list staring at the second element to the second last element => [2, 4]

In [None]:
a[::-1]  # return list in reverse order => [3, 4, 2, 1]

In [None]:
# concatenate lists with the addition operator
x = [5, 6]
y = [7, 8]
x + y   # this doesn't modify either list

In [None]:
# count the number of elements in a list with the len function
len(a)

In [None]:
# Check for existence in a list with "in"
1 in a  # => True

The `split()` function for strings is a useful function to break off the strings into a list of substrings based on a separator.

In [None]:
# break off a string into words separated by forward slashes
s = 'alice/bob/charlie'
s.split('/')

In [None]:
# if you don't specify the separator, Python uses the (space) characater by default
s = 'The quick brown fox jumps over the lazy dog.'
s.split()

The `join()` function for lists does the reverse job of the `split()` function, i.e., it can create a string from a list of strings with a separator in place.

In [None]:
# join the words into a string separated by forward slashes
a = ['alice', 'bob', 'charlie']
'/'.join(a)

In [None]:
# the default separator is the (space) characater
a = ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
' '.join(a)

## 3.2 Tuple
Tuples are like lists but are immutable.

In [None]:
# tuples are cretead with (parenthesis)
tup = (1, 2, 3)
tup[0]      # => 1

In [None]:
tup[0] = 3  # Raises a TypeError

## 3.3 Dictionary

Dictionaries in Python are used to store key-value pairs.

In [None]:
# Dictionaries store mappings from keys to values
empty_dict = {}

In [None]:
# Here is a prefilled dictionary
filled_dict = {"one": 1, "two": 2, "three": 3}
filled_dict

The keys for dictionaries have to be immutable types. This is to ensure that the key can be converted to a constant hash value for quick look-ups. Immutable types include ints, floats, strings, tuples.

In [None]:
invalid_dict = {[1,2,3]: "123"}  # => Raises a TypeError: unhashable type: 'list'

In [None]:
valid_dict = {(1,2,3):[1,2,3]}   # Values can be of any type, only keys have to be immutable

In [None]:
# values can be looked up with []
filled_dict["one"]  # => 1

In [None]:
# Get all keys as an iterable with "keys()", and wrap the call in list()
# to turn it into a list. We'll talk about iterables later
list(filled_dict.keys())  # => ["one", "two", "three"]

In [None]:
# Similarly all the values can be obtained as an iterable with "values()".
# Once again we need to wrap it in list() to get it out of the iterable.
list(filled_dict.values())  # => [1, 2, 3]

In [None]:
# To get both keys and values in a list, use the "items()" method.
# This returns a list of tuples, where each tuple contains two elements, 
# the first is the key and the second is the corresponding value.
list(filled_dict.items())  # => [('one', 1), ('two', 2), ('three', 3)]

In [None]:
# Check for existence of keys in a dictionary with "in"
"one" in filled_dict  # => True

In [None]:
1 in filled_dict      # => False

In [None]:
# Add a new item to the dictionary using []
filled_dict["four"] = 4
filled_dict

## 3.4 Set

A set is an unordeded collection of distinct objects.

In [None]:
empty_set = set()

In [None]:
# Set initialization looks similar to dictionaries.
# However, unlike a list only distinct entries are stored inside a set.
some_set = {1, 1, 2, 2, 3, 4}  # some_set is now {1, 2, 3, 4}
some_set

In [None]:
# Similar to keys of a dictionary, elements of a set have to be immutable.
invalid_set = {[1], 1}  # => Raises a TypeError: unhashable type: 'list'

In [None]:
# a set can be used to count unique elements in a list
a = [1, 2, 3, 1, 3, 4, 1, 2]  # a is list
len(a)  # a contains 8 elements

In [None]:
# create set from a list
s = set(a)  # s will contain only the distinct elements in a
len(s)  # s contains 4 elements

# 4. Control Flow and Iterables

For looping and branching in a Python program, we need a way to group commands togehter. In Java, C and C++, this is done with curly braces "{}". In python it's done with indenting. Consecutive lines starting with the same number of spaces or tabs are considered in the same block of code. To create a block within a block, you increase the indentation. One important point is that code always starts with no indentation - if you indent the first line, python produces an error. The line before an indented code block always ends with a colon ":".


### 4.1 If Else

An **if** statement is written with the `if` keyword. To test other conditions, you can use the `elif` and `else` keywords, however, both of these are optional.

In [None]:
# Let's just make a variable
some_var = 5

# Here is an if else statement. Indentation is significant in Python!
# Convention is to use four spaces, not tabs.
# This prints "some_var is smaller than 10"
if some_var > 10:
    print("some_var is totally bigger than 10.")
elif some_var < 10:    # This elif clause is optional.
    print("some_var is smaller than 10.")
else:                  # This is optional too.
    print("some_var is indeed 10.")

### 4.2 For Loop

In [None]:
"""
For loops iterate over lists
prints:
    dog is a mammal
    cat is a mammal
    mouse is a mammal
"""
animals = ["dog", "cat", "mouse"]
for animal in animals:
    # You can use format() to interpolate formatted strings
    print("{} is a mammal".format(animal))


In [None]:
"""
"range(number)" returns an iterable of numbers
from zero to the given number
prints:
    0
    1
    2
    3
"""
for i in range(4):
    print(i)

You can use `range` to access list elements too.

In [None]:
for i in range(0, len(animals)):
    print("{} is a mammal".format(animals[i]))


### 4.3 While Loop

In [None]:
"""
While loops go until a condition is no longer met.
prints:
    0
    1
    2
    3
"""
x = 0
while x < 4:
    print(x)
    x += 1  # Shorthand for x = x + 1

### 4.4 List Comprehension

Python allows you write elegant code which reads like pseudocode. You will often see examples of this in code which uses a single line containing an `if` condition and a `for` loop to create a list or iterate over something. However, it can be a bit tricky to understand it in the beginning.

In [None]:
# For example, the following code creates a list of squares from 1 to 9.
squares = []

for i in range(1, 10):
  squares.append(i**2)

print(squares)

In [None]:
# However, the same list can be created with just a single line
squares = [i**2 for i in range(1, 10)]
print(squares)

In [None]:
# You can also test for conditions inside list comprehension
# For example, the following checks for words containing the letter a in 'e'
words = ['apple', 'banana', 'orange', 'milk', 'eggs']
words_with_e = []

for word in words:
  if 'e' in word:
    words_with_e.append(word)
words_with_e

In [None]:
# with list comprehension this can be done as follows
[word for word in words if 'e' in word ]

The general syntax for list comprehension is

`[expression for item in iterable]`

where, an `iterable` is a `list`, `set`, `generator` or anything which returns one value at a time.


In [None]:
# Another example, to print "Odd" and "Even" for a numbers in a range from 10 to 20.
["Even" if i % 2 == 0 else "Odd" for i in range(10, 20)]

Note that when using a conditional with `if` and `else` the syntax looks like:

`[f(x) if condition else g(x) for x in sequence]`

i.e, the condition comes before the `for` loop.

However, when working with just the `if` statement, the condition comes after the `for` loop.

`[f(x) for x in sequence if condition]`

Check out this post here for more details: [List Comprehension - Python](https://realpython.com/list-comprehension-python/)

# 5. Functions

In [None]:
# Use "def" to create new functions
def add(x, y):
    print("x is {} and y is {}".format(x, y))
    return x + y  # Return values with a return statement

# note that this is just the definition and the function is not called here

In [None]:
# Calling functions with parameters
add(5, 6)  # => prints out "x is 5 and y is 6" and returns 11

In [None]:
# Another way to call functions is with keyword arguments
add(y=6, x=5)  # Keyword arguments can arrive in any order.

In [None]:
# Python supports returning multiple values (with tuple assignments)
def swap(x, y):
    return y, x  # Return multiple values as a tuple without the parenthesis.
                 # (Note: parenthesis have been excluded but can be included)

x = 1
y = 2
print('Before: x =', x, 'y =', y)
x, y = swap(x, y)     # => x = 2, y = 1
print('After: x =', x, 'y =', y)

In [None]:
# Python has first class functions, i.e., you can pass a function as an argument to another function
def create_adder(x):
    def adder(y):
        return x + y
    return adder

add_10 = create_adder(10)
add_10(3)   # => 13

In [None]:
# There are also anonymous functions that can be created with the lambda keyword
(lambda x: x > 2)(3)                  # => True

In [None]:
(lambda x, y: x ** 2 + y ** 2)(2, 1)  # => 5

There are some useful built-in functions such as `sum`, `min`, `max`, and `sorted`  covered below.

In [None]:
a = [30, 10, 40, 20]   # create a new list

In [None]:
# Find the sum of all the values in the list
sum(a)   # => 100

In [None]:
# Find the minimum and maximum values
min(a), max(a)  # => 10, 40

In [None]:
# sort the list
sorted(a)   # => [10, 20, 30, 40]

In [None]:
# note that the sorted function creates a new list and doesn't sort in place
a  # => [30, 10, 40, 20]

In [None]:
# pass the reverse=True argument to sort the list in descending order
sorted(a, reverse=True) # => [40, 30, 20, 10]

In [None]:
# you can create your custom sort function with the key argument;
# for example, say you have a list of tuples containing counts of different objects
fruits = [('orange', 8), ('apple', 4), ('banana', 1)]

# the first item in each tuple is the name of the fruit and the second item is the count;
# to sort the fruits by the counts in increasing order use the `key` argument in the function call
sorted(fruits, key=lambda x: x[1])   # x[1] means sort by the second item of the tuple

In [None]:
# to sort by names, use x[0]
sorted(fruits, key=lambda x: x[0])  # x[0] sorts by the first item of the tuple

# 6. Modules

A module is a file containing Python code. It can contain executable statements as well as function definitions. Python comes with a library of standard modules such as `os`, `sys`, `re`, `math`, etc.

You include a module in your code with the `import` keyword.

In [None]:
# import the math module to access mathematical functions
import math
math.sqrt(25)

In [None]:
# you can get specific functions from a module
from math import ceil, floor
print(ceil(3.7))   # => 4.0
print(floor(3.7))  # => 3.0

In [None]:
# or import all the functions inside a module with *
# although this is usually not recommended
from math import *
sqrt(100)   # no need for the `math.` prefix now

In [None]:
# You can shorten module names
import math as m
m.sqrt(0.25)

Later in the course we will be working with other modules like `nltk`, `gensim`, `numpy`, `pandas`, `sklearn` etc.

# 7. Working with Files

open() returns a file object, and is most commonly used with two arguments: `open(filename, mode)`.

In [None]:
textfile = open('workfile.txt', mode='w')  # mode='w' opens a file for writing
textfile.write('This is a test\n')
textfile.write('Second line in the file\n')
textfile.close()

To read a file’s contents, call f.read(size), which reads some quantity of 

---

data and returns it as a string. size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most size bytes are read and returned. If the end of the file has been reached, f.read() will return an empty string ("").

In [None]:
textfile = open('workfile.txt', mode='r') # mode='r' opens a file for reading
textfile.read()

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline.

In [None]:
textfile = open('workfile.txt', 'r')
textfile.readline()

To read multiple lines from a file into a list, use the `readlines()` function.

In [None]:
textfile = open('workfile.txt', 'r')
lines = textfile.readlines()
lines

It is good practice to use the `with` keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way. 

In [None]:
with open('workfile.txt', 'r') as textfile:
    read_data = textfile.read()
    print(read_data)

# 8. Advanced Topics

The following topics are not covered in this tutorial, however they are good to know when programming in Python.

1. [Classes](https://docs.python.org/3/tutorial/classes.html): Python supports many object-oriented programming principles such as classes, inheritance etc.
2. [Functional Programming](https://docs.python.org/3/howto/functional.html): Python has built-in features to support functional progamming concepts using iterators and generators along with library functions such as `map`, `filter`, `reduce`, etc.
3. [Regular Expressions](https://docs.python.org/3/library/re.html): Regular Expressions are a powerful tool used for pattern matching and text analysis. Python offers full support for regular expressions using the `re` module.


# Free Online Resources for Python

- [Python 3 Documentation](https://docs.python.org/3/)
- [PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
- [Learn Python in Y minutes](https://learnxinyminutes.com/docs/python/) - The tutorial above is largely adapted from here.
- [Learn Python - Full Course for Beginners by freeCodeCamp on YouTube](https://www.youtube.com/watch?v=rfscVS0vtbw)
- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com)