# Introduction to Python

---

by: Truong Le

*5 June 2023*

---

This is a three hour introductory course for Python. Please come back to this set of Jupyter Notebooks as and when you need them.

Before beginning it is important to understand that Python is an interpreted, Object-Orientated, high-level programming language. The code/program (if run as .py from a terminal) does not need to be compiled to machine code (unlike for example C++). A major advantage of Python is its accessibility and ease of use. A major disadvantage (which you may never come across) is its run speed. There are however in-built / bespoke packages which you can utilize to minimize this limitation.

The class today will cover the following items:

**Data Structures**

1.  Variables
2.  Lists, dictionaries, sets, tuples

**Data manipulation**
    
3.  Operators
4.  Logic

**Python programming** 

5.  Functions
6.  Classes 

**Essential modules**

7. Numpy
8. Pandas
9. Scipy (Very briefly covered)
10. Matplotlib

## Data Structures


### Variables



| **Type** | **Variable**        |
|----------|---------------------|
| Text     | str                 |
| Numeric  | int, float, complex |
| Sequence | list, tuple         |
| Mapping  | dict                |
| Set      | set                 |
| Boolean  | bool                |
| Binary   | byte                |

In [6]:
# Comments are written with a '#'
word         = 'string' 
numeric     = 1
numeric2    = 1.1
numeric3    = -1.2

print(type(word))
print(type(numeric))
print(type(numeric2))

<class 'str'>
<class 'int'>
<class 'float'>


### Lists



Note the following: 
* Lists are given in square brackets [...].
* The first item in a list has index [0], second is [1], third is [2] etc. 
* When slicing, the start index is **included**, the stop index is **not included**.
* Adding new items to a list will, by default, put it at the end. You can however put an item at a specific location.
* You can have nested lists.

In [7]:
# Lists can contain any (including mixed) data types
listvariable = [1,34,word,numeric,numeric2,numeric3,'a','m']

print(listvariable[0:2])
print(listvariable[2:4])
print(listvariable[-4:-2])

[1, 34]
['string', 1]
[1.1, -1.2]


In [8]:
listvariable.append('End')
print(listvariable)

[1, 34, 'string', 1, 1.1, -1.2, 'a', 'm', 'End']


In [9]:
listvariable.insert(-1, 'penultimate')
print(listvariable)

[1, 34, 'string', 1, 1.1, -1.2, 'a', 'm', 'penultimate', 'End']


In [10]:
listvariable.remove('End')
print(listvariable)

[1, 34, 'string', 1, 1.1, -1.2, 'a', 'm', 'penultimate']


In [11]:
listvariable.pop(1) 
# Remember '1' referes to the second item
print(listvariable)

[1, 'string', 1, 1.1, -1.2, 'a', 'm', 'penultimate']


In [12]:
# Deletes the list
listvariable.clear()
print(listvariable)

[]


In [14]:
# New list variable
listvariable=[1,2,4,6,7,4,2,1,4,6,7,8,9,10]

# reverse list variable
listvariable.reverse()
print(listvariable)

# sorted list variable 
# in reverse order
listvariable.sort(reverse=1)
print(listvariable)

# reverse list variable
listvariable.reverse()
print(listvariable)

[10, 9, 8, 7, 6, 4, 1, 2, 4, 7, 6, 4, 2, 1]
[10, 9, 8, 7, 7, 6, 6, 4, 4, 4, 2, 2, 1, 1]
[1, 1, 2, 2, 4, 4, 4, 6, 6, 7, 7, 8, 9, 10]


In [21]:
# other useful features

# Length of list
print(len(listvariable))

#number of 1's
print(listvariable.count(1))

# position of first number 7.
print(listvariable.index(7))

# 7th item
print(listvariable[7])

14
2
9
6


### Sets



These are variables which are unordered and unindexed. **You cannot have duplicate members**.

* We also write sets with curly brackets {...}.

In [None]:
setvariable = set(listvariable)
print(setvariable)

aset={1,2,4,65,1,1,1,1,1,1,1,1,1,1}
# Notice that the list is unordered. You cannot easily access the items individually.
print(aset)
print(len(aset))

# Adding and subtracting values is similar to that shown for lists
aset.add(12)
aset.remove(4)
asecondset = {3,5,6}
aset.update(asecondset)
print(aset)

asecondset = {1,2,'three'}
athirdset=aset.union(asecondset)
print(athirdset)

print(athirdset.intersection(asecondset))
print(athirdset.difference(asecondset))

### Tuples



These are a collection which is ordered and **unchangeable (immutable)**.

* We also write sets with curly brackets (...).
* We can also have nested tuples
* Tuple assignment is simultaneous and more memory efficient. (Faster)

In [None]:
atuple = (1,2,'a','two',[1,2,3,4,5])
print(atuple)


print(atuple[4])

# The following line will cause an error
# atuple[2]='ERROR'

### Dictionaries



Dictionary are used to store data values in key:value pairs. This concept (not necessarily with dictionaries as we will see later) is a very powerful tool for interacting with data.

* We also write sets with curly brackets {...}.
* Dictionaries are disordered
* Keys are immutable and are treated like a *tuple*
* We can also have nested dictionaries

In [None]:
thisdict1 = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}

print(thisdict1)

In [None]:
thisdict2 = {
    "brand": "Toyota",
    "model": "Corolla",
    "year": 1998
}

# Adding to the dictionary
thisdict2['owner'] = 'Jane Doe'

print(thisdict2)
len(thisdict2)

In [None]:
print(thisdict1.get("model"))
print(thisdict2.get("model"))

In [None]:
myfamily = {
  "child1" : {
    "name" : "Emil",
    "year" : 2004
  },
  "child2" : {
    "name" : "Tobias",
    "year" : 2007
  },
  "child3" : {
    "name" : "Linus",
    "year" : 2011
  }
} 

# This is very clunky (!)
print(myfamily)
print(myfamily["child2"]["name"]) 

## Data manipulation



### Operators



Native *Python* Arithmetic Operators

| **Operator** | **Name**        |
|---------- |--------------------|
| +         | add                |
| -         | subtraction        |
| *         | Multiplication     |
| /         | division           |
| %         | Modulus            |
| **        | Exponential        |
| //        | Floor division     |


A very useful shortcut is to use Python assignment operators

| **Operator** | **Example** | **Same as** |
|--------------|-------------|-------------|
| =            | `x=5`         | `x=5`       |
| +=           | `x+=3`        | `x=x+3`       |
| -=           | `x-=3`        | `x=x-3`       |
| *=           | `x*=3`        |  etc..      |
| /=           | `x/=3`        |             |
| %=           | `x%=3`        |             |
| //=          | `x//=3`       |             |
| **=          | `x**=3`       |             |

Python comparison operators

| **Operator** | **Name**                     |
|----------|--------------------------|
| ==       | Equal                    |
| !=       | Not equal                |
| >        | Greater than             |
| <        | Less than                |
| >=       | Greater than or equal to |
| <=       | Less than or equal to    |

Python logical operators

| **Operator** | **Description**                                                 | **Example**           |
|----------|-------------------------------------------------------------|-------------------|
| and      | Returns True if both statements are true                    | `x<5 and x<10`      |
| or       | Returns True is either statment is true                     | `x<5 or x>10`       |
| not      | Reverse the results, returns False if the statement is true | not(x<5 and x<10) |

Python identity operators

| **Operator** | **Description**                                        | **Example**    |
|----------|----------------------------------------------------------|------------|
| is       | Returns True if both variables are the same *object*     | `x is y`     |
| is not   | Returns True if both variables are not the same *object* | `x is not y` |

Python membership operators

| **Operator** | **Description**                                                                    | **Example** |
|--------------|------------------------------------------------------------------------------------|-------------|
| in           | Returns True if a sequence with the specified value is present in the *object*     | `x in y`      |
| not in       | Returns True if a sequence wit hthe specified value is not present in the *object* | `x not in y`  |


In [None]:
x =6
y = 10
z = x+y
print(z)

z += y
print(z)

print(z > 10)
print(z < 10)

print(z == 26)
print(z != 26)

### Logic



Python supports the usual logical conditions from mathematics:

*    Equals: `a == b`
*    Not Equals: `a != b`
*    Less than: `a < b`
*    Less than or equal to: `a <= b`
*    Greater than: `a > b`
*    Greater than or equal to: `a >= b`


#### If conditional



These conditions can be used in several ways, most commonly in "if statements" and loops.

An "if statement" is written by using the `if` keyword.

In [None]:
a = 33
b = 200
if b > a:
  print("b is greater than a")

**Specific to Python:** 

Python relies on indentation to define scope in the code. 

Note that there is no `end if` or `}`

In [None]:
a = 33
b = 33
if b > a:
  print("b is greater than a")
elif a == b:
  print("a and b are equal")

In [None]:
a = 200
b = 33
if b > a:
  print("b is greater than a")
elif a == b:
  print("a and b are equal")
else:
  print("a is greater than b")

There is an alternative. Use at your own risk.

Short hand `If` and `If..Else`

In [None]:
a = 200
b = 33
if a > b: print("a is greater than b")

a = 2
b = 330
print("A") if a > b else print("B") 

We can also combine the if condition with the operators described earlier:

In [None]:
a = 200
b = 33
c = 500
if a > b and c > a:
  print("Both conditions are True")

In [None]:
a = 200
b = 33
c = 500
if a > b or a > c:
  print("At least one of the conditions is True")

In [None]:
x = 41

# Nested if statements
if x > 10:
  print("Above ten,")
  if x > 20:
    print("and also above 20!")
  else:
    print("but not above 20.") 

**The pass Statement**

if statements cannot be empty, but if you for some reason have an if statement with no content, put in the pass statement to avoid getting an error.

In [None]:
a = 33
b = 200

if b > a:
  pass
else:
  print('Not True') 

In [None]:
a = 201
b = 200

if b > a:
  pass
else:
  print('Not True')

#### Python Loops



Python has two primitive loop commands:

*    `while` loops
*    `for` loops

*The While loop*

With the `while` loop we can execute a set of statements as long as a condition is true.


In [None]:
i = 1
while i < 6:
  print(i)
  i += 1

With the `break` statement we can stop the loop even if the while condition is true:

In [None]:
i = 1
while i < 6:
  print(i)
  if i == 3:
    break
  i += 1 

With the `continue` statement we can stop the current iteration, and continue with the next:

Notice in the following code we will skip the printing '3'.

In [None]:
i = 0
while i < 6:
  i += 1
  if i == 3:
    continue
  print(i)

With the `else` statement we can run a block of code once when the condition no longer is true:

In [None]:
i = 1
while i < 6:
  print(i)
  i += 1
else:
  print("i is no longer less than 6")

*The For loop*

A `for` loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).

This is less like the `for` keyword in other programming languages, and works more like an iterator method as found in other object-orientated programming languages.

With the `for` loop we can execute a set of statements, once for each item in a list, tuple, set etc.

Starting with *lists* (the most common data type you will interact with)


In [None]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
  print(x)

In [None]:
# We can loop through a string

for x in "banana":
  print(x)

In [None]:
# Again we see the application of the break statement

fruits = ["apple", "banana", "cherry"]
for x in fruits:
  print(x)
  if x == "banana":
    break

In [None]:
# Again we see the application of the continue statement

fruits = ["apple", "banana", "cherry"]
for x in fruits:
  if x == "banana":
    continue
  print(x)

**The most important application of the for loop:**

The `range()` function. 

To loop through a set of code a specified number of times, we can use the `range()` function,

The `range()` function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and ends at a specified number.

In [None]:
# This prints it 6 times, starts counting at 0.
# While the couting may not be immediately ituitive, for the `for` loop, what matters most is the number of times this operation repeats itself.

for x in range(6):
  print(x)

In [None]:
for x in range(2, 6):
  print(x)

In [None]:
for x in range(2, 30, 3):
  print(x)

In [None]:
for x in range(6):
  print(x)
else:
  print("Finally finished!") 

In [None]:
for x in range(6):
  if x == 3: 
    break
  print(x)
else:
  print("Finally finished!") 

**Nested Loops**

A nested loop is a loop inside a loop.

The "inner loop" will be executed one time for each iteration of the "outer loop":

Use nested loops carefully and wisely. You shouldn't ever have to go more than 3 levels of nested loops. If you are, restructure your loop.

In [None]:
adj = ["red", "big", "tasty"]
fruits = ["apple", "banana", "cherry"]

for x in adj:
  for y in fruits:
    print(x, y) 

## Python programming


### Functions



A function is a block of code which only runs when it is called.

You can pass data, known as parameters, into a function.

A function can return data as a result.

**Creating a Function**

In Python a function is defined using the `def` keyword:

In [None]:
def my_function():
  print("Hello from a function") 

my_function()

**Arguments**

Information can be passed into functions as arguments.

Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with a comma.

The following example has a function with one argument (fname). When the function is called, we pass along a first name, which is used inside the function to print the full name: 

In [None]:
def my_function(fname):
  print(fname + " Refsnes")

my_function("Emil")
my_function("Tobias")
my_function("Linus") 

**Number of Arguments**

By default, a function must be called with the correct number of arguments. Meaning that if your function expects 2 arguments, you have to call the function with 2 arguments, not more, and not less. 

In [None]:
def my_function(fname, lname):
  print(fname + " " + lname)

my_function("Emil", "Refsnes") 

**Arbitrary Arguments, `*args`**

If you do not know how many arguments that will be passed into your function, add a * before the parameter name in the function definition.

This way the function will receive a tuple of arguments, and can access the items accordingly:

In [None]:
def my_function(*args):
  print("The youngest child is " + args[2])

my_function("Emil", "Tobias", "Linus") 

**Keyword Arguments**

You can also send arguments with the key = value syntax.

This way the order of the arguments does not matter.

In [None]:
def my_function(child3, child2, child1):
  print("The youngest child is " + child3)

my_function(child1 = "Emil", child2 = "Tobias", child3 = "Linus") 

**Arbitrary Keyword Arguments, `**kwargs`**

If you do not know how many keyword arguments that will be passed into your function, add two asterisk: ** before the parameter name in the function definition.

This way the function will receive a dictionary of arguments, and can access the items accordingly:

In [None]:
def my_function(**kwargs):
  print("His last name is " + kwargs["lname"])

my_function(fname = "Tobias", lname = "Refsnes") 

**Default Parameter Value**

The following example shows how to use a default parameter value.

If we call the function without argument, it uses the default value:

In [None]:
def my_function(country = "Norway"):
  print("I am from " + country)

my_function("Sweden")
my_function("India")
my_function()
my_function("Brazil") 

**Passing a List as an Argument**

You can send any data types of argument to a function (string, number, list, dictionary etc.), and it will be treated as the same data type inside the function.

E.g. if you send a List as an argument, it will still be a List when it reaches the function:

In [None]:
def my_function(food):
  for x in food:
    print(x)

fruits = ["apple", "banana", "cherry"]

my_function(fruits)

**An important use of functions**

**Return Values**

To let a function return a value, use the return statement:

In [None]:
def my_function(x):
  return 5 * x

print(my_function(3))
print(my_function(5))
print(my_function(9)) 

Similar to before, we can also use the `pass` statement.

`function` definitions cannot be empty, but if you for some reason have a `function` definition with no content, put in the `pass` statement to avoid getting an error.

In [None]:
def myfunction():
  pass

A **very advanced** application of functions is to invoke **Recursion**. 

Python also accepts function recursion, which means a defined function can call itself.

Recursion is a common mathematical and programming concept. It means that a function calls itself. This has the benefit of meaning that you can loop through data to reach a result.

The developer should be very careful with recursion as it can be quite easy to slip into writing a function which never terminates, or one that uses excess amounts of memory or processor power. However, when written correctly recursion can be a very efficient and mathematically-elegant approach to programming.

In this example, tri_recursion() is a function that we have defined to call itself ("recurse"). We use the k variable as the data, which decrements (-1) every time we recurse. The recursion ends when the condition is not greater than 0 (i.e. when it is 0).

To a new developer it can take some time to work out how exactly this works, best way to find out is by testing and modifying it.

In [None]:
def tri_recursion(k):
  if(k > 0):
    result = k + tri_recursion(k - 1)
    print(result)
  else:
    result = 0
  return result

print("Recursion Example Results")
tri_recursion(6)

An **extremely** elegant implementation of the fibonacci sequence using recursion.

In [None]:
# Using recursion to define the Fibonacci sequence:
def fib(n):
    # Error check
    if type(n) is not int: return TypeError

    if n == 0 or n==1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

[fib(ii) for ii in range(6)]
    

**Python Try Except**

The `try` block lets you test a block of code for errors.

The `except` block lets you handle the error.

The `else` block lets you execute code when there is no error.

The `finally` block lets you execute code, regardless of the result of the try- and except blocks.



In [None]:
try:
  print(x_notdefined)
except:
  print("An exception occurred") 


In [None]:
try:
  print(x_notdefined)
except NameError:
  print("Variable x is not defined")
except:
  print("Something else went wrong") 

In [None]:
try:
  print("Hello")
except:
  print("Something went wrong")
else:
  print("Successfully run try") 

In [None]:
try:
  print(x)
except:
  print("Something went wrong")
finally:
  print("The 'try except' is finished") 

This can be useful to close objects and clean up resources:

In [None]:
try:
  f = open("demofile.txt")
  try:
    f.write("Lorum Ipsum")
  except:
    print("Something went wrong when writing to the file")
  finally:
    f.close()
except:
  print("Something went wrong when opening the file") 

**Raise an exception**

As a Python developer you can choose to throw an exception if a condition occurs.

To throw (or raise) an exception, use the raise keyword.

In [None]:
x = -1

if x < 0:
  raise Exception("Sorry, no numbers below zero") 

In [None]:
x = "hello"

if not type(x) is int:
  raise TypeError("Only integers are allowed") 

### Python Classes/Objects



Python is an object oriented programming language.

Almost everything in Python is an object, with its properties and methods.

A Class is like an object constructor, or a "blueprint" for creating objects.

You can think of classes as a more powerful dictionary.

**Create a Class**

To create a class, use the keyword `class`:


In [None]:
class MyClass:
  x = 5

Create Object

Now we can use the class named MyClass to create objects:

In [None]:
p1 = MyClass()
print(p1.x) 

# Notice how the x variable was automatically defined to the p1 class.

The `__init__()` Function

The examples above are classes and objects in their simplest form, and are not really useful in real life applications.

To understand the meaning of classes we have to understand the built-in `__init__()` function.

All classes have a function called `__init__()`, which is always executed when the class is being initiated.

Use the `__init__()` function to assign values to object properties, or other operations that are necessary to do when the object is being created:

In [None]:
class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

p1 = Person("John", 36)

print(p1.name)
print(p1.age) 

The `__str__()` Function

The `__str__()` function controls what should be returned when the class object is represented as a string.

If the `__str__()` function is not set, the string representation of the object is returned:

In [None]:
class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def __str__(self):
    return f"name: {self.name} \n age: {self.age}"

p1 = Person("John", 36)

print(p1) 

Object Methods

Objects can also contain methods. Methods in objects are functions that belong to the object.

Let us create a method in the Person class:

In [None]:
class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def myfunc(self):
    print("Hello my name is " + self.name)

p1 = Person("John", 36)
p1.myfunc() 

Here is a more detailed example:

In [None]:
class FruitBasket:
    def __init__(self, contains="Apples", count = 10):
        self.contains = contains
        self.count = count
    
    def length(self):
        return self.count
    
    def print_me(self):
        print(f"Basket of {self.length()} {self.contains}")

    def doubled(self, n):
        return n+n

fb_a = FruitBasket()

fb_b = FruitBasket("Bananas", 14)

fb_c = FruitBasket("Oranges", 100)

print(fb_a.length())
fb_a.print_me()
print(f"Our stocks have doubled! We now have {fb_a.doubled(fb_a.count)} {fb_a.contains}.")

print(fb_b.length())
fb_b.print_me()
print(f"Our stocks have doubled! We now have {fb_b.doubled(fb_b.count)} {fb_b.contains}.")

print(fb_c.length())
fb_c.print_me()
print(f"Our stocks have doubled! We now have {fb_c.doubled(fb_c.count)} {fb_c.contains}.")



## Essential modules


### Numpy



NumPy is a Python library.

NumPy is a Python library used for working with arrays.

NumPy is short for "Numerical Python".

Possibly the most important library in python! It's ~50x faster than using lists or loops

We use NumPy by first importing it


In [None]:
import numpy as np

Functionally, it works in the same way as previously described for lists. 

**Everything we've covered still applies.**

The only difference is that we are interacting with a much more efficient set of variables 

It also has functions for working in domain of linear algebra, fourier transform, and matrices.

NumPy is a Python library and is written partially in Python, but most of the parts that require fast computation are written in C or C++.

In [None]:
# 1D array.
arr = np.array([1, 2, 3, 4, 5])

print(arr)

In [None]:
# 2D array.
arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr) 

In [None]:
# 3D array.
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr) 

**Indexing** (accessing data)

In [None]:
arr = np.array([1, 2, 3, 4])

print(arr[0]) 

In [None]:
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr[0, 1]) 

**Slicing arrays**

Slicing in python means taking elements from one given index to another given index.

We pass slice instead of index like this: `[start:end]`.

We can also define the step, like this: `[start:end:step]`.

If we don't pass start its considered 0

If we don't pass end its considered length of array in that dimension

If we don't pass step its considered 1

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5]) 

print(arr[4:]) 

print(arr[:4]) 

print(arr[-3:-1]) 

print(arr[1:5:2]) 

print(arr[::2]) 

**Shape of an Array**

The shape of an array is the number of elements in each dimension.

In [None]:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape) 

**Reshaping arrays**

Reshaping means changing the shape of an array.

The shape of an array is the number of elements in each dimension.

By reshaping we can add or remove dimensions or change number of elements in each dimension.

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3)

print(newarr) 

In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Take a 2D array to 1D.
newarr = arr.reshape(-1)

print(newarr) 

**Iterating Arrays**

Iterating means going through elements one by one.

As we deal with multi-dimensional arrays in numpy, we can do this using basic `for` loop of python.

If we iterate on a 1-D array it will go through each element one by one.

In [None]:
arr = np.array([1, 2, 3])

for x in arr:
  print(x) 

print(f"\n")

# This is the same as:
for x in range(3):
    print(arr[x])  

In [None]:
# Iterating through a 2D array.
arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
  print(x) 

**Joining NumPy Arrays**

Joining means putting contents of two or more arrays in a single array.

We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If axis is not explicitly passed, it is taken as 0.

In [None]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

# Horizontal concatenate
arr = np.concatenate((arr1, arr2))
print(arr) 

# Vertical stacking
arr = np.vstack((arr1, arr2))
print(arr) 

In [None]:
arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

print(arr) 

In [None]:

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.stack((arr1, arr2), axis=1)

print(arr) 

arr = np.hstack((arr1, arr2))

print(arr) 

**Splitting NumPy Arrays**

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr) 

print(newarr[0])
print(newarr[1])
print(newarr[2]) 

**Searching Arrays**

You can search an array for a certain value, and return the indexes that get a match.

To search an array, use the `where()` method.

In [None]:
arr = np.array([1, 2, 3, 4, 5, 4, 4])

x = np.where(arr == 4)

print(x) 

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

x = np.where(arr%2 == 0)

print(x) 

In [None]:
arr = np.array([14, 12, 10, 8, 9])

x = np.searchsorted(arr, 7)

print(x) 

arr = np.array([6, 12, 10, 8, 9])

x = np.searchsorted(arr, 7)

print(x) 


In [None]:
arr = np.array([3, 2, 0, 1])

print(np.sort(arr)) 

In [None]:
arr = np.array([[3, 2, 4], [5, 0, 1]])

print(np.sort(arr)) 

**Filtering Arrays**

Getting some elements out of an existing array and creating a new array out of them is called filtering.

In NumPy, you filter an array using a boolean index list.

In [None]:
arr = np.array([41, 42, 43, 44])

x = [True, False, True, False]

newarr = arr[x]

print(newarr) 

In [None]:
arr = np.array([41, 42, 43, 44])

filter_arr = arr > 42

newarr = arr[filter_arr]

print(filter_arr)
print(newarr) 

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

filter_arr = arr % 2 == 0

newarr = arr[filter_arr]

print(filter_arr)
print(newarr) 

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 10, 12])

filter_array1 = arr % 3 == 0
filter_array2 = arr % 4 == 0

filter_array_combined = filter_array1 + filter_array2

newarr = arr[filter_array_combined]

print(filter_array_combined)
print(newarr) 

filter_arr = arr % 2 == 0 
newarr = arr[filter_arr]

print(filter_arr)
print(newarr) 

**What are ufuncs?**

ufuncs stands for "Universal Functions" and they are NumPy functions that operate on the `ndarray` object.

**Why use ufuncs?**

ufuncs are used to implement vectorization in NumPy which is way faster than iterating over elements.

They also provide broadcasting and additional methods like reduce, accumulate etc. that are very helpful for computation.


The key advantage over using Python's native functions is that its much faster and works natively on arrays. 

**What is Vectorization?**

Converting iterative statements into a vector based operation is called vectorization.

It is faster as modern CPUs are optimized for such operations.

In [None]:
x = [1, 2, 3, 4]
y = [4, 5, 6, 7]
z = []

z = np.add(x, y)
print(z) 

z = np.multiply(x, y)
print(z) 

z = np.divide(x, y)
print(z) 

z = np.power(x, y)
print(z) 

In [None]:
arr = np.array([1,4,9])

print(np.sqrt(arr))
print(np.exp(arr))
print(np.log(arr))
print(np.sin(arr))
print(np.tan(arr))
print(np.tanh(arr))
print(np.dot(arr, arr.T))

In [None]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 2, 3])

newarr = np.add(arr1, arr2)

print(newarr) 

newarr = np.sum([arr1, arr2])

print(newarr) 

newarr = np.sum([arr1, arr2], axis=1)

print(newarr) 

newarr = np.cumsum(arr1)

print(newarr) 

**Trigonometry**

In [None]:
x = np.sin(np.pi/2)

print(x) 


arr = np.array([90, 180, 270, 360])

x = np.deg2rad(arr)

print(x) 

In [None]:
arr = np.array([90, 180, 270, 360])

x = np.deg2rad(arr)

print(x) 

**Other set operations**

In [None]:
arr = np.array([1, 1, 1, 2, 3, 4, 5, 5, 6, 7])

x = np.unique(arr)

print(x) 

Notice that this is similar to some of the basic Python functions.

In [None]:
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([3, 4, 5, 6])

newarr = np.union1d(arr1, arr2)

print(newarr) 

### Pandas

**What is Pandas?**

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis".

**Why Use Pandas?**

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

What Can Pandas Do?

Pandas gives you answers about the data. Like:

*    Is there a correlation between two or more columns?
*    What is average value?
*    Max value?
*    Min value?


Generally, Pandas works like a dictionary/Class. We refer to objects made using Pandas as Dataframes.

In [None]:
import pandas as pd


In [None]:
mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}

myvar = pd.DataFrame(mydataset)

print(myvar)

**What is a Series?**

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

In [None]:
a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

In [None]:
a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

In [None]:
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 

**Location, location, location**

There are two principal ways we access data within the data frame.

In [None]:
#refer to the row index:
print(df.loc[0])

In [None]:
#use a list of indexes:
print(df.loc[[0, 1]])

In [None]:
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df) 

In [None]:
#refer to the named index:
print(df.loc["day2"])

In [None]:
# Despite the change of index, we can still use iloc to locate the first row.
print(df.iloc[0])

In [None]:
df = pd.read_csv('data.csv')

df

In [None]:
df = pd.read_csv('data.csv')

df.head()

In [None]:
df.head(10) 

In [None]:
df.tail()

Information on the imported DataFrame

In [None]:
print(df.info()) 

In [None]:
new_df = df.dropna()

new_df

We can query the various named columns for values we are interested in.

In [None]:
x = df["Calories"].mean()
print(x)

x = df["Calories"].max()
print(x)

x = df["Calories"].min()
print(x)

x = df["Calories"].std()
print(x)

We can also spot fix certain values as required.

In [None]:
print(df.head(10))
df.loc[7, 'Duration'] = 60
print(df.head(10))
# Changing it back
df.loc[7, 'Duration'] = 45

You can also iterate through the entire column and perform a select task. Lets go through the duration column and drop all rows with less than 60 minutes.

In [None]:
df.head(10) 

In [None]:
print(df.head(10))
print(df.size)

for x in df.index:
    if df.loc[x, "Duration"]<60:
        df.drop(x, inplace = True)

print(df.head(10))
print(df.size)

Finding duplicates

In [None]:
df = pd.read_csv('data.csv')

print(df.duplicated())

df.drop_duplicates(inplace = True) 

df

You can also do some basic data analysis using Pandas natively.

In [None]:
df.corr() 

### SciPy

**What is SciPy?**

SciPy is a scientific computation library that uses NumPy underneath.

SciPy stands for Scientific Python.

It provides more utility functions for optimization, stats and signal processing.

Like NumPy, SciPy is open source so we can use it freely.

SciPy was created by NumPy's creator Travis Olliphant.

**Why Use SciPy?**

If SciPy uses NumPy underneath, why can we not just use NumPy?

SciPy has optimized and added functions that are frequently used in NumPy and Data Science.

**Which Language is SciPy Written in?**

SciPy is predominantly written in Python, but a few segments are written in C.

In [None]:
import scipy 
from scipy import constants


print(constants.pi)
print(constants.golden)


NumPy is capable of finding roots for polynomials and linear equations, but it can not find roots for non linear equations, like this one:

`x+cos(x)`

for this we will use SciPy's `optimize.root` function.

In [None]:
from scipy.optimize import root
from math import cos

def eqn(x):
  return x + cos(x)

myroot = root(eqn, 0)

print(myroot.x)

In [None]:
from scipy.optimize import minimize

def eqn(x):
  return x**2 + x + 2

mymin = minimize(eqn, 0, method='BFGS')

print(mymin)

**Working With Matlab Arrays**

We know that NumPy provides us with methods to persist the data in readable formats for Python. But SciPy provides us with interoperability with Matlab as well.

SciPy provides us with the module scipy.io, which has functions for working with Matlab arrays.

In [None]:
from scipy import io
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])

# Export:
io.savemat('arr.mat', {"vec": arr})

# Import:
mydata = io.loadmat('arr.mat')

print(mydata) 

print(mydata['vec']) 

### Matplotlib

**What is Matplotlib?**

Matplotlib is a low level graph plotting library in python that serves as a visualization utility.

Matplotlib is mostly written in python, a few segments are written in C, Objective-C and Javascript for Platform compatibility.

An common alternative, is to use **Seaborn**

Some basic plots with example functionality

In [None]:
import matplotlib.pyplot as plt

xpoints = np.array([0, 6])
ypoints = np.array([0, 250])

plt.plot(xpoints, ypoints)
plt.show()

In [None]:
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])

plt.plot(xpoints, ypoints, 'o')
plt.show()

In [None]:
xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])

plt.plot(xpoints, ypoints)
plt.show()

In [None]:
plt.plot(xpoints, ypoints, 'o:r', ms = 20)
plt.show()

**Subplots**

In [None]:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)

plt.show()

In [None]:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(2, 1, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(2, 1, 2)
plt.plot(x,y)

plt.show()

In [None]:
#day one, the age and speed of 13 cars:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)

#day two, the age and speed of 15 cars:
x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y)

plt.show() 

In [None]:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])

plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar()
plt.show() 

In [None]:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
sizes = np.array([20,50,100,200,500,1000,60,90,10,300,600,800,75])

plt.scatter(x, y, s=sizes)

plt.show() 

In [None]:
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])

plt.bar(x,y)
plt.show()

In [None]:
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])

plt.barh(x, y)
plt.show()

In [None]:
x = np.random.normal(170, 10, 250)

print(x) 

In [None]:
x = np.random.normal(170, 10, 250)

plt.hist(x)
plt.show() 

In [None]:
y = np.array([35, 25, 25, 15])

plt.pie(y)
plt.show() 

In [None]:

y = np.array([35, 25, 25, 15])
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels)
plt.legend(title = "Four Fruits:")
plt.show() 

In [None]:
plt.style.use('_mpl-gallery')

# make data
np.random.seed(1)
x = np.linspace(0, 8, 16)
y1 = 3 + 4*x/8 + np.random.uniform(0.0, 0.5, len(x))
y2 = 1 + 2*x/8 + np.random.uniform(0.0, 0.5, len(x))

# plot
fig, ax = plt.subplots()

ax.fill_between(x, y1, y2, alpha=.5, linewidth=0)
ax.plot(x, (y1 + y2)/2, linewidth=2)

ax.set(xlim=(0, 8), xticks=np.arange(1, 8),
       ylim=(0, 8), yticks=np.arange(1, 8))

plt.show()

In [None]:
plt.style.use('_mpl-gallery-nogrid')

# make data
X, Y = np.meshgrid(np.linspace(-3, 3, 16), np.linspace(-3, 3, 16))
Z = (1 - X/2 + X**5 + Y**3) * np.exp(-X**2 - Y**2)

# plot
fig, ax = plt.subplots()

ax.imshow(Z)

plt.show()

In [None]:
plt.style.use('_mpl-gallery-nogrid')

# make data
X, Y = np.meshgrid(np.linspace(-3, 3, 256), np.linspace(-3, 3, 256))
Z = (1 - X/2 + X**5 + Y**3) * np.exp(-X**2 - Y**2)
levels = np.linspace(np.min(Z), np.max(Z), 7)

# plot
fig, ax = plt.subplots()

ax.contour(X, Y, Z, levels=levels)

plt.show()

In [None]:
# make data
x = np.linspace(-4, 4, 6)
y = np.linspace(-4, 4, 6)
X, Y = np.meshgrid(x, y)
U = X + Y
V = Y - X

# plot
fig, ax = plt.subplots()

ax.quiver(X, Y, U, V, color="C0", angles='xy',
          scale_units='xy', scale=5, width=.015)

ax.set(xlim=(-5, 5), ylim=(-5, 5))

plt.show()

In [None]:
plt.style.use('_mpl-gallery')

# make data:
np.random.seed(10)
D = np.random.normal((3, 5, 4), (1.25, 1.00, 1.25), (100, 3))

# plot
fig, ax = plt.subplots()
VP = ax.boxplot(D, positions=[2, 4, 6], widths=1.5, patch_artist=True,
                showmeans=False, showfliers=False,
                medianprops={"color": "white", "linewidth": 0.5},
                boxprops={"facecolor": "C0", "edgecolor": "white",
                          "linewidth": 0.5},
                whiskerprops={"color": "C0", "linewidth": 1.5},
                capprops={"color": "C0", "linewidth": 1.5})

ax.set(xlim=(0, 8), xticks=np.arange(1, 8),
       ylim=(0, 8), yticks=np.arange(1, 8))

plt.show()

In [None]:
plt.style.use('_mpl-gallery')

# make data:
np.random.seed(10)
D = np.random.normal((3, 5, 4), (0.75, 1.00, 0.75), (200, 3))

# plot:
fig, ax = plt.subplots()

vp = ax.violinplot(D, [2, 4, 6], widths=2,
                   showmeans=False, showmedians=False, showextrema=False)
# styling:
for body in vp['bodies']:
    body.set_alpha(0.9)
ax.set(xlim=(0, 8), xticks=np.arange(1, 8),
       ylim=(0, 8), yticks=np.arange(1, 8))

plt.show()

In [None]:
plt.style.use('_mpl-gallery-nogrid')

# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3

# plot:
fig, ax = plt.subplots()

ax.hist2d(x, y, bins=(np.arange(-3, 3, 0.1), np.arange(-3, 3, 0.1)))

ax.set(xlim=(-2, 2), ylim=(-3, 3))

plt.show()

In [None]:
# Make data
np.random.seed(19680801)
n = 100
rng = np.random.default_rng()
xs = rng.uniform(23, 32, n)
ys = rng.uniform(0, 100, n)
zs = rng.uniform(-50, -25, n)

# Plot
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
ax.scatter(xs, ys, zs)

ax.set(xticklabels=[],
       yticklabels=[],
       zticklabels=[])

plt.show()

In [None]:
from matplotlib import cm

plt.style.use('_mpl-gallery')

# Make data
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)

# Plot the surface
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
ax.plot_surface(X, Y, Z, vmin=Z.min() * 2, cmap=cm.Blues)

ax.set(xticklabels=[],
       yticklabels=[],
       zticklabels=[])

plt.show()

In [None]:
from mpl_toolkits.mplot3d import axes3d

plt.style.use('_mpl-gallery')

# Make data
X, Y, Z = axes3d.get_test_data(0.05)

# Plot
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10)

ax.set(xticklabels=[],
       yticklabels=[],
       zticklabels=[])

plt.show()

In [None]:
from matplotlib.patches import Ellipse
import matplotlib.transforms as transforms


def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
    """
    Create a plot of the covariance confidence ellipse of *x* and *y*.

    Parameters
    ----------
    x, y : array-like, shape (n, )
        Input data.

    ax : matplotlib.axes.Axes
        The axes object to draw the ellipse into.

    n_std : float
        The number of standard deviations to determine the ellipse's radiuses.

    **kwargs
        Forwarded to `~matplotlib.patches.Ellipse`

    Returns
    -------
    matplotlib.patches.Ellipse
    """
    if x.size != y.size:
        raise ValueError("x and y must be the same size")

    cov = np.cov(x, y)
    pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
    # Using a special case to obtain the eigenvalues of this
    # two-dimensional dataset.
    ell_radius_x = np.sqrt(1 + pearson)
    ell_radius_y = np.sqrt(1 - pearson)
    ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2,
                      facecolor=facecolor, **kwargs)

    # Calculating the standard deviation of x from
    # the squareroot of the variance and multiplying
    # with the given number of standard deviations.
    scale_x = np.sqrt(cov[0, 0]) * n_std
    mean_x = np.mean(x)

    # calculating the standard deviation of y ...
    scale_y = np.sqrt(cov[1, 1]) * n_std
    mean_y = np.mean(y)

    transf = transforms.Affine2D() \
        .rotate_deg(45) \
        .scale(scale_x, scale_y) \
        .translate(mean_x, mean_y)

    ellipse.set_transform(transf + ax.transData)
    return ax.add_patch(ellipse)

def get_correlated_dataset(n, dependency, mu, scale):
    latent = np.random.randn(n, 2)
    dependent = latent.dot(dependency)
    scaled = dependent * scale
    scaled_with_offset = scaled + mu
    # return x and y of the new, correlated dataset
    return scaled_with_offset[:, 0], scaled_with_offset[:, 1]

np.random.seed(0)

PARAMETERS = {
    'Positive correlation': [[0.85, 0.35],
                             [0.15, -0.65]],
    'Negative correlation': [[0.9, -0.4],
                             [0.1, -0.6]],
    'Weak correlation': [[1, 0],
                         [0, 1]],
}

mu = 2, 4
scale = 3, 5

fig, axs = plt.subplots(1, 3, figsize=(9, 3))
for ax, (title, dependency) in zip(axs, PARAMETERS.items()):
    x, y = get_correlated_dataset(800, dependency, mu, scale)
    ax.scatter(x, y, s=0.5)

    ax.axvline(c='grey', lw=1)
    ax.axhline(c='grey', lw=1)

    confidence_ellipse(x, y, ax, edgecolor='red')

    ax.scatter(mu[0], mu[1], c='red', s=3)
    ax.set_title(title)

plt.show()

In [None]:
fig, ax_nstd = plt.subplots(figsize=(6, 6))

dependency_nstd = [[0.8, 0.75],
                   [-0.2, 0.35]]
mu = 0, 0
scale = 8, 5

ax_nstd.axvline(c='grey', lw=1)
ax_nstd.axhline(c='grey', lw=1)

x, y = get_correlated_dataset(500, dependency_nstd, mu, scale)
ax_nstd.scatter(x, y, s=0.5)

confidence_ellipse(x, y, ax_nstd, n_std=1,
                   label=r'$1\sigma$', edgecolor='firebrick')
confidence_ellipse(x, y, ax_nstd, n_std=2,
                   label=r'$2\sigma$', edgecolor='fuchsia', linestyle='--')
confidence_ellipse(x, y, ax_nstd, n_std=3,
                   label=r'$3\sigma$', edgecolor='blue', linestyle=':')

ax_nstd.scatter(mu[0], mu[1], c='red', s=3)
ax_nstd.set_title('Different standard deviations')
ax_nstd.legend()
plt.show()