# DSC4828 Tutorial 3

## Python concepts covered in this tutorial:
* Tuples in Python
* Dictionaries in Python
* Numpy

## Tuples in Python
A tuple is used for storing multiple items in a single variable. Tuples are denoted in Python using round brackets. The code below will illustrate.

In [1]:
car1 = ('Ford', 'Fiesta', 'FX145RC GP', 2016)
print(type(car1))
print(len(car1))
car2 = ('Volkswagen', 'Tiguan', 'NDT39HY GP', 2018)
print(car2[0])

for i in car1:
    print(i)

<class 'tuple'>
4
Volkswagen
Ford
Fiesta
FX145RC GP
2016


Tuples in Python are **immutable**. This means that once they are created, the values cannot change. Run the code below to see what happens.

In [2]:
car2[3] = 2020

TypeError: 'tuple' object does not support item assignment

## Dictionaries in Python
In Python, a **dictionary** is a collection that can be used for storing **key-value pairs**. You can think of a dictionary as an **unordered** collection of elements where the elements are accessed using a key. The key is used just like the index (position) is used to access an element in a list. Every element of a dictionary has a key (used for indexing) and a value (the actual data stored in the dictionary). 
One of the advantages of using dictionaries is that they are very fast at doing lookups.
Dictionaries are denoted using curly braces with colons between the key and value pairs. The following code will illustrate:

In [None]:
# In the example below, a dictionary is created for storing names using a cell phone number as the key
names = {'0836784331': 'Gert Mkhize', '0845671238': 'Ayanda Botha', '0836164269': 'Anna Marota', '0612356649': 'Pete Mills'}
print(type(names))
print(len(names))
print(names['0836164269'])
cell = '0612356649'
if cell in names:
    print(cell, "is in the dictionary and has the name", names[cell])
else:
    print(cell, "is not in the dictionary")

Adding a new element to a dictionary is simple:

In [None]:
names['0734562741'] = 'Wonder Bingwa'
print(len(names))

Dictionaries are unordered - elements are accessed using the key value and not the position. The following code illustrates this:

In [3]:
digits = {9: 'nine', 8:'eight', 1: 'one', 2: 'two', 3:'three', 5:'five', 6:'six', 7:'seven', 4:'four',0:'zero'}
print(digits[0])

zero


The values stored in a dictionary can be of any type, such as a list of values or a tuple of data as seen below:

In [None]:
names_cars = {'0836784331': ('Gert Mkhize', car1), '0845671238': ('Ayanda Botha', None), 
              '0836164269': ('Anna Marota', car2), '0612356649': ('Pete Mills', None)}
for cell, person in names_cars.items():  # cell refers to the key and person to the value
    if person[1] == None:
        print(person[0], 'does not have a car')
    else:
        print(person[0], 'owns car:', person[1])

**Exercise**: Find two ways to delete an element from a dictionary. Add code below that illustrates how they are used. 

## Numpy
Numpy is the primary package for scientific computing in Python. There are three main reasons why numpy is important for data science:
* speed
* functionality
* many data science packages are built ontop of numpy.

The main class in numpy is called an **ndarray** (N-dimensional array). An ndarray is a container with elements of the same type and includes methods that can be executed on whole ndarrays without using loops.

An array is like a list, except that all the elements are of the same type. The dimension of an array is called the **rank**, so a vector is a rank 1 array, while a matrix is a rank 2 array. The **shape** of an ndarray is a tuple of integers indicating the size of the array in each dimension.

In [2]:
pip install numpy

Defaulting to user installation because normal site-packages is not writeableNote: you may need to restart the kernel to use updated packages.

Collecting numpy
  Downloading numpy-1.21.6-cp37-cp37m-win_amd64.whl (14.0 MB)
Installing collected packages: numpy
Successfully installed numpy-1.21.6


You should consider upgrading via the 'c:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\python.exe -m pip install --upgrade pip' command.


In [1]:
import numpy as np    # before using numpy, you must import the package
numsList = [10, 20, 30, 40, 50]    # a list of 5 numbers
vector = np.array(numsList)  # a numpy array of 5 numbers
matrix = np.array([[10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33]])

print(type(numsList), "does not have a shape")
print(type(vector), "with shape =", vector.shape, "; dimension =", vector.ndim, "; size =", vector.size)
print(type(matrix), "with shape:", matrix.shape, "; dimension =", matrix.ndim, "; size =", matrix.size)

print("Here is the whole vector:", vector)
print("Here is the whole matrix:\n", matrix)

print("First element of vector:", vector[0])
print("First element of matrix is a vector:", matrix[0])
print("First element of matrix is accessed using two indices:", matrix[0,0])
print("Here is the first column of the matrix:", matrix[:,0])


<class 'list'> does not have a shape
<class 'numpy.ndarray'> with shape = (5,) ; dimension = 1 ; size = 5
<class 'numpy.ndarray'> with shape: (3, 4) ; dimension = 2 ; size = 12
Here is the whole vector: [10 20 30 40 50]
Here is the whole matrix:
 [[10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]]
First element of vector: 10
First element of matrix is a vector: [10 11 12 13]
First element of matrix is accessed using two indices: 10
Here is the first column of the matrix: [10 20 30]


There are many useful functions for initialising ndarrays:

In [None]:
example1 = np.zeros((5,10))
print(example1)
example2 = np.ones((2,3,4))
print(example2)
example3 = np.random.random((3,5))
print(example3)

Arithmetic operators can be used on ndarrays:

In [None]:
percentages = example3 * 100
print(percentages)

There are also many useful methods defined for ndarrays:

In [None]:
example3.sort()
print(example3)

colAvg = example3.mean(axis=0)
print("Column averages:", colAvg)
rowAvg = example3.mean(axis=1)
print("Row averages:", rowAvg)

Numpy includes a function **info()** that can be used to quickly see the documentation on a function, class or module. Run the following code to see how this works:

In [None]:
print(np.info(np.mean))

**Slicing** is a useful way of extracting data from a numpy array. Work through the following online guide to see how this is done: https://www.pythoninformer.com/python-libraries/numpy/index-and-slice/
Add code cells here to illustrate what you have learnt.

## Exercises
1. Write a function called first_duplicate that takes a list as an argument and returns the first element that is duplicated in the list. If no duplicates are found, the function should return None. **Hint:** One way of doing this is to use the count method to see how many instances of each element there are. If there is more than 1, it means that that element is duplicated.
2. Write code to test the function first_duplicate with a list of numbers and a list of tuples. If your function does not work with both types, then change it until it does work with these two types.
3. Write code to do the following: 
    3.1 Declare a dictionary that includes entries with duplicated keys but with different values. 
    3.2 Print out the length of the dictionary and the dictionary itself. What do you notice?
4. Python has a built-in function called **sorted()** that takes a collection object (such as a list) as an argument and returns a list of the objects in sorted order. Write code to test the sorted() function on a list of numbers, a list of tuples and a single dictionary object. Write down any observations that you make on how the function works with these types.
5. Write a program that asks the user to enter numbers until they enter the character 's' to stop. Store the numbers in a numpy array. Print out the mean and standard deviation of the values entered. **Hints:** (1) Use a while loop to input values until the letter s is entered. (2) Append the values to a list in the while loop and create a numpy array from the list after all the values have been entered. (3) The method **std()** returns the standard deviation of a numpy array.
6. Write code to create a 5 x 6 matrix of random values between 0 and 1, rounded off to one decimal place. Print out the matrix. Use slicing to extract the matrix consisting of the values not on the outside of the matrix (i.e. the 3 x 4 matrix in the centre). Print out the inner matrix.