### CDS502: Big Data Storage and Management

# Lab 01: Introduction to Python

This notebook is adapted for self-study purpose.

Last updated: Sun, 25 October 2020

## Outline

- **[Section 1: Markdown Cells](#section1)**
  - [Bold and italics](#bold_and_italics)
  - [Paragraphs](#paragraphs)
  - [Bulleted Lists](#bulleted_lists)
  - [Numbered Lists](#numbered_lists)
  - [Colored Text](#colored_text)
- **[Section 2: Code Cells](#section2)**
- **[Section 3: Python Basics](#section3)**
  - [For Loops](#for_loops)
    - [**Program: 01**](#program1)
    - [**Program: 02**](#program2)
  - [Print Statement: Tabs and new lines](#tab_newline)
    - [**Program: 03**](#program3)
  - [Import Modules and Packages](#import)
    - [Mathematics](#mathematics)
    - [Dates and Times](#date_time)
  - [Data Structures](#data_structure)
- [**Challenge: The Guessing Game**](#challenge)

<a name="section1"></a>

# Section 1: Markdown Cells

<a name="bold_and_italics"></a>

## Bold and italics

There are two ways to specify bold and italic texts:

#### 1. Use asterisks

Syntax:
```
It's **very** easy to do **bold** and *italics*

```
Output:
> It's **very** easy to do **bold** and *italics*

#### 2. Use underscores

Syntax:
```
It's __very__ easy to do __bold__ and _italics_
```
Output:
> It's __very__ easy to do __bold__ and _italics_

<a name="paragraphs"></a>

## Paragraphs

In Markdown cells, new paragraph starts only after two consecutive line breaks, i.e.,

Syntax:
```
This is line 1

This is line 2
```
Output:
> This is line 1
>
> This is line 2

The following example demonstrates the texts being rendered as a single line in Markdown, even though it is written in multiple lines.

Syntax:
```
You can write your paragraph on one long line,
or you can
wrap the lines yourself
if you prefer.
```
Output:
> You can write your paragraph on one long line,
> or you can
> wrap the lines yourself
> if you prefer.

**Bonus hint**: You can insert HTML tag `<br />` for single line break.

Syntax:
```
This is line 1.<br />This is line 2.
```

Output:
> This is line 1.<br />This is line 2.

<a name="bulleted_lists"></a>

## Bulleted Lists

Start each line with hyphen (`-`) or an asterisk (`*`), followed by a space. List items can be nested (notice that there are multiple spaces before the nested items).

Syntax:
```
* Bullet 1
* Bullet 2
  * Bullet 2a
  * Bullet 2b
* Bullet 3
  * Bullet 3a
  * Bullet 3b
    * Bullet 3b(a)
    * Bullet 3b(b)
```

Output:
> * Bullet 1
> * Bullet 2
>   * Bullet 2a
>   * Bullet 2b
> * Bullet 3
>   * Bullet 3a
>   * Bullet 3b
>     * Bullet 3b(a)
>     * Bullet 3b(b)

<a name="numbered_lists"></a>

## Numbered Lists

Start each line with number and a period (`.`), then a space. 

Syntax:
```
1. Baked potato
2. Baked beans
3. Pepper
```

Output:
> 1. Baked potato
> 2. Baked beans
> 3. Pepper

<a name="colored_text"></a>

## Colored Text

By manipulating HTML tags, the text color in the Markdown cells can be changed.

Syntax:
```
This <span style="color:red">word</span> is not black. This is <span style="color:red">red</span>.

This <span style="color:green">word</span> is not black. This is <span style="color:green">green</span>.
```

Output:
> This <span style="color:red">word</span> is not black. This is <span style="color:red">red</span>.
> 
> This <span style="color:green">word</span> is not black. This is <span style="color:green">green</span>.

**Optional**: Check this [GitHub link](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) for a copy of your Markdown cheatsheet.

*Note: The Markdown cheatsheet is not comprehensive but useful enough to write the basic Markdown script.*

<a name="section2"></a>

# Section 2: Code Cells

Try typing `1 + 2` into the code cell, then hit **Shift + Enter** button on the keyboard. When hitting **Shift + Enter**, the code in the cell will be evaluated, you will be placed in a new cell, and you will get the following:

In [36]:
1 + 2
# 2 + 3

3

One very interesting feature of the notebook is that you can go back in a cell, change it and reevaluate it again, thus updating your whole document. Try this by going back to the previous cell, changing `1 + 2` to `2 + 3`, and reevaluating the cell, by pressing **Shift + Enter**. You will notice the result was updated to `5` as soon as you evaluated the cell. This can be very powerful when you want to explore data or test an equation with different parameters without having to reevaluate your whole script. You can, however, reevaluate the whole notebook at once, by going to **Cell $\rightarrow$ Run All**.

More examples are demonstrated below:

In [37]:
# round(): Python built-in function to round a number to the nearest integer.
round(2.3222)

2

In [38]:
# %: Modulus operator
# Calculate the remainder when a is divided by b (a % b)
5 % 2

1

In [39]:
# Combining round() and % operator
round((10 / 3) % 2 + 10)

11

<a name='section3'></a>

# Section 3: Python Basics

<a name='for_loops'></a>

## For Loops

The basic of a `for` loop: In Python (and other programming languages), a `for` loop enables us to easily evaluate the same block of codes as many times as we like. The simplest `for` loop in Python would be defined as follows: 

In [40]:
# Run for loop for 5 times.
for i in range(5):
    print(i)

0
1
2
3
4


Note that `i` starts from `0` and incremented by `1` at each new loop. We can think of `range(5)` as a **range** of values `[0, 1, 2, 3, 4]` (its data type is **range**: verify it by calling `type(range(5))`). Therefore, we can say that during the first loop, `i` is assigned to the first element: `0`; followed by the second element: `1` during the second loop, until the last element is accessed.

Variable assignment in Python is easy. Like most programming languages do, we specify the variable name on the left and the value on the right. The equal sign `=` is called the **assignment operator**. A line of code `a = 3` simply means that: "Assign the value `3` to the variable `a`". Furthermore, we can perform some operations to the variables (**<span style="color:red">only variables that we have defined earlier**</span>) and assign the new value to another variable, e.g., `c = a + b`.

In [41]:
a = 3
b = 4
e = 5
c = a + b
c

7

Notice that we redefine the variable `b` by assigning new value to it. Furthermore, the variable `e` has been defined from the previous cell (and you need to run it for sure). Therefore, we can perform some operations on the two variables `b` (with the new value) and `e`, as demonstrated below.

In [42]:
b = 6
d = b + e
d

11

In a `for` loop, calling `i` as previously done is totally optional depending on the tasks we intend to perform. If the value of `i` is not required in the code block that we want to run for `n` times, we can replace `i` with underscore: `_`. Then, the syntax becomes `for _ in range(n)`. 

In [43]:
# Print "Hello" 10 times.
for _ in range(10):
    print('Hello')

Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello
Hello


The program below asks the user for a number and prints its square, then asks for another number and prints its square, etc. It does this three times and then prints that the loop is done.

In [44]:
for i in range(3):
    num = int(input('Enter a number: '))
    print('The square of your number is', num * num) # num squared
    
print('The loop is now done.')

Enter a number: 5
The square of your number is 25
Enter a number: 4
The square of your number is 16
Enter a number: 3
The square of your number is 9
The loop is now done.


<a name='program1'></a>

### Program: 01

**The program below will print A, then B, then it will alternate C’s and D’s five times and then finish with the letter E once.**

In [45]:
# print('A')
# print('B')
# for i in range(5):
#     print('C')
#     print('D')
#     print('E')
    
# Alternative: use `\n` for "new line"
print('A\nB')
for _ in range(5): print('C\nD\nE')

A
B
C
D
E
C
D
E
C
D
E
C
D
E
C
D
E


### Try this: 01

**<span style="color:blue">We wanted the above program to print five C’s followed by five D’s, instead of alternating C’s and D’s.</span>**

Expected output:

```
A 
B 
C 
C 
C 
C 
C 
D 
D 
D 
D 
D
```

#### Solution

In [46]:
# Solution
# print('A')
# print('B')
# for i in range(5):
#     print('C')
# for i in range(5):
#     print('D')
    
# Alternative: Lambda function
λ = lambda char, num = 1: [print(char) for _ in range(num)]
_ = λ('A') + λ('B') + λ('C', 5) + λ('D', 5)

A
B
C
C
C
C
C
D
D
D
D
D


**Optional**: Check [documentation](https://docs.python.org/3/tutorial/controlflow.html?highlight=lambda#lambda-expressions) for more about **lambda** expression. Don't worry if you don't understand it for now.

**Bonus trick**:
You can quickly type any Greek symbol in the code cell with this simple trick, say you want the symbol: lambda (`λ`)
1. Type the escape character: `\`
2. Type the first few letters of "lambda": `\lam`
3. Press **Tab** key on the keyboard, you should see `\lam` is autocompleted and become `\lambda`
4. Press again **Tab** key, and `\lambda` will become `λ`.

Have fun!

<a name='program2'></a>

### Program: 02

The program below prints a rectangle of stars that is 4 rows tall and 6 rows wide.

In [47]:
for i in range(4):
    print('*' * 6)

******
******
******
******


If we want to make a triangle instead:

In [48]:
for i in range(4):
    print('*' * (i + 1))

*
**
***
****


### Try This: 02

**<span style="color:blue">Use for loops to print a diamond like the one below. Allow the user to specify how high the diamond should be.</span>**

Expected output:

```
Please input side length of diamond: 5
    *
   ***
  *****
 *******
*********
 *******
  *****
   ***
    *
```

#### Solution 1

In [49]:
# Solution 1

# Prompt user input
side = int(input("Please input side length of diamond: "))


# **************************************************

# - Method 1: More comprehensive look
char = "*"

upper_diamond = list(range(side))                # [0, 1, 2, 3, 4]
lower_diamond = list(reversed(range(side - 1)))  # [3, 2, 1, 0]
diamond = upper_diamond + lower_diamond          # [0, 1, 2, 3, 4, 3, 2, 1, 0]

for x in diamond:
    left_padding = ' ' * (side - (x + 1))        # [4, 3, 2, 1, 0, 1, 2, 3, 4]
    body = char * ((x * 2) + 1)                  # [1, 3, 5, 7, 9, 7, 5, 3, 1]
    print('{}{}'.format(left_padding, body))
#     print(f'{left_padding}{body}') # <-- You can run this (f string) on Python v3.6+


# *************************************************

# # - Method 2: Simplified look
# for x in list(range(side)) + list(reversed(range(side - 1))):
#     print('{: <{w1}}{:*<{w2}}'.format('', '', w1 = side - x - 1, w2 = x * 2 + 1))


Please input side length of diamond: 6
     *
    ***
   *****
  *******
 *********
***********
 *********
  *******
   *****
    ***
     *


#### Solution 2

In [50]:
# Solution 2

# Prompt user input
n = int(input("Please input side length of diamond: "))


# **************************************************

# - Method 1: lambda function + list comprehensive
λ = lambda r: [print((n - (x + 1)) * ' ' + (2 * x + 1) * '*') for x in r]
_ = λ(range(n - 1)) + λ(range(n - 1, -1, -1))


# **************************************************

# # - Method 2:
# # Upper diamond
# for idx in range(n - 1):
#     print((n - (idx + 1)) * ' ' + (2 * idx + 1) * '*')
# #     print((n - idx) * ' ' + (2 * idx + 1) * '*')
    
# # Lower diamond
# for idx in range(n - 1, -1, -1): # range(start, end, interval)
#     print((n - (idx + 1)) * ' ' + (2 * idx + 1) * '*')
# #     print((n - idx) * ' ' + (2 * idx + 1) * '*')

Please input side length of diamond: 5
    *
   ***
  *****
 *******
*********
 *******
  *****
   ***
    *


<a name='tab_newline'></a>

## Tab & New Line

`\t` -- Tab

`\n` -- New line

\' -- To insert '

<a name='program3'></a>

### Program: 03

In [51]:
print('This is a new paragraph. \n\t Hello world! \nThis is a new line.')

This is a new paragraph. 
	 Hello world! 
This is a new line.


### Try This: 03

**<span style="color:blue">Write a Python program to print the following string in a specific format.</span>**

Expected output:

```
A: "What are you doing now?"
    B: "I'm watching TV, what about you?"
A: "I'm doing my homework, but I really need to take a break."
    B: "You want to do something?"
A: "Yes. But I shouldn't. I got to finish my assignment now."
    B: "Alright. Call me later then."
A: "OK. Bye."
```

#### Solution

In [52]:
# Solution
print("A: \"What are you doing now?\"\n\t \
B: \"I\'m watching TV, what about you?\"\n \
A: \"I\'m doing my homework, but I really need to take a break.\"\n\t \
B: \"You want to do something?\"\n \
A: \"Yes. But I shouldn\'t. I got to finish my assignment now.\"\n\t \
B: \"Alright. Call me later then.\"\n \
A: \"OK. Bye.\"")

A: "What are you doing now?"
	 B: "I'm watching TV, what about you?"
 A: "I'm doing my homework, but I really need to take a break."
	 B: "You want to do something?"
 A: "Yes. But I shouldn't. I got to finish my assignment now."
	 B: "Alright. Call me later then."
 A: "OK. Bye."


<a name=import></a>

## Import Modules and Packages

The following shows some common Python modules/packages which provide some convenient functions which can be quite useful for certain tasks. Before calling these functions, we need to tell the machine to specify which modules/packages we will be using. The syntax is: `import {package/module_name}`.

```
import datetime
import math
import numpy
import pandas
import platform
```

In [53]:
# To check Version of Python that is running
import platform
print(platform.python_version())

3.8.3


Python also has some built-in functions which do not require the `import` call. One of them is the `type()` function.

In [54]:
# To check variable type
s = 0.314
type(s)

float

<a name=mathematics></a>

### Mathematics

The following are some common Python modules/packages which provide mathematical functions
- `math`: gives access to the underlying C library functions for floating point math.
- `random`: provides tools for making random selections.
- `statistics`: calculates basic statistical properties (the mean, median, variance, etc.) of numeric data.

#### `math` module 

The lines of code in the cell below can be expressed mathematically as follows:

- `math.cos(math.pi / 4)`: $\displaystyle{\cos{\left(\frac{\pi}{4}\right)}}$
- `math.log(1024, 2)`: $\displaystyle{\log_2{1024}}$

Check [documentation](https://docs.python.org/3/library/math.html?highlight=math#module-math) for more on `math` module.

**Bonus hint**: The mathematical formula written in $\LaTeX$ can be rendered in the Markdown cell. 

In [55]:
import math

print(math.cos(math.pi / 4))
print(math.log(1024, 2))

0.7071067811865476
10.0


#### `random` module

Try running the code cell below for multiple times (shortcut key: **Ctrl + Enter**), you will notice that the new output is always different from the previous output.

Check [documentation](https://docs.python.org/3/library/random.html?highlight=random#module-random) for more on `random` module.

In [56]:
import random

print(random.choice(['apple', 'pear', 'banana']))

# sampling without replacement
print(random.sample(range(100), 10))   

# random float
print(random.random())    

# random integer chosen from range(6)
print(random.randrange(6))   

apple
[85, 71, 80, 48, 72, 28, 7, 63, 24, 27]
0.9404463282494391
5


#### `statistics` module 

Check [documentation](https://docs.python.org/3/library/statistics.html?highlight=statistics#module-statistics) for more on `statistics` module.

In [57]:
import statistics
data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
print ('mean =', statistics.mean(data))
print ('median =', statistics.median(data))
print ('variance =', statistics.variance(data))

mean = 1.6071428571428572
median = 1.25
variance = 1.3720238095238095


<a name='date_time'></a>

### Dates and Times

The `datetime` module supplies classes for manipulating dates and times from simple to complex ways. While date and time arithmetic is supported, the focus of the implementation is on efficient member extraction for output formatting and manipulation. The module also supports objects that are timezone aware.

Check [documentation](https://docs.python.org/3/library/datetime.html) here for more on `datetime` module.

In [58]:
import datetime
from datetime import date
now = date.today()
print('Now:', now)

print('General datetime format:', datetime.date(2003, 12, 2))

birthday = date(1960, 8, 31)
age = now - birthday
print('Age:', age.days/365)

Now: 2020-10-25
General datetime format: 2003-12-02
Age: 60.19178082191781


<a name='data_structure'></a>

## Data Structures

List is one of the most fundamental data structures in most programming languages including Python. You can think of a list as an object, which consists of two parts: **property** and **method**. The elements in the list are considered as the property of the list, whereas there are also methods associated to any list to manipulate the property of the list (e.g.: `count()` is one of them).

Don't worry if you don't understand the explanation above for now. The concept will come in handy with enough practices.

In [59]:
# Define list of fruit names
fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana']

# Count the frequency of 'apple' in the list
fruits.count('apple')

2

In [60]:
# Count the frequency of 'tangerine' in the list.
fruits.count('tangerine')

0

So, we know how to define a list and get the frequency of an element within the list. Next, we would like to know the position of an element in a list. We can call the `index()` method, passing the name of the element, and this method will return a value associated with the position of the element within the list.

**Note that the element must exist in the list, otherwise, this method will throw an <span style="color:red">error</span>.**

In [61]:
# Get the index of `banana` from the list.
fruits.index('banana')

3

At this point, you may notice that calling `index()` method only returns one value, even though there are 2 "banana"s in the list. The return value is `3`, even though you may see that "banana" appears as the 4th and 7th elements in the list. Why?

In Python (as well as most programming languages), the count of the **index** (i.e. the position) of an element within a list starts from `0`, instead of `1`. For example, the index of "orange" in the list is `0`, instead of `1`. Therefore, it makes sense that the return value `3` corresponds to the first "banana" from the list.

What can be done if we also want to know the position of the second "banana" from the list? We can pass a second integer value to specify the starting position to find the element and return its index value when calling `index()` method, as demonstrated below.

In [62]:
# Find next banana starting from position 4
fruits.index('banana', 4)  

6

Some other operations (methods) available to manipulate a list are shown below:

In [63]:
# Reverse the order of the elements within the list, in-place, i.e., the effect is permanent. 
fruits.reverse()
fruits

['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange']

In [64]:
# Add new element into the list.
fruits.append('grape')
fruits

['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange', 'grape']

In [65]:
# Sort the elements within the list (default: ascending order).
fruits.sort()
fruits

['apple', 'apple', 'banana', 'banana', 'grape', 'kiwi', 'orange', 'pear']

In [66]:
# Remove the third element (at index 2) from the list. 
fruits.pop(2)
fruits

['apple', 'apple', 'banana', 'grape', 'kiwi', 'orange', 'pear']

**Optional**: Check [documentation](https://docs.python.org/3/tutorial/datastructures.html?highlight=list) for more on data structures (lists).

<a name='challenge'></a>

# Challenge: The Guessing Game

Step-by-step to develop a program. 

The guessing game program will do the following:
- The player only gets five turns.
- The program tells the player after each guess if the number is higher or lower.
- The program prints appropriate messages for when the player wins and loses.

***

Below is what we want the program to look like:

```
Enter your guess (1-100): 50
LOWER. 4 guesses left.
Enter your guess (1-100): 25
LOWER. 3 guesses left.
Enter your guess (1-100): 12
LOWER. 2 guesses left.
Enter your guess (1-100): 6
HIGHER. 1 guesses left.
Enter your guess (1-100): 9
LOWER. 0 guesses left.
You lose. The correct number is 8
```

***

First, think about what we will need in the program:

- We need random numbers, so there will be an import statement at the beginning of the program and a randint function somewhere else.
- To allow the user to guess until they either guess right or run out of turns, one solution is to use while loop with a condition that takes care of both of these possibilities.
- There will be an input statement to get the user’s guess. As this is something that is repeatedly done, it will go inside the loop.
- There will be an if statement to take care of the higher/lower thing. As this comparison will be done repeatedly and will depend on the user’s guesses, it will go in the loop after the input statement.
- There will be a counting variable to keep track of how many turns the player has taken. Each time the user makes a guess, the count will go up by one, so this statement will also go inside the loop.

#### Solution 1

Using `while` loop.

In [67]:
# Solution 1
from random import randint

secret_num = randint(1,100)
num_guesses = 0
guess = 0

while guess != secret_num and num_guesses <= 4:
    guess = int(input('Enter your guess (1-100): '))
    num_guesses = num_guesses + 1
    if guess < secret_num:
        print('HIGHER.', 5-num_guesses, 'guesses left.\n')
    elif guess > secret_num:
        print('LOWER.', 5-num_guesses, 'guesses left.\n')
    else:
        print('You got it!')

if num_guesses==5 and guess != secret_num:
    print('You lose. The correct number is', secret_num)

Enter your guess (1-100): 50
HIGHER. 4 guesses left.

Enter your guess (1-100): 75
LOWER. 3 guesses left.

Enter your guess (1-100): 62
LOWER. 2 guesses left.

Enter your guess (1-100): 56
You got it!


#### Solution 2

Using `for` loop.

In [68]:
# Solution 2
from random import randint

secret_num = randint(1,100)
attempts = 5

for num_guesses in range(attempts):
    guess = int(input('Enter your guess (1-100): '))
    if guess < secret_num:
        print('HIGHER.', attempts - (num_guesses + 1), 'guesses left.\n') # 1.
    elif guess > secret_num:
        print('LOWER.', attempts - (num_guesses + 1), 'guesses left.\n')
    else:
        print('You got it!')
        break # 2.

else:
    print('You lose. The correct number is', secret_num)
    
# 1. num_guesses + 1: the value of num_guesses starts from 0.
# 2. break: When the guessed number is correct, end the game.

Enter your guess (1-100): 50
LOWER. 4 guesses left.

Enter your guess (1-100): 25
HIGHER. 3 guesses left.

Enter your guess (1-100): 38
HIGHER. 2 guesses left.

Enter your guess (1-100): 44
You got it!
