(**You can also open this notebook in Google Colab**)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xiangshiyin/data-programming-with-python/blob/main/2023-fall/2023-08-29/notebook/code_demo.ipynb)

# General principle of program execution

## Example 1 - Simple addition

In [1]:
x = 1
y = 2
x + y

3

## Example 2 - Order of Operations

Compute `2 * 3 + 1`

In [3]:
x = 2
x = x * 3
x = x + 1
print(x)

7


If we swap the order of code

In [5]:
x = 2
x = x + 1
x = x * 3
print(x)

9


# Primitive Data Types

## Numbers
* To define a variable of numeric data type, follow the syntax `VarName = value``
* You could use the function `int()` to cast the value type to integer number
* You could use the function `float()` to cast the value type to floating point number
* You could use `type()` to check the datatype of an object

In [2]:
x = 123
x + 1

124

In [3]:
x = 123.1
x + 1

124.1

In [10]:
int(123.6)

123

In [11]:
float(123)

123.0

In [12]:
type(float(123))

float

**In fact, you could use `type()` function to check the data types of any object in Python**

**Common arithmetic operators**
| Operation | Result                           |
| --------- | -------------------------------- |
| x + y     | sum of x and y                   |
| x - y     | difference of x and y            |
| x * y     | product of x and y               |
| x / y     | quotient of x and y              |
| x // y    | floored quotient of x and y      |
| x % y     | remainder of x / y               |
| -x        | x negated                        |
| +x        | x unchanged                      |
| abs(x)    | absolute value or magnitude of x |
| pow(x, y) | x to the power y                 |
| x ** y    | x to the power y                 |

In [4]:
7 / 3

2.3333333333333335

In [5]:
7 // 3

2

In [6]:
7 % 3

1

In [7]:
(7 // 3) * 3 + (7 % 3)

7

In [8]:
2 ** 3

8

In [9]:
2 ** 0.5

1.4142135623730951

## Strings

### Characters and strings
* Character: a single letter, number, or symbol
  * Example: `'a', '1', '\n'`
* String: a sequence of characters
  * Example: `"I like programming"`
* Can be expressed in a variety of ways:
  * Single quotes: `'single quotes'`
  * Double quotes: `"double quotes"`
  * Triple quoted: `'''Three single quotes'''` or `"""Three double quotes"""`
* You can use `str()` to cast the value type to string
* Special characters
  * `\n` - new line 
  * `\t` - tab (often equals to 8 spaces)

In [13]:
x = 'a'
x

'a'

In [14]:
y = "abc"
type(y)

str

In [15]:
z = """
the first line
the second line
the third line
"""
z

'\nthe first line\nthe second line\nthe third line\n'

In [16]:
print(z)


the first line
the second line
the third line



In [17]:
xx = 123
type(xx)

int

In [18]:
str(xx)

'123'

In [19]:
type(str(xx))

str

In [21]:
## special characters
yy = 'a\nb\nc'
print(yy)

a
b
c


In [22]:
## special characters
zz = 'a\tb\tc'
print(zz)

a	b	c


### String properties
With `string` data, you could
* Check the length of a string with `len()`
* `+` and `*` operators
* Change the cases with `str.lower()` and `str.upper()`
* Replace part of the string with `str.replace()`
* Check if a substring is a part of a given string with the `in` operator
* String indexing
  * Each character of the string is assigned a index number representing its position in the string, and index number starts from 0
  * General indexing format - `StringValue[<lower_index>:<upper_index>]`
    * `<lower_index>` is inclusive
    * `<upper_index>` is exclusive
    * Negative indexing

In [28]:
x = 'adj;gja[gdjg;ajg;g]'
len(x)

19

In [29]:
y = 'x\ty'
len(y)

3

In [31]:
z = 'a'
z * 5

'aaaaa'

In [32]:
x.upper()

'ADJ;GJA[GDJG;AJG;G]'

In [33]:
x.upper().lower()

'adj;gja[gdjg;ajg;g]'

In [51]:
x = 'abababababab'
y = x.replace('a', '0')
print(y)

0b0b0b0b0b0b


In [58]:
'b' in x

True

In [34]:
xx = 'abcdefghijklmnopqrstuvwxyz'
len(xx)

26

In [35]:
print(xx[0])
print(xx[1])
print(xx[25])

a
b
z


In [36]:
xx[26]

IndexError: string index out of range

In [39]:
xx[1:5]

'bcde'

In [40]:
# negative indexing
xx[-1]

'z'

### String formatting [[Official Documentation](https://docs.python.org/3.8/library/string.html#formatstrings)]

#### "Old C style" string formatting
The `%` operator is used to format a set of variables enclosed in a `tuple` (a fixed size list, will be covered later in this class), together with a format string, which contains normal text together with `argument specifiers`.

Common `argument specifiers` include:
* `%s` - String (or any object with a string representation, like numbers)
* `%d` - Integers
* `%f` - Floating point numbers (by default, it keeps 6 decimal digits)
* `%.<number of digits>f` - Floating point numbers with a fixed amount of digits to the right of the dot.


In [41]:
## format string with 1 placeholder
name = 'John'
'My name is %s' % name

'My name is John'

In [42]:
age = 21
'His age is %d' % age

'His age is 21'

In [43]:
## format string with 2 placeholder
name = 'Xiangshi'
balance = 123.4
'Hello %s. Your current bank account balance is $%.2f' % (name, balance)

'Hello Xiangshi. Your current bank account balance is $123.40'

#### String formatting via the format() function
* In Python 3, you can also format strings by calling the `.format()` method on a string object. `{}` is used as a replacement field for values you'd like to plug in, and also a container for `format specifications`.
* A general convention is that <ins>an empty format specification produces the same result as if you had called the function `str()` on the value</ins>. A non-empty format specifications typically modifies the result.
* The common pattern of a replacement field is like `{field_name:format_spec}`
* Check the [official documentation](https://docs.python.org/3.8/library/string.html#formatstrings) for more details on `format specifications` pattern

* Example 1: no format modification
```
"{} {}".format(a,b), "{0} {1}".format(a,b), or "{A} {B}".format(A=a,B=b)
```

In [44]:
a = 1
b = 2
c = a + b
# '{} plus {} is {}'.format(b,a,c) 
'{B} plus {A} is {C}'.format(B=b,A=a,C=c) 

'2 plus 1 is 3'


* Example 2: floating point number

In [45]:
a = 1
b = 2
c = a + b
'{A:f} plus {B:f} is {C:f}'.format(A=a,B=b,C=c) 
# '{A:d} plus {B:d} is {C:d}'.format(A=a,B=b,C=c) 
# you can also use index as the field_name
# '{0:f} plus {1:f} is {2:f}'.format(a,b,c) 
# '{:f} plus {:f} is {:f}'.format(a,b,c) 

'1.000000 plus 2.000000 is 3.000000'

In [46]:
## control the precision
'{A:.2f} plus {B:.2f} is {C:.2f}'.format(A=a,B=b,C=c) 

'1.00 plus 2.00 is 3.00'

In [47]:
## align the number
print('{:>6.0f}'.format(1))
print('{:>6.1f}'.format(1))
print('{:>6.2f}'.format(1))
print('{:>6.3f}'.format(1))
print('{:>6.4f}'.format(1))

     1
   1.0
  1.00
 1.000
1.0000


In [48]:
print('{:<6.0f}'.format(1))
print('{:<6.1f}'.format(1))

1     
1.0   


In [49]:
print('{:^6.0f}'.format(1))
print('{:^6.1f}'.format(1))

  1   
 1.0  


### `f-string` [[Official Documentation](https://realpython.com/python-f-strings/)]

* Starting from Python 3.6, the `f string` formatting became available
* It in general carrys the same coding style as the `format()` function and is even conciser!!


In [50]:
a = 3
b = 2
c = a + b
f'{b} plus {a} is {c}'

'2 plus 3 is 5'

### Challenge
Align a string to right with a predefined window width

In [52]:
"{:>10}".format("Test")

'      Test'

In [53]:
"{:^10}".format("Test")

'   Test   '

In [54]:
"{:<10}".format("Test")

'Test      '

In [55]:
"{:>10}".format("This is our first class, I enjoy meeting everyone here")

'This is our first class, I enjoy meeting everyone here'

In [57]:
# A different way!!
"Test".ljust(10, '*')

'Test******'

## Boolean
This built-in data type that can take up the values: `True` and `False`, which often makes them interchangeable with the integers 1 and 0. Booleans are useful in conditional and comparison expressions.

In [59]:
x = 1
x == 1
# print(x==1)
# print(not x==2)
print(x!=2)

True


In [60]:
(100>10) and (100<200)

True

In [61]:
(100>10) or (100>200)

True

In [62]:
((100>10) & (100<200))==True
# ((100>10) & (100<200))==False

True

In [63]:
((100>10) & (100<200))==1

True

**Logical operators**

| Operator                                                              | Description                                                                        |
|-----------------------------------------------------------------------|------------------------------------------------------------------------------------|
| or                                                                    | Boolean OR                                                                         |
| and                                                                   | Boolean AND                                                                        |
| not x                                                                 | Boolean NOT                                                                        |
| in, not in, is, is not, <, <=, >, >=, !=, ==                          | Comparisons, including membership tests and identity tests                         |

## Revisit the example from last class

In [23]:
x = 1
y = 1
print(id(x))
print(id(y))

140484353753328
140484353753328


In [24]:
x = 'abc'
y = 'abc'
print(id(x))
print(id(y))

140484355187568
140484355187568


In Python, there are two types of data types: `immutable` and `mutable`. Immutable data types cannot be changed once they are created, while mutable data types can be changed.
* `Immutable` data types include:
    * Numbers
    * Strings
    * Tuples
* `Mutable` data types include:
  * Lists
  * Sets
  * Dictionaries

For `immutable` primitive data types, the Python interpreter optimizes memory usage by reusing the same memory location for these variables.

In [25]:
x = 1
y = x
print(id(x))
print(id(y))

140484353753328
140484353753328


In [26]:
x = 2
print(y)

1


In [27]:
print(id(x))
print(id(y))

140484353753360
140484353753328


# Non-primitive Data Types

## List
[[official documentation](https://docs.python.org/3/tutorial/datastructures.html)]
* List is a mutable sequence, typically used to store a collection of separate values. It is generally represented in a list of comma separated values(items) between a square bracket.
* It is normally used to store homogeneous items. However, items in a list don't necessarily need to be of the same data type.
* <span style="color:blue">Each item of a list is assigned a index number representing its position, and the index starts from 0</span> (Does this sound similar?)

**That's because both `string` and `list` belong to the so-called `Sequential Data Type`!!**

In [64]:
# Create an empty list
x = []

In [137]:
x = list()
type(x)

list

In [65]:
# Create a list of multiple elements
x = [1,2,3,4,5]

In [66]:
# Create a list of mixed data types
y = ['a',1,'b',2,'c']

### Common properties between `string` and `list`

#### Measure size with `len()`

In [73]:
x = [1,2,3]
len(x)

3

#### Indexing

In [70]:
x = [1,2,3,4,5]
x[1:3]

[2, 3]

In [71]:
x[1:]

[2, 3, 4, 5]

In [72]:
x[-1]

5

#### Expand

In [67]:
x = [1,2,3]
x = x + [4]
x

[1, 2, 3, 4]

In [68]:
x = [1,2,3]
x * 5

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

#### Check if an element exists

In [69]:
4 in [1,2,3]

False

### "NEW" property for `list` (because it is mutable ...)

#### Expand ... in place

In [78]:
x = [1,2,3]
id(x)

140484089021568

In [79]:
x.append(4)
print(x)
print(id(x))

[1, 2, 3, 4]
140484089021568


In [80]:
x.extend([4,5,6])
print(x)
print(id(x))

[1, 2, 3, 4, 4, 5, 6]
140484089021568


In [81]:
# insert to a specific position
x.insert(1,9)
print(x)
print(id(x))

[1, 9, 2, 3, 4, 4, 5, 6]
140484089021568


#### Element/value changes

In [82]:
x = [1,2,3]
print(f'Before change, x = {x}')
x[1] = 5
print(f'After change, x = {x}')

Before change, x = [1, 2, 3]
After change, x = [1, 5, 3]


In [86]:
x = [1,2,3,4]
print(f'Before pop, x = {x}')
x.pop()
print(f'After pop, x = {x}')

Before pop, x = [1, 2, 3, 4]
After pop, x = [1, 2, 3]


In [87]:
x = [1,2,3,4]
print(f'Before pop, x = {x}')
x.pop(1)
print(f'After pop, x = {x}')

Before pop, x = [1, 2, 3, 4]
After pop, x = [1, 3, 4]


#### Sort (because there is a sequence)

In [95]:
## list.sort()
x = [4,5,2,1]
print(f'Before sort, x = {x}, id = {id(x)}')
x.sort() # x.sort() does in-place sorting
print(f'After sort, x = {x}, id = {id(x)}')

Before sort, x = [4, 5, 2, 1], id = 140484089055872
After sort, x = [1, 2, 4, 5], id = 140484089055872


In [96]:
x = [4,5,2,1]
print(f'Before sort, x = {x}, id = {id(x)}')
x.sort(reverse=True) # x.sort() does in-place sorting
print(f'After sort, x = {x}, id = {id(x)}')

Before sort, x = [4, 5, 2, 1], id = 140484089293440
After sort, x = [5, 4, 2, 1], id = 140484089293440


In [97]:
## sorted()
x = [4,5,2,1]
print(f'Before sort, x = {x}, id = {id(x)}')
print(f'Sort result: {sorted(x)}') # sorted(x) output a new list holding the sorted values
print(f'After sort, x = {x}, id = {id(x)}')

Before sort, x = [4, 5, 2, 1], id = 140484089348736
Sort result: [1, 2, 4, 5]
After sort, x = [4, 5, 2, 1], id = 140484089348736


#### The `del` statement

In [98]:
del x
x

NameError: name 'x' is not defined

### Revisit the "address" problem

In [99]:
x = [1,2,3]
y = x
print(f"""
Before the element change in x: 
    x = {x}
    y = {y}
    id of x: {id(x)}
    id of y: {id(y)}
""")
x[1] = 5
print(f"""
After the element change in x: 
    x = {x}
    y = {y}
    id of x: {id(x)}
    id of y: {id(y)}
""")



Before the element change in x: 
    x = [1, 2, 3]
    y = [1, 2, 3]
    id of x: 140484089491776
    id of y: 140484089491776


After the element change in x: 
    x = [1, 5, 3]
    y = [1, 5, 3]
    id of x: 140484089491776
    id of y: 140484089491776



**The value of variable `y` changes along with variable `x` since they point to the same memory address!!**

In [100]:
# Any way to prevent this from happening??
x = [1,2,3]
y = x.copy()
print(f"""
Before the element change in x: 
    x = {x}
    y = {y}
    id of x: {id(x)}
    id of y: {id(y)}
""")
x[1] = 5
print(f"""
After the element change in x: 
    x = {x}
    y = {y}
    id of x: {id(x)}
    id of y: {id(y)}
""")



Before the element change in x: 
    x = [1, 2, 3]
    y = [1, 2, 3]
    id of x: 140484089050368
    id of y: 140484089487424


After the element change in x: 
    x = [1, 5, 3]
    y = [1, 2, 3]
    id of x: 140484089050368
    id of y: 140484089487424



In [101]:
# Any way to prevent this from happening??
import copy

x = [1,2,3]
y = copy.copy(x)
print(f"""
Before the element change in x: 
    x = {x}
    y = {y}
    id of x: {id(x)}
    id of y: {id(y)}
""")
x[1] = 5
print(f"""
After the element change in x: 
    x = {x}
    y = {y}
    id of x: {id(x)}
    id of y: {id(y)}
""")



Before the element change in x: 
    x = [1, 2, 3]
    y = [1, 2, 3]
    id of x: 140484089549376
    id of y: 140484089492352


After the element change in x: 
    x = [1, 5, 3]
    y = [1, 2, 3]
    id of x: 140484089549376
    id of y: 140484089492352



## Tuple
* Tuples are immutable sequences, typically used to store heterogeneous items.
* It is normally represented by a list of comma separated values(items) with surrounding parentheses

*Summary*:
* List, string, tuple are also called the `Sequence` type

*Major differences from `list`:*
* `()` instead of `[]`
* `immutable` vs. `mutable`

In [102]:
## Create a tuple
x = (1,2)
x

(1, 2)

In [103]:
y = (1,2)
print(f'id of x = {id(x)}')
print(f'id of y = {id(y)}')

id of x = 140484625912384
id of y = 140484625915008


In [104]:
# is it really immutable??
x[1] = 3

TypeError: 'tuple' object does not support item assignment

In [105]:
# is it really immutable?
x.sort()

AttributeError: 'tuple' object has no attribute 'sort'

In [106]:
# how about the other way to sort??
y = sorted(x)

In [107]:
print(f'id of x = {id(x)}')
print(f'id of y = {id(y)}')

id of x = 140484625912384
id of y = 140484089065088


In [None]:
## Tuples can be constructed with or without parentheses
x = 1,2
x

(1, 2)

In [None]:
## Indexing and slicing
x = (1,2,3)
x[1]

In [None]:
## Value in tuple, value not in tuple
x = (1,2,3)
1 in x

True

In [None]:
## Unpacking tuples
x,y = (1,2)
print(x,y)

1 2


In [None]:
z = (1,2)
x = z[0]
y = z[1]
print(x, y)

1 2


## Set
* A set is an unordered collection with no duplicate elements, same to the mathematical concept of `set`
* Set objects support mathematical operations like union, intersection, difference, and symmetric difference
* A good tutorial on Set operations: https://www.geeksforgeeks.org/python-set-operations-union-intersection-difference-symmetric-difference/

### Create a set
Use `{}` or `set()`

In [108]:
x = {1,2,3,4}
print(x)
print(type(x))

{1, 2, 3, 4}
<class 'set'>


In [109]:
x = set([1,2,3,4])
print(x)
print(type(x))

{1, 2, 3, 4}
<class 'set'>


In [122]:
x = set()
print(type(x))
print(f'Length of the set x: {len(x)}')

<class 'set'>
Length of the set x: 0


In [123]:
x.add('a')
print(f'Length of the set x: {len(x)}')

Length of the set x: 1


### Unordered

In [110]:
x[0]

TypeError: 'set' object is not subscriptable

### De-dup

In [111]:
set([1,2,3,3,4,4,5])

{1, 2, 3, 4, 5}

### If a value exists
Use `in`

In [112]:
1 in x

True

### Set operations
![](../pics/set_operations.png)

| Operation            | Python Code                            |
|----------------------|----------------------------------------|
| union                | `A \| B` or `A.union(B)`               |
| intersect            | `A & B` or `A.intersection(B)`         |
| difference           | `A - B` or `A.difference(B)`           |
| symmetric difference | `A ^ B` or `A.symmetric_difference(B)` |

In [115]:
x = {1,2,3}
y = {2,3,4,5}
print(x | y)
print(x.union(y))

{1, 2, 3, 4, 5}
{1, 2, 3, 4, 5}


In [114]:
x = {1,2,3}
y = {2,3,4,5}
print(x & y)
print(x.intersection(y))

{2, 3}
{2, 3}


In [118]:
x = {1,2,3}
y = {2,3,4,5}
print(x - y)
print(x.difference(y))

{1}
{1}


In [119]:
x = {1,2,3}
y = {2,3,4,5}
print(x ^ y)
print(x.symmetric_difference(y))

{1, 4, 5}
{1, 4, 5}


## Dictionary
* Dictionary is the most commonly used data structure to store key-value pairs
* Keys are unique within one dictionary, and search by key is of [constant time complexity](https://en.wikipedia.org/wiki/Time_complexity)
* The general format of a dictionary: `{key1:value1, key2:value2}`

### Create an empty dictionary

In [125]:
x = {}
print(type(x))
print(f'Length of the set x: {len(x)}')

<class 'dict'>
Length of the set x: 0


In [126]:
x = dict()
print(type(x))
print(f'Length of the set x: {len(x)}')

<class 'dict'>
Length of the set x: 0


### Create a non-empty dictionary

In [127]:
x = {'a': 1, 'b': 2}
print(type(x))
print(f'Length of the set x: {len(x)}')

<class 'dict'>
Length of the set x: 2


### Key-value lookup

In [128]:
x['a']

1

In [129]:
x['c']

KeyError: 'c'

In [130]:
x.get('a')

1

In [131]:
x.get('c', -1)

-1

### If a key exists
Use `in` statement

In [132]:
'a' in x

True

In [133]:
'c' in x

False

### Update a dictionary
- Change the value of a given key
- Introduce new key-value pairs

In [134]:
# update the value associated with a key
x = {'a':1, 'b':2}
x['a'] = 4
x

{'a': 4, 'b': 2}

In [135]:
# update the dictiory
x = {'a':1, 'b':2}
x.update({'c':3, 'd':4})
x

{'a': 1, 'b': 2, 'c': 3, 'd': 4}

In [136]:
x = {'a':1, 'b':2}
x.update({'b':3, 'd':4})
x

{'a': 1, 'b': 3, 'd': 4}