<a href="https://colab.research.google.com/github/sanglee/python_datascience/blob/master/lecture02.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Python for Data Science

@author: Sangkyun Lee  (sangkyun@korea.ac.kr)

___Python Basics: Control Logic, Function___

<hr>

### Control


#### Comparison operators

- Larger (or equal): `>` (`>=`)
- Smaller (or equal): `<` (`<=`)
- Equal to: `==` (__Notie here that there are double equal signs__)
- Not equal to: `!=`

#### Logical operators
- and
- or
- not

In [0]:
3 == 5

False

In [0]:
72 >= 2

True

In [0]:
store_name = 'ABC'
store_name == 'abc'

False

In [0]:
# Internally, FPs are in base-2 representation: some base-10 FP numbers cannot be represented exactly
print(2.2 * 3.0)
2.2 * 3.0 == 6.6  

6.6000000000000005


False

In [0]:
a = 3
(a > 0) and (a ** 2 < 10) # square of a

True

#### Control Statement (1):  `if-elif-else`

#### `if`

In [0]:
val = -1

if val > -2:
    print('val is above -2')

val is above -2


`if-else`

In [0]:
if val > -2:
    print('val is > -2')
else:
    print('val is <= -2')

val is > -2


`if-elif-else`

In [0]:
if val == 0:
    print("val is 0")
elif val < 0:
    print("val(%f) is neg" % val)
else:
    print("val(%f) is pos" % val)

val(-1.000000) is neg


Ex. substring match with `in`

In [0]:
store_name = 'Walmart'
if 'almar' in store_name:
    print("The store is likely to be Walmart.")
else:
    print("The store is NOT Walmart")

The store is likely to be Walmart.


Ex. exact string match with `==`

In [0]:
store_name = 'Walmart'
if 'Wal' == store_name:
    print("The store is Wal")
else:
    print("The store is NOT Wal")

The store is NOT Wal


---

#### Control Statement (2):  `for` loop

Iterate (= run the body) through a sequence

In [0]:
seq = [1,2,3,4,5]

for i in seq:
    print(i)

1
2
3
4
5


In [0]:
store_name = 'Walmart'

for c in store_name:
    print(c)

W
a
l
m
a
r
t


Useful function: `range()` creates an interger sequence:
- `range(0,1000)`, `range(1000)` : 0 ~ 999

In [0]:
store_name = 'Walmart'
for i in range(len(store_name)):
    print(store_name[i])

W
a
l
m
a
r
t


In [0]:
x = range(1,6)
y = [1,2,3,4,5]

print("range == list ? " + str(x == y))

print("type of range: " + str(type(x)))
print("type of list: " + str(type(y)))

print("populate range to list: list(x) =", str(list(x)))
print("list(x) == y ? " + str(list(x) == y))

range == list ? False
type of range: <class 'range'>
type of list: <class 'list'>
populate range to list: list(x) = [1, 2, 3, 4, 5]
list(x) == y ? True


---
#### Control Statement (3):  `while` loop

Iterate while the given condition is true

In [0]:
x = 1

while x < 5:
    print(x)
    x += 1  # same as x = x+1

1
2
3
4


---
#### Loop control: `break` and `continue`

In [0]:
seq = 'ATCGGAAATT'

for i in range(len(seq)):
    #print(seq[i:i+2])
    if (seq[i:(i+2)] == "GG"):
        print("Found GG at starting at index %d" % i)
        break
        

Found GG at starting at index 3


In [0]:
seq = 'ATCGGAAATT'

for i in range(len(seq)):
    c = seq[i]
    if(c == 'G' or c == 'A'):
        continue
    else:
        print(c)

T
C
T
T


### Functions

Function is a block of codes with input arguments (and, optionally, return values) for specific purposes.

User-defined function:
```python
def func_name(arg1, arg2, arg3):
    #####################
    # Do something here #
    #####################
    return val   # optional
```

Usage:
```python
output = func_name(arg1, arg2, arg3)
```

In [0]:
def count_base(seq, base):
    i = 0 # counter
    for c in seq:
        if c == base:
            i += 1
    return i

dna_seq = 'ATGCGGACCTAT'
base = 'C'
n = count_base(dna_seq, base)

# printf-style formatting
print('%s appears %d times in %s' % (base, n, dna_seq))

# or (new) format string syntax
print('{BASE} appears {N} times in {DNA}'.format(BASE=base, N=n, DNA=dna_seq))

C appears 3 times in ATGCGGACCTAT
C appears 3 times in ATGCGGACCTAT


---
#### Python built-in functions

In [0]:
abs(-3.5)

3.5

In [0]:
list(range(5, 0, -1))

[5, 4, 3, 2, 1]

In [0]:
x = [3, 5, -1, 2]
sorted?

In [0]:
sorted(x, reverse=True)

[5, 3, 2, -1]

---
#### Library functions

In [0]:
exp(-3)

NameError: name 'exp' is not defined

In [0]:
import math

math.exp(-3)

0.049787068367863944

In [0]:
from math import exp

exp(-3)

0.049787068367863944

In [0]:
from math import *

log10(10)

1.0

Ex. generate a random DNA string

In [0]:
import random

alphabet = list('ATGC')
N = 100

dna = [random.choice(alphabet) for i in range(N)]   # list comprehension
print(''.join(dna))

AGTTTCGCATATGGGTCGCTTAATGGATTACGTATCAGAGATATTCGTTTGATCGAATTATACGCCGGTGCTCTATTCTCCGACAAGTGGCTTAAAGTTA


---
#### External libraries

Many useful packages are provided by developers: __numpy__, __scipy__, __matplotlib__, __pandas__, __tensorflow__, etc.

In [0]:
import numpy as np # use a short name instead

In [0]:
x = np.zeros([1,4])
print(x)

[[0. 0. 0. 0.]]


In [0]:
print(x.shape)

(1, 4)


Ex. count DNA alphabets

In [0]:
alphabet = list('ATGC')
N = 100
dna_str = [random.choice(alphabet) for i in range(N)]

In [0]:
dna_str

['G', 'A', 'C', 'T', 'T', 'T', 'A', 'T', 'T', 'C', 'T', 'C', 'A', 'A', 'T']

In [0]:
alpha_cnt = np.zeros([1, len(alphabet)])
print(alpha_cnt.shape)

(1, 4)


In [0]:
for i in range(len(alphabet)):
    alpha_cnt[0,i] = count_base(dna_str, alphabet[i])

In [0]:
print(alphabet)
print(alpha_cnt)

['A', 'T', 'G', 'C']
[[24. 30. 27. 19.]]


---
## Summary

- Operators
    - Comparison ops: >, >=, <, <=, ==, !=
    - Logical ops: and, or, not
- Control Statements
    - if-elif-else, for, while
    - break, continue
- Functions
    - User-defined
    - Built-in functions
    - Built-in Library functions
    - External Library functions