# Environment Setup

## Notebook Setup

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:75% !important; margin-left:350px; }</style>"))

## Most Common Libraries

In [5]:
import numpy as np
import pandas as pd
import datetime as dt

import matplotlib
import matplotlib.pyplot as plt

from plydata import define, query, select, group_by, summarize, arrange, head, rename
import plotnine
from plotnine import *

### numpy
- large multi-dimensional array and matrices  
- High level mathematical funcitons to operate on them 
- Efficient array computation, modeled after matlab  
- Support vectorized array math functions (built on C, hence faster than python for loop and list)  


### scipy
- Collection of mathematical algorithms and convenience functions built on the numpy extension  
- Built uponi **numpy**

### Pandas
- Data manipulation and analysis 
- Offer data structures and operations for manipulating numerical tables and time series  
- Good for analyzing tabular data  
- Use for exploratory data analysis, data pre-processing, statistics and visualization
- Built upon **numpy**

### scikit-learn
- Machine learning functions  
- Built on top of scipy

### matplotlib
- Data Visualization

## Magic Functions

- IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax  
- There are two types of magics:  
    - **Line Magic**  
    Prefix with %, and work much like OS command-line calls: they get as an argument the rest of the line, where arguments are passed without parentheses or quotes. Lines magics can return results and can be used in the right hand side of an assignment  
    - **Cell Magic**  
    Prefix with %%, and they are functions that get as an argument not only the rest of the line, but also the lines below it in a separate argument.

### %matplotlib
Output graph inline to frontend (Jupyter Notebook). Therefore is stored in the Notebook document

In [6]:
%matplotlib inline

## Package Management
### Conda

#### Conda Environment

In [7]:
!conda info


     active environment : base
    active env location : C:\ProgramData\Anaconda3
            shell level : 1
       user config file : C:\Users\keh-soon.yong\.condarc
 populated config files : C:\Users\keh-soon.yong\.condarc
          conda version : 4.5.4
    conda-build version : 3.0.27
         python version : 3.6.5.final.0
       base environment : C:\ProgramData\Anaconda3  (read only)
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/win-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/win-64
                          https://repo.anaconda.com/pkgs/pro/noarch
                          https://repo.anaconda.com/pkgs/msy

#### Package Version

In [8]:
!conda list

# packages in environment at C:\ProgramData\Anaconda3:
#
# Name                    Version                   Build  Channel
_ipyw_jlab_nb_ext_conf    0.1.0            py36he6757f0_0  
alabaster                 0.7.10           py36hcd07829_0  
anaconda                  custom           py36h363777c_0  
anaconda-client           1.6.14                   py36_0  
anaconda-navigator        1.8.7                    py36_0  
anaconda-project          0.8.0            py36h8b3bf89_0  
asn1crypto                0.22.0           py36h8e79faa_1  
astroid                   1.5.3            py36h9d85297_0  
astropy                   2.0.2            py36h06391c4_4  
babel                     2.5.0            py36h35444c1_0  
backports                 1.0              py36h81696a8_1  
backports.shutil_get_terminal_size 1.0.0            py36h79ab834_2  
beautifulsoup4            4.6.0            py36hd4cc5e8_1  
bitarray                  0.8.1            py36h6af124b_0  
bkcharts                  0

#### Package Installation
Conda is recommended distribution.  

To install from **official** conda channel:
```
conda install <package_name>  # always install latest
conda install <package_name=version_number>
```
To install from **conda-forge community** channel:
```
conda install -c conda-forge <package_name>
conda install -c conda-forge <package_name=version_number>
```

```
# conda official channel
conda install numpy
conda install scipy
conda install pandas
conda install matpotlib
conda install scikit-learn
conda install seaborn
conda install pip

# conda-forge community channel
conda install -c conda-forge plotnine
```

### PIP
Use **pip** if the package is not available in conda.

#### Package Version

In [9]:
!pip list

Package                            Version            
---------------------------------- -------------------
alabaster                          0.7.10             
anaconda-client                    1.6.14             
anaconda-navigator                 1.8.7              
anaconda-project                   0.8.0              
asn1crypto                         0.22.0             
astroid                            1.5.3              
astropy                            2.0.2              
babel                              2.5.0              
backports.shutil-get-terminal-size 1.0.0              
beautifulsoup4                     4.6.0              
bitarray                           0.8.1              
bkcharts                           0.2                
blaze                              0.11.3             
bleach                             2.0.0              
bokeh                              0.12.16            
boto                               2.48.0             
Bottleneck

#### Package Installation
```
pip install <package_name>

pip install plydata
```

# Python Fundamental

## Variable and Values
- Every varibales in python are **objects**  
- Every variable assginment is **reference based**, that is, each object value is the reference to memory block of data 

In [10]:
# a,b refer to the same memory location
a = 123
b = a  
print ('Data of a =',a,'\nData of b =',b)
print ('ID of a = ', id(a))
print ('ID of b = ',id(b))

Data of a = 123 
Data of b = 123
ID of a =  1881045888
ID of b =  1881045888


Changing data value (using assignment) actually changed the reference value

In [11]:
a = 123
b = a
a = 456  # reassignemnt changed a memory reference
         # b memory reference not changed
print ('Data of a =',a,'\nData of b =',b)
print ('ID of a = ', id(a))
print ('ID of b = ',id(b))

Data of a = 456 
Data of b = 123
ID of a =  2208390210320
ID of b =  1881045888


## Assignment

### Multiple Assignment

In [12]:
x = y = 3
print (x,y)

3 3


### Augmented Assignment

In [13]:
x = 1
y = x + 1
y += 1
print (y)

3


### Unpacking Assingment

In [14]:
x,y = 1,3
print (x,y)

1 3


# Built-in Data Types

## Numbers

### Integer

In [15]:
n = 123
type (n)

int

### Float

In [16]:
f = 123.4
type (f)

float

### Number Operators

**Division** always return float

In [17]:
print(4/2)  # return int
type(4/2)

2.0


float

**Integer Division** return truncated int or float

In [18]:
print (8//3)    # return int
print (8//3.2)  # return float

2
2.0


**Remainder** return either float or integer

In [19]:
print (8%3)    # return int
print (8%3.2)  # return float

2
1.5999999999999996


**Power** return int or float

In [20]:
print (2**3)    # return int
print (2.1**3)  # return float
print (2**3.1)  # return float

8
9.261000000000001
8.574187700290345


## String
String is a **ordered collection of letters**  

### Immutable
- String is **immuatable**. Changing its content will result in **error** 
- Changing the variable completley change the reference

In [21]:
s = 'abcde'
print ('s : ', id(s))
s = 'efgh'
print ('s : ', id(s))

s :  2208390536864
s :  2208390535744


In [22]:
## s[1] = 'z' # error

### Slicing
```
string[start:end:step]  # default step:1, start:0, end:last
```
If step is negative, start value must be lower than end value

In [23]:
s = 'abcdefghijk'
print (s[0])      # first later
print (s[:3])     # first 3 letters
print (s[2:8:2])  # stepping
print (s[-1])     # last letter
print (s[-3:])    # last three letters
print (s[::-1])   # reverse everything
print (s[8:2:-1])

a
abc
ceg
k
ijk
kjihgfedcba
ihgfed


### Searching
```
string.find() return position of first occurance. **-1 if not found**
```

In [24]:
s='I love karaoke, I know you love it oo'
print (s.find('lov'))
print (s.find('kemuning'))

2
-1


### Concatenating Strings

In [25]:
'this is ' + 'awesome'

'this is awesome'

### Splitting Strings
Splitting delimeter is specified. Observe the empty spaces were conserved in result array

In [26]:
animals = 'a1,a2 ,a3, a4'
animals.split(',')

['a1', 'a2 ', 'a3', ' a4']

### Stripping Off Trailing Empty Spaces (Front and Back)

In [27]:
filename = '  myexce l.   xls   '
filename.strip()

'myexce l.   xls'

### Convert to Upper/Lower Case

In [28]:
filename = 'myEXEel.xls'
filename.upper()

'MYEXEEL.XLS'

In [29]:
filename.lower()

'myexeel.xls'

## Boolean

In [30]:
b = False

if (b):
    print ('It is true')
else:
    print ('It is fake')
    

It is fake


### What is Considered False ?
Everything below are false, anything else are true

In [31]:
print (bool(0))      # zero
print (bool(None))   # none
print (bool(''))     # empty string
print (bool([]))     # empty list
print (bool(()))     # empty tupple
print (bool(False))  # False
print (bool(2-2))    # expression that return any value above

False
False
False
False
False
False
False


### 'and' operator
- **and** can return different data types  
- If evaluated result is **True**, the last **True Value** is returned (because python need to evaluate up to the last value)  
- If evaluated result is **False**, the first **False Value** will be returned (because python return it immediately when detecting False value)

In [32]:
print (123 and 2 and 1)
print (123 and () and 2)

1
()


### 'not' operator

In [33]:
not (True or False)

False

### 'or' operator
- **or** can return different data type  
- If evaluated result is True, first **True Value** will be returned  (right hand side value **need not be evaluated**)  
- If evaluated result is False, last **Fasle Value** will be returned (need to evalute all items before concluding False)

In [34]:
print (1 or 2)
print (0 or 1 or 1)
print (0 or () or [])

1
1
[]


## None
- None is a Python **object**  
- Any operation to None object will result in **error**  
- For array data with None elements, verification is required to check through iteration to determine if the item is not None. It is very computaionaly heavy 

In [35]:
t = np.array([1,2,3,4,5])
t.dtype  #  its an integer

dtype('int32')

In [36]:
t = np.array([1, 2, 3, None, 4, 5])
t.dtype  # it's an object

dtype('O')

# Built-In Data Structure

## Tuple
Tuple is an **immutable list**. Any attempt to change/update tuple will return error.

Benefits of tuple against List are:
- **Faster** than list
- **Protects** your data against accidental change
- Can be used as key in dictionaries, list can't

### Assignment

#### (item1, item2, item3)
This is a formal syntax for defining tuple, items inside (  ) notation

In [37]:
t = (1,2,3,'o','apple')
t

(1, 2, 3, 'o', 'apple')

In [38]:
type(t)

tuple

#### item1, item2, item3
- Without (  ) notation, it is also considered as tuple  
- However, some functions may not consider this method 

In [39]:
1,2,3,'o','apple'

(1, 2, 3, 'o', 'apple')

### Accessing

In [40]:
print (t[1])
print (type(t[1]))

2
<class 'int'>


In [41]:
print (t[1:3])
type ([t[1:3]])

(2, 3)


list

## List
- List is a collection of **ordered** items, where the items **can be different data types**  
- You can pack list of items by placing them into []  
- List is mutable

### Creating List
**Creating Empty List**

In [42]:
empty = []      # literal assignment method
empty = list()  # constructor method
print (empty)
type(empty)

[]


list

**Creating List using Literal Assignment Method**  
- **Multiple data types** is allowed in a list

In [43]:
mylist = [123,'abc',456]

In [44]:
food = ['bread','noodle','rice','biscuit']

**Creating List using Constructor Method**  
- Note that **list(string)** will split the string into letters

In [45]:
list('hello')

['h', 'e', 'l', 'l', 'o']

**Creating List using split() method**  
- Split base on spaces (by default) to create a list item

In [46]:
'a bunch of words'.split()

['a', 'bunch', 'of', 'words']

- Split can also break into items base on specified delimter

In [47]:
'a1,a2,a3, a4'.split(',')

['a1', 'a2', 'a3', ' a4']

### Accessing Items

**Access specific index number**

In [48]:
food = list(['bread', 'noodle', 'rice', 'biscuit','jelly','cake'])
print (food[2])  # 3rd item
print (food[-1]) # last item

rice
cake


**Access range of indexes**

In [49]:
print (food[:4])   # first 3 items
print (food[-3:])  # last 3 items
print (food[1:5])  # item 1 to 4
print (food[5:2:-1]) # item 3 to 5, reverse order
print (food[::-1]) # reverse order

['bread', 'noodle', 'rice', 'biscuit']
['biscuit', 'jelly', 'cake']
['noodle', 'rice', 'biscuit', 'jelly']
['cake', 'jelly', 'biscuit']
['cake', 'jelly', 'biscuit', 'rice', 'noodle', 'bread']


### Properties

**Total Number of Items**

In [50]:
len(food)

6

### Remove Item(s)
**Search and remove first occurance** of an item

In [51]:
food = list(['bread', 'noodle', 'rice', 'biscuit','jelly','cake','noodle'])
food.remove('noodle')
print (food)

['bread', 'rice', 'biscuit', 'jelly', 'cake', 'noodle']


**Remove last item**

In [52]:
food.pop()
print (food)

['bread', 'rice', 'biscuit', 'jelly', 'cake']


**Remove item at specific position**

In [53]:
food.pop(1)  # counter start from 0
print(food)

['bread', 'biscuit', 'jelly', 'cake']


### Appending Item (s)

**Append One Item**

In [54]:
food.append('jelly')
print (food)

['bread', 'biscuit', 'jelly', 'cake', 'jelly']


**Append Multiple Items**  
**extend()** will expand the list/tupple argument and append as multiple items

In [55]:
food.extend(['nand','puff'])
print (food)

['bread', 'biscuit', 'jelly', 'cake', 'jelly', 'nand', 'puff']


### Concateneting Multiple Lists

**Concatenating Lists**
Although you can use '+' operator, however '-' operator is not supported

In [56]:
['dog','cat','horse'] + ['elephant','tiger'] + ['sheep']

['dog', 'cat', 'horse', 'elephant', 'tiger', 'sheep']

### Other Methods ###

**Reversing the order of the items**

In [57]:
food.reverse()
food

['puff', 'nand', 'jelly', 'cake', 'jelly', 'biscuit', 'bread']

**Locating the Index Number of An Item**

In [58]:
food.index('biscuit')

5

**Sorting The Order of Items**

In [59]:
food.sort()
print (food)

['biscuit', 'bread', 'cake', 'jelly', 'jelly', 'nand', 'puff']


### List is Mutable
The reference list variable won't change after adding/removing its item

In [60]:
food = ['cake','jelly','roti','noodle']
print ('food : ',id(food))
food += ['salad','chicken']
print ('food : ',id(food))

food :  2208390642696
food :  2208390642696


In [61]:
x = [1,2,3]
y = [x,'abc']
print (y)
x[2] = 'k'
print (y)

[[1, 2, 3], 'abc']
[[1, 2, 'k'], 'abc']


Consider a function is an object, its variable (some_list) is immutable and hence its reference won't change, even data changes

In [62]:
def spam (elem, some_list=[]):
    some_list.append(elem)
    return some_list

print (spam(1))
print (spam(2))
print (spam(3))

[1]
[1, 2]
[1, 2, 3]


## Dictionaries
Dictionary is a list of index-value items.

### Creating dict
**Creating dict with literals**

Simple Dictionary

In [63]:
animal_counts = { 'cats' : 2, 'dogs' : 5, 'horses':4}
print (animal_counts)

{'cats': 2, 'dogs': 5, 'horses': 4}


Dictionary with list

In [64]:
animal_names = {'cats':   ['Walter','Ra'],
                'dogs':   ['Jim','Roy','John','Lucky','Row'],
                'horses': ['Sax','Jack','Ann','Jeep']
               }
print (animal_names)

{'cats': ['Walter', 'Ra'], 'dogs': ['Jim', 'Roy', 'John', 'Lucky', 'Row'], 'horses': ['Sax', 'Jack', 'Ann', 'Jeep']}


**Creating dict with variables**

In [65]:
#cat_names : ['Walter','Ra','Jim']
#dog_names  : ['Jim','Roy','John','Lucky','Row']
#horse_names: ['Sax','Jack','Ann','Jeep']
#animal_names = {'cats': cat_names,'dogs': dog_names, 'horses': horse_names}
#animal_names

### Accessing dict
Find out the list of keys using **keys()**  

In [66]:
print (animal_counts.keys())
print (sorted(animal_counts.keys()))

dict_keys(['cats', 'dogs', 'horses'])
['cats', 'dogs', 'horses']


Find out the list of values using **values()**

In [67]:
print (animal_counts.values())
print (sorted(animal_counts.values()))

dict_values([2, 5, 4])
[2, 4, 5]


**Refer a dictionary item using index**

In [68]:
animal_counts['dogs']

5

**Accessing non-existance key natively will return Error**

In [69]:
##animal_count['cow']

**Accessing non-existance key** with **get()** will return None

In [70]:
print (animal_counts.get('cow'))

None


## Sets
Set is unordered collection of unique items

In [71]:
myset = {'a','b','c','d','a','b','e','f','g'}
print (myset) # notice no repetition values

{'f', 'a', 'e', 'g', 'c', 'b', 'd'}


### Membership Test

In [72]:
print ('a' in myset)      # is member ?
print ('f' not in myset)  # is not member ?

True
False


### Subset Test
Subset Test : <=  
Proper Subset Test : <

In [73]:
mysubset = {'d','g'}
mysubset <= myset

True

Proper Subset test that the master set **contain at least one element** which is not in the subset

In [74]:
mysubset = {'b','a','d','c','e','f','g'}
print ('Is Subset : ', mysubset <= myset)
print ('Is Proper Subet : ', mysubset < myset)

Is Subset :  True
Is Proper Subet :  False


### Union using '|'

In [75]:
{'a','b','c'} | {'e','f'}

{'a', 'b', 'c', 'e', 'f'}

### Intersection using '&'
Any elments that exist in both left and right set

In [76]:
{'a','b','c','d'} & {'c','d','e','f'}

{'c', 'd'}

### Difference using '-'
Anything in **left** that is **not in right** 

In [77]:
{'a','b','c','d'} - {'c','d','e','f'}

{'a', 'b'}

## range
**range(X)** generates sequence of integer object
```
range (lower_bound, upper_bound, step_size)  
# lower bound is optional, default = 0
# upper bound is not included in result
# step is optional, default = 1
```

**Use list() to convert in order to view actual sequence of data**

In [78]:
r = range(10)     # default lower bound =0, step =1
print (type (r))
print (r)
print (list(r))

<class 'range'>
range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


**More Examples**

In [79]:
print (list(range(2,8)))    # step not specified, default 1
print ('Odds Number : ' , list(range(1,10,2))) # generate odds number

[2, 3, 4, 5, 6, 7]
Odds Number :  [1, 3, 5, 7, 9]


# Control and Loops

## If Statement

In [80]:
price = 102
if price <100:
    print ('buy')
elif price < 110:
    print ('hold')
elif price < 120:
    print ('think about it')
else:
    print ('sell')
print('end of programming')

hold
end of programming


## For Loops

### Loop thorugh 'range'

In [81]:
for i in range (1,10,2):
    print ('Odds Number : ',i) 

Odds Number :  1
Odds Number :  3
Odds Number :  5
Odds Number :  7
Odds Number :  9


### Loop through 'list'

In [82]:
letters = ['a','b','c','d']
for e in letters:
    print ('Letter : ',e)

Letter :  a
Letter :  b
Letter :  c
Letter :  d


### Loop Through 'Dictionary'

In [83]:
d = {"x": 1, "y": 2}
for key in d:
    print (key, d[key])

x 1
y 2


## Generators

- Generator is lazy, produce items only if asked for, hence more memory efficient
- Generator is **function** with 'yield' instead of 'return'  
- Generator contains one or more yields statement  
- When called, it returns an object (iterator) but **does not start execution** immediately  
- Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using **next()**  
- Once the function yields, the **function is paused** and the control is transferred to the caller  
- Local variables and their states are **remembered** between successive calls  
- Finally, when the function **terminates**, **StopIteration** is raised automatically on further calls

### Basic Generator Function
Below example give clear understanding of how generator works

In [84]:
def my_gen():
    n = 1
    print('This is printed first')
    # Generator function contains yield statements
    yield n

    n += 1
    print('This is printed second')
    yield n

    n += 1
    print('This is printed at last')
    yield n

In [85]:
a = my_gen()
type(a)

generator

In [86]:
next(a)

This is printed first


1

In [87]:
next(a)

This is printed second


2

In [88]:
next(a)

This is printed at last


3

In [107]:
next(a)

StopIteration: 

### Useful Generator Fuction
Generator is only useful when it uses **for-loop**
- for-loop within generator
- for-loop to iterate through a generator

In [89]:
def rev_str(my_str):
    length = len(my_str)
    for i in range(length - 1,-1,-1):
        yield my_str[i]

In [90]:
for c in rev_str("hello"):
     print(c)

o
l
l
e
h


### Generator Expression
Use () to create an annonymous generator function

In [91]:
my_list = [1, 3, 6, 10]
a = (x**2 for x in my_list)

In [92]:
next(a)

1

In [93]:
next(a)

9

In [94]:
sum(a) # sum the power of 6,10

136

### Compare to Iterator Class

In [95]:
class PowTwo:
    def __init__(self, max = 0):
        self.max = max

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n > self.max:
            raise StopIteration

        result = 2 ** self.n
        self.n += 1
        return result

**Obviously, Generator is more concise and cleaner**

In [None]:
def PowTwoGen(max = 0):
    n = 0
    while n < max:
        yield 2 ** n
        n += 1

# Library and Functions
Library are group of functions

## Package Source

### Conda
- Package manager for any language  
- Install binaries

### PIP
- Package manager python only  
- Compile from source  
- Stands for Pip Installs Packages  
- Python's officially-sanctioned package manager, and is most commonly used to install packages published on the **Python Package Index (PyPI)**  
- Both pip and PyPI are governed and supported by the Python Packaging Authority (PyPA).

## Importing Library

There are two methods to import library functions:  

**Standalone Namespace**
```
- import <libName>                        # access function through: libName.functionName
- import <libName> as <shortName>         # access function through: shortName.functionName
```
**Global Namespace**
```
- from   <libName> import *               # all functions available at global namespace
- from   <libName> import <functionName>  # access function through: functionName    
- from   <libName> import <functionName> as <shortFunctionName>  # access function through shortFunctionName
```

### Import Entire Library

#### Import Into Standalone Namespace

In [None]:
import math
math.sqrt(9)

Use **as** for aliasing library name. This is useful if you have conflicting library name

In [None]:
import math as m
m.sqrt(9)

#### Import Into Global Name Space
All functions in the library accessible through global namespace
```
from <libName> import *
```

### Import Specific Function

In [None]:
from math import sqrt
print (sqrt(9))

Use **as** for aliasing function name

In [None]:
from math import sqrt as sq
print (sq(9))

![alt text](img/ml_libraries.jpg)

## Define Function

### Function Arguments
By default, arguments are assigned to function left to right

In [None]:
def myfun(x,y):
    print ('x:',x)
    print ('y:',y)
    
myfun(5,8)

However, you can also specify the argument assigment during function call

In [None]:
myfun (y=8,x=5)

Function can have **default argement value**

In [None]:
def myfun(x=1,y=1):  # default argument value is 1
    print ('x:',x)
    print ('y:',y)
    
myfun(5)  # pass only one argument

### List Within Function

Consider a function is an object, its variable (some_list) is immutable and hence its reference won't change, even data changes

In [None]:
def spam (elem, some_list=[]):
    some_list.append(elem)
    return some_list

print (spam(1))
print (spam(2))
print (spam(3))

### Return Statement

In [None]:
def bigger(x,y):
    if (x>y):
        return x
    else:
        return y
    
print (bigger(5,8))

### No Return Statement
if no **return** statement, python return **None**

In [None]:
def dummy():
    print ('This is a dummy function, return no value')

print (dummy())

### Return Multiple Value
Multiple value is returned as **tuple**. Use multiple assignment to assign to multiple variable

In [None]:
def minmax(x,y,z):
    return min(x,y,z), max(x,y,z)

a,b = minmax(7,8,9)     # multiple assignment
c   = minmax(7,8,9)     # tuple

print (a,b)
print (c)    

### Passing Function as Argument 
You can pass a function name as an argument to a function

In [None]:
def myfun(x,y,f):
    f(x,y)

myfun('hello',54,print)

# Object Oriented Programming

## Defining Class

- Every function within a class **must have** at least one parameter - **self**, accept it
- Use **init** as the constructor function. **init** is optional

In [17]:
class Person:
    wallet = 0  # 
    def __init__(self, myname,money=0):   # constructor
        self.name = myname
        self.wallet=money
    def say_hi(self):
        print('Hello, my name is : ', self.name)
    def say_bye(self):
        print('Goodbye', Person.ID)
    def take(self,amount):
        self.wallet+=amount
    def balance(self):
        print('Wallet Balance:',self.wallet)

## Object Class Assignment

In [18]:
#p = Person() ## this will fail, as the constructor expect a parameter
p1 = Person('Yong')  
p2 = Person('Gan',200)

## Calling Method

In [19]:
p1.say_hi()
p1.balance()

Hello, my name is :  Yong
Wallet Balance: 0


In [20]:
p2.say_hi()
p2.balance()

Hello, my name is :  Gan
Wallet Balance: 200


## Getting Property

In [24]:
p1.wallet

0

In [23]:
p2.wallet

200

## Setting Property

In [25]:
p1.wallet = 900
p1.wallet

900

# datetime Library

## Data Types
datetime library contain **three data types**:  
- **date** (year,month,day)  
- **time** (hour,minute,second)  
- **datetime** (year,month,day,hour,minute,second)  
- **timedelta**: duration between two datetime or date object

### date object

In [None]:
dt.date(2000,1,1)

In [None]:
dt.date(year=2000,month=1,day=1)

### datetime object

In [None]:
dt.datetime(2000,1,1,0,0,0)

In [None]:
dt.datetime(year=2000,month=1,day=1,hour=23,minute=15,second=55)

### time object
There is unfortunately no single function to extract the current time. Use **time()** function of an **datetime** object

In [None]:
dt.time(2)   #default single arugement, hour

In [None]:
dt.time(2,15) #default two arguments, hour, minute

In [None]:
dt.time(hour=2,minute=15,second=30)

### timedelta
- **years** argument is **not supported**  
- Apply timedelta on **datetime** object  
- timedelta **cannot** be applied on **time object**  , because timedelta potentially go beyond single day (24H)

In [None]:
delt = dt.timedelta(days=5,minutes=33,seconds=15)
d = dt.datetime.now()

print ('delt+d : ', delt + d)

### Useful Functions

#### .now (datetime)

In [None]:
now = dt.datetime.now()
now

#### .today(datetime)

In [None]:
hari_ini = dt.datetime.today()
hari_ini

#### date()

In [None]:
hari_ini.date()

#### time()

In [None]:
now.time()

#### Combine Date and Time into DateTime (.combine)

In [None]:
dt.datetime.combine(hari_ini.date(), now.time())

## Datetime Parsing and Formating

### String to DateTime, strptime()
```
%I : 12-hour
%H : 24-hour
%M : Minute
%p : AM/PM
%y : 18
%Y : 2018
%b : Mar
%m : month (1 to 12)
%d : day
```

In [None]:
dt.datetime.strptime('9-01-18','%d-%m-%y')

In [None]:
dt.datetime.strptime('09-Mar-2018','%d-%b-%Y')

In [None]:
dt.datetime.strptime('2/5/2018 4:49 PM', '%m/%d/%Y %I:%M %p')

### Date/DateTime To String

#### str(), standard string format

In [None]:
str(dt.datetime.now())

#### strftime(), custom string format

In [None]:
d    = dt.datetime.now()
dt.datetime.strftime(d, '%d-%b-%Y')

# Getting External Data

 ## Webscraping using request & BeautifulSoup4
Use webscraping technique only if API is not available

### Library

In [None]:
import requests
from bs4 import BeautifulSoup

In [None]:
url = "https://www.epicurious.com/search/tofu%20chill"
res = requests.get(url)
if (res.status_code == 200):
    soup = BeautifulSoup(res.content,'lxml')
    print (soup.prettify())
else:
    print('Failure')

# Plydata (dplyr for Python)

## Sample Data

In [None]:
n = 200
comp = ['C' + i for i in np.random.randint( 1,4, size  = n).astype(str)] # 3x Company
dept = ['D' + i for i in np.random.randint( 1,6, size  = n).astype(str)] # 5x Department
grp =  ['G' + i for i in np.random.randint( 1,3, size  = n).astype(str)] # 2x Groups
value1 = np.random.normal( loc=50 , scale=5 , size = n)
value2 = np.random.normal( loc=20 , scale=3 , size = n)
#value3 = np.random.normal( loc=5 , scale=30 , size = n)

mydf = pd.DataFrame({
    'comp':comp, 
    'dept':dept, 
    'grp': grp,
    'value1':value1, 
    'value2':value2
    #'value3':value3 
})
mydf.head()

## Column Manipulation

### Copy Column

In [None]:
mydf >> define(newcol = 'value1')                 # simple method for one column

In [None]:
mydf >> define (('newcol1', 'value1'), newcol2='value2')  # method for muiltiple new columns

### New Column from existing Column

**Without specify the new column name**, it will be derived from expression

In [None]:
mydf >> define ('value1*2')

**Specify the new column name**

In [None]:
mydf >> define(value3 = 'value1*2')

Define **multiple** new columns in one go. Observe there are three ways to specify the new columns

In [None]:
mydf >> define('value1*2',('newcol2','value2*2'),newcol3='value2*3')

### Select Column(s)

In [None]:
mydf2 = mydf >> define(newcol1='value1',newcol2='value2')
mydf2.info()

#### By Column Names
**Exact Coumn Name**

In [None]:
mydf2 >> select ('comp','dept','value1')

**Column Name Starts With** ...

In [None]:
mydf2 >> select ('comp', startswith='val')

**Column Name Ends With ...**

In [None]:
mydf2 >> select ('comp',endswith=('1','2','3'))

**Column Name Contains ...**

In [None]:
mydf2 >> select('comp', contains=('col','val'))

#### Specify Column Range

In [None]:
mydf2 >> select ('comp', slice('value1','newcol2'))

### Drop Column(s)

In [None]:
mydf2 >> select('newcol1','newcol2',drop=True)

## Rename Column

In [None]:
mydf.head(80)

**Assignment Method**  
Use when column name does not contain special character

In [None]:
mydf >> rename( val1='value1', val2='value2' )

**Dictionary Method**  
Use when column name contain special character

In [None]:
mydf >> rename( {'val.1' : 'value1',
                 'val.2' : 'value2' })

**Combined Method**  
Combine both assignment and dictionary method

In [None]:
mydf >> rename( {'val.1' : 'value1',
                 'val.2' : 'value2'
              }, group = 'grp' )

## Sorting (arrange)
Use **'-colName'** for decending

In [None]:
mydf >> arrange('comp', '-value1')

## Grouping

In [None]:
mydf.info()

In [None]:
gdf = mydf >> group_by('comp','dept')
type(gdf)

## Summarization

### Simple Method
**Passing Multiple Expressions**

In [None]:
gdf >> summarize('n()','sum(value1)','mean(value2)')

### Specify Summarized Column Name

**Assignment Method**  
- Passing colName='expression'**  
- Column name cannot contain special character

In [None]:
gdf >> summarize(count='n()',v1sum='sum(value1)',v2_mean='mean(value2)')

**Tuple Method ('colName','expression')**  
Use when the column name contain special character

In [None]:
gdf >> summarize(('count','n()'),('v1.sum','sum(value1)'),('s2.sum','sum(value2)'),v2mean=np.mean(value2))

### Number of Rows in Group
- n()        : total rows in group  
- n_unique() : total of rows with unique value

In [None]:
gdf >> summarize(count='n()', va11_unique='n_unique(value1)')