# Environment Setup

## Notebook Setup

In [111]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:75% !important; margin-left:350px; }</style>"))

## Most Common Libraries

In [2]:
import numpy as np
import pandas as pd
import datetime as dt

import matplotlib
import matplotlib.pyplot as plt

#from plydata import define, query, select, group_by, summarize, arrange, head, rename
#import plotnine
#from plotnine import *

### numpy
- large multi-dimensional array and matrices  
- High level mathematical funcitons to operate on them 
- Efficient array computation, modeled after matlab  
- Support vectorized array math functions (built on C, hence faster than python for loop and list)  


### scipy
- Collection of mathematical algorithms and convenience functions built on the numpy extension  
- Built uponi **numpy**

### Pandas
- Data manipulation and analysis 
- Offer data structures and operations for manipulating numerical tables and time series  
- Good for analyzing tabular data  
- Use for exploratory data analysis, data pre-processing, statistics and visualization
- Built upon **numpy**

### scikit-learn
- Machine learning functions  
- Built on top of scipy

### matplotlib
- Data Visualization

## Magic Functions

- IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax  
- There are two types of magics:  
    - **Line Magic** : prefix with %  
    Work much like OS command-line calls: they get as an argument the rest of the line, where arguments are passed without parentheses or quotes. Lines magics can return results and can be used in the right hand side of an assignment  
    - **Cell Magic**  : prefix with %%  
    They are functions that get as an argument not only the rest of the line, but also the lines below it in a separate argument.

### List of Magic 

In [82]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cd  %clear  %cls  %colors  %config  %connect_info  %copy  %ddir  %debug  %dhist  %dirs  %doctest_mode  %echo  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %macro  %magic  %matplotlib  %mkdir  %more  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %ren  %rep  %rerun  %reset  %reset_selective  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%cmd  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%py

### Line Magic
Execute magic on each line

#### %timeit
- Run the line for default 7 times (use -r to specify)
- Each **run** has default 100,000,000 loops (use -n to specify)

In [83]:
%timeit -r 2 -n 100 3+1000/0.25*100

36.4 ns ± 0 ns per loop (mean ± std. dev. of 2 runs, 100 loops each)


#### %matplotlib
Output graph inline to frontend (Jupyter Notebook). Therefore is stored in the Notebook document

In [84]:
%matplotlib inline

#### %who
- Analyse variables of global scope  
- Specify optional type to filter the variables

In [85]:
a = 1
type(a)

int

In [86]:
%who int

a	 


In [87]:
%who

HTML	 NamespaceMagics	 a	 aes	 annotate	 arrange	 arrow	 as_labeller	 coord_cartesian	 
coord_equal	 coord_fixed	 coord_flip	 coord_trans	 define	 display	 dt	 element_blank	 element_line	 
element_rect	 element_text	 expand_limits	 facet_grid	 facet_null	 facet_wrap	 geom_abline	 geom_area	 geom_bar	 
geom_bin2d	 geom_blank	 geom_boxplot	 geom_col	 geom_count	 geom_crossbar	 geom_density	 geom_dotplot	 geom_errorbar	 
geom_errorbarh	 geom_freqpoly	 geom_histogram	 geom_hline	 geom_jitter	 geom_label	 geom_line	 geom_linerange	 geom_path	 
geom_point	 geom_pointrange	 geom_polygon	 geom_qq	 geom_quantile	 geom_rect	 geom_ribbon	 geom_rug	 geom_segment	 
geom_smooth	 geom_spoke	 geom_step	 geom_text	 geom_tile	 geom_violin	 geom_vline	 get_ipython	 getsizeof	 
ggplot	 ggsave	 ggtitle	 group_by	 guide_colorbar	 guide_colourbar	 guide_legend	 guides	 head	 
json	 label_both	 label_context	 label_value	 labeller	 labs	 lims	 matplotlib	 np	 
pd	 plotnine	 plt	 position_dodge	 position_fill

### Cell Magic
Execute magic on the entire cell

#### %%timeit
- Run the line for default 7 times (use -r to specify)
- Each **run** has default 100,000,000 loops (use -n to specify)

In [88]:
%%timeit -r 1 -n 10
import time
for _ in range(100):
    time.sleep(0.01)# sleep for 0.01 seconds

1.25 s ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)


## Package Management
### Conda

#### Conda Environment

In [89]:
!conda info


     active environment : base
    active env location : C:\ProgramData\Anaconda3
            shell level : 1
       user config file : C:\Users\YKS-NIC\.condarc
 populated config files : C:\Users\YKS-NIC\.condarc
          conda version : 4.5.4
    conda-build version : 3.10.7
         python version : 3.6.5.final.0
       base environment : C:\ProgramData\Anaconda3  (read only)
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/win-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/win-64
                          https://repo.anaconda.com/pkgs/pro/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
  

#### Package Version

In [90]:
!conda list

# packages in environment at C:\ProgramData\Anaconda3:
#
# Name                    Version                   Build  Channel
_ipyw_jlab_nb_ext_conf    0.1.0            py36he6757f0_0  
alabaster                 0.7.10           py36hcd07829_0  
anaconda                  custom           py36h363777c_0  
anaconda-client           1.6.14                   py36_0  
anaconda-navigator        1.8.7                    py36_0  
anaconda-project          0.8.2            py36hfad2e28_0  
asn1crypto                0.22.0           py36h8e79faa_1  
astroid                   1.5.3            py36h9d85297_0  
astropy                   2.0.2            py36h06391c4_4  
babel                     2.5.0            py36h35444c1_0  
backcall                  0.1.0                    py36_0  
backports                 1.0              py36h81696a8_1  
backports.shutil_get_terminal_size 1.0.0            py36h79ab834_2  
beautifulsoup4            4.6.0            py36hd4cc5e8_1  
bitarray                  0

#### Package Installation
Conda is recommended distribution.  

To install from **official** conda channel:
```
conda install <package_name>  # always install latest
conda install <package_name=version_number>
```
To install from **conda-forge community** channel:
```
conda install -c conda-forge <package_name>
conda install -c conda-forge <package_name=version_number>
```

```
# conda official channel
conda install numpy
conda install scipy
conda install pandas
conda install matpotlib
conda install scikit-learn
conda install seaborn
conda install pip

# conda-forge community channel
conda install -c conda-forge plotnine
```

### PIP
Use **pip** if the package is not available in conda.

#### Package Version

In [91]:
!pip list

alabaster (0.7.10)
anaconda-client (1.6.14)
anaconda-navigator (1.8.7)
anaconda-project (0.8.2)
asn1crypto (0.22.0)
astroid (1.5.3)
astropy (2.0.2)
babel (2.5.0)
backcall (0.1.0)
backports.shutil-get-terminal-size (1.0.0)
beautifulsoup4 (4.6.0)
bitarray (0.8.1)
bkcharts (0.2)
blaze (0.11.3)
bleach (2.0.0)
bokeh (0.12.16)
boto (2.48.0)
Bottleneck (1.2.1)
CacheControl (0.12.3)
certifi (2018.4.16)
cffi (1.10.0)
chardet (3.0.4)
click (6.7)
cloudpickle (0.4.0)
clyent (1.2.2)
colorama (0.3.9)
comtypes (1.1.2)
conda (4.5.4)
conda-build (3.10.7)
conda-verify (2.0.0)
contextlib2 (0.5.5)
cryptography (2.0.3)
cycler (0.10.0)
Cython (0.26.1)
cytoolz (0.8.2)
dask (0.15.3)
datashape (0.5.4)
decorator (4.1.2)
distlib (0.2.5)
distributed (1.19.1)
docutils (0.14)
entrypoints (0.2.3)
et-xmlfile (1.0.1)
fastcache (1.0.2)
filelock (2.0.12)
Flask (0.12.2)
Flask-Cors (3.0.3)
gevent (1.2.2)
glob2 (0.5)
greenlet (0.4.12)
h5py (2.7.0)
heapdict (1.0.0)
html5lib (0.999999999)
idna (2.6)
imageio (2.2.0)
imagesize

You are using pip version 9.0.1, however version 10.0.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


#### Package Installation
```
pip install <package_name>

pip install plydata
```

# Python Fundamental

## Variable and Values
- Every varibales in python are **objects**  
- Every variable assginment is **reference based**, that is, each object value is the reference to memory block of data 

In [92]:
# a,b refer to the same memory location
a = 123
b = a  
print ('Data of a =',a,'\nData of b =',b)
print ('ID of a = ', id(a))
print ('ID of b = ',id(b))

Data of a = 123 
Data of b = 123
ID of a =  1990753152
ID of b =  1990753152


Changing data value (using assignment) actually changed the reference value

In [93]:
a = 123
b = a
a = 456  # reassignemnt changed a memory reference
         # b memory reference not changed
print ('Data of a =',a,'\nData of b =',b)
print ('ID of a = ', id(a))
print ('ID of b = ',id(b))

Data of a = 456 
Data of b = 123
ID of a =  1761565045776
ID of b =  1990753152


## Assignment

### Multiple Assignment

In [94]:
x = y = 3
print (x,y)

3 3


### Augmented Assignment

In [95]:
x = 1
y = x + 1
y += 1
print (y)

3


### Unpacking Assingment

In [96]:
x,y = 1,3
print (x,y)

1 3


# Built-in Data Types

## Numbers

### Integer

In [97]:
n = 123
type (n)

int

### Float

In [98]:
f = 123.4
type (f)

float

### Number Operators

**Division** always return float

In [99]:
print(4/2)  # return int
type(4/2)

2.0


float

**Integer Division** return truncated int or float

In [100]:
print (8//3)    # return int
print (8//3.2)  # return float

2
2.0


**Remainder** return either float or integer

In [101]:
print (8%3)    # return int
print (8%3.2)  # return float

2
1.5999999999999996


**Power** return int or float

In [102]:
print (2**3)    # return int
print (2.1**3)  # return float
print (2**3.1)  # return float

8
9.261000000000001
8.574187700290345


## String
### A List of Characters
String is an object class 'str'. It is an **ordered collection of letters**, an array of object type **str**

In [12]:
s='abcde'
print( type(s) )
print( s[0], s[1], s[2] )
print( len(s) )
print(type(s[1]))

<class 'str'>
a b c
5
<class 'str'>


### Immutable
- String is **immuatable**. Changing its content will result in **error** 
- Changing the variable completley change the reference

In [1]:
s = 'abcde'
print ('s : ', id(s))
s = 'efgh'
print ('s : ', id(s))

s :  1786613484712
s :  1786613484544


In [104]:
## s[1] = 'z' # error

### Slicing
```
string[start:end:step]  # default step:1, start:0, end:last
```
If step is negative, end value must be lower than start value

In [105]:
s = 'abcdefghijk'
print (s[0])      # first later
print (s[:3])     # first 3 letters
print (s[2:8:2])  # stepping
print (s[-1])     # last letter
print (s[-3:])    # last three letters
print (s[::-1])   # reverse everything
print (s[8:2:-1])

a
abc
ceg
k
ijk
kjihgfedcba
ihgfed


### Searching
```
string.find() return position of first occurance. -1 if not found
```

In [106]:
s='I love karaoke, I know you love it oo'
print (s.find('lov'))
print (s.find('kemuning'))

2
-1


### Pattern Matching
#### Containing (in)

For single string, **partial match**

In [29]:
print( 'abc' in '123abcdefg' )

True


For list of strings, **exact match**. Workaround for partial match in list of strings, **convert list to single string**

In [31]:
print( 'abc' in ['123','abcdefg'] )
print( 'abc' in ['abcdefg','123'] )
print( 'abc' in ['123','abc','def'] )
print( 'abc' in str(['123','abcdefg']) )

False
False
True
True


#### Extraction
Extract all matching items within list

In [32]:
newlist = []
for x in s:
    if 'abc' in x:
        newlist.append(x)
newlist

[]

Use **list comprehension** shorthand to achieve the same above

In [29]:
[x for x in s if 'abc' in x]

['aaabc', 'abcd', 'abcddd', 'abcdee']

### Concatenating Strings

In [107]:
'this is ' + 'awesome'

'this is awesome'

### Splitting Strings
Splitting delimeter is specified. Observe the empty spaces were conserved in result array

In [108]:
animals = 'a1,a2 ,a3, a4'
animals.split(',')

['a1', 'a2 ', 'a3', ' a4']

### Stripping Off Trailing Empty Spaces (Front and Back)

In [109]:
filename = '  myexce l.   xls   '
filename.strip()

'myexce l.   xls'

### Convert to Upper/Lower Case

In [34]:
'myEXEel.xls'.upper()

'MYEXEEL.XLS'

In [35]:
'myEXEel.xls'.lower()

'myexeel.xls'

### Comparison

In [39]:
a='abc'
b='abc'
print(a==b)
print(a!=b)

True
False


## Boolean

In [36]:
b = False

if (b):
    print ('It is true')
else:
    print ('It is fake')
    

It is fake


### What is Considered False ?
Everything below are false, **anything else are true**

In [113]:
print (bool(0))      # zero
print (bool(None))   # none
print (bool(''))     # empty string
print (bool([]))     # empty list
print (bool(()))     # empty tupple
print (bool(False))  # False
print (bool(2-2))    # expression that return any value above

False
False
False
False
False
False
False


### ```and``` operator
- **and** can return different data types  
- If evaluated result is **True**, the last **True Value** is returned (because python need to evaluate up to the last value)  
- If evaluated result is **False**, the first **False Value** will be returned (because python return it immediately when detecting False value)

In [114]:
print (123 and 2 and 1)
print (123 and () and 2)

1
()


### ```not``` operator

In [115]:
not (True or False)

False

### ```or``` operator
- **or** can return different data type  
- If evaluated result is True, first **True Value** will be returned  (right hand side value **need not be evaluated**)  
- If evaluated result is False, last **Fasle Value** will be returned (need to evalute all items before concluding False)

In [116]:
print (1 or 2)
print (0 or 1 or 1)
print (0 or () or [])

1
1
[]


## None
### None is Object
- None is a Python **object NonType**  
- Any operation to None object will result in **error**  
- For array data with None elements, verification is required to check through iteration to determine if the item is not None. It is very computaionaly heavy 

In [96]:
type(None)

NoneType

In [117]:
t = np.array([1,2,3,4,5])
t.dtype  #  its an integer

dtype('int32')

In [118]:
t = np.array([1, 2, 3, None, 4, 5])
t.dtype  # it's an object

dtype('O')

### Comparing None
**Not Prefered Method**

In [101]:
null_variable = None
print( null_variable == None )

True


**Prefered**

In [106]:
print( null_variable is None )
print( null_variable is not None )

True
False


# Built-In Data Structure

## Tuple
Tuple is an **immutable list**. Any attempt to change/update tuple will return error. It can contain **different types** of object.

Benefits of tuple against List are:
- **Faster** than list
- **Protects** your data against accidental change
- Can be used as key in dictionaries, list can't

### Assignment

#### (item1, item2, item3)
This is a formal syntax for defining tuple, items inside (  ) notation

In [119]:
t = (1,2,3,'o','apple')
t

(1, 2, 3, 'o', 'apple')

In [120]:
type(t)

tuple

#### item1, item2, item3
- Without (  ) notation, it is also considered as tuple  
- However, some functions may not consider this method 

In [121]:
1,2,3,'o','apple'

(1, 2, 3, 'o', 'apple')

### Accessing

In [122]:
print (t[1])
print (type(t[1]))

2
<class 'int'>


In [123]:
print (t[1:3])
type ([t[1:3]])

(2, 3)


list

## List
- List is a collection of **ordered** items, where the items **can be different data types**  
- You can pack list of items by placing them into []  
- List is mutable

### Creating List
#### Empty List

In [124]:
empty = []      # literal assignment method
empty = list()  # constructor method
print (empty)
type(empty)

[]


list

#### Literal Assignment Method
- **Multiple data types** is allowed in a list

In [125]:
mylist = [123,'abc',456]

**Creating List using Constructor Method**  
- Note that **list(string)** will split the string into letters

In [127]:
list('hello')

['h', 'e', 'l', 'l', 'o']

**Creating List using split() method**  
- Split base on spaces (by default) to create a list item

In [128]:
'a bunch of words'.split()

['a', 'bunch', 'of', 'words']

- Split can also break into items base on specified delimter

In [129]:
'a1,a2,a3, a4'.split(',')

['a1', 'a2', 'a3', ' a4']

### Accessing Items

**Access specific index number**

In [45]:
food = ['bread', 'noodle', 'rice', 'biscuit','jelly','cake']
print (food[2])  # 3rd item
print (food[-1]) # last item

rice
cake


**Access range of indexes**

In [46]:
print (food[:4])     # first 3 items
print (food[-3:])    # last 3 items
print (food[1:5])    # item 1 to 4
print (food[5:2:-1]) # item 3 to 5, reverse order
print (food[::-1])   # reverse order

['bread', 'noodle', 'rice', 'biscuit']
['biscuit', 'jelly', 'cake']
['noodle', 'rice', 'biscuit', 'jelly']
['cake', 'jelly', 'biscuit']
['cake', 'jelly', 'biscuit', 'rice', 'noodle', 'bread']


### Properties

**Total Number of Items**

In [47]:
len(food)

6

### Remove Item(s)
Removal of non-existance item will result in error

**Search and remove first occurance** of an item

In [116]:
food = list(['bread', 'noodle', 'rice', 'biscuit','jelly','cake','noodle'])
food.remove('noodle')
print (food)

['bread', 'rice', 'biscuit', 'jelly', 'cake', 'noodle']


**Remove last item**

In [117]:
food.pop()
print (food)

['bread', 'rice', 'biscuit', 'jelly', 'cake']


**Remove item at specific position**

In [118]:
food.pop(1)  # counter start from 0
print(food)

['bread', 'biscuit', 'jelly', 'cake']


In [119]:
food.remove('jelly')
print(food)

['bread', 'biscuit', 'cake']


### Appending Item (s)

**Append One Item**

In [136]:
food.append('jelly')
print (food)

['bread', 'biscuit', 'jelly', 'cake', 'jelly']


**Append Multiple Items**  
**```extend()```** will expand the list/tupple argument and append as multiple items

In [137]:
food.extend(['nand','puff'])
print (food)

['bread', 'biscuit', 'jelly', 'cake', 'jelly', 'nand', 'puff']


### Concateneting Multiple Lists

**Concatenating Lists**
Although you can use '+' operator, however '-' operator is not supported

In [138]:
['dog','cat','horse'] + ['elephant','tiger'] + ['sheep']

['dog', 'cat', 'horse', 'elephant', 'tiger', 'sheep']

### Other Methods ###

**Reversing the order of the items**

In [139]:
food.reverse()
food

['puff', 'nand', 'jelly', 'cake', 'jelly', 'biscuit', 'bread']

**Locating the Index Number of An Item**

In [140]:
food.index('biscuit')

5

**Count occurance**

In [120]:
test = ['a','a','a','b','c']
test.count('a')

3

**Sorting The Order of Items**

In [141]:
food.sort()
print (food)

['biscuit', 'bread', 'cake', 'jelly', 'jelly', 'nand', 'puff']


### List is Mutable
The reference list variable won't change after adding/removing its item

In [142]:
food = ['cake','jelly','roti','noodle']
print ('food : ',id(food))
food += ['salad','chicken']
print ('food : ',id(food))

food :  1761565257672
food :  1761565257672


In [143]:
x = [1,2,3]
y = [x,'abc']
print (y)
x[2] = 'k'
print (y)

[[1, 2, 3], 'abc']
[[1, 2, 'k'], 'abc']


A function is actually an **object**, which reference never change, hence **mutable**

In [57]:
def spam (elem, some_list=['a','b']):
    some_list.append(elem)
    return some_list

print (spam(1,['x']))
print (spam(2)) ## second parameter is not passed
print (spam(3)) ##  notice the default was remembered

['x', 1]
['a', 'b', 2]
['a', 'b', 2, 3]


### List Is Iterable

#### For Loop

In [50]:
s = ['abc','abcd','bcde','bcdee','cdefg']
for x in s:
    if 'abc' in x:
        print (x)

abc
abcd


In [59]:
new_list = []
old_list = ['abc','abcd','bcde','bcdee','cdefg']
for x in old_list:
    if 'abc' in x:
        new_list.append(x)
        
print( new_list )

['abc', 'abcd']


In [62]:
new_list = [x for x in old_list if 'abc' in x]
print( new_list)

['abc', 'abcd']


### Built-In Functions Applicable To List
**Number of Elements**

In [122]:
len(food)

3

**Max Value**

In [128]:
test = [1,2,3,5,5,3,2,1]
m = max(test)
test.index(m)  ## only first occurance is found

3

## Dictionaries
Dictionary is a list of index-value items.

### Creating dict
**Creating dict with literals**

Simple Dictionary

In [76]:
animal_counts = { 'cats' : 2, 'dogs' : 5, 'horses':4}
print (animal_counts)
print( type(animal_counts) )

{'cats': 2, 'dogs': 5, 'horses': 4}
<class 'dict'>


Dictionary with list

In [69]:
animal_names = {'cats':   ['Walter','Ra'],
                'dogs':   ['Jim','Roy','John','Lucky','Row'],
                'horses': ['Sax','Jack','Ann','Jeep']
               }
animal_names

{'cats': ['Walter', 'Ra'],
 'dogs': ['Jim', 'Roy', 'John', 'Lucky', 'Row'],
 'horses': ['Sax', 'Jack', 'Ann', 'Jeep']}

**Creating dict with variables**

In [83]:
cat_names = ['Walter','Ra','Jim']
dog_names = ['Jim','Roy','John','Lucky','Row']
horse_names= ['Sax','Jack','Ann','Jeep']
animal_names = {'cats': cat_names, 'dogs': dog_names, 'horses': horse_names}
animal_names

{'cats': ['Walter', 'Ra', 'Jim'],
 'dogs': ['Jim', 'Roy', 'John', 'Lucky', 'Row'],
 'horses': ['Sax', 'Jack', 'Ann', 'Jeep']}

### Accessing dict
Find out the list of keys using **keys()**  

In [74]:
print (animal_names.keys())
print (sorted(animal_names.keys()))

dict_keys(['cats', 'dogs', 'horses'])
['cats', 'dogs', 'horses']


Find out the list of values using **values()**

In [77]:
print (animal_names.values())
print (sorted(animal_names.values()))

dict_values([['Walter', 'Ra'], ['Jim', 'Roy', 'John', 'Lucky', 'Row'], ['Sax', 'Jack', 'Ann', 'Jeep']])
[['Jim', 'Roy', 'John', 'Lucky', 'Row'], ['Sax', 'Jack', 'Ann', 'Jeep'], ['Walter', 'Ra']]


**Refer a dictionary item using index**

In [78]:
animal_names['dogs']

['Jim', 'Roy', 'John', 'Lucky', 'Row']

**Accessing non-existance key natively will return Error**

In [151]:
##animal_count['cow']

**Accessing non-existance key** with **get()** will return None

In [152]:
print (animal_counts.get('cow'))

None


### Dict are Mutable
Use **[key]** notation to update the content of element. However, if the key is non-existance, this will return error.

In [90]:
animal_names['dogs'] = ['Ali','Abu','Bakar']
animal_names

{'cats': ['Walter', 'Ra', 'Jim'],
 'dogs': ['Ali', 'Abu', 'Bakar'],
 'horses': ['Sax', 'Jack', 'Ann', 'Jeep']}

Use **```clear()```** to erase all elements

In [None]:
animal_names.clear()

## Sets
Set is unordered collection of unique items

In [153]:
myset = {'a','b','c','d','a','b','e','f','g'}
print (myset) # notice no repetition values

{'g', 'd', 'f', 'c', 'a', 'b', 'e'}


### Membership Test

In [154]:
print ('a' in myset)      # is member ?
print ('f' not in myset)  # is not member ?

True
False


### Subset Test
Subset Test : <=  
Proper Subset Test : <

In [155]:
mysubset = {'d','g'}
mysubset <= myset

True

Proper Subset test that the master set **contain at least one element** which is not in the subset

In [156]:
mysubset = {'b','a','d','c','e','f','g'}
print ('Is Subset : ', mysubset <= myset)
print ('Is Proper Subet : ', mysubset < myset)

Is Subset :  True
Is Proper Subet :  False


### Union using '|'

In [157]:
{'a','b','c'} | {'e','f'}

{'a', 'b', 'c', 'e', 'f'}

### Intersection using '&'
Any elments that exist in both left and right set

In [158]:
{'a','b','c','d'} & {'c','d','e','f'}

{'c', 'd'}

### Difference using '-'
Anything in **left** that is **not in right** 

In [159]:
{'a','b','c','d'} - {'c','d','e','f'}

{'a', 'b'}

## range
**range(X)** generates sequence of integer object
```
range (lower_bound, upper_bound, step_size)  
# lower bound is optional, default = 0
# upper bound is not included in result
# step is optional, default = 1
```

**Use list() to convert in order to view actual sequence of data**

In [160]:
r = range(10)     # default lower bound =0, step =1
print (type (r))
print (r)
print (list(r))

<class 'range'>
range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


**More Examples**

In [161]:
print (list(range(2,8)))    # step not specified, default 1
print ('Odds Number : ' , list(range(1,10,2))) # generate odds number

[2, 3, 4, 5, 6, 7]
Odds Number :  [1, 3, 5, 7, 9]


# Control and Loops

## If Statement
### Multiline If.. Statements

In [162]:
price = 102
if price <100:
    print ('buy')
elif price < 110:
    print ('hold')
elif price < 120:
    print ('think about it')
else:
    print ('sell')
print('end of programming')

hold
end of programming


### Single Line If .. Statement

In [4]:
price = 70
if price<80: print('buy')

buy


In [5]:
price = 85
'buy' if (price<80) else 'dont buy'

'dont buy'

## For Loops

### Loop thorugh 'range'

In [163]:
for i in range (1,10,2):
    print ('Odds Number : ',i) 

Odds Number :  1
Odds Number :  3
Odds Number :  5
Odds Number :  7
Odds Number :  9


### Loop through 'list'
#### Standard For Loop

In [164]:
letters = ['a','b','c','d']
for e in letters:
    print ('Letter : ',e)

Letter :  a
Letter :  b
Letter :  c
Letter :  d


#### List Comprehension

Iterate through existing list, and **build new list** based on condition  
```new_list = [expression(i) for i in old_list]```

In [47]:
s = ['abc','abcd','bcde','bcdee','cdefg']
[x.upper() for x in s]

['ABC', 'ABCD', 'BCDE', 'BCDEE', 'CDEFG']

Extend list comprehension can be extended with **```if```** condition**  
```new_list = [expression(i) for i in old_list if filter(i)]```

In [53]:
old_list    = ['abc','abcd','bcde','bcdee','cdefg']
matching = [ x.upper() for x in old_list if 'bcd' in x ]
print( matching )

['ABCD', 'BCDE', 'BCDEE']


### Loop Through 'Dictionary'
Looping through dict will picup **key**

In [165]:
d = {"x": 1, "y": 2}
for key in d:
    print (key, d[key])

x 1
y 2


## Generators

- Generator is lazy, produce items only if asked for, hence more memory efficient
- Generator is **function** with 'yield' instead of 'return'  
- Generator contains one or more yields statement  
- When called, it returns an object (iterator) but **does not start execution** immediately  
- Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using **next()**  
- Once the function yields, the **function is paused** and the control is transferred to the caller  
- Local variables and their states are **remembered** between successive calls  
- Finally, when the function **terminates**, **StopIteration** is raised automatically on further calls

### Basic Generator Function
Below example give clear understanding of how generator works

In [166]:
def my_gen():
    n = 1
    print('This is printed first')
    # Generator function contains yield statements
    yield n

    n += 1
    print('This is printed second')
    yield n

    n += 1
    print('This is printed at last')
    yield n

In [167]:
a = my_gen()
type(a)

generator

In [168]:
next(a)

This is printed first


1

In [169]:
next(a)

This is printed second


2

In [170]:
next(a)

This is printed at last


3

In [171]:
next(a)

StopIteration: 

### Useful Generator Fuction
Generator is only useful when it uses **for-loop**
- for-loop within generator
- for-loop to iterate through a generator

In [None]:
def rev_str(my_str):
    length = len(my_str)
    for i in range(length - 1,-1,-1):
        yield my_str[i]

In [None]:
for c in rev_str("hello"):
     print(c)

### Generator Expression
Use () to create an annonymous generator function

In [None]:
my_list = [1, 3, 6, 10]
a = (x**2 for x in my_list)

In [None]:
next(a)

In [None]:
next(a)

In [None]:
sum(a) # sum the power of 6,10

### Compare to Iterator Class

In [None]:
class PowTwo:
    def __init__(self, max = 0):
        self.max = max

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n > self.max:
            raise StopIteration

        result = 2 ** self.n
        self.n += 1
        return result

**Obviously, Generator is more concise and cleaner**

In [None]:
def PowTwoGen(max = 0):
    n = 0
    while n < max:
        yield 2 ** n
        n += 1

# Library and Functions
Library are group of functions

## Package Source

### Conda
- Package manager for any language  
- Install binaries

### PIP
- Package manager python only  
- Compile from source  
- Stands for Pip Installs Packages  
- Python's officially-sanctioned package manager, and is most commonly used to install packages published on the **Python Package Index (PyPI)**  
- Both pip and PyPI are governed and supported by the Python Packaging Authority (PyPA).

## Importing Library

There are two methods to import library functions:  

**Standalone Namespace**
```
- import <libName>                        # access function through: libName.functionName
- import <libName> as <shortName>         # access function through: shortName.functionName
```
**Global Namespace**
```
- from   <libName> import *               # all functions available at global namespace
- from   <libName> import <functionName>  # access function through: functionName    
- from   <libName> import <functionName> as <shortFunctionName>  # access function through shortFunctionName
```

### Import Entire Library

#### Import Into Standalone Namespace

In [None]:
import math
math.sqrt(9)

Use **as** for aliasing library name. This is useful if you have conflicting library name

In [None]:
import math as m
m.sqrt(9)

#### Import Into Global Name Space
All functions in the library accessible through global namespace
```
from <libName> import *
```

### Import Specific Function

In [None]:
from math import sqrt
print (sqrt(9))

Use **as** for aliasing function name

In [None]:
from math import sqrt as sq
print (sq(9))

### Machine Learning Packages

![alt text](img/ml_libraries.jpg)

## Define Function

### Function Arguments
By default, arguments are assigned to function left to right

In [215]:
def myfun(x,y):
    print ('x:',x)
    print ('y:',y)
    
myfun(5,8)

x: 5
y: 8


However, you can also specify the argument assigment during function call

In [216]:
myfun (y=8,x=5)

x: 5
y: 8


Function can have **default argement value**

In [110]:
def myfun(x=1,y=1):  # default argument value is 1
    print ('x:',x)
    print ('y:',y)
    
myfun(5)  # pass only one argument

x: 5
y: 1


### List Within Function

Consider a function is an object, its variable (some_list) is immutable and hence its reference won't change, even data changes

In [None]:
def spam (elem, some_list=[]):
    some_list.append(elem)
    return some_list

print (spam(1))
print (spam(2))
print (spam(3))

### Return Statement

In [None]:
def bigger(x,y):
    if (x>y):
        return x
    else:
        return y
    
print (bigger(5,8))

### No Return Statement
if no **return** statement, python return **None**

In [120]:
def dummy():
    print ('This is a dummy function, return no value')

dummy()

This is a dummy function, return no value


### Return Multiple Value
Multiple value is returned as **tuple**. Use multiple assignment to assign to multiple variable

In [None]:
def minmax(x,y,z):
    return min(x,y,z), max(x,y,z)

a,b = minmax(7,8,9)     # multiple assignment
c   = minmax(7,8,9)     # tuple

print (a,b)
print (c)    

### Passing Function as Argument 
You can pass a function name as an argument to a function

In [121]:
def myfun(x,y,f):
    f(x,y)

myfun('hello',54,print)

hello 54


### Arguments

args is a **tuple**

#### Example 1
Error example, too many parameters passed over to function

In [197]:
def myfun(x,y):
    print (x)
    print (y)
    print (z)
    
myfun(1,2,3)

TypeError: myfun() takes 2 positional arguments but 3 were given

#### Example 2
First argument goes to x, remaining goes to args as tuple

In [109]:
def myfun(x,*args):
    print (x)
    print (args)     #tuple
    
myfun(1,2,3,4,5,'abc')

1
(2, 3, 4, 5, 'abc')


#### Example 3
First argument goes to x, second argument goest to y, remaining goes to args

In [185]:
def myfun(x,y,*args):
    print (x)
    print (y)
    print (args)     #tuple
    
myfun(1,2,3)

1
2
(3,)


#### Example 4

In [198]:
def myfun(x,*args, y=9):
    print (x)
    print (y)
    print (args)     #tuple
    
myfun(1,2,3,4,5)

1
9
(2, 3, 4, 5)


#### Example 5
All goes to args

In [188]:
def myfun(*args):
    print (args)     #tuple
    
myfun(1,2,3,4,5)

(1, 2, 3, 4, 5)


#### Example 6 Empty args

In [200]:
def myfun(x,y,*args):
    print (x)
    print (y)
    print (args)
    
myfun(1,2)

1
2
()


### keyword arguments
kwargs is a **dictionary**

#### Example 1

In [211]:
def foo(**kwargs):
    print(kwargs)
    
foo(a=1,b=2,c=3)

{'a': 1, 'b': 2, 'c': 3}


#### Example 2

In [202]:
def foo(x,**kwargs):
    print(x)
    print(kwargs)
    
foo(9,a=1,b=2,c=3)

9
{'a': 1, 'b': 2, 'c': 3}


In [204]:
foo(9) #empty dictionary

9
{}


#### Example 3

In [217]:
def foo(a,b,c,d=1):
    print(a)
    print(b)
    print(c)
    print(d)
    
foo(**{"a":2,"b":3,"c":4})

2
3
4
1


### Mixing *args, **kwargs

Always put args **before** kwargs

#### Example 1

In [210]:
def foo(x,y=1,**kwargs):
    print (x)
    print (y)
    print (kwargs)
    
foo(1,2,c=3,d=4)

1
2
{'c': 3, 'd': 4}


#### Example 2

In [212]:
def foo(x,y=2,*args,**kwargs):
    print (x)
    print (y)
    print (args)
    print (kwargs)
    
foo(1,2,3,4,5,c=6,d=7)

1
2
(3, 4, 5)
{'c': 6, 'd': 7}


# Object Oriented Programming

## Defining Class

- Every function within a class **must have** at least one parameter - **self**, accept it
- Use **init** as the constructor function. **init** is optional

In [None]:
class Person:
    wallet = 0  # 
    def __init__(self, myname,money=0):   # constructor
        self.name = myname
        self.wallet=money
    def say_hi(self):
        print('Hello, my name is : ', self.name)
    def say_bye(self):
        print('Goodbye', Person.ID)
    def take(self,amount):
        self.wallet+=amount
    def balance(self):
        print('Wallet Balance:',self.wallet)

## Object Class Assignment

In [None]:
#p = Person() ## this will fail, as the constructor expect a parameter
p1 = Person('Yong')  
p2 = Person('Gan',200)

## Calling Method

In [None]:
p1.say_hi()
p1.balance()

In [None]:
p2.say_hi()
p2.balance()

## Getting Property

In [None]:
p1.wallet

In [None]:
p2.wallet

## Setting Property

In [None]:
p1.wallet = 900
p1.wallet

# Decorator

## Definition
- **Decorator** is a function that accept callable as the **only argument**
- The main purpose of decarator is to **enhance** the program of the decorated function
- It returns a **callable**

## Examples

### Example 1 - Plain decorator function
- Many times, it is useful to register a function elsewhere - for example, registering a task in a task runner, or a functin with signal handler
- **register** is a decarator, it accept **decorated** as the only argument
- foo() and bar() are the **decorated function** of **register**

In [8]:
registry = []

def register(decorated):
    registry.append(decorated)
    return decorated

@register
def foo():
    return 3

@register
def bar():
    return 5

In [4]:
registry

[<function __main__.foo>, <function __main__.bar>]

In [11]:
registry[0]()

3

In [12]:
registry[1]()

5

### Example 2 - Decorator with Class
- Extending the use case above
- register is the **decarator**, it has only one argument

In [103]:
class Registry(object):
    def __init__(self):
        self._functions = []
    def register(self,decorated):
        self._functions.append(decorated)
        return decorated
    def run_all(self,*args,**kwargs):
        return_values = []
        for func in self._functions:
            return_values.append(func(*args,**kwargs))
        return return_values

The decorator will decorate two functions, for both object **a** and **b**

In [104]:
a = Registry()
b = Registry()

@a.register
def foo(x=3):
    return x

@b.register
def bar(x=5):
    return x

@a.register
@b.register
def bax(x=7):
    return x

Observe the result

In [108]:
print (a._functions)
print (b._functions)

[<function foo at 0x000002AB7B6C6BF8>, <function bax at 0x000002AB7B6C80D0>]
[<function bar at 0x000002AB7B6C6C80>, <function bax at 0x000002AB7B6C80D0>]


In [105]:
print (a.run_all())
print (b.run_all())

[3, 7]
[5, 7]


In [106]:
print ( a.run_all(x=9) )
print ( b.run_all(x=9) )

[9, 9]
[9, 9]


# datetime Standard Library
This is a built-in library by Python. There is no need to install this library.

## ISO8601

https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators

### Date Time
```UTC:   "2007-04-05T14:30Z"      #notice Z
GMT+8:  "2007-04-05T12:30+08:00  #notice +08:00
GMT+8:  "2007-04-05T12:30+0800   #notice +0800
GMT+8:  "2007-04-05T12:30+08     #notice +08```

### Date
2019-02-04 #notice no timezone available

## Module Import

In [2]:
from datetime import date     # module for date object
from datetime import time     # module for time object
from datetime import datetime # module for datetime object
from datetime import timedelta

## Class
datetime library contain **three class of objects**:  
- **date** (year,month,day)  
- **time** (hour,minute,second)  
- **datetime** (year,month,day,hour,minute,second)  
- **timedelta**: duration between two datetime or date object

## date

### Constructor

In [26]:
date(2000,1,1)
date(year=2000,month=1,day=1)

datetime.date(2000, 1, 1)

### Class Method
Class method applies to **class**

#### ```today```
This is **local date** (not UTC)

In [39]:
date.today()

datetime.date(2019, 2, 5)

In [40]:
print( date.today() )

2019-02-05


#### Convert From ISO ```fromisoformat```
```strptime``` is **not available for date** conversion. It is only for datetime conversion

In [13]:
date.fromisoformat('2011-11-11')

datetime.date(2011, 11, 11)

### Instance Method
Intance method **apply to instance(object)**

#### ```replace()```
- Replace year/month/day with specified parameter, non specified params will remain unchange.  
- Example below change only month. You can change year or day in combination

In [9]:
print( date.today() )
print( date.today().replace(month=8) )

2019-02-05
2019-08-05


#### ```weekday()```
For ```weekday()```, Zero being Monday   
For ```isoweekday()```, Zero being Sunday

In [106]:
print( date.today().weekday() )
print( date.today().isoweekday() )

1
2


In [109]:
date.today().isocalendar() # return tuple

(2019, 6, 2)

In [61]:
weekdays = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
wd = date.today().weekday()
print( date.today(), "is day", wd ,"which is", weekdays[wd] )

2019-02-05 is day 1 which is Tue


#### Formating with ```isoformat()```
```isoformat()``` return **ISO 8601 String (YYYY-MM-DD)**

In [12]:
date.today().isoformat() # return string

'2019-02-05'

#### Formating with ```strftime```
For complete directive, see below:  
https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

In [17]:
date.today().strftime("%m/%d")

'02/05'

#### ```isocalendar()```
```isocalendar``` return a 3-tuple, **(ISO year, ISO week number, ISO weekday)**.

In [10]:
date.today().isocalendar() ## return tuple 

(2019, 6, 2)

### Attributes

In [105]:
print( date.today().year )
print( date.today().month )
print( date.today().day )

2019
2
5


## datetime
### Constructor

In [36]:
datetime(2000,1,1,0,0,0)
datetime(year=2000,month=1,day=1,hour=23,minute=15,second=55)

datetime.datetime(2000, 1, 1, 23, 15, 55)

### Class Method
#### ```now``` and ```today```
Both ```now()``` and ```today()``` return current **local**  datetime

In [65]:
datetime.now()

datetime.datetime(2019, 2, 5, 0, 58, 6, 627911)

In [66]:
datetime.today()

datetime.datetime(2019, 2, 5, 0, 58, 9, 154960)

In [38]:
print( datetime.now() )

2019-02-05 00:46:37.208730


#### ```utcnow```

In [112]:
datetime.utcnow()

datetime.datetime(2019, 2, 4, 17, 55, 59, 797875)

#### ```combine()``` date and time
Apply ```datetime.combine()``` module method on both **date and time**  object to get **datetime**

In [74]:
datetime.combine(now.date(), now.time())

datetime.datetime(2019, 2, 5, 1, 3, 23, 911670)

#### Convert from String ```strptime()```
Use **```strptime```** to convert string into **datetime** object
```
%I : 12-hour
%H : 24-hour
%M : Minute
%p : AM/PM
%y : 18
%Y : 2018
%b : Mar
%m : month (1 to 12)
%d : day```

In [117]:
datetime.strptime('2011-02-25','%Y-%m-%d')

datetime.datetime(2011, 2, 25, 0, 0)

In [81]:
datetime.strptime('9-01-18','%d-%m-%y')

datetime.datetime(2018, 1, 9, 0, 0)

In [78]:
datetime.strptime('09-Mar-2018','%d-%b-%Y')

datetime.datetime(2018, 3, 9, 0, 0)

In [79]:
datetime.strptime('2/5/2018 4:49 PM', '%m/%d/%Y %I:%M %p')

datetime.datetime(2018, 2, 5, 16, 49)

#### Convert from ISO ```fromisoformat```
- ```fromisoformat()``` is intend to be reverse of ```isoformat()```  
-  It actually **not ISO compliance**: when Z or +8 is included at the nd of the string, error occur

In [9]:
s = datetime.now().isoformat()
datetime.fromisoformat("2019-02-05T10:22:33")

datetime.datetime(2019, 2, 5, 10, 22, 33)

### Instance Method
#### ```weekday```

In [24]:
datetime.now().weekday()

1

#### ```replace```

In [25]:
datetime.now().replace(year=1999)

datetime.datetime(1999, 2, 5, 9, 42, 34, 38400)

#### convert to ```.time()```

In [34]:
datetime.now().time()

datetime.time(9, 52, 47, 719398)

#### convert to ```.date()```

In [44]:
datetime.now().date()

datetime.date(2019, 2, 5)

#### Convert to String
**```str```**

In [17]:
str( datetime.now() )

'2019-02-05 15:10:11.093005'

**Use ```strftime()```**

In [22]:
datetime.now().strftime('%d-%b-%Y')

'05-Feb-2019'

In [24]:
datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S.%fZ')  ## ISO 8601 UTC

'2019-02-05T07:13:54.691364Z'

**Use ```isoformat()```**

In [53]:
datetime.utcnow().isoformat()

'2019-02-05T02:29:11.607484'

### Attributes

In [22]:
print( datetime.now().year )
print( datetime.now().month )
print( datetime.now().day )
print( datetime.now().hour )
print( datetime.now().minute )

2019
2
5
9
41


## time

### Constructor

In [124]:
print( time(2) )    #default single arugement, hour
print( time(2,15) ) #default two arguments, hour, minute
print( time(hour=2,minute=15,second=30) )

02:00:00
02:15:00
02:15:30


### Class Method
#### ```now()```
There is unfortunately no single function to extract the current time. Use **time()** function of an **datetime** object

In [33]:
datetime.now().time()

datetime.time(9, 52, 3, 374617)

### Attributes

In [36]:
print( datetime.now().time().hour )
print( datetime.now().time().minute )
print( datetime.now().time().second )

9
54
16


## timedelta
- **years** argument is **not supported**  
- Apply timedelta on **datetime** object  
- timedelta **cannot** be applied on **time object**  , because timedelta potentially go beyond single day (24H)

In [95]:
delt = timedelta(days=365,minutes=33,seconds=15)

In [96]:
now = datetime.now()
print ('delt+now : ', delt + now)

delt+now :  2020-02-05 01:54:00.895460


# Getting External Data

 ## Webscraping using request & BeautifulSoup4
Use webscraping technique only if API is not available

### Library

In [None]:
import requests
from bs4 import BeautifulSoup

In [None]:
url = "https://www.epicurious.com/search/tofu%20chill"
res = requests.get(url)
if (res.status_code == 200):
    soup = BeautifulSoup(res.content,'lxml')
    print (soup.prettify())
else:
    print('Failure')

# Plydata (dplyr for Python)

## Sample Data

In [None]:
n = 200
comp = ['C' + i for i in np.random.randint( 1,4, size  = n).astype(str)] # 3x Company
dept = ['D' + i for i in np.random.randint( 1,6, size  = n).astype(str)] # 5x Department
grp =  ['G' + i for i in np.random.randint( 1,3, size  = n).astype(str)] # 2x Groups
value1 = np.random.normal( loc=50 , scale=5 , size = n)
value2 = np.random.normal( loc=20 , scale=3 , size = n)
#value3 = np.random.normal( loc=5 , scale=30 , size = n)

mydf = pd.DataFrame({
    'comp':comp, 
    'dept':dept, 
    'grp': grp,
    'value1':value1, 
    'value2':value2
    #'value3':value3 
})
mydf.head()

## Column Manipulation

### Copy Column

In [None]:
mydf >> define(newcol = 'value1')                 # simple method for one column

In [None]:
mydf >> define (('newcol1', 'value1'), newcol2='value2')  # method for muiltiple new columns

### New Column from existing Column

**Without specify the new column name**, it will be derived from expression

In [None]:
mydf >> define ('value1*2')

**Specify the new column name**

In [None]:
mydf >> define(value3 = 'value1*2')

Define **multiple** new columns in one go. Observe there are three ways to specify the new columns

In [None]:
mydf >> define('value1*2',('newcol2','value2*2'),newcol3='value2*3')

### Select Column(s)

In [None]:
mydf2 = mydf >> define(newcol1='value1',newcol2='value2')
mydf2.info()

#### By Column Names
**Exact Coumn Name**

In [None]:
mydf2 >> select ('comp','dept','value1')

**Column Name Starts With** ...

In [None]:
mydf2 >> select ('comp', startswith='val')

**Column Name Ends With ...**

In [None]:
mydf2 >> select ('comp',endswith=('1','2','3'))

**Column Name Contains ...**

In [None]:
mydf2 >> select('comp', contains=('col','val'))

#### Specify Column Range

In [None]:
mydf2 >> select ('comp', slice('value1','newcol2'))

### Drop Column(s)

In [None]:
mydf2 >> select('newcol1','newcol2',drop=True)

## Rename Column

In [None]:
mydf.head(80)

**Assignment Method**  
Use when column name does not contain special character

In [None]:
mydf >> rename( val1='value1', val2='value2' )

**Dictionary Method**  
Use when column name contain special character

In [None]:
mydf >> rename( {'val.1' : 'value1',
                 'val.2' : 'value2' })

**Combined Method**  
Combine both assignment and dictionary method

In [None]:
mydf >> rename( {'val.1' : 'value1',
                 'val.2' : 'value2'
              }, group = 'grp' )

## Sorting (arrange)
Use **'-colName'** for decending

In [None]:
mydf >> arrange('comp', '-value1')

## Grouping

In [None]:
mydf.info()

In [None]:
gdf = mydf >> group_by('comp','dept')
type(gdf)

## Summarization

### Simple Method
**Passing Multiple Expressions**

In [None]:
gdf >> summarize('n()','sum(value1)','mean(value2)')

### Specify Summarized Column Name

**Assignment Method**  
- Passing colName='expression'**  
- Column name cannot contain special character

In [None]:
gdf >> summarize(count='n()',v1sum='sum(value1)',v2_mean='mean(value2)')

**Tuple Method ('colName','expression')**  
Use when the column name contain special character

In [None]:
gdf >> summarize(('count','n()'),('v1.sum','sum(value1)'),('s2.sum','sum(value2)'),v2mean=np.mean(value2))

### Number of Rows in Group
- n()        : total rows in group  
- n_unique() : total of rows with unique value

In [None]:
gdf >> summarize(count='n()', va11_unique='n_unique(value1)')