# Content

* What is Python??
    * The origin
    * The eco-system
* Python basics
    * Data structure
    * Pandas dataframe
    * Function and Class definition
    * Package manager and environments
* File Input/Ouput
* Database connection
* Web datacrawling
* Data visualization
* Model fitting

## 0. What is Python??

### 0.1. The origin ...

* High-level programming language for general-purpose programming
* Supports multiple programming paradigms
    * Object-oriented
    * Functional
    * Procedural
    * Imperative
* Easy interface with other languages, such as C++/Java
* A large and comprehensive standard library
* Not so fast though …

![title](../pics/history.png)

### 0.2. The eco-system

![title](../pics/ecosystem.png)

### 0.3. Python 2 vs. Python 3

In [7]:
from IPython.display import IFrame
IFrame('https://pythonclock.org/', width=700, height=200)

### 0.4. IDE
(picture source: https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook)

* Jupyter Notebook
![title](../pics/jupyternotebook.gif)

* Visual Studio Code
(picture source: https://code.visualstudio.com/docs/python/editing)
![title](../pics/vscode.gif)
* PyCharm
* Spyder
* Atom
* ...

### 0.5.Prerequisite

In order to run the following code
* Anaconda (recommended) (https://www.anaconda.com/distribution/)
* Library list included (requirements.txt)

In order to run the notebook in presentation mode
* RISE extension to jupyter notebook (https://github.com/damianavila/RISE)

## 1. Python Basics

### 1.1 Data structures

* Variable definition

In [2]:
a = 123
print(a)

123


* Iterables:

In [8]:
# list
a = [1, 'a', 3]
a

[1, 'a', 3]

In [25]:
a[1]

'a'

In [9]:
a.append(4) ## append values to the list
a

[1, 'a', 3, 4]

In [27]:
# Set
a = {1,2,3}
b = {2,3,4}
print(a)
print(b)

{1, 2, 3}
{2, 3, 4}


In [33]:
# a[0]

In [29]:
a.update([5,6,7]) ## append values
a

{1, 2, 3, 5, 6, 7}

In [30]:
a - b

{1, 5, 6, 7}

In [31]:
b - a

{4}

In [32]:
a&b ## intersection

{2, 3}

In [9]:
a|b ## union

{1, 2, 3, 4}

In [34]:
# Dictionary

x = {'a':1,'b':[2,3,4],'c':{'d':[1,2,3]}}

In [35]:
x['a']

1

In [36]:
x['c']['d']

[1, 2, 3]

In [37]:
x['e'] = 5 ## add new entry
x

{'a': 1, 'b': [2, 3, 4], 'c': {'d': [1, 2, 3]}, 'e': 5}

* Pandas dataframe

In [68]:
import pandas as pd ## import the pandas library

In [69]:
df = pd.DataFrame({'name':['ABC','DEF','GHI','JKL'],'age':[20,30,40,50]}) # create from a dictionary
df

Unnamed: 0,name,age
0,ABC,20
1,DEF,30
2,GHI,40
3,JKL,50


In [70]:
names = ['ABC','DEF','GHI','JKL']
ages = [20,30,40,50]
df = pd.DataFrame(zip(names,ages), columns=['name','ages'])
df

Unnamed: 0,name,ages
0,ABC,20
1,DEF,30
2,GHI,40
3,JKL,50


In [71]:
df.loc[:,'name'] # slice by column name

0    ABC
1    DEF
2    GHI
3    JKL
Name: name, dtype: object

In [72]:
df.iloc[:,0] # slice by column index

0    ABC
1    DEF
2    GHI
3    JKL
Name: name, dtype: object

In [73]:
df.loc[df.name=='ABC'] # slice by condition

Unnamed: 0,name,ages
0,ABC,20


In [65]:
df2 = pd.DataFrame({'name':['DEF','GHI','JKL'], 'hometown':['Atlanta, GA', 'Atlanta, GA', 'Knoxville, TN']})
df2

Unnamed: 0,name,hometown
0,DEF,"Atlanta, GA"
1,GHI,"Atlanta, GA"
2,JKL,"Knoxville, TN"


In [77]:
df_combo = pd.concat([df,df2],axis=0,sort=False) # stack two dataframes
df_combo

Unnamed: 0,name,ages,hometown
0,ABC,20.0,
1,DEF,30.0,
2,GHI,40.0,
3,JKL,50.0,
0,DEF,,"Atlanta, GA"
1,GHI,,"Atlanta, GA"
2,JKL,,"Knoxville, TN"


In [79]:
df_combo = pd.merge( # join dataframes
    df,
    df2,
    on='name',
    how='left'
) ## Other tools are available to do sql like operation on dataframe (https://pypi.org/project/pandasql/)
df_combo

Unnamed: 0,name,ages,hometown
0,ABC,20,
1,DEF,30,"Atlanta, GA"
2,GHI,40,"Atlanta, GA"
3,JKL,50,"Knoxville, TN"


In [80]:
df_combo

Unnamed: 0,name,ages,hometown
0,ABC,20,
1,DEF,30,"Atlanta, GA"
2,GHI,40,"Atlanta, GA"
3,JKL,50,"Knoxville, TN"


In [81]:
df_combo.groupby('hometown').size().reset_index() # simple statistics

Unnamed: 0,hometown,0
0,"Atlanta, GA",2
1,"Knoxville, TN",1


In [83]:
df_combo.groupby(['name','hometown']).size().reset_index().rename(columns={0:'frequency'})

Unnamed: 0,name,hometown,frequency
0,DEF,"Atlanta, GA",1
1,GHI,"Atlanta, GA",1
2,JKL,"Knoxville, TN",1


In [85]:
df_combo['num_pets'] = [1,2,2,0] # create a pivot table
df_combo.pivot_table(
    index='name',
    columns='hometown',
    values='num_pets',
    aggfunc='sum'
).fillna(0)

hometown,"Atlanta, GA","Knoxville, TN"
name,Unnamed: 1_level_1,Unnamed: 2_level_1
DEF,2.0,0.0
GHI,2.0,0.0
JKL,0.0,0.0


* Function definition
    * Regular function
    * Lambda function

* Class definition

* Value assignment

### 1.2. Control statement

* Loop (if, while, comprehension, etc.)
* Conditional statement

### 1.3. Package manager and environment

* conda
* virtualenv

## 2. File Input/Output

* Read file
* Write file
* Pandas dataframe example

## 3. Database Connection

* Common tools
* MySQL example
* AWS example

## 4. Webpage crawling

* Common tools
* Regex match
* Example (www.advantage.com, grab all car rental addresses)


In [19]:
IFrame("https://www.advantage.com/us-location/", width=1400, height=400)

## 5. Data Visualization

* 5.1. X-Y plot

* 5.2. Bar chart

* 5.3. Histogram

* 5.4. Heatmap

* 5.5. Visualization on geolocation map

## 6. Model Fitting Examples

* 6.1. Linear Regression (randomly generated sample data)

* 6.2. k-Means (Iris dataset)

* 6.3 PCA analysis on an image