# Basics of Python - Data Types of (core) Python (plus NumPy)

- We are coding by using Jupyter Notebook (older name is IPython Notebook). 
- We are NOT using your local computer resources. Instead you are connecting to Kaggle Server (Google), and you are using Kaggle computing resources (CPU, RAM, HDD, etc.)

# Data Types : Numbers

- integers 
- real numbers : especially we call real numbers as **float** numbers. 
- complex numbers : complex = (real) + i(imaginary). In Python, we use j instead of i.
- boolean numbers : True (1) or False (0)

In [2]:
3+5

8

In [3]:
3.5 + 4.7

8.2

In [5]:
3+4j

(3+4j)

In [6]:
(3+4j)*(6+2j)

(10+30j)

In [7]:
True

True

In [9]:
False

False

# Variable

- A variable is a space to save your data. 
- How to define : (your own variable name) = (your data)
- "=" does not mean "equal to" in the usual math, but rather "=" means the assignment operation. 

In [10]:
a = 3+5

In [11]:
a

8

In [12]:
a = 3+4j

In [13]:
a

(3+4j)

# Container (Compound) Data Types 

- Data = Collections of elements that we are interested in. 
- We discuss how to save data and process it.

- List : !
- Tuple
- String
- Dictionary
- Set
- NumPy.array : !

## 1. List

- List is a collection of any types of data (numbers, symbols, another list, etc.)
- How to define a list? : You can use squared brackets [].

In [14]:
a = [1,2,3,4,5]

In [15]:
a

[1, 2, 3, 4, 5]

In [16]:
a = [1, 5, 6, 2, 10, 12, 24, 45, 37, 25, 46]

## 2. NumPy Array

- What is NumPy? : "a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays."
- What is NumPy Array? : Simply speaking, it is a matrix in linear algebra. 
- How can we use NumPy? : Since NumPy is not provided by the default Python, we need to **import** NumPy.
- Why do you need to use NumPy Array, instead of using List?

### How to import external libraries such as NumPy

In [17]:
import numpy

In [18]:
numpy.sqrt(2)

1.4142135623730951

In [19]:
import numpy as np

In [20]:
np.sqrt(2)

1.4142135623730951

### How to define NumPy arrays

- Variables a and b have the same collections of numbers
- But a and b have different formats; a is a list, while b is a numpy array. 

In [21]:
b = np.array([1, 5, 6, 2, 10, 12, 24, 45, 37, 25, 46])

In [22]:
a

[1, 5, 6, 2, 10, 12, 24, 45, 37, 25, 46]

In [23]:
b

array([ 1,  5,  6,  2, 10, 12, 24, 45, 37, 25, 46])

### Q. Why do we need to use NumPy array?

- NumPy array is more efficient and faster than List when you do some computation. 
- When you have very small size of data, you cannot recognize difference in computational speeds between NumPy Array and List.
- But, if you need to handle a sort of **big data**, NumPy array boosts up calculations. 

## 3. Calculating Statistical quantities

- Practices on descriptive statistics : Mean, Variance, Std, Skewness, Kurtosis, etc. 
- We need to use some statistical libraries.
- Actually NumPy itself provides statistical analysis tools. 
- We can also use a different library, which is called SciPy.  
- In this course, we will use NumPy and SciPy for statistical analysis. 


- https://numpy.org/doc/stable/reference/routines.statistics.html
- https://docs.scipy.org/doc/scipy/reference/stats.html#module-scipy.stats

In [24]:
import scipy as sp

In [25]:
a

[1, 5, 6, 2, 10, 12, 24, 45, 37, 25, 46]

In [26]:
np.sort(a)

array([ 1,  2,  5,  6, 10, 12, 24, 25, 37, 45, 46])

In [27]:
np.median(a)

12.0

In [28]:
np.mean(a)

19.363636363636363

### Variance Calculation

- If your data is population, then you use the population variance. 
- If your data is sampled from a population, you need to use the sample variance. 
- Difference between population and sample variances? : prefactor $\frac{1}{N}$ vs $\frac{1}{n-1}$

#### CASE 1: Your data is population data. 

In [29]:
np.var(a)

263.3223140495868

In [30]:
np.std(a)

16.22720906531948

In [31]:
np.sqrt(np.var(a))

16.22720906531948

#### CASE 2: Your data is sampled from the population

In [32]:
np.var(a, ddof=1)

289.6545454545454

In [33]:
np.std(a, ddof=1)

17.019240448813967

### Skewness

In [35]:
import scipy.stats as stats

In [36]:
stats.skew(a)

0.5002591114597914

# Next Week

- Visualizating data: Histrograms, LinePlot, etc. 
- Calculating Skewness, Kurtosis, etc. 
- Matplotlib (Visualization), Seaborn (Visualization), Pandas (data processing and analsysis)