# Introduction to pandas 
In this notebook we are going to learn the basics of pandas. 
Pandas is a Python library that is used to handle datasets using standarized data structures.  

## 1D data
The most basic data structure in pandas is called a "Series" which is used to handle 1-dimensional data. 

In [2]:
# Import libraries
# General purpose
import os
import sys
# Tensor manipulation
import numpy as np
# Data manipulation
import pandas as pd

## Working with lists
The most basic data structure for 1 dimensional data are lists.
Here is the documentation:
https://docs.python.org/3/tutorial/datastructures.html

In [21]:
# Example 1D data
array = [x for x in range(6)]
print("Example 1d array (list data type): {}".format(array))

Example 1d array (list data type): [0, 1, 2, 3, 4, 5]


In [22]:
# Some useful built-in methods for lists
print("This is how you find a number efficiently: {}".format(array.index(3)))
print("\nThis is how you slice a vector:\n    slicing until position 2: {} \n    slicing from position 2 to the end:{}".format(array[:2], array[2:]))

This is how you find a number efficiently: 3

This is how you slice a vector:
    slicing until position 2: [0, 1] 
    slicing from position 2 to the end:[2, 3, 4, 5]


In [14]:
array.remove(3)
print("\nThis is how you remove an element: {}".format(array))


This is how you remove an element: [0, 1, 2, 4, 5]


## Working with numpy arrays
When working with 1d series (tensors in general), it is better to use numpy arrays because they are efficient when dealing with huge vectors. 
Numpy documentation:
https://docs.scipy.org/doc/

In [27]:
# This is how you declare a 1d array in numpy
numpyArray = np.array([x for x in range(6)])
print("Array type: {}\nExample 1d numpy array: {}".format(type(array), array))

Array type: <class 'list'>
Example 1d numpy array: [0, 1, 2, 3, 4, 5]


In [26]:
# Some useful methods
print("This is how you find the position of an element: {}".format(np.where(numpyArray == 2)))

This is how you find the position of an element: (array([2], dtype=int64),)


In [28]:
print("This is how you can manipulate numpy arrays easily: {}".format(np.where(numpyArray > 3)))

This is how you can manipulate numpy arrays easily: (array([4, 5], dtype=int64),)


## Working with pandas series
The easiest way to work with 1 dimensional data is using pandas. Which has tons of built in methods to perform very complex operations that are useful for data exploration and visualization. 

In [51]:
series = pd.Series([6, 6, 7, 3, 9], index = ["a", "b", "c", "d", "e"])

In [52]:
print("This is a panda's series: \n{}".format(series))

This is a panda's series: 
a    6
b    6
c    7
d    3
e    9
dtype: int64


In [53]:
print("We usually want a quick summary that describes the variability and central tendency of our data: \n{}"\
      .format(series.describe()))

We usually want a quick summary that describes the variability and central tendency of our data: 
count    5.000000
mean     6.200000
std      2.167948
min      3.000000
25%      6.000000
50%      6.000000
75%      7.000000
max      9.000000
dtype: float64


In [54]:
print("In pandas there are two ways to search for data.\n The first is to use the indexes of the rows: {}"\
      .format(series.loc["b"]))

In pandas there are two ways to search for data.
 The first is to use the indexes of the rows: 6


In [56]:
print("The second way is to use the position of the value in the array: {}".format(series.iloc[0]))

The second way is to use the position of the value in the array: 6


In [58]:
print("Finally, it is necessary to filter values when working with series: \n{}".format(series > 3))

Finally, it is necessary to filter values when working with series: 
a     True
b     True
c     True
d    False
e     True
dtype: bool


In [61]:
print("Sometimes it is more intuitive to use numpy functions: \n", np.where(series > 3))

Sometimes it is more intuitive to use numpy functions: 
 (array([0, 1, 2, 4], dtype=int64),)
