# Lab 3.1. Data Structures in Pandas

## The previous section introduced the Pandas DataFrame and Series objects. These data structures resemble the primitive Python data containers (lists and dictionaries) for indexing and labeling, but have additional features that make working with data easier.


## This session will cover:
1. Loading in manual data
2. The Series object
3. Basic operations on Series objects
4. The DataFrame object
5. Conditional subsetting and fancy slicing and indexing
6. Saving out data”



# 0. Let's load and install some libraries

In [None]:
import pandas as pd

In [None]:
import numpy as np

In [None]:
import sys

In [None]:
# we needd this library to save excel files
import sys
!{sys.executable} -m pip install xlwt

In [None]:
# we needd this library to save excel (xlst) files
import sys
!{sys.executable} -m pip install openpyxl

# 1. Let's create our own data

## 1.1. Creating a Series

### The Pandas Series is a one-dimensional container, similar to the built-in Python list. 
### A Pandas Series is very similar to a Python list, except that each element must be of the same dtype. 


In [None]:
mySeries=pd.Series(['banana, apple, coconut,orange'])

In [None]:
mySeries

In [None]:
otherSeries=pd.Series(['4.5','3.5','2.3','0.5','6.7'])

In [None]:
otherSeries

## 1.2. Creating a DataFrame from a dictionary

### The DataFrame is the most common Pandas object. It can be thought of as Python’s way of storing spreadsheet-like data.

### A DataFrame can be thought of as a dictionary of Series objects. This is why dictionaries are the the most common way of creating a DataFrame by hand.
### The key represents the column name, and the values are the contents of the column.



In [None]:
dictionaryOfScientists={'Name': ['Rosaline Franklin', 'William Gosset','Alexander Flemming','Carl F. Gauss'],
                        'Occupation': ['Chemist', 'Statistician','Physician','Mathematician'],
                        'Born': ['1920-07-25', '1876-06-13','1881-08-06','1777-04-10'],
                        'Died': ['1958-04-16', '1937-10-16','1954-03-11','1855-02-23'],
                        'Age': [37, 61,73,77]}



In [None]:
dictionaryOfScientists

In [None]:
DataFrameOfScientists = pd.DataFrame(dictionaryOfScientists)

In [None]:
DataFrameOfScientists

## 1.3. Creating a DataFrame from records

### we can also create a DataFrame from some existing records 

In [None]:
stocksRecords = [(3000, 'Google'), (213, 'Microsoft'), (1250, 'Apple'), (4500, 'Facebook'),(100, 'Tesla')]

In [None]:
stocksRecords

In [None]:
stocksDataFrame=pd.DataFrame.from_records(stocksRecords, columns=['Price', 'Company'])

## 2. Let's select some data

### 2.1. Selection based on column name

In [None]:
DataFrameOfScientists['Name']

In [None]:
DataFrameOfScientists[['Name','Occupation']]

### 2.2. Selection based on filter

In [None]:
# filter to select chemists
chemistFilter=DataFrameOfScientists['Occupation']=='Chemist'

In [None]:
DataFrameOfScientists[chemistFilter]

In [None]:
# filter to select those named Alexander Flemming
alexanderFlemmingFilter=DataFrameOfScientists['Name']=='Alexander Flemming'

In [None]:
DataFrameOfScientists[alexanderFlemmingFilter]

In [None]:
# filter to select those named William 
# we use the function str.contains()
williamFilter=DataFrameOfScientists['Name'].str.contains('William')

In [None]:
DataFrameOfScientists[williamFilter]

In [None]:
# filter to select by age
ageFilter=DataFrameOfScientists['Age']>=60

In [None]:
DataFrameOfScientists[ageFilter]

In [None]:
# filter to select by age range [40,70]
ageFilter2=(DataFrameOfScientists['Age']>=40)&(DataFrameOfScientists['Age']<70)

In [None]:
DataFrameOfScientists[ageFilter2]

### 2.3. Selection based on inverse filter 

In [None]:
DataFrameOfScientists[~williamFilter]

### 2.4. Slicing

In [None]:
DataFrameOfScientists.iloc[1:3]

In [None]:
DataFrameOfScientists.iloc[1:4]

In [None]:
DataFrameOfScientists.iloc[1:4,0:2]

In [None]:
DataFrameOfScientists.iloc[1:4,0:3]

In [None]:
DataFrameOfScientists.iloc[:,0:2]

## 3. Let's perform some basic operations 

In [None]:
# we multiply the Price column and save the result as a new column
stocksDataFrame['PriceDoubled']=stocksDataFrame['Price']*2

In [None]:
stocksDataFrame

In [None]:
# we compute the log of the column price
stocksDataFrame['PriceLog']=np.log(stocksDataFrame['Price'])

In [None]:
stocksDataFrame

## 4. Let's perform some basic operations on time variables

In [None]:
# we transform the columns Born and Died to datetime64 format
DataFrameOfScientists['BornDateTimeFormat']=pd.to_datetime(DataFrameOfScientists['Born'])

In [None]:
DataFrameOfScientists['DiedDateTimeFormat']=pd.to_datetime(DataFrameOfScientists['Died'])

In [None]:
DataFrameOfScientists

In [None]:
DataFrameOfScientists['daysLived']=DataFrameOfScientists['DiedDateTimeFormat']-DataFrameOfScientists['BornDateTimeFormat']

In [None]:
DataFrameOfScientists

In [None]:
DataFrameOfScientists['yearsLived']=DataFrameOfScientists['daysLived'].astype('timedelta64[Y]')

In [None]:
DataFrameOfScientists

## 5. Let's export our data

### 5.1. Export as CSV

In [None]:
DataFrameOfScientists.to_csv('./DataFrameOfScientists.csv')

### 5.2. Export as Excel

In [None]:
DataFrameOfScientists.to_excel('./DataFrameOfScientists.xls')

### 5.3. Import a CSV

In [None]:
DataFrameOfScientists2=pd.read_csv('./DataFrameOfScientists.csv')

In [None]:
DataFrameOfScientists2