<center><img src="http://i.imgur.com/sSaOozN.png" width="500"></center>

## Course: Computational Thinking for Governance Analytics

### Prof. José Manuel Magallanes, PhD 
* Visiting Professor of Computational Policy at Evans School of Public Policy and Governance, and eScience Institute Senior Data Science Fellow, University of Washington.
* Professor of Government and Political Methodology, Pontificia Universidad Católica del Perú. 


# Session 0: Introduction to Python Data Structures

_____
<a id='home'></a>

# 1.  Data Structures

Python has basic native structures, like lists, tuples and dictionaries.

## A.  **LISTS** 

Lists are the most flexible structure to save or contain data elements.

In [1]:
names=["Qing", "Françoise", "Raúl", "Bjork","Marie"]
ages=[32,33,28,30,29]
country=["China", "Senegal", "España", "Norway","Korea"]
education=["Bach", "Bach", "Master", "PhD","PhD"]

Above we have created some lists. Lists can contain any values. Lists support different operations:

* **Accessing**:

Keep in mind the positions in Python start in **0**.

In [2]:
# one element
ages[0]

32

In [3]:
# several, using slices:
ages[1:-1] #second to before last

[33, 28, 30]

In [4]:
# several, using slices:
ages[:-2] #all but two last ones

[32, 33, 28]

In [5]:
# non consecutive
from operator import itemgetter
list(itemgetter(0,2,3)(ages))

[32, 28, 30]

In [6]:
# difficul to understand?
ages[0:4:2] + [ages[3]]

[32, 28, 30]

* **Modifying**:

In [None]:
# by position
country[2]="Spain"

# list changed:
country

In [None]:
# by value
country=["PR China" if x == "China" else x for x in country]

# list changed:
country

* **Deleting**

In [None]:
# by position
del country[-1] #last value

# list changed:
country

In [None]:
# by position
names.pop() #last value by default

# list changed:
names

In [None]:
# only 'del' works for several positions

lista=[1,2,3,4,5,6]
del lista[1:3]

#now:
lista

In [None]:
# by value
ages.remove(29) 

# list changed:
ages # just first ocurrence of value!!

In [None]:
# by value
education.remove('PhD') 

# list changed:
education # just first ocurrence!!

In [None]:
# deleting every  value:

lista=[1,'a',45,'b','a']
lista=[x for x in lista if x!='a']

# you get:
lista

* **Inserting values**

In [None]:
# at the end
lista.append("abc")
lista

In [None]:
# PART ONE:
# first delete a position
education.pop(2)
education

In [None]:
# PART TWO:
# now insert in that position
education.insert(2,"Master")
education

## B.  **TUPLES**

Tuples are inmutable structures in Python, they look like lists but do not share much of their functionality:

In [None]:
# new list:
weekend=("Friday", "Saturday", "Sunday")

You can access:

In [None]:
weekend[0]

But no other operation is allowed.

Python itself uses tuples as output of some important functions:

In [None]:
zip(names,ages)

The **zip** functions creates tuples, by combining in parallel. You can see it if you turn the result into a list:

In [None]:
list(zip(names,ages))  # a list of tuples

## C. **DICTIONARIES**  

*Dicts* work in a more sophisticated way, as they have a **'key'**:**'value'** structure:

In [None]:
classroom={'student':names,'age':ages,'edu':education}
# see it:

classroom

Dicts do not use indexes to access values:

In [None]:
#classroom[0]

Dicts use keys:

In [None]:
classroom['student']

Notice I created a dictionary where the value is not ONE but a LIST of values.

Once you access a value, you can modify it. You can also use _pop_ or _del_ using the **keys**. But you can not use _append_ to add an element, you need **update**:

In [None]:
classroom.update({'country':country})
# now:
classroom

## D. DATA FRAMES

**Data frames**  are more complex containers of values. The most common analogy is a spreadsheet. To create a data frame, we need to call **pandas**:

In [None]:
import pandas

We can prepare a data frame from a dictionary immediately, but ONLY if you have the same amount of elements in each list representing a column.

In [None]:
# our data frame:
students=pandas.DataFrame(classroom)
## see it:
students

But, let me update the dictionary with: 

In [None]:
names=["Qing", "Françoise", "Raúl", "Bjork","Marie"]
#
classroom.update({'student':names})
#
classroom

We have five students, but only data for four of them. Then this does not work:

In [None]:
#pandas.DataFrame(classroom)

In that case, you need this:

In [None]:
#then
students=pandas.DataFrame({key:pandas.Series(value) for key, value in classroom.items()})

# seeing it:
students

Sometimes, Python users code like this:

In [None]:
import pandas as pd # renaming the library

students=pd.DataFrame({key:pd.Series(value) for key, value in classroom.items()})
students

### Data frame basic operations

In [None]:
# data of structure: list? tuple? dataframe?
type(students)

In [None]:
# type of data in data frame column
students.dtypes

In [None]:
# details of data frame
students.info()

In [None]:
# number of rows and columns
students.shape 

In [None]:
# number of rows:
len(students) 

In [None]:
# first rows
students.head(2) # compare with: students.tail(2)

In [None]:
# name of columns
students.columns

If you needed the column names as a list:

In [None]:
students.columns.tolist()# or simply: list(students)

If you needed a column values as a list:

In [None]:
students.age.tolist()# list(students.ages)

### Accesing elements in DF:

The data frames in pandas behave much like in R:

In [None]:
#one particular column
students.student

In [None]:
# or
students['student'] 

In [None]:
# it is not the same as: 
students[['student']] # a data frame, not a column (or series)

In [None]:
# this is also a DF
students[['country','student']]

In [None]:
# and this, using loc:
columnNames=['country','student']
students.loc[:,columnNames]

In [None]:
## Using positions is very common:
columnPositions=[1,3,0]
students.iloc[:,columnPositions] 

### Changing values

If you have a position, you can update values:

In [None]:
students.iloc[4,1]=23 # change is immediate! (no warning)
students

### Deleting columns

You can modify any values in a data frame, but let me create a **deep** copy of this data frame to play with:

In [None]:
studentsCopy=students.copy()
studentsCopy

In [None]:
# This is what you want get rid of:
byeColumns=['edu'] # you can delete more than one

#this is the result
studentsCopy.drop(columns=byeColumns)

Notice you do not have saved the previous result:

In [None]:
studentsCopy

In [None]:
#NOW we do
studentsCopy.drop(columns=byeColumns,inplace=True)

In [None]:
#then:
studentsCopy

### Deleting a row

Let me delete a row:

In [None]:
# axis 0 is delete by row
studentsCopy.drop(index=2,inplace=True) 
studentsCopy

As you see, the index dissapeared. Then, you should reset the indexes:

In [None]:
studentsCopy.reset_index(drop=True,inplace=True)
studentsCopy