# Introduction to Python and Pandas


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/unstructured-data/IESE_presession_23/blob/main/notebooks/intro_python.ipynb)


## Why Python?

- Great implementations of machine learning methods
  - [scikit-learn](https://scikit-learn.org/stable/): logistic regression, decision trees, clustering algorithms, and much more.
  - [PyTorch](https://pytorch.org/), [Keras](https://keras.io), [JAX](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html): building blocks for neural networks
  - [HuggingFace](https://huggingface.co/): High-level abstractions to work with modern language models and computer vision models

- Versatility
- Support
  - Large community of active users
  - Modern generative language models can create useful fragments of code

## Variables and data types

- Variables are created using the = sign
- Basic data types in Python:
  - Strings: ```"Hi!"```
  - Integers: ```2```
  - Floats: ```4.4```
  - Booleans: ```True```

In [2]:
# defining variables in Python and exploring different data types
my_variable = 2

In [4]:
my_string = "Hi everyone!"
my_float = 2.4
my_boolean = True

In [14]:
# operations with numbers
2 + 2

4

In [15]:
2*4

8

In [16]:
2/2

1.0

In [17]:
2**8

256

In [18]:
my_math_operation = (2+2) * 8

In [19]:
my_math_operation

32

In [20]:
# operations with strings
"Hi" + " " + "everyone" + "!"

'Hi everyone!'

In [21]:
"Hi"/"everyone"

TypeError: ignored

In [26]:
False + False + True

1

In [27]:
True*False

0

## Lists

- Basic object in Python to store multiple values together
- Represented by square brackets ```[ ]```


In [28]:
# create an empty list
empty_list = []

In [29]:
empty_list

[]

In [30]:
# create lists
integer_list = [2,3,4,5,6]

In [31]:
integer_list

[2, 3, 4, 5, 6]

In [32]:
string_list = ["john", "pedro", "juan", "nicolas"]

In [33]:
string_list

['john', 'pedro', 'juan', 'nicolas']

In [35]:
mixed_list = [2, 4.0, True, "Hi"]

In [39]:
# get the first element of a list
mixed_list[3]

'Hi'

In [40]:
mixed_list[-1]

'Hi'

In [41]:
# change an element within a list
mixed_list[0] = 10

In [43]:
mixed_list[-1] = "Hello"

In [44]:
mixed_list

[10, 4.0, True, 'Hello']

In [49]:
# add elements to a list
mixed_list + [4,6]

[10, 4.0, True, 'Hello', 4, 6]

In [50]:
my_new_list = mixed_list + [4,6]
my_new_list

[10, 4.0, True, 'Hello', 4, 6]

In [53]:
# list of lists
my_list_of_lists = [[1,2,3], ["a", "b", "c"], [True, False]]

In [54]:
my_list_of_lists

[[1, 2, 3], ['a', 'b', 'c'], [True, False]]

## Dictionaries

- Represented by curly brackets ```{}```
- Consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value


In [55]:
# empty dictionary
my_dictionary = {}

In [56]:
my_dictionary

{}

In [57]:
# create a non-empty dictionary (groceries list)
my_groceries = {"bananas": 10, "apples": 1, "olive oil": 1}

In [58]:
my_groceries

{'bananas': 10, 'apples': 1, 'olive oil': 1}

In [61]:
# access elements of dictionaries
my_groceries["bananas"]

10

In [64]:
my_groceries["apples"]

1

In [72]:
# create a dictionary with integers as keys
my_numbers = {0: "zero", 1: "one", 2: "two"}

In [73]:
my_numbers

{0: 'zero', 1: 'one', 2: 'two'}

In [74]:
my_numbers[2]

'two'

In [75]:
# add new key-value pairs to dictionary
my_numbers[10] = "ten"

In [76]:
my_numbers

{0: 'zero', 1: 'one', 2: 'two', 10: 'ten'}

In [77]:
my_numbers[20] = "twenty"
my_numbers

{0: 'zero', 1: 'one', 2: 'two', 10: 'ten', 20: 'twenty'}

In [81]:
# access all keys
my_numbers.keys()

dict_keys([0, 1, 2, 10, 20])

In [82]:
# access all values
my_numbers.values()

dict_values(['zero', 'one', 'two', 'ten', 'twenty'])

## For loops and list comprehensions

In [None]:
# loop over elements of a list

In [4]:
# loop over a range

In [None]:
# loop over a string

In [None]:
# for loop in one line

In [None]:
# list comprehension

## Conditional statements

- We can use the ```if``` statement in Python to check if a logical condition is true
- Python supports basic logical operations such as:
    - Equal: ``` == ```
    - Not equal: ``` != ``` 
    - Greater than: ``` > ```
    - Smaller than: ``` < ```
- After an ```if``` statement we can also use ```elif``` (i.e. else + if) and ```else``` to specify what happens when the first logical condition is not true

In [None]:
# basic if statement

In [5]:
# if + else statement

In [None]:
# if + elif + else statement

## Functions

- A function is a re-usable block of code that performs one or multiple operations. Functions usually take inputs and return outputs.
- Python has a set of basic pre-built functions. [Here](https://www.w3schools.com/python/python_ref_functions.asp) is a full list but some of the most used are:
  - ```print( )```
  - ```len( )```
  - ```abs( )```, ```max( )```, ```min( )```
  - ```range( )``` <br><br>

- Additional functions can be brought to Python by importing packages
- Functions can also be defined by the user using the ``` def my_function():``` syntax 

In [None]:
# example of pre-built functions

In [None]:
# functions from objects

In [None]:
# define my own function

# Pandas

In [None]:
# expand python's capacity with packages
import pandas as pd

In [None]:
# what are we importing?

In [None]:
# create a dataframe from two lists
df = pd.DataFrame({"name": ["John", "Mary"], "age": [30, 25]})
df

Unnamed: 0,name,age
0,John,30
1,Mary,25


In [None]:
# read data from a csv file
#df = pd.read_csv("data/iris.csv")

In [None]:
# read data from file with a different separator
#df = pd.read_csv("data/iris.csv", sep="\t")

In [None]:
# reading from other file types
#df = pd.read_excel("data/iris.xlsx")
#df = pd.read_json("data/iris.json")
#df = pd.read_stata("data/iris.dta")

In [None]:
# select a column
df["sepal_length"]

In [None]:
# select a row
df.iloc[0]

In [None]:
# select a cell
df["sepal_length"].iloc[0]

In [None]:
# filter data
#df[df["species"] == "setosa"]

In [None]:
# apply a function to all elements of a column
#df["sepal_length"].apply(lambda x: x * 2)

In [None]:
# sum all elements of a column
#df["sepal_length"].sum()

In [None]:
# create a new column
#df["sepal_length_in_cm"] = df["sepal_length"] / 10

In [None]:
# group data
#df.groupby("species").mean()

## Loading data from other sources

In [None]:
# dowlonalod data from Google Drive URL
#url = "https://drive.google.com/uc?export=download
#       &id=1QZqQ5Z
#       &authuser=0
#       &export=download"
#df = pd.read_csv(url)

In [None]:
# mount Google Drive
#from google.colab import drive
#drive.mount('/content/drive')
