# Introduction to Python and Pandas


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/unstructured-data/IESE_presession_23/blob/main/notebooks/intro_python.ipynb)


## Why Python?

- Great implementations of machine learning methods
  - [scikit-learn](https://scikit-learn.org/stable/): logistic regression, decision trees, clustering algorithms, and much more.
  - [PyTorch](https://pytorch.org/), [Keras](https://keras.io), [JAX](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html): building blocks for neural networks
  - [HuggingFace](https://huggingface.co/): High-level abstractions to work with modern language models and computer vision models

- Versatility
- Support
  - Large community of active users
  - Modern generative language models can create useful fragments of code

## Variables and data types

- Variables are created using the = sign
- Basic data types in Python:
  - Strings: ```"Hi!"```
  - Integers: ```2```
  - Floats: ```4.4```
  - Booleans: ```True```

In [1]:
# defining variables in Python and exploring different data types

In [2]:
# operations with numbers

In [3]:
# operations with strings

## Lists

- Basic object in Python to store multiple values together
- Represented by square brackets ```[ ]```


In [4]:
# create an empty list

In [5]:
# create lists

In [6]:
# get the first element of a list

In [7]:
# change an element within a list

In [8]:
# add elements to a list

In [9]:
# list of lists

## Dictionaries

- Represented by curly brackets ```{}```
- Consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value


In [None]:
# empty dictionary

In [10]:
# create a non-empty dictionary (groceries list)

In [None]:
# access elements of dictionaries

In [None]:
# create a dictionary with integers as keys

In [None]:
# access all keys

In [None]:
# access all values

## For loops and list comprehensions

In [None]:
# loop over elements of a list

In [None]:
# loop over a range

In [None]:
# loop over a string

In [12]:
# for loop in one line

In [None]:
# list comprehension

## Functions

- A function is a re-usable block of code that performs one or multiple operations. Functions usually take inputs and return outputs.
- Python has a set of basic pre-built functions. [Here](https://www.w3schools.com/python/python_ref_functions.asp) is a full list but some of the most used ones are:
  - ```print( )```
  - ```len( )```
  - ```abs( )```, ```max( )```, ```min( )```
  - ```range( )``` <br><br>

- Additional functions can be brought to Python by importing packages
- Functions can also be defined by the user using the ``` def my_function():``` syntax 

In [1]:
# example of pre-built functions

In [2]:
# define my own function

## Classes

# Pandas

In [None]:
# expand python's capacity with packages
import pandas as pd

In [None]:
# what are we importing?

In [None]:
# create a dataframe from two lists
df = pd.DataFrame({"name": ["John", "Mary"], "age": [30, 25]})
df

Unnamed: 0,name,age
0,John,30
1,Mary,25


In [None]:
# read data from a csv file
#df = pd.read_csv("data/iris.csv")

In [None]:
# read data from file with a different separator
#df = pd.read_csv("data/iris.csv", sep="\t")

In [None]:
# reading from other file types
#df = pd.read_excel("data/iris.xlsx")
#df = pd.read_json("data/iris.json")
#df = pd.read_stata("data/iris.dta")

In [None]:
# select a column
df["sepal_length"]

In [None]:
# select a row
df.iloc[0]

In [None]:
# select a cell
df["sepal_length"].iloc[0]

In [None]:
# filter data
#df[df["species"] == "setosa"]

In [None]:
# apply a function to all elements of a column
#df["sepal_length"].apply(lambda x: x * 2)

In [None]:
# sum all elements of a column
#df["sepal_length"].sum()

In [None]:
# create a new column
#df["sepal_length_in_cm"] = df["sepal_length"] / 10

In [None]:
# group data
#df.groupby("species").mean()

## Loading data from other sources

In [None]:
# dowlonalod data from Google Drive URL
#url = "https://drive.google.com/uc?export=download
#       &id=1QZqQ5Z
#       &authuser=0
#       &export=download"
#df = pd.read_csv(url)

In [None]:
# mount Google Drive
#from google.colab import drive
#drive.mount('/content/drive')
