# CSS 201 / 202 - CSS Bootcamp

## Week 07 - Lecture 05

### Umberto Mignozzetti

# Introduction to Julia

## Intro to Julia

Julia is probably the best software for data science available.

It has all the capabilities of R and Python, but it is faster, much faster.

The reason for the speed is because it uses all the cores in your computer by default.

It also can run and load code from the most important languages (Java, C, R, and Python included).

It was the software used by the CERN for analyzing the LHC experiment and BlackRock to run financial assets simulation.

## Intro to Julia

Today, we will talk about:

1. Arithmetics
1. Install and load packages
1. Loading data
1. Data types
1. Data wrangling
1. Simple regression
1. Programming

## Intro to Julia

To install Julia, you need to download it from the internet: https://julialang.org

Then, you can choose to work with Julia in:

1. [Visual Studio](https://code.visualstudio.com/docs/languages/julia) (A great Graphical User Interface GUI)
2. [Jupyter Notebooks / Jupyter Lab](https://www.geeksforgeeks.org/how-to-work-with-julia-on-jupyter-notebook/) 

We will use Jupyter Notebooks, since it is our editor of choice. If you are working locale, the link will have instructions on how to do the job.

## Print and Comment

With Julia, the print command is `println`. 

`#` for commenting.

`#=...=#` for multiline commenting.

In [None]:
# This is 2
println(2)

In [None]:
#=
This is 2 + 3
Equals to 5
=#
println(2 + 3)

## Arithmetics in Julia

In [None]:
# Subtraction
println(2 - 3.2)

In [None]:
# Multiplication
println(2 * 3)

In [None]:
# Division
println(2 / 3)

In [None]:
# Power (note that different than Python)!
println(2 ^ 3)

## Intro to Julia

### Install packages

To get started with Julia, we need to install packages. We will install packages called:

- [`StatsKit`](https://juliastats.org)
- `CSV`
- `Plots`
- `PythonCall`
- `RCall`
- `BenchmarkTools`
- `GLM`

For Julia packages see [JuliaHub](https://juliahub.com/ui/Packages)

In [None]:
# Installing packages (takes a bit to do)
using Pkg
Pkg.add("StatsKit")
Pkg.add("CSV")
Pkg.add("Plots")
Pkg.add("PythonCall")
Pkg.add("RCall")
Pkg.add("BenchmarkTools")
Pkg.add("GLM")

## Loading Data in Julia

Let us load the education dataset.

We use the function `CSV.read` and `DataFrame` to convert it to the right object type:

In [None]:
# Loading packages to read the data
using CSV
using DataFrames

## Datasets
educ = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI175public/main/data/educexp.csv"), DataFrame)
voting = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI30Dpublic/main/datasets/voting.csv"), DataFrame)
chilesurv = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI30Dpublic/main/datasets/survchile.csv"), DataFrame)
println("Done") # Trick to avoid displaying it

In [None]:
# The head of the dataset
println(first(educ, 3))

## Data Types in Julia

Julia has roughly the same data types as Python and R, with a few exceptions in terms of how to use it.

In [None]:
# Numeric variable
x = 30

# Result
println(x)

In [None]:
# Julia is case-sensitive
x # different than X

## Updating rules in Julia

In [None]:
# Updating +
x = 3
println(x += 1)

In [None]:
# Updating -
println(x -= 2)

In [None]:
# Updating *
println(x *= 3)

In [None]:
# Updating /
println(x /= 2)
println(x)

## Data Types in Julia

To define variables, we can use non-numeric unicode characters. No spaces.

We can check type of variable with `typeof` command.

In [None]:
x2 = 30 # But 2x not valid

In [None]:
# Data type
println(typeof(x))

In [None]:
# Data type
println(typeof(x / 11))

## Data Types in Julia

**Booleans**: `true` or `false`. Note: different than R and Python.

**Strings**: `"string"` and not `'string'`. But for a single character *c* it is ok to use `'c'`.

In [None]:
# True maps to 1
println(true + false)

In [None]:
# My string
println("my string in awesome")

## Data Types in Julia

We can slice strings as we do in Python, to get either characters or substrings.

Note that Julia starts at `1`.

In [None]:
# My seat in the Padres game
s = "Row 23 Seat 10"
println(s[1:3] * ": " * s[5:6])
println(s[end-6:end-3] * ": " * s[end-1:end])

## Intro to Julia

### Data Types in Julia

**Literals**: Triple double quotation marks.

Concatenation can be done with `*`. Can also use the `$` sign for placement

In [None]:
lit = """
My literal
is awesome!"""
println(lit)

In [None]:
name = "John"
coffee = "Coffee for "
println(coffee * name)

In [None]:
x = 30
println("x times two is $(x * 2)")

## Data Types in Julia

Data types in Julia, in a nutshell (see [this tutorial](https://syl1.gitbook.io/julia-language-a-concise-tutorial/language-core/data-types)):

- Scalars: `Int64`, `Float64`, `Char`, `String` and `Bool`
- `Arrays`, `Tuples` (immutable), `NamedTuples`, `Dict`ionaries
- `Set`s

Make sure when using coercion, that the data type is ok for coercion.

## Data Types in Julia

Arrays: Vectors. Similar to Python lists

In [None]:
# Numeric vector
x = [30, 21, 11]
println(x)

In [None]:
println(typeof(x))

In [None]:
println(eltype(x))

## Data Types in Julia

Like Python (and different from R), arrays do not need to hold the same data types:

In [None]:
x = ["John", "coffee", 2, true]
println(x)

In [None]:
println(typeof(x))

## Data Types in Julia

We can slice the data in a similar way we do in R or Python.

In [None]:
x[1]

In [None]:
x[1] * "'s " * x[2]

In [None]:
println(x[end-1:end])

## Data Types in Julia

The equivalent to the `.append` in Julia is `push!`.

The `append!`command adds more than one element.

In [None]:
x = [30, 21, 11]
push!(x, 31)
println(x)

In [None]:
x = [30, 21, 11]
append!(x, [31, 45, 3])
println(x)

## Data Types in Julia

We also have `pop!` and `sort` (note that no `sorted` like in Python).

In [None]:
x = [30, 21, 11]
println("Popped element $(pop!(x))")
println(x)

In [None]:
x = [30, 21, 11]
x = sort(x)
println(x)

In [None]:
println(length(x))

## Data Types in Julia

Like in R, we do not need a special package for doing vectorized operations.

Scalar operations:

In [None]:
x = [30, 21, 11]
println(x .+ 5)

In [None]:
println(x .- 5)

In [None]:
println(x .* 1.2)

In [None]:
println(x ./ 2.1)

In [None]:
println(x .^ 2)

## Data Types in Julia

Vectorized operations:

In [None]:
x = [30, 21, 11]
y = [2, 3, 1]
println(x .+ y)

In [None]:
println(x .- y)

In [None]:
println(x .* y)

In [None]:
println(x ./ y)

In [None]:
println(x .^ y)

## Arrays in Julia

We can also create empty arrays:

In [None]:
# Empty integers array
arr = Int64[]
println(arr)

In [None]:
append!(arr, [1, 2, 3])
println(arr)

In [None]:
arr[end] = 32
println(arr)

In [None]:
arr[1:end] = [-1, -2, -3]
println(arr)

## Dictionaries in Julia

In [None]:
# Dictionaries (note the structure: key => value)
game = Dict("team" => "Padres", "score" => 5)
println(game["team"])

In [None]:
println(keys(game))

In [None]:
println(values(game))

## Dictionaries in Julia

In [None]:
# get values: (dict, key, default_val)
println(get(game, "team", "does not have this key."))

In [None]:
# get values: (dict, key, default_val)
println(get(game, "otherteam", "does not have this key."))

## DataFrames in Julia

Now let us work with `DataFrames`.

In [None]:
# .head()
println(first(educ, 3))

In [None]:
# .tail()
println(last(educ, 3))

## DataFrames in Julia

In [None]:
# Data Frames
println(typeof(educ))

And `describe` tells us about the DataFrame:

In [None]:
println(describe(educ))

## DataFrames in Julia

To access a variable in a dataset, use the `.` and the name of the variable.

In [None]:
println(educ.income)

Names of the variables:

In [None]:
println(names(educ))

## DataFrames in Julia

Stats of a single variable:

In [None]:
using Statistics

In [None]:
println(mean(educ.income))

In [None]:
println(median(educ.income))

In [None]:
println(std(educ.income))

## DataFrames in Julia

Stats of a single variable:

In [None]:
println(var(educ.income))

In [None]:
println(sum(educ.income))

In [None]:
println(minimum(educ.income))

In [None]:
println(maximum(educ.income))

## DataFrames in Julia

`size` for the dimension of the dataset:

In [None]:
println(size(educ))

Finding a given value: `dat[rows, cols]`

In [None]:
# Maine
println(educ[1, :])

## DataFrames in Julia

In [None]:
# education expenditure
println(educ[:, 1])

In [None]:
# Top three income
println(educ[1:3, "income"])

In [None]:
# Third state in the dataset
println(educ.states[3])

In [None]:
# education expenditure of the first case (Maine)
println(educ[1, 1])

## Importing Python Functions

In [None]:
using PythonCall
pytime = pyimport("time")

In [None]:
println(pytime.ctime())

## Importing R Functions

In [None]:
using RCall
@rimport base as r_base

In [None]:
r_base.sum([1, 2, 3, 4, 5])

# Data Wrangling in Julia

## Renaming a column in Julia

In [None]:
aux = chilesurv
rename!(aux, Dict(:voteYES => :votebinary))
println(first(aux, 2))

## Filtering Rows in Julia

Filtering rows:

| **Operator** | **Meaning**                          |
|----------|------------------------------------------|
| `<` or `<=` | Smaller than or smaller than or equal |
| `>` or `>=` | Greater than or greater than or equal |
| `==`       | Equal                                  |
| `!=`       | Different                              |
| `!`        | Negation (turns a `true` into a `false`)   |
| `\|\|`        | Or                                     |
| `&&`        | And                                    |

In [None]:
aux = filter(row -> row.birth >= 1975, voting)
println(first(aux, 3))

## Filtering Rows in Julia

In [None]:
aux = filter(row -> row.birth >= 1975 && row.message == "yes", voting)
println(first(aux, 3))

In [None]:
println(size(voting))

In [None]:
println(size(aux))

## Sorting datasets in Julia

In [None]:
aux = sort(educ, "education")
println(first(aux, 3))

In [None]:
aux = sort(educ, "education", rev = true)
println(first(aux, 3))

## Drop missing data in Julia

In [None]:
aux = chilesurv
dropmissing!(aux, :"statusquo")
println(nrow(aux))

## Mutating Variables in Julia

Mutating the dataset (creates new variables using old ones):

In [None]:
educ.income_per_urban = educ.income ./ educ.urban
println(first(educ, 6))

## Linear Regression in Julia

A simple regression model in Julia

In [None]:
using GLM
ols = lm(@formula(voted ~ message), voting)

## Julia Programming

Julia is a programming language, to the same extent and strength as Python is.

We can build great code in Julia.

## Conditional Execution

In [None]:
monday = true

# Conditional fork
if monday
    # Code for condition
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
end

In [None]:
monday = false

# Conditional fork
if monday
    # Code for condition
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
end

## Conditional Execution

In [None]:
monday = true

# Conditional fork
if monday
    # Code for condition true
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
else
    # Code for condition false
    println("Well, not that cool...")
end

In [None]:
monday = false

# Conditional fork
if monday
    # Code for condition
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
else
    # Code for condition false
    println("Well, not that cool...")
end

## Conditional Execution

In [None]:
day = "Wed"

# Conditional fork
if day == "Mon"
    # Code for condition true
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
elseif day == "Wed"
    # Code for wed = true
    println("Padres day! Let's go!!")
else
    # Code for condition false
    println("Well, not that cool...")
end

## Function Creation

Public functions:

In [None]:
# My BMI depression function
function bmi(weight, height, imperial = false)
    return (weight)/(height^2)
end

# BMI
println(bmi(115, 1.78))

**Check-in**: Adapt the code to have a imperial to metric conversion when imperial is true.

In [None]:
# Your code here

## Function Creation

Type declaration:

In [None]:
function bmi(weight::Float64, height::Float64, imperial::Bool = false)
    return (weight)/(height^2)
end

# BMI
println(bmi(115, 1.78))

In [None]:
# BMI
println(bmi("115", 1.78))

## Function Creation

Multiple arguments

In [None]:
function people(names...)
    println(names)
end

# BMI
people("John", "Julia", "Jane")

In [None]:
function people(name, schools... ; address)
    # Like tuple! (actually, a tuple)
    return name, schools, address
end

# Using it
print(people("John", "BA PoliSci", "MS CSS", address = "La Jolla"))

## Function Creation

Anonymous functions:

In [None]:
# my bmi: Anonymized
mybmi = (w, h) -> w / (h^2)
mybmi(115, 1.78)

In [None]:
# With mapping
mybmi = (w, h) -> w / (h^2)
println(map(mybmi, [100, 115, 70], [1.78, 1.86, 1.67]))

## Function Creation

Timing functions:

In [None]:
# Timing bmi
@time bmi(115, 1.78)

In [None]:
# Benchmarking bmi
using BenchmarkTools
@benchmark bmi

## For loops

Very similar to Python:

In [None]:
nums = [1, 2, 3, 4, 5]
for x in nums
    xsq = x^2
    println("$x squared is equal to $xsq")
end

## For loops

With `enumerate`:

In [None]:
nums = [5, 4, 3, 2, 1]
for (index, x) in enumerate(nums)
    xsq = x^2
    println("In position $index: $x squared is equal to $xsq")
end

## For loops

With ranges: (note difference from Python and R ranges!)

In [None]:
nums = 10:-2:1 # start:step:stop
for x in nums
    xsq = x^2
    println("$x squared is equal to $xsq")
end

## While loops

In [None]:
nums = [5, 4, 3, 2, 1]
while length(nums) > 0
    x = pop!(nums)
    xsq = x^2
    println("$x squared is equal to $xsq")
end

**Check-in**: Explain this code in words.

Little hint: Like Python, if infinite loop, just press `CTRL + C`

## ANES Dataset

Exercise:

1. Load the `anes` 2020 dataset: https://raw.githubusercontent.com/umbertomig/POLI175public/main/data/anes2020.csv

1. Explore the dataset.

1. What is the mean of `feel_biden`?

1. What is the standard deviation of `feel_rural`?

1. What is the median of `feel_gay`?

1. Standardize the variables (the operation: $z = \dfrac{x - \text{mean}(x)}{\text{sd}(x)}$):
    - `feel_biden`
    - `feel_fauci`
    - `feel_nra`
    - `feel_unions`
    - `feel_fbi`

1. Run a regression trying to predict `feel_biden` based on `feel_fauci`, `feel_nra`, `feel_unions`, and `feel_fbi`. Which variable is the most important? (Hint: try using the standardized variables in the regression!)

In [None]:
## Your answers here

# Great work! See you in the next class.