# CSS 201 / 202 - CSS Bootcamp

## Week 07 - Lecture 05

### Umberto Mignozzetti

# Introduction to Julia

## Intro to Julia

Julia is probably the best software for data science available.

It has all the capabilities of R and Python, but it is faster, much faster.

The reason for the speed is because it uses all the cores in your computer by default.

It also can run and load code from the most important languages (Java, C, R, and Python included).

It was the software used by the CERN for analyzing the LHC experiment and BlackRock to run financial assets simulation.

## Intro to Julia

Today, we will talk about:

1. Arithmetics
1. Install and load packages
1. Loading data
1. Data types
1. Data wrangling
1. Simple regression
1. Programming

## Intro to Julia

To install Julia, you need to download it from the internet: https://julialang.org

Then, you can choose to work with Julia in:

1. [Visual Studio](https://code.visualstudio.com/docs/languages/julia) (A great Graphical User Interface GUI)
2. [Jupyter Notebooks / Jupyter Lab](https://www.geeksforgeeks.org/how-to-work-with-julia-on-jupyter-notebook/) 

We will use Jupyter Notebooks, since it is our editor of choice. If you are working locale, the link will have instructions on how to do the job.

## Print and Comment

With Julia, the print command is `println`. 

`#` for commenting.

`#=...=#` for multiline commenting.

In [1]:
# This is 2
println(2)

2


In [2]:
#=
This is 2 + 3
Equals to 5
=#
println(2 + 3)

5


## Arithmetics in Julia

In [4]:
# Subtraction
println(2 - 3.2)

-1.2000000000000002


In [5]:
# Multiplication
println(2 * 3)

6


In [6]:
# Division
println(2 / 3)

0.6666666666666666


In [7]:
# Power (note that different than Python)!
println(2 ^ 3)

8


## Intro to Julia

### Install packages

To get started with Julia, we need to install packages. We will install packages called:

- [`StatsKit`](https://juliastats.org)
- `CSV`
- `Plots`
- `PythonCall`
- `RCall`
- `BenchmarkTools`
- `GLM`

For Julia packages see [JuliaHub](https://juliahub.com/ui/Packages)

In [9]:
# Installing packages (takes a bit to do)
using Pkg
Pkg.add("StatsKit")
Pkg.add("Statistics")
Pkg.add("CSV")
Pkg.add("Plots")
Pkg.add("PythonCall")
Pkg.add("RCall")
Pkg.add("DataFrames")
Pkg.add("BenchmarkTools")
Pkg.add("GLM")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.juli

## Loading Data in Julia

Let us load the education dataset.

We use the function `CSV.read` and `DataFrame` to convert it to the right object type:

In [10]:
# Loading packages to read the data
using CSV
using DataFrames

## Datasets
educ = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI175public/main/data/educexp.csv"), DataFrame)
voting = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI30Dpublic/main/datasets/voting.csv"), DataFrame)
chilesurv = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI30Dpublic/main/datasets/survchile.csv"), DataFrame)
println("Done") # Trick to avoid displaying it

Done


In [13]:
# The head of the dataset
println(first(educ, 3))

[1m3×5 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       189    2824    350.7    508  ME
   2 │       169    3259    345.9    564  NH
   3 │       230    3072    348.5    322  VT


## Data Types in Julia

Julia has roughly the same data types as Python and R, with a few exceptions in terms of how to use it.

In [14]:
# Numeric variable
x = 30

# Result
println(x)

30


In [17]:
# Julia is case-sensitive
x # different than X

30

## Updating rules in Julia

In [18]:
# Updating +
x = 3
println(x += 1) # x = x + 1

4


In [19]:
# Updating -
println(x -= 2)

2


In [20]:
# Updating *
println(x *= 3)

6


In [21]:
# Updating /
println(x /= 2)

3.0
3.0


## Data Types in Julia

To define variables, we can use non-numeric unicode characters. No spaces.

We can check type of variable with `typeof` command.

In [24]:
x2 = 30 # But 2x not valid
println(x2)

30


In [27]:
# Data type
x = 3
println(typeof(x))

Int64


In [28]:
# Data type
println(typeof(x / 11))

Float64


## Data Types in Julia

**Booleans**: `true` or `false`. Note: different than R and Python.

**Strings**: `"string"` and not `'string'`. But for a single character *c* it is ok to use `'c'`.

In [29]:
# True maps to 1
println(true + false)

1


In [36]:
# My string
println("my string in awesome")

my string in awesome


## Data Types in Julia

We can slice strings as we do in Python, to get either characters or substrings.

Note that Julia starts at `1`.

In [40]:
# My seat in the Padres game
s = "Row 23 Seat 10"
println(s[1:3] * ": " * s[5:6])
println(s[end-6:end-3] * ": " * s[end-1:end])

Row: 23
Seat: 10


## Intro to Julia

### Data Types in Julia

**Literals**: Triple double quotation marks.

Concatenation can be done with `*`. Can also use the `$` sign for placement

In [41]:
lit = """
My literal
is awesome!"""
println(lit)

My literal
is awesome!


In [42]:
name = "John"
coffee = "Coffee for "
println(coffee * name)

Coffee for John


In [46]:
x = 30
println("$x times two is $(x * 2)")

30 times two is 60


## Data Types in Julia

Data types in Julia, in a nutshell (see [this tutorial](https://syl1.gitbook.io/julia-language-a-concise-tutorial/language-core/data-types)):

- Scalars: `Int64`, `Float64`, `Char`, `String` and `Bool`
- `Arrays`, `Tuples` (immutable), `NamedTuples`, `Dict`ionaries
- `Set`s

Make sure when using coercion, that the data type is ok for coercion.

## Data Types in Julia

Arrays: Vectors. Similar to Python lists

In [48]:
# Numeric vector
x = [30, 21, 11]
println(x)
x

[30, 21, 11]


3-element Vector{Int64}:
 30
 21
 11

In [49]:
println(typeof(x))

Vector{Int64}


In [50]:
println(eltype(x))

Int64


## Data Types in Julia

Like Python (and different from R), arrays do not need to hold the same data types:

In [51]:
x = ["John", "coffee", 2, true]
println(x)

Any["John", "coffee", 2, true]


In [52]:
println(typeof(x))

Vector{Any}


## Data Types in Julia

We can slice the data in a similar way we do in R or Python.

In [54]:
x[1]

"John"

In [55]:
x[1] * "'s " * x[2]

"John's coffee"

In [56]:
println(x[end-1:end])

Any[2, true]


## Data Types in Julia

The equivalent to the `.append` in Julia is `push!`.

The `append!`command adds more than one element.

In [60]:
x = [30, 21, 11]
push!(x, 31)
println(x)

[30, 21, 11, 31]


In [61]:
x = [30, 21, 11]
append!(x, [31, 45, 3])
println(x)

[30, 21, 11, 31, 45, 3]


## Data Types in Julia

We also have `pop!` and `sort` (note that no `sorted` like in Python).

In [62]:
x = [30, 21, 11]
println("Popped element $(pop!(x))")
println(x)

Popped element 11
[30, 21]


In [65]:
x = [30, 21, 11]
x = sort(x)
println(x)

[11, 21, 30]


In [66]:
println(length(x))

3


## Data Types in Julia

Like in R, we do not need a special package for doing vectorized operations.

Scalar operations:

In [67]:
x = [30, 21, 11]
println(x .+ 5)

[35, 26, 16]


In [69]:
println(x .- 5)

[25, 16, 6]


In [70]:
println(x .* 1.2)

[36.0, 25.2, 13.2]


In [71]:
println(x ./ 2.1)

[14.285714285714285, 10.0, 5.238095238095238]


In [72]:
println(x .^ 2)

[900, 441, 121]


## Data Types in Julia

Vectorized operations:

In [74]:
x = [30, 21, 11]
y = [2, 3, 1]
println(x .+ y)

[32, 24, 12]


In [75]:
println(x .- y)

[28, 18, 10]


In [76]:
println(x .* y)

[60, 63, 11]


In [77]:
println(x ./ y)

[15.0, 7.0, 11.0]


In [78]:
println(x .^ y)

[900, 9261, 11]


## Arrays in Julia

We can also create empty arrays:

In [80]:
# Empty integers array
arr = Int64[]
println(arr)

Int64[]


In [81]:
append!(arr, [1, 2, 3])
println(arr)

[1, 2, 3]


In [82]:
arr[end] = 32
println(arr)

[1, 2, 32]


In [83]:
arr[1:end] = [-1, -2, -3]
println(arr)

[-1, -2, -3]


## Dictionaries in Julia

In [84]:
# Dictionaries (note the structure: key => value)
game = Dict("team" => "Padres", "score" => 5)
println(game["team"])

Padres


In [85]:
println(keys(game))

["score", "team"]


In [86]:
println(values(game))

Any[5, "Padres"]


## Dictionaries in Julia

In [87]:
# get values: (dict, key, default_val)
println(get(game, "team", "does not have this key."))

Padres


In [88]:
# get values: (dict, key, default_val)
println(get(game, "otherteam", "does not have this key."))

does not have this key.


## DataFrames in Julia

Now let us work with `DataFrames`.

In [91]:
# .head()
println(first(educ, 3))

[1m3×5 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       189    2824    350.7    508  ME
   2 │       169    3259    345.9    564  NH
   3 │       230    3072    348.5    322  VT


In [92]:
# .tail()
println(last(educ, 3))

[1m3×5 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       273    3968    348.4    909  CA
   2 │       372    4146    439.7    484  AK
   3 │       212    3513    382.9    831  HI


## DataFrames in Julia

In [93]:
# Data Frames
println(typeof(educ))

DataFrame


And `describe` tells us about the DataFrame:

In [94]:
println(describe(educ))

[1m5×7 DataFrame[0m
[1m Row [0m│[1m variable  [0m[1m mean    [0m[1m min   [0m[1m median [0m[1m max   [0m[1m nmissing [0m[1m eltype   [0m
     │[90m Symbol    [0m[90m Union…  [0m[90m Any   [0m[90m Union… [0m[90m Any   [0m[90m Int64    [0m[90m DataType [0m
─────┼──────────────────────────────────────────────────────────────
   1 │ education  196.314  112    192.0   372           0  Int64
   2 │ income     3225.29  2081   3257.0  4425          0  Int64
   3 │ young      358.886  326.2  354.1   439.7         0  Float64
   4 │ urban      664.51   322    664.0   1000          0  Int64
   5 │ states    [90m         [0m AK    [90m        [0m WY            0  String3


## DataFrames in Julia

To access a variable in a dataset, use the `.` and the name of the variable.

In [95]:
println(educ.income)

[2824, 3259, 3072, 3835, 3549, 4256, 4151, 3954, 3419, 3509, 3412, 3981, 3675, 3363, 3341, 3265, 3257, 2730, 2876, 3239, 3303, 3795, 3742, 4425, 3068, 2470, 2664, 2380, 2781, 3191, 2645, 2579, 2337, 2081, 2322, 2634, 2880, 3029, 2942, 2668, 3190, 3340, 2651, 3027, 2790, 3957, 3688, 3317, 3968, 4146, 3513]


Names of the variables:

In [96]:
println(names(educ))

["education", "income", "young", "urban", "states"]


## DataFrames in Julia

Stats of a single variable:

In [97]:
using Statistics

In [99]:
println(mean(educ.income))

3225.294117647059


In [100]:
println(median(educ.income))

3257.0


In [101]:
println(std(educ.income))

560.0259741875424


## DataFrames in Julia

Stats of a single variable:

In [102]:
println(var(educ.income))

313629.0917647059


In [103]:
println(sum(educ.income))

164490


In [104]:
println(minimum(educ.income))

2081


In [105]:
println(maximum(educ.income))

4425


## DataFrames in Julia

`size` for the dimension of the dataset:

In [106]:
println(size(educ))

(51, 5)


Finding a given value: `dat[rows, cols]`

In [107]:
# Maine
println(educ[1, :])

[1mDataFrameRow[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       189    2824    350.7    508  ME


## DataFrames in Julia

In [108]:
# education expenditure
println(educ[:, 1])

[189, 169, 230, 168, 180, 193, 261, 214, 201, 172, 194, 189, 233, 209, 262, 234, 177, 177, 187, 148, 196, 248, 247, 246, 180, 149, 155, 149, 156, 191, 140, 137, 112, 130, 134, 162, 135, 155, 238, 170, 238, 192, 227, 207, 201, 225, 215, 233, 273, 372, 212]


In [109]:
# Top three income
println(educ[1:3, "income"])

[2824, 3259, 3072]


In [110]:
# Third state in the dataset
println(educ.states[3])

VT


In [111]:
# education expenditure of the first case (Maine)
println(educ[1, 1])

189


## Importing Python Functions

In [112]:
using PythonCall
pytime = pyimport("time")

Python: <module 'time' (built-in)>

In [113]:
println(pytime.ctime())

Fri Aug 18 14:41:29 2023


## Importing R Functions

In [114]:
using RCall
@rimport base as r_base

In [115]:
r_base.sum([1, 2, 3, 4, 5])

RObject{IntSxp}
[1] 15


# Data Wrangling in Julia

## Renaming a column in Julia

In [118]:
aux = chilesurv
rename!(aux, Dict(:voteYES => :votebinary))
println(first(aux, 2))

LoadError: ArgumentError: Tried renaming :voteYES to :votebinary, when :voteYES does not exist in the data frame.

## Filtering Rows in Julia

Filtering rows:

| **Operator** | **Meaning**                          |
|----------|------------------------------------------|
| `<` or `<=` | Smaller than or smaller than or equal |
| `>` or `>=` | Greater than or greater than or equal |
| `==`       | Equal                                  |
| `!=`       | Different                              |
| `!`        | Negation (turns a `true` into a `false`)   |
| `\|\|`        | Or                                     |
| `&&`        | And                                    |

In [122]:
aux = filter(row -> row.birth >= 1980, voting)
println(first(aux, 3))

[1m3×3 DataFrame[0m
[1m Row [0m│[1m birth [0m[1m message [0m[1m voted [0m
     │[90m Int64 [0m[90m String3 [0m[90m Int64 [0m
─────┼───────────────────────
   1 │  1981  no           0
   2 │  1983  no           0
   3 │  1985  yes          1


## Filtering Rows in Julia

In [123]:
aux = filter(row -> row.birth >= 1975 && row.message == "yes", voting)
println(first(aux, 3))

[1m3×3 DataFrame[0m
[1m Row [0m│[1m birth [0m[1m message [0m[1m voted [0m
     │[90m Int64 [0m[90m String3 [0m[90m Int64 [0m
─────┼───────────────────────
   1 │  1985  yes          1
   2 │  1984  yes          0
   3 │  1979  yes          0


In [124]:
println(size(voting))

(229444, 3)


In [125]:
println(size(aux))

(4472, 3)


## Sorting datasets in Julia

In [126]:
aux = sort(educ, "education")
println(first(aux, 3))

[1m3×5 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       112    2337    362.2    584  AL
   2 │       130    2081    385.2    445  MS
   3 │       134    2322    351.9    500  AR


In [128]:
aux = sort(educ, "education", rev = true)
println(first(aux, 3))

[1m3×5 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       372    4146    439.7    484  AK
   2 │       273    3968    348.4    909  CA
   3 │       262    3341    365.4    664  MN


## Drop missing data in Julia

In [129]:
aux = chilesurv
dropmissing!(aux, :"statusquo")
println(nrow(aux))

1703


## Mutating Variables in Julia

Mutating the dataset (creates new variables using old ones):

In [135]:
educ.income_per_urban = educ.income ./ educ.urban
println(first(educ, 6))

[1m6×6 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m[1m income_per_urban [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m[90m Float64          [0m
─────┼──────────────────────────────────────────────────────────────
   1 │       189    2824    350.7    508  ME                5.55906
   2 │       169    3259    345.9    564  NH                5.77837
   3 │       230    3072    348.5    322  VT                9.54037
   4 │       168    3835    335.3    846  MA                4.5331
   5 │       180    3549    327.1    871  RI                4.07463
   6 │       193    4256    341.0    774  CT                5.49871


## Linear Regression in Julia

A simple regression model in Julia

In [136]:
using GLM
ols = lm(@formula(voted ~ message), voting)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

voted ~ 1 + message

Coefficients:
───────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error       t  Pr(>|t|)  Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────────
(Intercept)   0.296638   0.00105548  281.05    <1e-99    0.29457  0.298707
message: yes  0.0813099  0.00258672   31.43    <1e-99    0.07624  0.0863798
───────────────────────────────────────────────────────────────────────────

## Julia Programming

Julia is a programming language, to the same extent and strength as Python is.

We can build great code in Julia.

## Conditional Execution

In [137]:
monday = true

# Conditional fork
if monday
    # Code for condition
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
end

Yay, first day of the week! Hope Hurricane Hillary was not that distructive.


In [138]:
monday = false

# Conditional fork
if monday
    # Code for condition
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
end

## Conditional Execution

In [139]:
monday = true

# Conditional fork
if monday
    # Code for condition true
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
else
    # Code for condition false
    println("Well, not that cool...")
end

Yay, first day of the week! Hope Hurricane Hillary was not that distructive.


In [140]:
monday = false

# Conditional fork
if monday
    # Code for condition
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
else
    # Code for condition false
    println("Well, not that cool...")
end

Well, not that cool...


## Conditional Execution

In [142]:
day = "Wed"

# Conditional fork
if day == "Mon"
    # Code for condition true
    println("Yay, first day of the week! Hope Hurricane Hillary was not that distructive.")
elseif day == "Wed"
    # Code for wed = true
    println("Padres day! Let's go!!")
else
    # Code for condition false
    println("Well, not that cool...")
end

Padres day! Let's go!!


## Function Creation

Public functions:

In [143]:
# My BMI depression function
function bmi(weight, height, imperial = false)
    return (weight)/(height^2)
end

# BMI
println(bmi(115, 1.78))

36.295922232041406


**Check-in**: Adapt the code to have a imperial to metric conversion when imperial is true.

In [155]:
# Your code here
function bmi(weight, height ; imperial = false)
    if imperial
        weight *= 0.453592
        height *= 0.0254
    end
    return (weight)/(height^2)
end
println(bmi(250, 70, imperial = true))

35.87086766010266


## Function Creation

Type declaration:

In [183]:
function bmi(weight::Float64, height::Float64, imperial::Bool = false)
    return (weight)/(height^2)
end

function bmi(weight::String, height::Float64, imperial::Bool = false)
    return (Float64(weight))/(height^2)
end

# BMI
println(bmi(115, 1.78))

36.295922232041406


In [184]:
# BMI
println(bmi("115", 1.78))

LoadError: MethodError: no method matching Float64(::String)

[0mClosest candidates are:
[0m  (::Type{T})([91m::AbstractChar[39m) where T<:Union{AbstractChar, Number}
[0m[90m   @[39m [90mBase[39m [90m[4mchar.jl:50[24m[39m
[0m  (::Type{T})([91m::Base.TwicePrecision[39m) where T<:Number
[0m[90m   @[39m [90mBase[39m [90m[4mtwiceprecision.jl:266[24m[39m
[0m  (::Type{T})([91m::Complex[39m) where T<:Real
[0m[90m   @[39m [90mBase[39m [90m[4mcomplex.jl:44[24m[39m
[0m  ...


## Function Creation

Multiple arguments

In [154]:
function people(names...)
    println(names)
end

people("John", "Julia", "Jane")

("John", "Julia", "Jane")


In [157]:
function people(name, schools... ; address)
    # Like tuple! (actually, a tuple)
    return name, schools, address
end

# Using it
print(people("John", "BA PoliSci", "MS CSS", address = "La Jolla"))

("John", ("BA PoliSci", "MS CSS"), "La Jolla")

## Function Creation

Anonymous functions:

In [158]:
# my bmi: Anonymized
mybmi = (w, h) -> w / (h^2)
mybmi(115, 1.78)

36.295922232041406

In [159]:
# With mapping
mybmi = (w, h) -> w / (h^2)
println(map(mybmi, [100, 115, 70], [1.78, 1.86, 1.67]))

[31.561671506122963, 33.240837090993175, 25.099501595611173]


## Function Creation

Timing functions:

In [160]:
# Timing bmi
@time bmi(115, 1.78)

  0.000000 seconds


36.295922232041406

In [161]:
# Benchmarking bmi
using BenchmarkTools
@benchmark bmi

BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m0.791 ns[22m[39m … [35m13.667 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m0.875 ns              [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m0.898 ns[22m[39m ± [32m 0.398 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m█[34m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m 
  [39m▂[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[

## For loops

Very similar to Python:

In [162]:
nums = [1, 2, 3, 4, 5]
for x in nums
    xsq = x^2
    println("$x squared is equal to $xsq")
end

1 squared is equal to 1
2 squared is equal to 4
3 squared is equal to 9
4 squared is equal to 16
5 squared is equal to 25


## For loops

With `enumerate`:

In [163]:
nums = [5, 4, 3, 2, 1]
for (index, x) in enumerate(nums)
    xsq = x^2
    println("In position $index: $x squared is equal to $xsq")
end

In position 1: 5 squared is equal to 25
In position 2: 4 squared is equal to 16
In position 3: 3 squared is equal to 9
In position 4: 2 squared is equal to 4
In position 5: 1 squared is equal to 1


## For loops

With ranges: (note difference from Python and R ranges!)

In [164]:
nums = 10:-2:1 # start:step:stop
for x in nums
    xsq = x^2
    println("$x squared is equal to $xsq")
end

10 squared is equal to 100
8 squared is equal to 64
6 squared is equal to 36
4 squared is equal to 16
2 squared is equal to 4


## While loops

In [166]:
nums = [5, 4, 3, 2, 1]
while length(nums) > 0
    x = pop!(nums)
    xsq = x^2
    println("$x squared is equal to $xsq")
end

1 squared is equal to 1
2 squared is equal to 4
3 squared is equal to 9
4 squared is equal to 16
5 squared is equal to 25


**Check-in**: Explain this code in words.

Little hint: Like Python, if infinite loop, just press `CTRL + C`

## ANES Dataset

Exercise:

1. Load the `anes` 2020 dataset: https://raw.githubusercontent.com/umbertomig/POLI175public/main/data/anes2020.csv

1. Explore the dataset.

1. What is the mean of `feel_biden`?

1. What is the standard deviation of `feel_rural`?

1. What is the median of `feel_gay`?

1. Standardize the variables (the operation: $z = \dfrac{x - \text{mean}(x)}{\text{sd}(x)}$):
    - `feel_biden`
    - `feel_fauci`
    - `feel_nra`
    - `feel_unions`
    - `feel_fbi`

1. Run a regression trying to predict `feel_biden` based on `feel_fauci`, `feel_nra`, `feel_unions`, and `feel_fbi`. Which variable is the most important? (Hint: try using the standardized variables in the regression!)

In [169]:
## Your answers here
⚅ = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI175public/main/data/anes2020.csv"), DataFrame)

Row,votebiden,feel_biden,feel_trump,feel_fauci,feel_feminists,feel_unions,feel_bigbusiness,feel_gay,feel_muslim,feel_christian,feel_jewish,feel_police,feel_scientists,feel_nra,feel_fbi,feel_rural,feel_cdc
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,1,85,0,100,75,75,0,100,100,30,100,40,100,0,50,75,80
2,0,0,90,0,30,50,0,50,10,50,90,70,60,70,20,90,10
3,1,100,0,85,60,50,40,70,50,85,100,70,85,15,70,85,70
4,0,0,70,0,60,50,50,60,60,60,60,60,85,85,79,70,85
5,1,70,30,50,50,50,15,50,50,50,50,70,50,50,50,50,70
6,1,60,0,80,50,40,40,50,50,50,50,70,80,30,70,70,50
7,1,85,15,100,60,40,50,70,50,70,60,70,85,0,85,40,70
8,1,85,0,90,85,70,60,85,90,100,100,85,100,15,50,85,95
9,1,70,0,85,50,75,15,50,55,50,50,15,80,0,25,50,80
10,1,85,0,75,80,85,45,60,65,65,60,65,75,45,50,60,60


In [171]:
# Head of the dataset
👤 = first(⚅, 3)
👤

Row,votebiden,feel_biden,feel_trump,feel_fauci,feel_feminists,feel_unions,feel_bigbusiness,feel_gay,feel_muslim,feel_christian,feel_jewish,feel_police,feel_scientists,feel_nra,feel_fbi,feel_rural,feel_cdc
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,1,85,0,100,75,75,0,100,100,30,100,40,100,0,50,75,80
2,0,0,90,0,30,50,0,50,10,50,90,70,60,70,20,90,10
3,1,100,0,85,60,50,40,70,50,85,100,70,85,15,70,85,70


In [172]:
# Tail of the dataset
🐊 = last(⚅, 3)
🐊

Row,votebiden,feel_biden,feel_trump,feel_fauci,feel_feminists,feel_unions,feel_bigbusiness,feel_gay,feel_muslim,feel_christian,feel_jewish,feel_police,feel_scientists,feel_nra,feel_fbi,feel_rural,feel_cdc
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,0,40,70,85,100,50,50,60,85,100,60,100,70,50,50,100,85
2,1,60,30,70,40,40,50,50,40,40,50,60,70,30,70,40,70
3,1,70,0,85,85,85,30,85,70,50,50,50,85,0,60,60,70


In [173]:
# Description of the dataset
🔎 = describe(⚅)
🔎

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Float64,Int64,Float64,Int64,Int64,DataType
1,votebiden,0.5744,0,1.0,1,0,Int64
2,feel_biden,54.3075,0,68.0,100,0,Int64
3,feel_trump,37.8767,0,15.0,100,0,Int64
4,feel_fauci,70.0888,0,85.0,100,0,Int64
5,feel_feminists,59.7539,0,60.0,100,0,Int64
6,feel_unions,57.7181,0,60.0,100,0,Int64
7,feel_bigbusiness,47.164,0,50.0,100,0,Int64
8,feel_gay,66.7663,0,70.0,100,0,Int64
9,feel_muslim,59.5309,0,50.0,100,0,Int64
10,feel_christian,71.9796,0,75.0,100,0,Int64


In [175]:
# What is the mean of feel_biden?
ȳ = mean(⚅.feel_biden)
ȳ

54.30746140651801

In [176]:
#What is the standard deviation of feel_rural?
σ = std(⚅.feel_rural)
σ

22.608122809331604

In [177]:
#What is the median of feel_gay?
😐 = median(⚅.feel_gay)
😐

70.0

In [178]:
😐 = x -> (x .- mean(x)) ./ std(x)
#Standardize the variables
⚅.feel_biden_😐 = 😐(⚅.feel_biden)
#feel_fauci
⚅.feel_fauci_😐 = 😐(⚅.feel_fauci)
#feel_nra
⚅.feel_nra_😐 = 😐(⚅.feel_nra)
#feel_unions
⚅.feel_unions_😐 = 😐(⚅.feel_unions)
#feel_fbi
⚅.feel_fbi_😐 = 😐(⚅.feel_fbi)

4664-element Vector{Float64}:
 -0.5785274980027265
 -1.886411675611165
  0.293395287069566
  0.6857605403520977
 -0.5785274980027265
  0.293395287069566
  0.9473373758737853
 -0.5785274980027265
 -1.668430979343092
 -0.5785274980027265
  1.6012794646780046
 -0.5785274980027265
  1.1653180721418586
  ⋮
 -1.0144888905388727
  0.293395287069566
  0.9473373758737853
  1.3832987684099316
  0.9473373758737853
  0.9473373758737853
  0.5113759833376391
 -1.0144888905388727
 -0.5785274980027265
 -0.5785274980027265
  0.293395287069566
 -0.14256610546658022

In [179]:
first(⚅, 3)

Row,votebiden,feel_biden,feel_trump,feel_fauci,feel_feminists,feel_unions,feel_bigbusiness,feel_gay,feel_muslim,feel_christian,feel_jewish,feel_police,feel_scientists,feel_nra,feel_fbi,feel_rural,feel_cdc,feel_biden_😐,feel_fauci_😐,feel_nra_😐,feel_unions_😐,feel_fbi_😐
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,Float64,Float64,Float64,Float64,Float64
1,1,85,0,100,75,75,0,100,100,30,100,40,100,0,50,75,80,0.836176,0.975665,-1.30136,0.71371,-0.578527
2,0,0,90,0,30,50,0,50,10,50,90,70,60,70,20,90,10,-1.47953,-2.2862,0.683801,-0.31874,-1.88641
3,1,100,0,85,60,50,40,70,50,85,100,70,85,15,70,85,70,1.24483,0.486385,-0.875969,-0.31874,0.293395


In [180]:
#Run a regression trying to predict feel_biden based on feel_fauci, feel_nra, feel_unions, and feel_fbi. Which variable is the most important? (Hint: try using the standardized variables in the regression!)
ols = lm(@formula(feel_biden_😐 ~ feel_fauci_😐 + feel_nra_😐+ feel_unions_😐+ feel_fbi_😐), ⚅)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

feel_biden_😐 ~ 1 + feel_fauci_😐 + feel_nra_😐 + feel_unions_😐 + feel_fbi_😐

Coefficients:
────────────────────────────────────────────────────────────────────────────────
                     Coef.  Std. Error       t  Pr(>|t|)   Lower 95%   Upper 95%
────────────────────────────────────────────────────────────────────────────────
(Intercept)      1.706e-16  0.00827119    0.00    1.0000  -0.0162154   0.0162154
feel_fauci_😐    0.40902    0.0117634    34.77    <1e-99   0.385958    0.432082
feel_nra_😐     -0.412146   0.0106579   -38.67    <1e-99  -0.43304    -0.391251
feel_unions_😐   0.10904    0.00923968   11.80    <1e-30   0.0909255   0.127154
feel_fbi_😐      0.0981019  0.00932513   10.52    <1e-24   0.0798202   0.116384
──────────────────────────────────────────────────────────────────────────────

# Great work! See you in the next class.