# CSS 201 / 202 - CSS Bootcamp

## Week 07 - Lecture 04

### Umberto Mignozzetti

# Introduction to Julia

## Intro to Julia

Julia is probably the best software for data science available.

It has all the capabilities of R and Python, but it is faster, much faster.

The reason for the speed is because it uses all the cores in your computer by default.

It also can run and load code from the most important languages (Java, C, R, and Python included).

It was the software used by the CERN for analyzing the LHC experiment and BlackRock to run financial assets simulation.

## Intro to Julia

Today, we will talk about:

1. Arithmetics
1. Install and load packages
1. Loading data
1. Data types

Tomorrow, we will finish with:

1. Plotting
1. Data wrangling
1. Merging data
1. Simple regression

## Intro to Julia

To install Julia, you need to download it from the internet: https://julialang.org

Then, you can choose to work with Julia in:

1. [Visual Studio](https://code.visualstudio.com/docs/languages/julia) (A great Graphical User Interface GUI)
2. [Jupyter Notebooks / Jupyter Lab](https://www.geeksforgeeks.org/how-to-work-with-julia-on-jupyter-notebook/) 

We will use Jupyter Notebooks, since it is our editor of choice. If you are working locale, the link will have instructions on how to do the job.

## Print and Comment

With Julia, the print command is `println`. 

`#` for commenting.

`#=...=#` for multiline commenting.

In [1]:
# This is 2
println(2)

2


In [2]:
#=
This is 2 + 3
Equals to 5
=#
println(2 + 3)

5


## Arithmetics in Julia

In [3]:
# Subtraction
println(2 - 3.2)

-1.2000000000000002


In [4]:
# Multiplication
println(2 * 3)

6


In [5]:
# Division
println(2 / 3)

0.6666666666666666


In [6]:
# Power (note that different than Python)!
println(2 ^ 3)

8


## Intro to Julia

### Install packages

To get started with Julia, we need to install packages. We will install packages called:

- `Statistics`
- `DataFrames`
- `CSV`
- `Plots`
- `PythonCall`
- `RCall`

For Julia packages see [JuliaHub](https://juliahub.com/ui/Packages)

In [7]:
# Installing packages (takes a bit to do)
using Pkg
Pkg.add("Statistics")
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("Plots")
Pkg.add("PythonCall")
Pkg.add("RCall")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.9/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.juli

## Loading Data in Julia

Let us load the education dataset.

We use the function `CSV.read` and `DataFrame` to convert it to the right object type:

In [8]:
# Loading packages to read the data
using CSV
using DataFrames

## Education Expenditure Dataset
educ = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI175public/main/data/educexp.csv"), DataFrame)
println("Done") # Trick to avoid displaying it

In [10]:
# The head of the dataset
println(first(educ, 3))

[1m3×5 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       189    2824    350.7    508  ME
   2 │       169    3259    345.9    564  NH
   3 │       230    3072    348.5    322  VT


## Data Types in Julia

Julia has roughly the same data types as Python and R, with a few exceptions in terms of how to use it.

In [11]:
# Numeric variable
x = 30

# Result
println(x)

30


In [12]:
# Julia is case-sensitive
x # different than X

30

## Data Types in Julia

To define variables, we can use non-numeric unicode characters. No spaces.

We can check type of variable with `typeof` command.

In [13]:
x2 = 30 # But 2x not valid

30

In [14]:
# Data type
println(typeof(x))

Int64


In [15]:
# Data type
println(typeof(x / 11))

Float64


## Data Types in Julia

**Booleans**: `true` or `false`. Note: different than R and Python.

**Strings**: `"string"` and not `'string'`. But for a single character *c* it is ok to use `'c'`.

In [16]:
# True maps to 1
println(true + false)

1


In [17]:
# My string
println("my string in awesome")

my string in awesome


## Data Types in Julia

We can slice strings as we do in Python, to get either characters or substrings.

Note that Julia starts at `1`.

In [18]:
# My seat in the Padres game
s = "Row 23 Seat 10"
println(s[1:3] * ": " * s[5:6])
println(s[end-6:end-3] * ": " * s[end-1:end])

Row: 23
Seat: 10


## Intro to Julia

### Data Types in Julia

**Literals**: Triple double quotation marks.

Concatenation can be done with `*`. Can also use the `$` sign for placement

In [19]:
lit = """
My literal
is awesome!"""
println(lit)

My literal
is awesome!


In [20]:
name = "John"
coffee = "Coffee for "
println(coffee * name)

Coffee for John


In [21]:
x = 30
println("x times two is $(x * 2)")

x times two is 60


## Data Types in Julia

Data types in Julia, in a nutshell (see [this tutorial](https://syl1.gitbook.io/julia-language-a-concise-tutorial/language-core/data-types)):

- Scalars: `Int64`, `Float64`, `Char`, `String` and `Bool`
- `Arrays`, `Tuples` (immutable), `NamedTuples`, `Dict`ionaries
- `Set`s

Make sure when using coercion, that the data type is ok for coercion.

## Data Types in Julia

Arrays: Vectors. Similar to Python lists

In [22]:
# Numeric vector
x = [30, 21, 11]
println(x)

[30, 21, 11]


In [23]:
println(typeof(x))

Vector{Int64}


In [24]:
println(eltype(x))

Int64


## Data Types in Julia

Like Python (and different from R), arrays do not need to hold the same data types:

In [25]:
x = ["John", "coffee", 2, true]
println(x)

Any["John", "coffee", 2, true]


In [26]:
println(typeof(x))

Vector{Any}


## Data Types in Julia

We can slice the data in a similar way we do in R or Python.

In [27]:
x[1]

"John"

In [28]:
x[1] * "'s " * x[2]

"John's coffee"

In [29]:
println(x[end-1:end])

Any[2, true]


## Data Types in Julia

The equivalent to the `.append` in Julia is `push!`.

The `append!`command adds more than one element.

In [30]:
x = [30, 21, 11]
push!(x, 31)
println(x)

[30, 21, 11, 31]


In [31]:
x = [30, 21, 11]
append!(x, [31, 45, 3])
println(x)

[30, 21, 11, 31, 45, 3]


## Data Types in Julia

We also have `pop!` and `sort` (note that no `sorted` like in Python).

In [32]:
x = [30, 21, 11]
println("Popped element $(pop!(x))")
println(x)

Popped element 11
[30, 21]


In [33]:
x = [30, 21, 11]
x = sort(x)
println(x)

[11, 21, 30]


## Data Types in Julia

Like in R, we do not need a special package for doing vectorized operations.

Scalar operations:

In [34]:
x = [30, 21, 11]
println(x .+ 5)

[35, 26, 16]


In [35]:
println(x .- 5)

[25, 16, 6]


In [36]:
println(x .* 1.2)

[36.0, 25.2, 13.2]


In [37]:
println(x ./ 2.1)

[14.285714285714285, 10.0, 5.238095238095238]


In [38]:
println(x .^ 2)

[900, 441, 121]


## Data Types in Julia

Vectorized operations:

In [39]:
x = [30, 21, 11]
y = [2, 3, 1]
println(x .+ y)

[32, 24, 12]


In [40]:
println(x .- y)

[28, 18, 10]


In [41]:
println(x .* y)

[60, 63, 11]


In [42]:
println(x ./ y)

[15.0, 7.0, 11.0]


In [43]:
println(x .^ y)

[900, 9261, 11]


## Data Types in Julia

In [44]:
# .head()
println(first(educ, 3))

[1m3×5 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       189    2824    350.7    508  ME
   2 │       169    3259    345.9    564  NH
   3 │       230    3072    348.5    322  VT


In [45]:
# .tail()
println(last(educ, 3))

[1m3×5 DataFrame[0m
[1m Row [0m│[1m education [0m[1m income [0m[1m young   [0m[1m urban [0m[1m states  [0m
     │[90m Int64     [0m[90m Int64  [0m[90m Float64 [0m[90m Int64 [0m[90m String3 [0m
─────┼────────────────────────────────────────────
   1 │       273    3968    348.4    909  CA
   2 │       372    4146    439.7    484  AK
   3 │       212    3513    382.9    831  HI


## Data Types in Julia

In [46]:
# Data Frames
println(typeof(educ))

DataFrame


And `describe` tells us about the DataFrame:

In [47]:
println(describe(educ))

[1m5×7 DataFrame[0m
[1m Row [0m│[1m variable  [0m[1m mean    [0m[1m min   [0m[1m median [0m[1m max   [0m[1m nmissing [0m[1m eltype   [0m
     │[90m Symbol    [0m[90m Union…  [0m[90m Any   [0m[90m Union… [0m[90m Any   [0m[90m Int64    [0m[90m DataType [0m
─────┼──────────────────────────────────────────────────────────────
   1 │ education  196.314  112    192.0   372           0  Int64
   2 │ income     3225.29  2081   3257.0  4425          0  Int64
   3 │ young      358.886  326.2  354.1   439.7         0  Float64
   4 │ urban      664.51   322    664.0   1000          0  Int64
   5 │ states    [90m         [0m AK    [90m        [0m WY            0  String3


# Great work! See you in the next class.