# CSS 201 / 202 - CSS Bootcamp

## Week 07 - Lecture 04

### Umberto Mignozzetti

# Introduction to Julia

## Intro to Julia

Julia is probably the best software for data science available.

It has all the capabilities of R and Python, but it is faster, much faster.

The reason for the speed is because it uses all the cores in your computer by default.

It also can run and load code from the most important languages (Java, C, R, and Python included).

It was the software used by the CERN for analyzing the LHC experiment and BlackRock to run financial assets simulation.

## Intro to Julia

Today, we will talk about:

1. Arithmetics
1. Install and load packages
1. Loading data
1. Data types

Tomorrow, we will finish with:

1. Plotting
1. Data wrangling
1. Merging data
1. Simple regression

## Intro to Julia

To install Julia, you need to download it from the internet: https://julialang.org

Then, you can choose to work with Julia in:

1. [Visual Studio](https://code.visualstudio.com/docs/languages/julia) (A great Graphical User Interface GUI)
2. [Jupyter Notebooks / Jupyter Lab](https://www.geeksforgeeks.org/how-to-work-with-julia-on-jupyter-notebook/) 

We will use Jupyter Notebooks, since it is our editor of choice. If you are working locale, the link will have instructions on how to do the job.

## Print and Comment

With Julia, the print command is `println`. 

`#` for commenting.

`#=...=#` for multiline commenting.

In [None]:
# This is 2
println(2)

In [None]:
#=
This is 2 + 3
Equals to 5
=#
println(2 + 3)

## Arithmetics in Julia

In [None]:
# Subtraction
println(2 - 3.2)

In [None]:
# Multiplication
println(2 * 3)

In [None]:
# Division
println(2 / 3)

In [None]:
# Power (note that different than Python)!
println(2 ^ 3)

## Intro to Julia

### Install packages

To get started with Julia, we need to install packages. We will install packages called:

- `Statistics`
- `DataFrames`
- `CSV`
- `Plots`
- `PythonCall`
- `RCall`

For Julia packages see [JuliaHub](https://juliahub.com/ui/Packages)

In [None]:
# Installing packages (takes a bit to do)
using Pkg
Pkg.add("Statistics")
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("Plots")
Pkg.add("PythonCall")
Pkg.add("RCall")

## Loading Data in Julia

Let us load the education dataset.

We use the function `CSV.read` and `DataFrame` to convert it to the right object type:

In [None]:
# Loading packages to read the data
using CSV
using DataFrames

## Education Expenditure Dataset
educ = CSV.read(download("https://raw.githubusercontent.com/umbertomig/POLI175public/main/data/educexp.csv"), DataFrame)
println("Done") # Trick to avoid displaying it

In [None]:
# The head of the dataset
println(first(educ, 3))

## Data Types in Julia

Julia has roughly the same data types as Python and R, with a few exceptions in terms of how to use it.

In [None]:
# Numeric variable
x = 30

# Result
println(x)

In [None]:
# Julia is case-sensitive
x # different than X

## Data Types in Julia

To define variables, we can use non-numeric unicode characters. No spaces.

We can check type of variable with `typeof` command.

In [None]:
x2 = 30 # But 2x not valid

In [None]:
# Data type
println(typeof(x))

In [None]:
# Data type
println(typeof(x / 11))

## Data Types in Julia

**Booleans**: `true` or `false`. Note: different than R and Python.

**Strings**: `"string"` and not `'string'`. But for a single character *c* it is ok to use `'c'`.

In [None]:
# True maps to 1
println(true + false)

In [None]:
# My string
println("my string in awesome")

## Data Types in Julia

We can slice strings as we do in Python, to get either characters or substrings.

Note that Julia starts at `1`.

In [None]:
# My seat in the Padres game
s = "Row 23 Seat 10"
println(s[1:3] * ": " * s[5:6])
println(s[end-6:end-3] * ": " * s[end-1:end])

## Intro to Julia

### Data Types in Julia

**Literals**: Triple double quotation marks.

Concatenation can be done with `*`. Can also use the `$` sign for placement

In [None]:
lit = """
My literal
is awesome!"""
println(lit)

In [None]:
name = "John"
coffee = "Coffee for "
println(coffee * name)

In [None]:
x = 30
println("x times two is $(x * 2)")

## Data Types in Julia

Data types in Julia, in a nutshell (see [this tutorial](https://syl1.gitbook.io/julia-language-a-concise-tutorial/language-core/data-types)):

- Scalars: `Int64`, `Float64`, `Char`, `String` and `Bool`
- `Arrays`, `Tuples` (immutable), `NamedTuples`, `Dict`ionaries
- `Set`s

Make sure when using coercion, that the data type is ok for coercion.

## Data Types in Julia

Arrays: Vectors. Similar to Python lists

In [None]:
# Numeric vector
x = [30, 21, 11]
println(x)

In [None]:
println(typeof(x))

In [None]:
println(eltype(x))

## Data Types in Julia

Like Python (and different from R), arrays do not need to hold the same data types:

In [None]:
x = ["John", "coffee", 2, true]
println(x)

In [None]:
println(typeof(x))

## Data Types in Julia

We can slice the data in a similar way we do in R or Python.

In [None]:
x[1]

In [None]:
x[1] * "'s " * x[2]

In [None]:
println(x[end-1:end])

## Data Types in Julia

The equivalent to the `.append` in Julia is `push!`.

The `append!`command adds more than one element.

In [None]:
x = [30, 21, 11]
push!(x, 31)
println(x)

In [None]:
x = [30, 21, 11]
append!(x, [31, 45, 3])
println(x)

## Data Types in Julia

We also have `pop!` and `sort` (note that no `sorted` like in Python).

In [None]:
x = [30, 21, 11]
println("Popped element $(pop!(x))")
println(x)

In [None]:
x = [30, 21, 11]
x = sort(x)
println(x)

## Data Types in Julia

Like in R, we do not need a special package for doing vectorized operations.

Scalar operations:

In [None]:
x = [30, 21, 11]
println(x .+ 5)

In [None]:
println(x .- 5)

In [None]:
println(x .* 1.2)

In [None]:
println(x ./ 2.1)

In [None]:
println(x .^ 2)

## Data Types in Julia

Vectorized operations:

In [None]:
x = [30, 21, 11]
y = [2, 3, 1]
println(x .+ y)

In [None]:
println(x .- y)

In [None]:
println(x .* y)

In [None]:
println(x ./ y)

In [None]:
println(x .^ y)

## Data Types in Julia

In [None]:
# .head()
println(first(educ, 3))

In [None]:
# .tail()
println(last(educ, 3))

## Data Types in Julia

In [None]:
# Data Frames
println(typeof(educ))

And `describe` tells us about the DataFrame:

In [None]:
println(describe(educ))

# Great work! See you in the next class.