# Learning Julia

This notebook is meant to be an introduction to the Julia Language. Below we will provide some examples of basic operations, functions, and data types. This tutorial was made by following the walkthrough at this [link](https://www.analyticsvidhya.com/blog/2017/10/comprehensive-tutorial-learn-data-science-julia-from-scratch/).

The full Julia documentation (Version 1) is available [here](https://docs.julialang.org/en/v1/).

## Operations

Addition

In [2]:
4+5

9

Exponentiation

In [3]:
3^4

81

Division

In [4]:
91/2

45.5

## Data Types

### Arrays

In [5]:
A = [10, 20, 30]

3-element Array{Int64,1}:
 10
 20
 30

Access the first element of the array

In [6]:
A[1] #Julia is a 1-indexed language

10

### Column Vectors

In [7]:
B = [10; 20; 30] # Semi-colon denotes row-change

3-element Array{Int64,1}:
 10
 20
 30

Change the value of the first entry

In [8]:
B[1] = 199
B

3-element Array{Int64,1}:
 199
  20
  30

### Matrix operations

Create an array

In [9]:
M = [1 2 3; 4 5 6; 7 8 9]

3×3 Array{Int64,2}:
 1  2  3
 4  5  6
 7  8  9

Access/change the elements of an array by index

In [10]:
M[1,2] = 3663

3663

Transpose a matrix with a single quote

In [11]:
M'

3×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
    1  4  7
 3663  5  8
    3  6  9

Take the inverse of a matrix with the inv() command

In [12]:
inv(M)

3×3 Array{Float64,2}:
 -0.000136575  -1.49973       0.999863   
  0.000273149  -0.000546299   0.000273149
 -0.000136575   1.16694      -0.666803   

### Dictionaries

A dictionary is an unordered set of key:value pairs, with unique keys in each dictionary.

Create a dictionary with the Dict() function

In [13]:
D = Dict(
    "first_name" => "Zane",
    "last_name" => "Murphy"
)

Dict{String,String} with 2 entries:
  "first_name" => "Zane"
  "last_name"  => "Murphy"

Access the elements using keys

In [14]:
D["first_name"]

"Zane"

In [15]:
D["last_name"]

"Murphy"

Count the number of elements in a dictionary

In [16]:
D.count

2

### Strings

Strings can simply be defined by use of double ( ” ) or triple ( ”’ ) quotes. Like Python, strings in Julia are also immutable.

Create a string

In [17]:
text = "Sample String"

"Sample String"

Access a letter of a string

In [18]:
text[1]

'S': ASCII/Unicode U+0053 (category Lu: Letter, uppercase)

Get the length of a string

In [19]:
length(text)

13

Get a subset of a string

In [20]:
text[1:6]

"Sample"

Remember that strings are immutable

In [21]:
text[1] = "C"

MethodError: MethodError: no method matching setindex!(::String, ::String, ::Int64)

## Loops/Conditionals

### For Loop

The method for iteration in Julia has the following syntax:

“Julia Iterable” can be a vector, string or other advanced data structure

Let's compute a factorial

In [22]:
num = 1

for i in range(1, stop=5)
    num = num*i
end

print(num)

120

Or we can do this using the ':' syntax as well

In [23]:
num = 1

for i in 1:5
    num = num*i
end

print(num)

120

### While Loop

Let's try factorial with a while loop

In [24]:
i = 5
total = 1

while i > 0
    total = total*i
    i = i-1
end

print(total)

120

### Conditionals

The following demonstrates the use of conditionals in Julia

In [25]:
for i in 1:10
    print(i)
    if i < 4
        print(" is less than four\n")
    elseif 4 <= i & i <= 7
        print(" is between four and seven\n")
    else
        print(" is greater than seven\n")
    end
end
        

1 is less than four
2 is less than four
3 is less than four
4 is between four and seven
5 is between four and seven
6 is between four and seven
7 is between four and seven
8 is greater than seven
9 is greater than seven
10 is greater than seven


## DataFrames

Now we can get into the good stuff: Dataframes and their manipulations.

Let's figure out what directory we are working in

In [28]:
pwd()

"/Users/zanemurphy/Documents/julia/julia_scripts/julia_test"

Now let's figure out where the data is stored

In [58]:
readdir()

6-element Array{String,1}:
 ".git"              
 ".gitignore"        
 ".ipynb_checkpoints"
 "data_files"        
 "julia_test.html"   
 "julia_test.ipynb"  

In [62]:
readdir("data_files")

2-element Array{String,1}:
 "vidhya_loan_test.csv" 
 "vidhya_loan_train.csv"

Read in the training and testing data to Julia dataframes

In [50]:
using CSV, DataFrames

In [29]:
train = CSV.read("data_files/vidhya_loan_train.csv", copycols=true)
test = CSV.read("data_files/vidhya_loan_test.csv", copycols=true)

Unnamed: 0_level_0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome
Unnamed: 0_level_1,String,String⍰,String,String⍰,String,String⍰,Int64
1,LP001015,Male,Yes,0,Graduate,No,5720
2,LP001022,Male,Yes,1,Graduate,No,3076
3,LP001031,Male,Yes,2,Graduate,No,5000
4,LP001035,Male,Yes,2,Graduate,No,2340
5,LP001051,Male,No,0,Not Graduate,No,3276
6,LP001054,Male,Yes,0,Not Graduate,Yes,2165
7,LP001055,Female,No,1,Not Graduate,No,2226
8,LP001056,Male,Yes,2,Not Graduate,No,3881
9,LP001059,Male,Yes,2,Graduate,missing,13633
10,LP001067,Male,No,0,Not Graduate,No,2400


Issue some basic table commands on the training and testing set

In [30]:
size(train)

(614, 13)

In [31]:
size(test)

(367, 12)

In [32]:
names(test)

12-element Array{Symbol,1}:
 :Loan_ID          
 :Gender           
 :Married          
 :Dependents       
 :Education        
 :Self_Employed    
 :ApplicantIncome  
 :CoapplicantIncome
 :LoanAmount       
 :Loan_Amount_Term 
 :Credit_History   
 :Property_Area    

In [33]:
first(train, 10)

Unnamed: 0_level_0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome
Unnamed: 0_level_1,String,String⍰,String⍰,String⍰,String,String⍰,Int64
1,LP001002,Male,No,0,Graduate,No,5849
2,LP001003,Male,Yes,1,Graduate,No,4583
3,LP001005,Male,Yes,0,Graduate,Yes,3000
4,LP001006,Male,Yes,0,Not Graduate,No,2583
5,LP001008,Male,No,0,Graduate,No,6000
6,LP001011,Male,Yes,2,Graduate,Yes,5417
7,LP001013,Male,Yes,0,Not Graduate,No,2333
8,LP001014,Male,Yes,3+,Graduate,No,3036
9,LP001018,Male,Yes,2,Graduate,No,4006
10,LP001020,Male,Yes,1,Graduate,No,12841


In [34]:
describe(train)

Unnamed: 0_level_0,variable,mean,min,median,max,nunique,nmissing
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Union…,Union…
1,Loan_ID,,LP001002,,LP002990,614.0,
2,Gender,,Female,,Male,2.0,13.0
3,Married,,No,,Yes,2.0,3.0
4,Dependents,,0,,3+,4.0,15.0
5,Education,,Graduate,,Not Graduate,2.0,
6,Self_Employed,,No,,Yes,2.0,32.0
7,ApplicantIncome,5403.46,150,3812.5,81000,,
8,CoapplicantIncome,1621.25,0.0,1188.5,41667.0,,
9,LoanAmount,146.412,9,128.0,700,,22.0
10,Loan_Amount_Term,342.0,12,360.0,480,,14.0


Set up a basic processing pipeline using the DataFramesMeta Package

In [107]:
using DataFramesMeta

In [133]:
@linq train |>
dropmissing(:LoanAmount) |>
where(:LoanAmount .> 400) |>
select(:LoanAmount)

Unnamed: 0_level_0,LoanAmount
Unnamed: 0_level_1,Int64
1,650
2,600
3,700
4,495
5,436
6,480
7,490
8,570
9,405
10,500


Set up a basic processing pipeline using the Query Package

In [96]:
using Query

In [135]:
@from i in train begin
    @where i.LoanAmount > 400
    @select i
    @collect DataFrame
end

Unnamed: 0_level_0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome
Unnamed: 0_level_1,String,String⍰,String⍰,String⍰,String,String⍰,Int64
1,LP001469,Male,No,0,Graduate,Yes,20166
2,LP001536,Male,Yes,3+,Graduate,No,39999
3,LP001585,missing,Yes,3+,Graduate,No,51763
4,LP001610,Male,Yes,3+,Graduate,No,5516
5,LP001907,Male,Yes,0,Graduate,No,14583
6,LP001996,Male,No,0,Graduate,No,20233
7,LP002101,Male,Yes,0,Graduate,missing,63337
8,LP002191,Male,Yes,0,Graduate,No,19730
9,LP002386,Male,No,0,Graduate,missing,12876
10,LP002547,Male,Yes,1,Graduate,No,18333


Set up a basic processing pipeline using the Lazy Package

In [137]:
using Lazy

In [145]:
@> begin
    train
    dropmissing(:LoanAmount)
    @where(:LoanAmount .> 400)
    @select(:LoanAmount)
end

Unnamed: 0_level_0,LoanAmount
Unnamed: 0_level_1,Int64
1,650
2,600
3,700
4,495
5,436
6,480
7,490
8,570
9,405
10,500
