# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="100" /> _for Pythonistas_

> TL;DR: _Julia looks and feels a lot like Python, only much faster. It's dynamic, expressive, extensible, with batteries included, in particular for Data Science_.

This notebook is an **introduction to Julia for Python programmers**.

It will go through the most important Python features (such as functions, basic types, list comprehensions, exceptions, generators, modules, packages, and so on) and show you how to code them in Julia.

## Running on Google Colab
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia (the Jupyter kernel for Julia) and other packages. You can update `JULIA_VERSION` and the other parameters, if you know what you're doing. Installation takes 2-3 minutes.
3. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the _Checking the Installation_ section.

* _Note_: If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2 and 3.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.7.1" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia CSV DataFrames JWAS Random Statistics"  # Install packages
JULIA_PACKAGES_IF_GPU="CUDA"
JULIA_NUM_THREADS=4
#---------------------------------------------------#

if [ -n "$COLAB_GPU" ] && [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  if [ "$COLAB_GPU" = "1" ]; then
      JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia  

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.7.1 on the current Colab Runtime...
2022-07-18 23:25:02 URL:https://storage.googleapis.com/julialang2/bin/linux/x64/1.7/julia-1.7.1-linux-x86_64.tar.gz [123374573/123374573] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package CSV...
Installing Julia package DataFrames...
Installing Julia package JWAS...
Installing Julia package Random...
Installing Julia package Statistics...
Installing IJulia kernel...
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mInstalling julia kernelspec in /root/.local/share/jupyter/kernels/julia-1.7

Successfully installed julia version 1.7.1!
Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then
jump to the 'Checking the Installation' section.




## Checking the Installation
The `versioninfo()` function should print your Julia version and some other info about the system (if you ever ask for help or file an issue about Julia, you should always provide this information).

In [1]:
versioninfo()

Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_NUM_THREADS = 4


In [2]:
using Pkg;Pkg.status()

[32m[1m      Status[22m[39m `~/.julia/environments/v1.7/Project.toml`
 [90m [336ed68f] [39mCSV v0.8.5
 [90m [a93c6f00] [39mDataFrames v0.22.7
 [90m [7073ff75] [39mIJulia v1.23.3
 [90m [c9a035f4] [39mJWAS v1.1.2
 [90m [9a3f8284] [39mRandom
 [90m [10745b16] [39mStatistics


# 1. Working with DataFrames

In [3]:
using DataFrames, CSV

## 1.1 Create a `DataFrame`

In [4]:
mydf = DataFrame(ID=1:3, y=randn(3), m1=[0.0,1.0,2.0],m2=[2.0,2.0,1.0])

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,Int64,Float64,Float64,Float64
1,1,0.888445,0.0,2.0
2,2,0.078831,1.0,2.0
3,3,-0.965671,2.0,1.0


Or you can create a `DataFrame` from a matix:

In [5]:
m=[0.0 2.0
   1.0 2.0
   2.0 1.0]
dfnames=["m1","m2"]
mydf = DataFrame(m,dfnames)

Unnamed: 0_level_0,m1,m2
Unnamed: 0_level_1,Float64,Float64
1,0.0,2.0
2,1.0,2.0
3,2.0,1.0


## 1.2 Insert columns into a `DataFrame`:

In [6]:
insertcols!(mydf, 1, :ID => 1:3) #1 means the 1st column

Unnamed: 0_level_0,ID,m1,m2
Unnamed: 0_level_1,Int64,Float64,Float64
1,1,0.0,2.0
2,2,1.0,2.0
3,3,2.0,1.0


In [7]:
insertcols!(mydf, 2, :y => randn(3)) #2 means the 2nd column

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,Int64,Float64,Float64,Float64
1,1,0.917462,0.0,2.0
2,2,1.1245,1.0,2.0
3,3,1.1446,2.0,1.0


## 1.3 change type of a column

In [8]:
mydf[!,:ID] = string.(mydf[!,:ID]) #change from Int64 to string

3-element Vector{String}:
 "1"
 "2"
 "3"

In [9]:
mydf

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,1,0.917462,0.0,2.0
2,2,1.1245,1.0,2.0
3,3,1.1446,2.0,1.0


## 1.4 Save a `DataFrame` as a CSV file

In [10]:
CSV.write("test.csv", mydf)

"test.csv"

## 1.5 Read data from CSV files as `DataFrame`

In [11]:
mydf=CSV.read("test.csv",DataFrame)
mydf[!,:ID] = string.(mydf[!,:ID]);
mydf

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,1,0.917462,0.0,2.0
2,2,1.1245,1.0,2.0
3,3,1.1446,2.0,1.0


more options can be found in the documentation (run `?CSV.File()`)

## 1.6 Access the columns of a data frame without copying 

In [12]:
mydf[!,"ID"]

3-element Vector{String}:
 "1"
 "2"
 "3"

In [13]:
mydf[!,1:2]

Unnamed: 0_level_0,ID,y
Unnamed: 0_level_1,String,Float64
1,1,0.917462
2,2,1.1245
3,3,1.1446


In [14]:
mydf[!,[:ID,:y]]

Unnamed: 0_level_0,ID,y
Unnamed: 0_level_1,String,Float64
1,1,0.917462
2,2,1.1245
3,3,1.1446


In [15]:
a=mydf[!,"ID"]
a[1]="will change";
mydf

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,will change,0.917462,0.0,2.0
2,2,1.1245,1.0,2.0
3,3,1.1446,2.0,1.0


## 1.7 get a copy of the column

In [16]:
mydf[:,"ID"]

3-element Vector{String}:
 "will change"
 "2"
 "3"

In [17]:
mydf[:,1:2]

Unnamed: 0_level_0,ID,y
Unnamed: 0_level_1,String,Float64
1,will change,0.917462
2,2,1.1245
3,3,1.1446


In [19]:
a=mydf[:,"ID"]
a[1]="this won't change"; #will not change because a is the copy of the column
mydf

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,will change,0.917462,0.0,2.0
2,2,1.1245,1.0,2.0
3,3,1.1446,2.0,1.0


In [20]:
mydf[:,:ID] = ["a1","a2","a3"]; #this will change the column
mydf

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,a1,0.917462,0.0,2.0
2,a2,1.1245,1.0,2.0
3,a3,1.1446,2.0,1.0


In [21]:
mydf[!,:ID] = ["ind1","ind2","ind3"]; #this will change the column
mydf

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,ind1,0.917462,0.0,2.0
2,ind2,1.1245,1.0,2.0
3,ind3,1.1446,2.0,1.0


## 1.8 Indexing

In [22]:
mydf[1:2,:y] # returns a vector

2-element Vector{Float64}:
 0.9174624888621485
 1.1245039508348154

In [23]:
mydf[1:2,2] # returns a vector

2-element Vector{Float64}:
 0.9174624888621485
 1.1245039508348154

In [24]:
mydf[1:2,[:y]] #returns a data frame object

Unnamed: 0_level_0,y
Unnamed: 0_level_1,Float64
1,0.917462
2,1.1245


In [25]:
mydf[1:2,[2]] #returns a data frame object

Unnamed: 0_level_0,y
Unnamed: 0_level_1,Float64
1,0.917462
2,1.1245


In [26]:
mydf[1:2,[1,3]] #returns a data frame object

Unnamed: 0_level_0,ID,m1
Unnamed: 0_level_1,String,Float64
1,ind1,0.0
2,ind2,1.0


In [27]:
mydf[1:2,end] #end means the last one

2-element Vector{Float64}:
 2.0
 2.0

## 1.9 DataFrame to Matrix

In [28]:
Matrix(mydf[:,[:m1,:m2]])

3×2 Matrix{Float64}:
 0.0  2.0
 1.0  2.0
 2.0  1.0

## 1.10 get basic information about a dataframe

In [31]:
size(mydf)

(3, 4)

In [32]:
size(mydf,1)

3

In [33]:
size(mydf,2)

4

In [34]:
nrow(mydf)

3

In [35]:
ncol(mydf)

4

In [36]:
first(mydf, 2) #show the first 2 rows

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,ind1,0.917462,0.0,2.0
2,ind2,1.1245,1.0,2.0


In [37]:
last(mydf, 3) #show the last 3 rows

Unnamed: 0_level_0,ID,y,m1,m2
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,ind1,0.917462,0.0,2.0
2,ind2,1.1245,1.0,2.0
3,ind3,1.1446,2.0,1.0
