<a href="https://colab.research.google.com/github/xKDR/Julia-Workshop/blob/main/DataStructuresForSpeed.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="100" /> _Colab Notebook Template_

## Instructions
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. If you need a GPU: _Runtime_ > _Change runtime type_ > _Harware accelerator_ = _GPU_.
3. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia and other packages (if needed, update `JULIA_VERSION` and the other parameters). This takes a couple of minutes.
4. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the next section.

_Notes_:
* If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2, 3 and 4.
* After installation, if you want to change the Julia version or activate/deactivate the GPU, you will need to reset the Runtime: _Runtime_ > _Factory reset runtime_ and repeat steps 3 and 4.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.10.4" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools"
JULIA_PACKAGES_IF_GPU="CUDA" # or CuArrays for older Julia versions
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  nvidia-smi -L &> /dev/null && export GPU=1 || export GPU=0
  if [ $GPU -eq 1 ]; then
    JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.10.4 on the current Colab Runtime...
2024-10-23 05:56:43 URL:https://julialang-s3.julialang.org/bin/linux/x64/1.10/julia-1.10.4-linux-x86_64.tar.gz [173704015/173704015] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package BenchmarkTools...


In [None]:
versioninfo()

Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
Threads: 2 default, 0 interactive, 1 GC (on 2 virtual cores)
Environment:
  LD_LIBRARY_PATH = /usr/local/nvidia/lib:/usr/local/nvidia/lib64
  JULIA_NUM_THREADS = 2


In [None]:
using BenchmarkTools

M = rand(2^11, 2^11)

@btime $M * $M;

  555.310 ms (2 allocations: 32.00 MiB)


In [None]:
try
    using CUDA
catch
    println("No GPU found.")
else
    run(`nvidia-smi`)
    # Create a new random matrix directly on the GPU:
    M_on_gpu = CUDA.CURAND.rand(2^11, 2^11)
    @btime $M_on_gpu * $M_on_gpu; nothing
end

# Data structures for speed

Julia is clearly the winner when it comes to speed of execution for
tabular data structure manipulation. In this session we will cover the
basics of the manipulatin tabular data structures with DataFrames.jl
and timeseries data using TSFrames.jl.

In [7]:
using Pkg
Pkg.add("DataFrames")
Pkg.add("TSFrames")
Pkg.add("RDatasets")
Pkg.add("CSV")
Pkg.add("MarketData")
Pkg.add("Impute")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Manifest.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.10/Project.toml`
[32m[1m  No Changes[22m[39m to

In [8]:
using DataFrames

In [9]:
df = DataFrame([])
df = DataFrame(a=[1,2], b=[2,3])

Row,a,b
Unnamed: 0_level_1,Int64,Int64
1,1,2
2,2,3


In [11]:
using CSV
aapl_df = CSV.read("aapl.csv", DataFrame)

Row,timestamp,Open,High,Low,Close,AdjClose,Volume
Unnamed: 0_level_1,Date,Float64,Float64,Float64,Float64,Float64,Float64
1,1980-12-12,0.128348,0.128906,0.128348,0.128348,0.0988345,4.69034e8
2,1980-12-15,0.12221,0.12221,0.121652,0.121652,0.0936782,1.75885e8
3,1980-12-16,0.113281,0.113281,0.112723,0.112723,0.0868024,1.05728e8
4,1980-12-17,0.115513,0.116071,0.115513,0.115513,0.0889509,8.64416e7
5,1980-12-18,0.118862,0.11942,0.118862,0.118862,0.0915298,7.34496e7
6,1980-12-19,0.126116,0.126674,0.126116,0.126116,0.0971157,4.86304e7
7,1980-12-22,0.132254,0.132813,0.132254,0.132254,0.101842,3.73632e7
8,1980-12-23,0.137835,0.138393,0.137835,0.137835,0.10614,4.69504e7
9,1980-12-24,0.145089,0.145647,0.145089,0.145089,0.111726,4.80032e7
10,1980-12-26,0.158482,0.15904,0.158482,0.158482,0.122039,5.55744e7


In [12]:
## Pkg.add("MySQL")
## Pkg.add("JSON")

In [26]:
using RDatasets
iris = dataset("datasets", "iris")

Row,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Cat…
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3.0,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5.0,3.6,1.4,0.2,setosa
6,5.4,3.9,1.7,0.4,setosa
7,4.6,3.4,1.4,0.3,setosa
8,5.0,3.4,1.5,0.2,setosa
9,4.4,2.9,1.4,0.2,setosa
10,4.9,3.1,1.5,0.1,setosa


In [27]:
describe(iris)

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
1,SepalLength,5.84333,4.3,5.8,7.9,0,Float64
2,SepalWidth,3.05733,2.0,3.0,4.4,0,Float64
3,PetalLength,3.758,1.0,4.35,6.9,0,Float64
4,PetalWidth,1.19933,0.1,1.3,2.5,0,Float64
5,Species,,setosa,,virginica,0,"CategoricalValue{String, UInt8}"


In [28]:
first(iris)
first(iris, 10)
last(iris, 10)
iris[1, :]
iris[:, 1]
iris[!, 1]
iris[!, [1, 2]]
iris[!, :SepalLength]
iris[!, [:SepalLength, :SepalWidth]]
iris.SepalLength
iris.SepalWidth

150-element Vector{Float64}:
 3.5
 3.0
 3.2
 3.1
 3.6
 3.9
 3.4
 3.4
 2.9
 3.1
 3.7
 3.4
 3.0
 ⋮
 3.0
 3.1
 3.1
 3.1
 2.7
 3.2
 3.3
 3.0
 2.5
 3.0
 3.4
 3.0

In [29]:
iris[!, r"Sepal"]

iris[!, Not(r"Sepal")]
iris[!, Not(:SepalLength)]

iris[!, Between(:SepalWidth, :PetalWidth)]
iris[!, Between(2, 4)]

Row,SepalWidth,PetalLength,PetalWidth
Unnamed: 0_level_1,Float64,Float64,Float64
1,3.5,1.4,0.2
2,3.0,1.4,0.2
3,3.2,1.3,0.2
4,3.1,1.5,0.2
5,3.6,1.4,0.2
6,3.9,1.7,0.4
7,3.4,1.4,0.3
8,3.4,1.5,0.2
9,2.9,1.4,0.2
10,3.1,1.5,0.1


In [32]:
iris[!, Cols(r"Petal")]

Row,PetalLength,PetalWidth
Unnamed: 0_level_1,Float64,Float64
1,1.4,0.2
2,1.4,0.2
3,1.3,0.2
4,1.5,0.2
5,1.4,0.2
6,1.7,0.4
7,1.4,0.3
8,1.5,0.2
9,1.4,0.2
10,1.5,0.1


In [33]:
iris[iris.SepalLength .> 4, :]
iris[iris.Species .== "setosa", :]
iris[(iris.SepalLength .> 4) .& (iris.PetalLength .> 3), :]

DataFrames.subset(iris,
                    :SepalLength => s -> s .> 4,
                    :PetalLength => p -> p .> 3)

DataFrames.subset(iris, :Species => s -> s .== "setosa")

iriscopy = copy(iris)
DataFrames.subset!(iriscopy, :Species => s -> s .== "setosa")
nrow(iris)
nrow(iriscopy)

50

In [34]:
select(iris, Not(:SepalLength))
select(iris, :SepalLength => s -> s * 2)
select(iris, :SepalLength => s -> s * 2, :SepalWidth)
select(iris, :SepalLength => s -> s * 2, [:SepalLength, :SepalWidth] => ((x,y) -> x[1] + x[2]) => :X)
select(iris, :SepalLength => :S1, :SepalWidth => :S2) ## Rename columns
#select!(iris, :SepalLength => :S1, :SepalWidth => :S2) ## Don't copy columns

Row,S1,S2
Unnamed: 0_level_1,Float64,Float64
1,5.1,3.5
2,4.9,3.0
3,4.7,3.2
4,4.6,3.1
5,5.0,3.6
6,5.4,3.9
7,4.6,3.4
8,5.0,3.4
9,4.4,2.9
10,4.9,3.1


In [35]:
transform(iris, Not(:SepalLength))
transform(iris, Not(:SepalLength)) == select(iris, Not(:SepalLength)) # true
transform(iris, :SepalLength => s -> s * 2) # returns new column
transform(iris, :SepalLength => (s -> s * 2) => :SepalLength2) # returns new column
transform(iris, :SepalLength => s -> s * 2, [:SepalLength, :SepalWidth] => ((x,y) -> x[1] + x[2]) => :X)

Row,SepalLength,SepalWidth,PetalLength,PetalWidth,Species,SepalLength_function,X
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Cat…,Float64,Float64
1,5.1,3.5,1.4,0.2,setosa,10.2,10.0
2,4.9,3.0,1.4,0.2,setosa,9.8,10.0
3,4.7,3.2,1.3,0.2,setosa,9.4,10.0
4,4.6,3.1,1.5,0.2,setosa,9.2,10.0
5,5.0,3.6,1.4,0.2,setosa,10.0,10.0
6,5.4,3.9,1.7,0.4,setosa,10.8,10.0
7,4.6,3.4,1.4,0.3,setosa,9.2,10.0
8,5.0,3.4,1.5,0.2,setosa,10.0,10.0
9,4.4,2.9,1.4,0.2,setosa,8.8,10.0
10,4.9,3.1,1.5,0.1,setosa,9.8,10.0


In [36]:
combine(iris, :SepalLength .=> sum)
combine(iris, Not(:Species) .=> sum)
combine(iris, :SepalLength => x -> sum(x * 10))

Row,SepalLength_function
Unnamed: 0_level_1,Float64
1,8765.0


In [37]:
df = DataFrame(x=[1, 2, missing], y=[1, missing, missing])
combine(df, All() .=> x -> x * 10)
combine(df, All() .=> x -> sum(x * 10))
combine(df, All() .=> x -> sum(skipmissing(x * 10)))

Row,x_function,y_function
Unnamed: 0_level_1,Int64,Int64
1,30,10


In [38]:
gd = groupby(iris, :Species)
combine(gd, :SepalLength => sum)
combine(gd, Not(:Species) .=> sum)
combine(gd, Not(:Species) .=> sum, DataFrames.nrow)
using Statistics
combine(gd, Not(:Species) .=> mean, DataFrames.nrow)

combine(gd, AsTable([:SepalLength, :PetalLength]) => ByRow((x) -> x[1] / x[2]) => :Ratio)

Row,Species,Ratio
Unnamed: 0_level_1,Cat…,Float64
1,setosa,3.64286
2,setosa,3.5
3,setosa,3.61538
4,setosa,3.06667
5,setosa,3.57143
6,setosa,3.17647
7,setosa,3.28571
8,setosa,3.33333
9,setosa,3.14286
10,setosa,3.26667


In [39]:
using TSFrames
ts = TSFrame(1:10)
ts = TSFrame(1:10, 2301: 2310)



[1m10×1 TSFrame with Int64 Index[0m
[1m Index [0m[1m x1    [0m
[90m Int64 [0m[90m Int64 [0m
──────────────
  2301      1
  2302      2
  2303      3
  2304      4
  2305      5
  2306      6
  2307      7
  2308      8
  2309      9
  2310     10

In [41]:
using MarketData
aapl_df = DataFrame(MarketData.yahoo(:AAPL))
aapl_ts = TSFrame(MarketData.yahoo(:AAPL))
aapl_ts = CSV.read("aapl.csv", TSFrame)

[1m11088×6 TSFrame with Date Index[0m
[1m Index      [0m[1m Open       [0m[1m High       [0m[1m Low        [0m[1m Close      [0m[1m AdjClose    [0m[1m Volume    [0m
[90m Date       [0m[90m Float64    [0m[90m Float64    [0m[90m Float64    [0m[90m Float64    [0m[90m Float64     [0m[90m Float64   [0m
────────────────────────────────────────────────────────────────────────────────────
 1980-12-12    0.128348    0.128906    0.128348    0.128348    0.0988345  4.69034e8
 1980-12-15    0.12221     0.12221     0.121652    0.121652    0.0936782  1.75885e8
 1980-12-16    0.113281    0.113281    0.112723    0.112723    0.0868024  1.05728e8
 1980-12-17    0.115513    0.116071    0.115513    0.115513    0.0889509  8.64416e7
 1980-12-18    0.118862    0.11942     0.118862    0.118862    0.0915298  7.34496e7
 1980-12-19    0.126116    0.126674    0.126116    0.126116    0.0971157  4.86304e7
 1980-12-22    0.132254    0.132813    0.132254    0.132254    0.101842   3.73632e

In [43]:
nr(aapl_ts)
nc(aapl_ts)
size(aapl_ts)
length(aapl_ts)
names(aapl_ts)
index(aapl_ts)
TSFrames.describe(aapl_ts)

Row,variable,mean,min,median,max,nmissing,eltype
Unnamed: 0_level_1,Symbol,Union…,Any,Union…,Any,Int64,DataType
1,Index,,1980-12-12,,2024-12-05,0,Date
2,Open,23.953,0.049665,0.540179,244.05,0,Float64
3,High,24.2085,0.049665,0.549107,244.54,0,Float64
4,Low,23.7092,0.049107,0.53125,242.23,0,Float64
5,Close,23.9704,0.049107,0.540179,243.01,0,Float64
6,AdjClose,23.1336,0.0378149,0.442486,243.01,0,Float64
7,Volume,315893000.0,0.0,203896000.0,7.42164e9,0,Float64


In [44]:
aapl_ts[1]
aapl_ts[2, 1]
aapl_ts[2, [1]]
aapl_ts[[2, 3], [1, 2, 3, 4]]
aapl_ts[[2, 3], [:Open, :High, :Low, :Close]]
aapl_ts.Open


11088-element Vector{Float64}:
   0.1283479928970337
   0.12221000343561172
   0.1132809966802597
   0.11551299691200256
   0.11886200308799744
   0.12611599266529083
   0.1322540044784546
   0.13783499598503113
   0.14508900046348572
   0.15848200023174286
   0.16071400046348572
   0.15736599266529083
   0.1529020071029663
   ⋮
 226.97999572753906
 228.05999755859375
 228.8800048828125
 228.05999755859375
 231.4600067138672
 233.3300018310547
 234.47000122070312
 234.80999755859375
 237.27000427246094
 239.80999755859375
 242.8699951171875
 244.0500030517578

In [45]:
aapl_ts[Date(2007, 1, 10)]
aapl_ts[Date(2007, 1, 10), [:Open, :High, :Low, :Close]]
aapl_ts[Year(2007), Month(1)]
aapl_ts[Year(2007), Month(1)][:, [:Open, :High, :Low, :Close]]
aapl_ts[Year(2007), Quarter(1)][:, [:Open, :High, :Low, :Close]]

[1m61×4 TSFrame with Date Index[0m
[1m Index      [0m[1m Open    [0m[1m High    [0m[1m Low     [0m[1m Close   [0m
[90m Date       [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m
────────────────────────────────────────────────
 2007-01-03  3.08179  3.09214  2.925    2.99286
 2007-01-04  3.00179  3.06964  2.99357  3.05929
 2007-01-05  3.06321  3.07857  3.01429  3.0375
 2007-01-08  3.07     3.09036  3.04571  3.0525
 2007-01-09  3.0875   3.32071  3.04107  3.30607
 2007-01-10  3.38393  3.49286  3.3375   3.46429
 2007-01-11  3.42643  3.45643  3.39643  3.42143
 2007-01-12  3.37821  3.395    3.32964  3.37929
 2007-01-16  3.41714  3.47321  3.40893  3.46786
 2007-01-17  3.48429  3.48571  3.38643  3.39107
 2007-01-18  3.28929  3.28964  3.18036  3.18107
     ⋮          ⋮        ⋮        ⋮        ⋮
 2007-03-19  3.22286  3.26964  3.19964  3.25464
 2007-03-20  3.2625   3.28     3.25214  3.26714
 2007-03-21  3.28536  3.35714  3.27321  3.3525
 2007-03-22  3.34

In [None]:
# Pkg.add("Plots")
# using Plots
# plot(aapl_ts, [:AdjClose])

In [46]:
aapl_monthly = apply(aapl_ts, Month(1), last)
aapl_weekly = apply(aapl_ts, Week(1), Statistics.std)
aapl_weekly = apply(aapl_ts, Week(1), Statistics.std, last)

[1m2296×6 TSFrame with Date Index[0m
[1m Index      [0m[1m Open_std     [0m[1m High_std     [0m[1m Low_std      [0m[1m Close_std    [0m[1m AdjClose_std [0m[1m Volume_std  [0m
[90m Date       [0m[90m Float64      [0m[90m Float64      [0m[90m Float64      [0m[90m Float64      [0m[90m Float64      [0m[90m Float64     [0m
───────────────────────────────────────────────────────────────────────────────────────────────
 1980-12-12  NaN           NaN           NaN           NaN           NaN           NaN
 1980-12-19    0.00513892    0.00522604    0.00522604    0.00522604    0.00402431    4.82168e7
 1980-12-26    0.0113361     0.0113358     0.0113361     0.0113361     0.00872939    7.46981e6
 1981-01-02    0.0035291     0.00356931    0.00365905    0.00365905    0.00281765    3.23055e7
 1981-01-09    0.00601539    0.00601797    0.00601797    0.00601797    0.00463415    1.25715e7
 1981-01-16    0.00231418    0.0023209     0.00205022    0.00205022    0.00157877    5.

In [47]:
ibm_ts = TSFrame(MarketData.yahoo(:IBM))

[1m13575×6 TSFrame with Date Index[0m
[1m Index      [0m[1m Open     [0m[1m High     [0m[1m Low      [0m[1m Close    [0m[1m AdjClose  [0m[1m Volume        [0m
[90m Date       [0m[90m Float64  [0m[90m Float64  [0m[90m Float64  [0m[90m Float64  [0m[90m Float64   [0m[90m Float64       [0m
──────────────────────────────────────────────────────────────────────────────
 1971-02-08   16.0851   16.336    15.8939   16.2882    3.49648  719648.0
 1971-02-09   16.2882   16.3121   16.1687   16.1807    3.47339  673624.0
 1971-02-10   16.1568   16.1568   15.9775   16.1209    3.46057  648520.0
 1971-02-11   16.1209   16.2285   16.097    16.1926    3.47596  579484.0
 1971-02-12   16.1926   16.2285   16.1329   16.2285    3.48365  382836.0
 1971-02-16   16.2285   16.4197   16.1926   16.3301    3.50546  684084.0
 1971-02-17   16.2703   16.2703   16.0492   16.0851    3.45287  652704.0
 1971-02-18   16.0851   16.1209   15.7385   15.7385    3.37848  822156.0
 1971-02-19   15.738

In [48]:
date_from = Date(2021, 06, 01);
date_to = Date(2021, 12, 31);
ibm = TSFrames.subset(ibm_ts, date_from, date_to)
aapl = TSFrames.subset(aapl_ts, date_from, date_to)

[1m150×6 TSFrame with Date Index[0m
[1m Index      [0m[1m Open    [0m[1m High    [0m[1m Low     [0m[1m Close   [0m[1m AdjClose [0m[1m Volume    [0m
[90m Date       [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64 [0m[90m Float64  [0m[90m Float64   [0m
─────────────────────────────────────────────────────────────────────
 2021-06-01   125.08   125.35   123.94   124.28   121.916  6.76371e7
 2021-06-02   124.28   125.24   124.05   125.06   122.681  5.92789e7
 2021-06-03   124.68   124.85   123.13   123.54   121.19   7.62292e7
 2021-06-04   124.07   126.16   123.85   125.89   123.496  7.51693e7
 2021-06-07   126.17   126.32   124.83   125.9    123.505  7.10576e7
 2021-06-08   126.6    128.46   126.21   126.74   124.329  7.44038e7
 2021-06-09   127.21   127.75   126.52   127.13   124.712  5.68779e7
 2021-06-10   127.02   128.19   125.94   126.11   123.711  7.11864e7
 2021-06-11   126.53   127.44   126.1    127.35   124.928  5.35224e7
 2021-06-14 

In [49]:
ibm_aapl = TSFrames.join(ibm[:, ["AdjClose"]], aapl[:, ["AdjClose"]]; jointype = :JoinBoth)

[1m150×2 TSFrame with Date Index[0m
[1m Index      [0m[1m AdjClose [0m[1m AdjClose_1 [0m
[90m Date       [0m[90m Float64  [0m[90m Float64    [0m
──────────────────────────────────
 2021-06-01   117.809     121.916
 2021-06-02   119.06      122.681
 2021-06-03   118.921     121.19
 2021-06-04   120.448     123.496
 2021-06-07   120.939     123.505
 2021-06-08   121.797     124.329
 2021-06-09   123.104     124.712
 2021-06-10   122.998     123.711
 2021-06-11   123.602     124.928
 2021-06-14   122.581     127.998
 2021-06-15   122.034     127.174
     ⋮          ⋮          ⋮
 2021-12-17   111.603     168.382
 2021-12-20   111.305     167.014
 2021-12-21   112.978     170.202
 2021-12-22   113.661     172.809
 2021-12-23   114.432     173.439
 2021-12-27   115.299     177.424
 2021-12-28   116.184     176.4
 2021-12-29   116.815     176.489
 2021-12-30   117.305     175.328
 2021-12-31   117.086     174.708
[36m                  129 rows omitted[0m

In [50]:
TSFrames.rename!(ibm_aapl, [:IBM, :AAPL])

[1m150×2 TSFrame with Date Index[0m
[1m Index      [0m[1m IBM     [0m[1m AAPL    [0m
[90m Date       [0m[90m Float64 [0m[90m Float64 [0m
──────────────────────────────
 2021-06-01  117.809  121.916
 2021-06-02  119.06   122.681
 2021-06-03  118.921  121.19
 2021-06-04  120.448  123.496
 2021-06-07  120.939  123.505
 2021-06-08  121.797  124.329
 2021-06-09  123.104  124.712
 2021-06-10  122.998  123.711
 2021-06-11  123.602  124.928
 2021-06-14  122.581  127.998
 2021-06-15  122.034  127.174
     ⋮          ⋮        ⋮
 2021-12-17  111.603  168.382
 2021-12-20  111.305  167.014
 2021-12-21  112.978  170.202
 2021-12-22  113.661  172.809
 2021-12-23  114.432  173.439
 2021-12-27  115.299  177.424
 2021-12-28  116.184  176.4
 2021-12-29  116.815  176.489
 2021-12-30  117.305  175.328
 2021-12-31  117.086  174.708
[36m              129 rows omitted[0m

In [51]:
using Impute
ibm_aapl = ibm_aapl |> Impute.locf()

[1m150×2 TSFrame with Date Index[0m
[1m Index      [0m[1m IBM     [0m[1m AAPL    [0m
[90m Date       [0m[90m Float64 [0m[90m Float64 [0m
──────────────────────────────
 2021-06-01  117.809  121.916
 2021-06-02  119.06   122.681
 2021-06-03  118.921  121.19
 2021-06-04  120.448  123.496
 2021-06-07  120.939  123.505
 2021-06-08  121.797  124.329
 2021-06-09  123.104  124.712
 2021-06-10  122.998  123.711
 2021-06-11  123.602  124.928
 2021-06-14  122.581  127.998
 2021-06-15  122.034  127.174
     ⋮          ⋮        ⋮
 2021-12-17  111.603  168.382
 2021-12-20  111.305  167.014
 2021-12-21  112.978  170.202
 2021-12-22  113.661  172.809
 2021-12-23  114.432  173.439
 2021-12-27  115.299  177.424
 2021-12-28  116.184  176.4
 2021-12-29  116.815  176.489
 2021-12-30  117.305  175.328
 2021-12-31  117.086  174.708
[36m              129 rows omitted[0m

In [52]:
ibm_aapl_weekly = to_weekly(ibm_aapl)

[1m31×2 TSFrame with Date Index[0m
[1m Index      [0m[1m IBM     [0m[1m AAPL    [0m
[90m Date       [0m[90m Float64 [0m[90m Float64 [0m
──────────────────────────────
 2021-06-04  120.448  123.496
 2021-06-11  123.602  124.928
 2021-06-18  116.935  127.979
 2021-06-25  119.975  130.578
 2021-07-02  114.402  137.298
 2021-07-09  115.628  142.35
 2021-07-16  113.487  143.606
 2021-07-23  115.481  145.734
 2021-07-30  115.17   143.086
 2021-08-06  117.728  143.575
 2021-08-13  118.331  146.483
     ⋮          ⋮        ⋮
 2021-10-29  103.389  147.171
 2021-11-05  106.857  148.842
 2021-11-12  104.209  147.573
 2021-11-19  101.66   157.962
 2021-11-26  101.45   154.283
 2021-12-03  104.104  159.232
 2021-12-10  108.703  176.558
 2021-12-17  111.603  168.382
 2021-12-23  114.432  173.439
 2021-12-31  117.086  174.708
[36m               10 rows omitted[0m

In [53]:
ibm_aapl_weekly_returns = diff(log.(ibm_aapl_weekly))

[1m31×2 TSFrame with Date Index[0m
[1m Index      [0m[1m IBM_log           [0m[1m AAPL_log         [0m
[90m Date       [0m[90m Float64?          [0m[90m Float64?         [0m
─────────────────────────────────────────────────
 2021-06-04 [90m missing           [0m[90m missing          [0m
 2021-06-11        0.0258466          0.0115307
 2021-06-18       -0.055449           0.0241275
 2021-06-25        0.0256603          0.0201091
 2021-07-02       -0.0475583          0.0501808
 2021-07-09        0.0106556          0.0361354
 2021-07-16       -0.0186868          0.00878227
 2021-07-23        0.0174143          0.0147146
 2021-07-30       -0.00269226        -0.0183416
 2021-08-06        0.0219619          0.00341488
 2021-08-13        0.00511147         0.0200523
     ⋮               ⋮                 ⋮
 2021-10-29       -0.0219787          0.00743724
 2021-11-05        0.0329914          0.01129
 2021-11-12       -0.0250883         -0.00856391
 2021-11-19       -0.024766

In [55]:
TSFrames.rename!(ibm_aapl_weekly_returns, [:IBM, :AAPL])

[1m31×2 TSFrame with Date Index[0m
[1m Index      [0m[1m IBM               [0m[1m AAPL             [0m
[90m Date       [0m[90m Float64?          [0m[90m Float64?         [0m
─────────────────────────────────────────────────
 2021-06-04 [90m missing           [0m[90m missing          [0m
 2021-06-11        0.0258466          0.0115307
 2021-06-18       -0.055449           0.0241275
 2021-06-25        0.0256603          0.0201091
 2021-07-02       -0.0475583          0.0501808
 2021-07-09        0.0106556          0.0361354
 2021-07-16       -0.0186868          0.00878227
 2021-07-23        0.0174143          0.0147146
 2021-07-30       -0.00269226        -0.0183416
 2021-08-06        0.0219619          0.00341488
 2021-08-13        0.00511147         0.0200523
     ⋮               ⋮                 ⋮
 2021-10-29       -0.0219787          0.00743724
 2021-11-05        0.0329914          0.01129
 2021-11-12       -0.0250883         -0.00856391
 2021-11-19       -0.024766

In [56]:
ibm_std = std(skipmissing(ibm_aapl_weekly_returns[:, :IBM]))

0.034079281937703855

# References

- Working with DataFrames: https://dataframes.juliadata.org/stable/man/working_with_dataframes/
- DataFrames API reference: https://dataframes.juliadata.org/stable/lib/functions/
- TSFrames user guide: https://xkdr.github.io/TSFrames.jl/stable/user_guide/
- Basic demo of TSFrames: https://xkdr.github.io/TSFrames.jl/stable/demo_finance/

Add new code cells by clicking the `+ Code` button (or _Insert_ > _Code cell_).

Have fun!

<img src="https://raw.githubusercontent.com/JuliaLang/julia-logo-graphics/master/images/julia-logo-mask.png" height="100" />