## Here lets try to get some data from the alfven modes and train a few autoencoders on them to see if they capture any structure in the latent layer

In [None]:
using AlfvenDetectors
using PyPlot
using Flux
using CuArrays  # for GPU runs
using ValueHistories
using BSON: @save, @load

What we are doing is unsupervised training on columns of the magnitude squared coherence time histograms. 

### Collect the data

Use shot #10370 and #11960 and several coil couples. Select only some timeslices, normalize data

In [None]:
host = gethostname()
if occursin("vit", host)
    datapath = "/home/vit/vyzkum/alfven/cdb_data/original_data/"
else
    datapath = "/home/skvara/work/alfven/cdb_data/original_data/"
end

In [None]:
function get_msc_array(datapath, shot, coil, timelim = [1.0, 1.25])
    _data = AlfvenDetectors.BaseAlfvenData(joinpath(datapath,"$(shot).h5"), [coil])
    tinds = timelim[1] .<= _data.t .<= timelim[2]
    return _data.msc[coil][:,tinds], _data.t[tinds], _data.f 
end

In [None]:
msc, t, f = get_msc_array(datapath, 11096, 5)

In [None]:
pcolormesh(t,f,msc)

In [None]:
function collect_msc(datapath, shot, coils)
    datalist = map(x-> get_msc_array(datapath, shot, x), coils)
    return hcat([x[1] for x in datalist]...), datalist[1][3]
end

In [None]:
shots_coils = [
#    [10370, [12, 15, 17, 20]],
    [10370, [12, 20]],
#    [11096, [11, 8, 17, 20]]
    [11096, [11, 8, 20]]
]
datalist = map(x->collect_msc(datapath, x[1], x[2]), shots_coils)
data, f = hcat([x[1] for x in datalist]...), datalist[1][2]

In [None]:
pcolormesh(1:size(data,2), f, data)

### Now that we have the data, construct a VAE

Larger dimension of middle layer is beneficial, but improvement from 10 to 20 is much alrger than from 20 to 200.

Reconstruction works even with zdim = 2 although there are some artifacts.

In [None]:
M,N = size(data)
# fortunately data is already normalized in the interval (0,1)
zdim = 2
small_model = AlfvenDetectors.VAE([M, 20, zdim*2], [zdim, 20, M])
large_model = AlfvenDetectors.VAE([M, 200, zdim*2], [zdim, 200, M])
small_train_history = MVHistory()
large_train_history = MVHistory()
batchsize = 64
nepochs = 200
cbit = 1
# progress bars are broken in notebooks
if occursin(".jl", @__FILE__) 
    verb = true
else
    verb = false
end
# VAE specific settings
L = 1
β = 0.01

In [None]:
@info "Training small CPU model"
@time AlfvenDetectors.fit!(small_model, data, batchsize, 1;
    β = β, L = L,
    cbit = cbit, history = small_train_history, verb = verb)
@time AlfvenDetectors.fit!(small_model, data, batchsize, nepochs-1;
    β = β, L = L,
    cbit = cbit, history = small_train_history, verb = verb)

Doing a fast run saves about 20% of allocations, still more than 2x as many as compared to AE. Also, setting $\beta$ to large values around 1 is detrimental to the reconstruction - however it is not detrimental to the clustering in latent space, as it still shows even when more weight is put on KL. Also, the generated samples look more realistic.

In [None]:
@info "Training large CPU model"
@time AlfvenDetectors.fit!(large_model, data, batchsize, nepochs;
    L = L, β = 1.0,
    cbit = cbit, history = large_train_history, verb = verb)

In [None]:
figure()
plot(get(small_train_history, :loss)...,label="loss")
plot(get(small_train_history, :loglikelihood)...,label="-loglikelihood")
plot(get(small_train_history, :KL)...,label="KL")
title("Training loss - smaller model")
xlabel("iteration")
ylabel("loss")
legend()

In [None]:
figure()
plot(get(large_train_history, :loss)...,label="loss")
plot(get(large_train_history, :loglikelihood)...,label="-loglikelihood")
plot(get(large_train_history, :KL)...,label="KL")
title("Training loss - larger model")
xlabel("iteration")
ylabel("loss")
legend()

In [None]:
X = data;

In [None]:
figure()
pcolormesh(1:size(X,2), f, X)
title("Original data")
xlabel("t")
ylabel("f")

In [None]:
figure()
sX = small_model(X).data
pcolormesh(1:size(sX,2), f, sX)
title("VAE output - smaller model")
xlabel("t")
ylabel("f")

In [None]:
figure()
lX = large_model(X).data
pcolormesh(1:size(lX,2), f, lX)
title("VAE output - larger model")
xlabel("t")
ylabel("f")

## Basic training seems to work, now test the GPU version

In [None]:
# convert to CuArrays
zdim = 2
cudata = data |> gpu
cumodel = AlfvenDetectors.VAE([M, 200, zdim*2], [zdim, 200, M]) |> gpu
cu_train_history = MVHistory()
nepochs = 200
L = 1
β = 0.01

In [None]:
@info "Training a large GPU model with less epochs in more iterations"
# clear cache

@time AlfvenDetectors.fit!(cumodel, cudata, batchsize, 1;
        L=L,β=β,
        cbit = cbit, history = cu_train_history, verb = verb)
    
for i in 1:5
    @time AlfvenDetectors.fit!(cumodel, cudata, batchsize, nepochs;
        L=L,β=β,
        cbit = cbit, history = cu_train_history, verb = verb)
    # clear cache so that the gpu memory is cleared
    GC.gc()
end

In [None]:
@info "large CPU model(data) timing"
@time large_model(data);

In [None]:
@info "GPU model(data) timing"
@time cumodel(cudata);

In case of VAE, GPU is a considerable boost to training and evaluation times.

In [None]:
figure()
plot(get(cu_train_history, :loss)...)
title("GPU model training loss")
xlabel("iteration")
ylabel("loss")

In [None]:
figure()
X = cudata;
_X = cumodel(X).data |> cpu
pcolormesh(1:size(_X,2), f, _X)
title("VAE output with GPU training")
xlabel("t")
ylabel("f")

Check further memory allocation for GPUs

## In this part, lets try to see some sort of structure in the latent code

In [None]:
# save/load a pretrained model
f = "large_vae_model.bson"
if !isfile(f) 
    @save f large_model
else
    @load f large_model
end

In [None]:
X1, t1, f1 = get_msc_array(datapath, 11096, 11)
pcolormesh(t1, f1, X1)

In [None]:
X0, t0, f0 = get_msc_array(datapath, 11096, 20)
pcolormesh(t0, f0, X0)

In [None]:
Xα = X1[:,1.06.<=t1.<=1.22]
zα = large_model.encoder(Xα).data
z1 = large_model.encoder(X1).data
z0 = large_model.encoder(X0).data

Clearly the code is not very N(0,1) since we have used a very low $\beta$.

In [None]:
figure()
scatter(z1[1,:], z1[2,:], label = "positive")
scatter(z0[1,:], z0[2,:], label = "negative")
scatter(zα[1,:], zα[2,:], label = "alfven mode")
legend()

Now lets "generate" a new diagram.

In [None]:
function connect(zs, l)
    L = length(zs)
    return vcat([hcat(
    collect(range(zs[i][1], zs[i+1][1]; length = l)), 
    collect(range(zs[i][2], zs[i+1][2]; length = l))
        )
    for i in 1:L-1]...)
end
zs = [[-1,-3.5], [0,-4.5], [1,-4.5], [2,-2], [0,0]]
zpath = Array(connect(zs, 50)');

In [None]:
figure()
scatter(z1[1,:], z1[2,:], label = "positive")
scatter(z0[1,:], z0[2,:], label = "negative")
scatter(zα[1,:], zα[2,:], label = "alfven mode")
plot(zpath[1,:], zpath[2,:], label = "artificial z")
legend()

In [None]:
Xgen = large_model.decoder(zpath).data;

In [None]:
figure()
pcolormesh(Xgen)
title("artificial coherence")

In [None]:
# also, lets try to sample from N(0,1) and give it to the decoder
Xgen2 = AlfvenDetectors.sample(large_model, 100).data;

This is not a good results since only some strange phenomena were actually encoded to N(0,1), like the ends/beginnings of the shot.

In [None]:
figure()
pcolormesh(Xgen2)
title("generated artificial coherence");

In [None]:
show()

## Also train a diag version of VAE

In [None]:
diag_model = AlfvenDetectors.VAE([M, 200, zdim*2], [zdim, 200, M*2],variant = :diag)
diag_train_history = MVHistory()

In [None]:
@info "Training large CPU model with diagonal covariance"

# precompilation run
@time AlfvenDetectors.fit!(diag_model, data, batchsize, 1;
    L = L, β = 1.0,
    cbit = cbit, history = diag_train_history, verb = verb)

@time AlfvenDetectors.fit!(diag_model, data, batchsize, 100;
    L = L, β = 1.0,
    cbit = cbit, history = diag_train_history, verb = verb)

In [None]:
figure()
plot(get(diag_train_history, :loss)..., label="loss")
plot(get(diag_train_history, :loglikelihood)..., label="-loglikelihood")
plot(get(diag_train_history, :KL)..., label="KL")
title("Training loss - large diagonal model")
xlabel("iteration")
ylabel("loss")
legend()

In [None]:
figure()
dlX = diag_model(X).data
pcolormesh(1:size(dlX,2), collect(1:M), dlX[1:M,:])
title("VAE output - diagonal model (means)")
xlabel("t")
ylabel("f")

# before we ahve only taken the means, now sample from the posterior properly
sdlX = AlfvenDetectors.samplenormal(dlX)
figure()
pcolormesh(1:size(sdlX,2), collect(1:M), sdlX)
title("VAE output - diagonal model - samples of output")
xlabel("t")
ylabel("f")


In [None]:
# save/load a pretrained model
f = "diag_vae_model.bson"
if !isfile(f) 
    @save f diag_model
else
    @load f diag_model
end

In [None]:
X1, t1, f1 = get_msc_array(datapath, 11096, 11)
pcolormesh(t1, f1, X1)

In [None]:
X0, t0, f0 = get_msc_array(datapath, 11096, 20)
pcolormesh(t0, f0, X0)

In [None]:
Xα = X1[:,1.06.<=t1.<=1.22]
zα = diag_model.encoder(Xα).data
z1 = diag_model.encoder(X1).data
z0 = diag_model.encoder(X0).data

The code does not seem to be very N(0,1).

In [None]:
figure()
scatter(z1[1,:], z1[2,:], label = "positive")
scatter(z0[1,:], z0[2,:], label = "negative")
scatter(zα[1,:], zα[2,:], label = "alfven mode")
legend()

### Takeaways

Obviously the sampling introduces noise in the places where we should see clear zeros but the structure is there. Also, artifacts are introduced in the output of the model with more training - overfitting? Also, tuning $\beta$ does not play a role now as the output variance is estimated afterwards. Training is super slow. 

Tuning or not tuning of $\beta$ does not seem to have an effect on the cluster in latent space.

## Also train a diag version of VAE with GPU

In [None]:
M,N = size(data)
zdim = 2
cudata = data |> gpu
gpu_diag_model = AlfvenDetectors.VAE([M, 200, zdim*2], [zdim, 200, M*2],variant = :diag) |> gpu
gpu_diag_train_history = MVHistory()
L = 1
verb = false
cbit = 1
batchsize = 64

In [None]:
@info "Training large GPU model with diagonal covariance"

# precompilation run
@time AlfvenDetectors.fit!(gpu_diag_model, cudata, batchsize, 1;
    L = L, β = 0.01,
    cbit = cbit, history = gpu_diag_train_history, verb = verb)

@time AlfvenDetectors.fit!(gpu_diag_model, cudata, batchsize, 100;
    L = L, β = 0.01,
    cbit = cbit, history = gpu_diag_train_history, verb = verb)

Here GPU is actually quite faster.

## Finally, lets try the scalar VAE

In [None]:
M,N = size(data)
zdim = 2
cudata = data |> gpu
L = 1
verb = false
cbit = 1
batchsize = 64

scalar_model = AlfvenDetectors.VAE([M, 200, zdim*2], [zdim, 200, M+1],variant = :scalar) |> gpu
scalar_train_history = MVHistory()

In [None]:
@info "Training large GPU model with scalar output variance"

# precompilation run
@time AlfvenDetectors.fit!(scalar_model, cudata, batchsize, 1;
    L = L, β = 1.0,
    cbit = cbit, history = scalar_train_history, verb = verb)

@time AlfvenDetectors.fit!(scalar_model, cudata, batchsize, 100;
    L = L, β = 1.0,
    cbit = cbit, history = scalar_train_history, verb = verb)

In [None]:
figure()
plot(get(scalar_train_history, :loss)..., label="loss")
plot(get(scalar_train_history, :loglikelihood)..., label="-loglikelihood")
plot(get(scalar_train_history, :KL)..., label="KL")
title("Training loss - large diagonal model")
xlabel("iteration")
ylabel("loss")
legend()

In [None]:
figure()
dlX = scalar_model(cudata).data |> cpu
pcolormesh(1:size(dlX,2), collect(1:M), dlX[1:M,:])
title("VAE output - diagonal model (means)")
xlabel("t")
ylabel("f")

# before we ahve only taken the means, now sample from the posterior properly
sdlX = AlfvenDetectors.samplenormal_scalarvar(dlX)
figure()
pcolormesh(1:size(sdlX,2), collect(1:M), sdlX)
title("VAE output - diagonal model - samples of output")
xlabel("t")
ylabel("f")


In [None]:
# save/load a pretrained model
f = "scalar_vae_model.bson"
if !isfile(f)
    m = scalar_model |> cpu
    @save f m
else
    @load f m
    scalar_model = m |> gpu
end

In [None]:
X1, t1, f1 = get_msc_array(datapath, 11096, 11)
pcolormesh(t1, f1, X1)

In [None]:
X0, t0, f0 = get_msc_array(datapath, 11096, 20)
pcolormesh(t0, f0, X0)

In [None]:
X1 = X1 |> gpu
X0 = X0 |> gpu
Xα = X1[:,40:60]
zα = scalar_model.encoder(Xα).data |> cpu
z1 = scalar_model.encoder(X1).data |> cpu
z0 = scalar_model.encoder(X0).data |> cpu

The code does not seem to be very N(0,1).

In [None]:
figure()
scatter(z1[1,:], z1[2,:], label = "positive")
scatter(z0[1,:], z0[2,:], label = "negative")
scatter(zα[1,:], zα[2,:], label = "alfven mode")
legend()

### Takeaways

Obviously the sampling introduces noise in the places where we should see clear zeros but the structure is there. Also, artifacts are introduced in the output of the model with more training - overfitting? Also, tuning $\beta$ does not play a role now as the output variance is estimated afterwards. Training is super slow. 

Tuning or not tuning of $\beta$ does not seem to have an effect on the cluster in latent space.