## Description of Notebook

The notebook is used to look closer into VDFs and the fields files in order to identify the stable and unstable periods. This notebook is also used to prepare the data for the machine learning (yet without the partitioning of the data into the train and test data sets which will be done separately)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import kineticsim_reader as kr
import pickle
import os
import random
from scipy.signal import savgol_filter
from tqdm import tqdm
from matplotlib.animation import FuncAnimation

In [2]:
simfiles = ['particles.d11_A0.5Hepp_beta0.5eps1e-4_256',\
    'particles.d11_A0.75Hepp_beta1_256',\
    'particles.d11_E11Ap3.3Aa2.0Vd0.42',\
    'particles.d11_E11Ap4.3Aa1.6',\
    'particles.d11_E11Ap4.3Aa1.6Vd0.32',\
    'particles.d11_E12Ap1.86Aa1.0Vd0.32_256_256x256',\
    'particles.d11_E12Ap1.86Aa1.0Vd0.32_512_256x256',\
    'particles.d11_He++A10_256_iden0eps0',\
    'particles.d11_He++v2_256_iden0eps1e-4t600',\
    'particles.d11_He++vd1.5_256_iden0eps1e-4',\
    'particles.d11_pv1.5_128_64_iden0eps1e-4_dx0.75_long',\
    'particles.d11_pv1Ap2Apb2betac0.214betab0.858_128_128x2_dx0.75_t3000',\
    'particles.d11_pv2a_128x3_iden0eps1e-4_dx0.75',\
    'particles.d11_pv2Ap1Ab1betac0.429betab0.858_128_128x2_dx0.75_t3000',\
    'particles.d11_pv2Ap1Ab2betac0.429betab0.858_128_128x2_dx0.75_t3000',\
    'particles.d11_pv2Ap2Apb2betac0.214betab0.858_128_128x2_dx0.75_t3000',\
    'particles.d11_pv2av2.3_128x3_iden0eps1e-4_dx0.75',\
    'particles.d11_pv2av2Ap1Aa1beta0.429_128_128x2_dx0.75_t3000',\
    'particles.d11_pv2av2_rdna0.03375_128x3_iden0eps1e-4_dx0.75_t6000',\
    'particles.d11_vap1.2Ap1Aa0.75_rdna_0.05',\
    'particles.d11_vap1.2Ap3.35Aa2.05rdna_0.007',\
    'particles.d11_vap1.5Ap1.5Aa1rdna_0.007']

fldfiles = ['fields.d10_A0.5Hepp_beta0.5eps1e-4_256',\
    'fields.d10_A0.75Hepp_beta1_256',\
    'fields.d10_E11Ap3.3Aa2.0Vd0.42',\
    'fields.d10_E11Ap4.3Aa1.6',\
    'fields.d10_E11Ap4.3Aa1.6Vd0.32',\
    'fields.d10_E12Ap1.86Aa1.0Vd0.32_256_256x256',\
    'fields.d10_E12Ap1.86Aa1.0Vd0.32_512_256x256',\
    'fields.d10_He++A10_256_iden0eps0',\
    'fields.d10_He++v2_256_iden0eps1e-4t600',\
    'fields.d10_He++vd1.5_256_iden0eps1e-4',\
    'fields.d10_pv1.5_128_64_iden0eps1e-4_dx0.75_long',\
    'fields.d10_pv1Ap2Apb2betac0.214betab0.858_128_128x2_dx0.75_t3000',\
    'fields.d10_pv2a_128x3_iden0eps1e-4_dx0.75',\
    'fields.d10_pv2Ap1Ab1betac0.429betab0.858_128_128x2_dx0.75_t3000',\
    'fields.d10_pv2Ap1Ab2betac0.429betab0.858_128_128x2_dx0.75_t3000',\
    'fields.d10_pv2Ap2Apb2betac0.214betab0.858_128_128x2_dx0.75_t3000',\
    'fields.d10_pv2av2.3_128x3_iden0eps1e-4_dx0.75',\
    'fields.d10_pv2av2Ap1Aa1beta0.429_128_128x2_dx0.75_t3000',\
    'fields.d10_pv2av2_rdna0.03375_128x3_iden0eps1e-4_dx0.75_t6000',\
    'fields.d10_vap1.2Ap1Aa0.75_rdna_0.05',\
    'fields.d10_vap1.2Ap3.35Aa2.05rdna_0.007',\
    'fields.d10_vap1.5Ap1.5Aa1rdna_0.007']

populations = [[0.9,0.05], [0.9,0.05], [0.986,0.007], [0.986,0.007], [0.986,0.007],\
               [0.986,0.007], [0.986,0.007], [0.9,0.05], [0.9,0.05], [0.9,0.05], [1.00,0.00],\
               [1.00,0.00], [0.91,0.045], [1.00,0.00], [1.00,0.00], [1.00,0.00], [0.91,0.045],\
               [0.91,0.045], [0.91,0.045], [0.9,0.05], [0.986,0.007], [0.986,0.007]]

## Determination of stable and unstable VDFs

This overall remains an open question. Inspections of the VDFs and time parameters revealed the cases when the magnetic energy remains stable, but the anisotropies and temperatures change/exchange.

Given the definition uncertainties above, we will implement several labeling approaches. First, the separate labelings will be developed for the magnetic energy and anisotropy following these thresholds:
1. The simulation runs are classified: if the change of anisotropies or perpendicular magnetic energy is more than 0.1% per one gyroperiod, the VDFs are called unstable.
2. The simulation runs are classified: if the change of anisotropies or perpendicular magnetic energy is more than 0.5% per one gyroperiod, the VDFs are called unstable.
3. The simulation runs are classified: if the change of anisotropies or perpendicular magnetic energy is more than 1.0% per one gyroperiod, the VDFs are called unstable.
4. The simulation runs are NOT classified. Instead, the regression problem will be solved for both the anisotropies and magnetic energies.

The figures seem to be reasonable, except for the following cases where the labeling has to be adjusted:
- Simulation run 'particles.d11_A0.5Hepp_beta0.5eps1e-4_256': overall, the entire run looks stable and should be counted as stable when solving for classification with the smallest threshold.
- Simulation run 'particles.d11_A0.75Hepp_beta1_256': overall, the entire run looks stable and should be counted as stable when solving for classification with the smallest threshold.
- All simulation runs: the very first and the very last points will be excluded to avoid the boundary effects.

In [3]:
def sort_labels_classification(rate_of_change):
    label_001 = 0
    label_005 = 0
    label_01 = 0
    label_05 = 0
    label_10 = 0
    if (rate_of_change > 0.0001): label_001 = 1
    if (rate_of_change > 0.0005): label_005 = 1
    if (rate_of_change > 0.001): label_01 = 1
    if (rate_of_change > 0.005): label_05 = 1
    if (rate_of_change > 0.010): label_10 = 1
    return [label_001, label_005, label_01, label_05, label_10]

def prepare_mldata_vdfmoments(simfile, fieldsfile, population):
    
    # loading arrays
    timep_array = np.load('./processing_results/' + simfile + '.timep_array.npy')
    anisotropies_p = np.load('./processing_results/' + simfile + '.anisotropies_p.npy')
    moments_p = np.load('./processing_results/' + simfile + '.moments_p.npy')
    anisotropies_he = np.load('./processing_results/' + simfile + '.anisotropies_he.npy')
    moments_he = np.load('./processing_results/' + simfile + '.moments_he.npy')
    
    # loading VDFs and accounting for the case with large file VDF
    if (simfile == 'particles.d11_pv1.5_128_64_iden0eps1e-4_dx0.75_long'):
        vdfp_array_p1 = np.load('./processing_results/' + simfile + '_p1.vdfp_array.npy')
        vdfhe_array_p1 = np.load('./processing_results/' + simfile + '_p1.vdfhe_array.npy')
        vdfp_array_p2 = np.load('./processing_results/' + simfile + '_p2.vdfp_array.npy')
        vdfhe_array_p2 = np.load('./processing_results/' + simfile + '_p2.vdfhe_array.npy')
        vdfp_array = np.concatenate((vdfp_array_p1, vdfp_array_p2))
        vdfhe_array = np.concatenate((vdfhe_array_p1, vdfhe_array_p2))
    else:
        vdfp_array = np.load('./processing_results/' + simfile + '.vdfp_array.npy')
        vdfhe_array = np.load('./processing_results/' + simfile + '.vdfhe_array.npy')
    
    timep_array_fields = np.load('./processing_results/' + fieldsfile + '.timing.npy')[5:,1]
    me_perp = np.load('./processing_results/' + fieldsfile + '.me_perp.npy')[5:]
    me_tot = np.load('./processing_results/' + fieldsfile + '.me_tot.npy')[5:]
    # applying smoothing to remove a periodic signal
    dtime = timep_array_fields[1] - timep_array_fields[0]
    npoints = int(len(me_tot)/10)
    if (npoints % 2 == 0): npoints = npoints + 1
    me_tot = savgol_filter(me_tot - me_tot[0], npoints, 3)
    me_perp = savgol_filter(me_perp, npoints, 3)
    
    # resampling to the timing of the VDFs
    me_tot = np.interp(timep_array,timep_array_fields,me_tot)
    me_perp = np.interp(timep_array,timep_array_fields,me_perp)
    
    # time derivatives (relative)
    dt_anisotropies_p = (anisotropies_p[1:]-anisotropies_p[:-1])/(timep_array[1:]-timep_array[:-1])
    dt_anisotropies_p = 2*(dt_anisotropies_p)/(anisotropies_p[1:]+anisotropies_p[:-1])
    if (anisotropies_he[0] == 0.0):
        dt_anisotropies_he = anisotropies_he*0.0
    else:
        dt_anisotropies_he = (anisotropies_he[1:]-anisotropies_he[:-1])/(timep_array[1:]-timep_array[:-1])
        dt_anisotropies_he = 2*(dt_anisotropies_he)/(anisotropies_he[1:]+anisotropies_he[:-1])
    dt_me_perp = (me_perp[1:]-me_perp[:-1])/(timep_array[1:]-timep_array[:-1])
    dt_me_perp = 2*(dt_me_perp)/(me_perp[1:]+me_perp[:-1])
    dt_me_tot = (me_tot[1:]-me_tot[:-1])/(timep_array[1:]-timep_array[:-1])
    dt_me_tot = 2*(dt_me_tot)/(me_tot[1:]+me_tot[:-1])
    
    # declaring feature vectors and moments
    simnames = []
    featurevector_allmoments = []
    labels_allmoments_me_001 = []
    labels_allmoments_me_005 = []
    labels_allmoments_me_01 = []
    labels_allmoments_me_05 = []
    labels_allmoments_me_10 = []
    labels_allmoments_an_001 = []
    labels_allmoments_an_005 = []
    labels_allmoments_an_01 = []
    labels_allmoments_an_05 = []
    labels_allmoments_an_10 = []
    labels_allmoments_me_re = []
    labels_allmoments_an_re = []
    timep_array_out = []
    # constructing the feature vector. The feature vector includes:
    # - moments 0-3 along and across the field (taking care of b-parallel)
    # - anisotropies in addition
    # - particle relative populations in addition
    # REMINDER: the very first and the very last time moments omitted
    for i in range (1, len(timep_array)-2, 1):
        subvector = []
        subvector.append(moments_p[i,0,0])
        subvector.append(moments_p[i,0,1])
        subvector.append(moments_p[i,1,0])
        subvector.append(moments_p[i,1,1])
        subvector.append(moments_p[i,2,0])
        subvector.append(moments_p[i,2,1])
        subvector.append(moments_p[i,3,0])
        subvector.append(moments_p[i,3,1])
        subvector.append(moments_he[i,0,0])
        subvector.append(moments_he[i,0,1])
        subvector.append(moments_he[i,1,0])
        subvector.append(moments_he[i,1,1])
        subvector.append(moments_he[i,2,0])
        subvector.append(moments_he[i,2,1])
        subvector.append(moments_he[i,3,0])
        subvector.append(moments_he[i,3,1])
        subvector.append(anisotropies_p[i])
        subvector.append(anisotropies_he[i])
        subvector.append(population[0])
        subvector.append(population[1])
        # writing the common properties (rates and names) into the file
        featurevector_allmoments.append(subvector)
        simnames.append(simfile)
        labels_allmoments_me_re.append(dt_me_perp[i])
        labels_allmoments_an_re.append([dt_anisotropies_p[i], dt_anisotropies_he[i]])
        timep_array_out.append(timep_array[i])
        # sorting the labels depending on the strength of the change
        labels_an_p = sort_labels_classification(np.abs(dt_anisotropies_p[i]))
        labels_an_he = sort_labels_classification(np.abs(dt_anisotropies_he[i]))
        labels_me = sort_labels_classification(np.abs(dt_me_perp[i]))
        # fixing the cases for two specific simulations
        if ((simfile == 'particles.d11_A0.5Hepp_beta0.5eps1e-4_256') or (simfile == 'particles.d11_A0.75Hepp_beta1_256')):
            labels_me = [0,0,0,0,0]
        # writing the labels into the arrays
        labels_allmoments_me_001.append(labels_me[0])
        labels_allmoments_me_005.append(labels_me[1])
        labels_allmoments_me_01.append(labels_me[2])
        labels_allmoments_me_05.append(labels_me[3])
        labels_allmoments_me_10.append(labels_me[4])
        # for anisotropies, one positive label is sufficient
        labels_allmoments_an_001.append(np.amax([labels_an_p[0],labels_an_he[0]]))
        labels_allmoments_an_005.append(np.amax([labels_an_p[1],labels_an_he[1]]))
        labels_allmoments_an_01.append(np.amax([labels_an_p[2],labels_an_he[2]]))
        labels_allmoments_an_05.append(np.amax([labels_an_p[3],labels_an_he[3]]))
        labels_allmoments_an_10.append(np.amax([labels_an_p[4],labels_an_he[4]]))
    
    # converting to numpy arrays
    simnames = np.array(simnames)
    featurevector_allmoments = np.array(featurevector_allmoments, dtype=float)
    labels_allmoments_me_001 = np.array(labels_allmoments_me_001, dtype=int)
    labels_allmoments_me_005 = np.array(labels_allmoments_me_005, dtype=int)
    labels_allmoments_me_01 = np.array(labels_allmoments_me_01, dtype=int)
    labels_allmoments_me_05 = np.array(labels_allmoments_me_05, dtype=int)
    labels_allmoments_me_10 = np.array(labels_allmoments_me_10, dtype=int)
    labels_allmoments_an_001 = np.array(labels_allmoments_an_001, dtype=int)
    labels_allmoments_an_005 = np.array(labels_allmoments_an_005, dtype=int)
    labels_allmoments_an_01 = np.array(labels_allmoments_an_01, dtype=int)
    labels_allmoments_an_05 = np.array(labels_allmoments_an_05, dtype=int)
    labels_allmoments_an_10 = np.array(labels_allmoments_an_10, dtype=int)
    labels_allmoments_me_re = np.array(labels_allmoments_me_re, dtype=float)
    labels_allmoments_an_re = np.array(labels_allmoments_an_re, dtype=float)
    timep_array_out = np.array(timep_array_out, dtype=float)
    
    # returning all arrays
    return simnames, featurevector_allmoments, labels_allmoments_me_001, labels_allmoments_me_005, \
           labels_allmoments_me_01, labels_allmoments_me_05, \
           labels_allmoments_me_10, labels_allmoments_an_001, labels_allmoments_an_005, \
           labels_allmoments_an_01, labels_allmoments_an_05, labels_allmoments_an_10, \
           labels_allmoments_me_re, labels_allmoments_an_re, timep_array_out

simnames, featurevector_allmoments, labels_allmoments_me_001, labels_allmoments_me_005, \
           labels_allmoments_me_01, labels_allmoments_me_05, \
           labels_allmoments_me_10, labels_allmoments_an_001, labels_allmoments_an_005, \
           labels_allmoments_an_01, labels_allmoments_an_05, labels_allmoments_an_10, \
           labels_allmoments_me_re, labels_allmoments_an_re, timep_array = \
    prepare_mldata_vdfmoments(simfiles[0], fldfiles[0], populations[0])
print("ML data for the simulation " + simfiles[0] + " generated")
print("Number of data points: " + str(len(labels_allmoments_me_01)))
print("Positive samples with 0.1% (magnetic energy): " + str(np.sum(labels_allmoments_me_01)))
simnames_all = np.copy(simnames)
featurevector_allmoments_all = np.copy(featurevector_allmoments)
labels_allmoments_me_001_all = np.copy(labels_allmoments_me_001)
labels_allmoments_me_005_all = np.copy(labels_allmoments_me_005)
labels_allmoments_me_01_all = np.copy(labels_allmoments_me_01)
labels_allmoments_me_05_all = np.copy(labels_allmoments_me_05)
labels_allmoments_me_10_all = np.copy(labels_allmoments_me_10)
labels_allmoments_an_001_all = np.copy(labels_allmoments_an_001)
labels_allmoments_an_005_all = np.copy(labels_allmoments_an_005)
labels_allmoments_an_01_all = np.copy(labels_allmoments_an_01)
labels_allmoments_an_05_all = np.copy(labels_allmoments_an_05)
labels_allmoments_an_10_all = np.copy(labels_allmoments_an_10)
labels_allmoments_me_re_all = np.copy(labels_allmoments_me_re)
labels_allmoments_an_re_all = np.copy(labels_allmoments_an_re)
timep_array_all = np.copy(timep_array)

for i in range (1, len(simfiles), 1):
    simnames, featurevector_allmoments, labels_allmoments_me_001, labels_allmoments_me_005, \
           labels_allmoments_me_01, labels_allmoments_me_05, \
           labels_allmoments_me_10, labels_allmoments_an_001, labels_allmoments_an_005, \
           labels_allmoments_an_01, labels_allmoments_an_05, labels_allmoments_an_10, \
           labels_allmoments_me_re, labels_allmoments_an_re, timep_array = \
        prepare_mldata_vdfmoments(simfiles[i], fldfiles[i], populations[i])
    print("ML data for the simulation " + simfiles[i] + " generated")
    print("Number of data points: " + str(len(labels_allmoments_me_01)))
    print("Positive samples with 0.1% (magnetic energy): " + str(np.sum(labels_allmoments_me_01)))
    
    simnames_all = np.concatenate((simnames_all, simnames))
    featurevector_allmoments_all = np.concatenate((featurevector_allmoments_all, featurevector_allmoments))
    labels_allmoments_me_001_all = np.concatenate((labels_allmoments_me_001_all, labels_allmoments_me_001))
    labels_allmoments_me_005_all = np.concatenate((labels_allmoments_me_005_all, labels_allmoments_me_005))
    labels_allmoments_me_01_all = np.concatenate((labels_allmoments_me_01_all, labels_allmoments_me_01))
    labels_allmoments_me_05_all = np.concatenate((labels_allmoments_me_05_all, labels_allmoments_me_05))
    labels_allmoments_me_10_all = np.concatenate((labels_allmoments_me_10_all, labels_allmoments_me_10))
    labels_allmoments_an_001_all = np.concatenate((labels_allmoments_an_001_all, labels_allmoments_an_001))
    labels_allmoments_an_005_all = np.concatenate((labels_allmoments_an_005_all, labels_allmoments_an_005))
    labels_allmoments_an_01_all = np.concatenate((labels_allmoments_an_01_all, labels_allmoments_an_01))
    labels_allmoments_an_05_all = np.concatenate((labels_allmoments_an_05_all, labels_allmoments_an_05))
    labels_allmoments_an_10_all = np.concatenate((labels_allmoments_an_10_all, labels_allmoments_an_10))
    labels_allmoments_me_re_all = np.concatenate((labels_allmoments_me_re_all, labels_allmoments_me_re))
    labels_allmoments_an_re_all = np.concatenate((labels_allmoments_an_re_all, labels_allmoments_an_re))
    timep_array_all = np.concatenate((timep_array_all, timep_array))
    
np.save('./mldata_vdfmoments/allsimulations.simnames_all.npy', simnames_all)
np.save('./mldata_vdfmoments/allsimulations.featurevector_allmoments_all.npy', featurevector_allmoments_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_me_001_all.npy', labels_allmoments_me_001_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_me_005_all.npy', labels_allmoments_me_005_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_me_01_all.npy', labels_allmoments_me_01_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_me_05_all.npy', labels_allmoments_me_05_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_me_10_all.npy', labels_allmoments_me_10_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_an_001_all.npy', labels_allmoments_an_001_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_an_005_all.npy', labels_allmoments_an_005_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_an_01_all.npy', labels_allmoments_an_01_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_an_05_all.npy', labels_allmoments_an_05_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_an_10_all.npy', labels_allmoments_an_10_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_me_re_all.npy', labels_allmoments_me_re_all)
np.save('./mldata_vdfmoments/allsimulations.labels_allmoments_an_re_all.npy', labels_allmoments_an_re_all)
np.save('./mldata_vdfmoments/allsimulations.timep_array_all.npy', timep_array_all)

ML data for the simulation particles.d11_A0.5Hepp_beta0.5eps1e-4_256 generated
Number of data points: 78
Positive samples with 0.1% (magnetic energy): 0
ML data for the simulation particles.d11_A0.75Hepp_beta1_256 generated
Number of data points: 46
Positive samples with 0.1% (magnetic energy): 0
ML data for the simulation particles.d11_E11Ap3.3Aa2.0Vd0.42 generated
Number of data points: 46
Positive samples with 0.1% (magnetic energy): 16
ML data for the simulation particles.d11_E11Ap4.3Aa1.6 generated
Number of data points: 46
Positive samples with 0.1% (magnetic energy): 39
ML data for the simulation particles.d11_E11Ap4.3Aa1.6Vd0.32 generated
Number of data points: 46
Positive samples with 0.1% (magnetic energy): 40
ML data for the simulation particles.d11_E12Ap1.86Aa1.0Vd0.32_256_256x256 generated
Number of data points: 48
Positive samples with 0.1% (magnetic energy): 0
ML data for the simulation particles.d11_E12Ap1.86Aa1.0Vd0.32_512_256x256 generated
Number of data points: 48
Po

## Understanding Statistics of Runs

In [4]:
print("Threshold 0.0001:")
print("Positive (ME):", np.sum(labels_allmoments_me_001_all))
print("Negative (ME):", len(labels_allmoments_me_001_all) - np.sum(labels_allmoments_me_001_all))
print("Positive (AN):", np.sum(labels_allmoments_an_001_all))
print("Negative (AN):", len(labels_allmoments_an_001_all) - np.sum(labels_allmoments_an_001_all))
print("Positive (Intersection):", np.sum(labels_allmoments_an_001_all*labels_allmoments_me_001_all))
print("Positive (Union):", np.sum(labels_allmoments_me_001_all) + np.sum(labels_allmoments_an_001_all) - \
                           np.sum(labels_allmoments_an_001_all*labels_allmoments_me_001_all))
print("--------------------------------")
print("Threshold 0.0005:")
print("Positive (ME):", np.sum(labels_allmoments_me_005_all))
print("Negative (ME):", len(labels_allmoments_me_005_all) - np.sum(labels_allmoments_me_005_all))
print("Positive (AN):", np.sum(labels_allmoments_an_005_all))
print("Negative (AN):", len(labels_allmoments_an_005_all) - np.sum(labels_allmoments_an_005_all))
print("Positive (Intersection):", np.sum(labels_allmoments_an_005_all*labels_allmoments_me_005_all))
print("Positive (Union):", np.sum(labels_allmoments_me_005_all) + np.sum(labels_allmoments_an_005_all) - \
                           np.sum(labels_allmoments_an_005_all*labels_allmoments_me_005_all))
print("--------------------------------")
print("Threshold 0.001:")
print("Positive (ME):", np.sum(labels_allmoments_me_01_all))
print("Negative (ME):", len(labels_allmoments_me_01_all) - np.sum(labels_allmoments_me_01_all))
print("Positive (AN):", np.sum(labels_allmoments_an_01_all))
print("Negative (AN):", len(labels_allmoments_an_01_all) - np.sum(labels_allmoments_an_01_all))
print("Positive (Intersection):", np.sum(labels_allmoments_an_01_all*labels_allmoments_me_01_all))
print("Positive (Union):", np.sum(labels_allmoments_me_01_all) + np.sum(labels_allmoments_an_01_all) - \
                           np.sum(labels_allmoments_an_01_all*labels_allmoments_me_01_all))
print("--------------------------------")
print("Threshold 0.005:")
print("Positive (ME):", np.sum(labels_allmoments_me_05_all))
print("Negative (ME):", len(labels_allmoments_me_05_all) - np.sum(labels_allmoments_me_05_all))
print("Positive (AN):", np.sum(labels_allmoments_an_05_all))
print("Negative (AN):", len(labels_allmoments_an_05_all) - np.sum(labels_allmoments_an_05_all))
print("Positive (Intersection):", np.sum(labels_allmoments_an_05_all*labels_allmoments_me_05_all))
print("Positive (Union):", np.sum(labels_allmoments_me_05_all) + np.sum(labels_allmoments_an_05_all) - \
                           np.sum(labels_allmoments_an_05_all*labels_allmoments_me_05_all))
print("--------------------------------")
print("Threshold 0.01:")
print("Positive (ME):", np.sum(labels_allmoments_me_10_all))
print("Negative (ME):", len(labels_allmoments_me_10_all) - np.sum(labels_allmoments_me_10_all))
print("Positive (AN):", np.sum(labels_allmoments_an_10_all))
print("Negative (AN):", len(labels_allmoments_an_10_all) - np.sum(labels_allmoments_an_10_all))
print("Positive (Intersection):", np.sum(labels_allmoments_an_10_all*labels_allmoments_me_10_all))
print("Positive (Union):", np.sum(labels_allmoments_me_10_all) + np.sum(labels_allmoments_an_10_all) - \
                           np.sum(labels_allmoments_an_10_all*labels_allmoments_me_10_all))

Threshold 0.0001:
Positive (ME): 768
Negative (ME): 485
Positive (AN): 949
Negative (AN): 304
Positive (Intersection): 634
Positive (Union): 1083
--------------------------------
Threshold 0.0005:
Positive (ME): 486
Negative (ME): 767
Positive (AN): 439
Negative (AN): 814
Positive (Intersection): 331
Positive (Union): 594
--------------------------------
Threshold 0.001:
Positive (ME): 363
Negative (ME): 890
Positive (AN): 240
Negative (AN): 1013
Positive (Intersection): 205
Positive (Union): 398
--------------------------------
Threshold 0.005:
Positive (ME): 93
Negative (ME): 1160
Positive (AN): 51
Negative (AN): 1202
Positive (Intersection): 36
Positive (Union): 108
--------------------------------
Threshold 0.01:
Positive (ME): 53
Negative (ME): 1200
Positive (AN): 28
Negative (AN): 1225
Positive (Intersection): 20
Positive (Union): 61


It seems that, overall for the classification problem, it is best to use the 0.001 threshold (which will make the data set more balanced in terms of positive/negative sampling). Higher thresholds are highly-imbalanced. In addition, it is possibly best to consider the situation when either anisotropy or magnetic energy changes as imbalanced, which will bring the total number of positive cases to 398 out of 1253 VDFs.

However, it also might be that the threshold of 0.0005 or even 0.0001 is good enough. At least, the intersections/unions for these thresholds look balanced.