# Automatize Sample Code

Sample Code in python to use Automatize as a python library.

[download button]

**Observations:**
- Not yet compatible with windows, only linux shell

---

### 1. Paths Configurations
You can use configured paths if you want to move directories

In [3]:
import os

root     = './'
# We consider this folder organization to the experimental enviromnent:
prg_path = os.path.join(root, 'programs')
data     = os.path.join(root, 'data')
res_path = os.path.join(root, 'results')

### 2. Scripting
To run feature extraction methods, import from package `automatize` the script `run.py` or `script.py`:

In [9]:
import sys, os
from automatize.script import *

ImportError: attempted relative import with no known parent package

The `gensh` function is the statring point to generate scripts for the available methods:

- `method`: method name to generate the scripts;
- `datasets`: dictionary for datasets config, with
    - key: Dataset category folder + . + DAtaset Name (same as Descriptor JSON file prefix)
    - value: list of subsets (the second part of JSON descriptor name)
- `params`: dictionary of configuration parameters for scripting (described later)


In [2]:
method = 'hiper'
datasets = {'multiple_trajectories.FoursquareNYC': ['specific']}

params = {
    'sh_folder': 'scripts',      # where to generate script files
    'folder':    'EXP2022',      # folder prefix for result files
    'k':         5,              # number of folds - optional
    'root':      root,           # root folder of the experimental environment
    'threads':   10,             # number of threads allowed (for movelets methods) - optional
    'gig':       100,            # GB of RAM memory limit allowed (for movelets methods) - optional
    'pyname': 'python3',         # Python command - optional
    
    'runopts': '-TR 0.5',        # other arguments to pass to the method line (-TR is the τ for HiPerMovelets) - optional
    'timeout': '7d',             # set a timeout to methods runtime (7d limits to 7 days)
}

gensh(method, datasets, params)

NameError: name 'root' is not defined

If, for some reason, you want to join again the results of each class to train.csv / test.csv, you can run the subroutine:

In [None]:
mergeAndMove(os.path.join(res_path, prefix, method_name), 'MASTERMovelets', prg_path)

To run k-fold experiments:

In [None]:
# My Configurations:
data_folder = os.path.join(data, 'FoursquareNY')
res_path    = os.path.join(root, 'results')
prefix      = 'FoursquareNY'
method_name = 'Hiper-Log'
descriptor  = 'FoursquareNY_specific'
k = 5

# Print run script:
k_run(k, data_folder, res_path, prefix, method_name, descriptor, 'hiper', ms=-1, Ms=-3, extra='-T 0.5', 
    prg_path=prg_path, print_only=False, keep_folder=True)

To print run scripts:

In [None]:
import os,sys

# My Configurations:
data_folder = os.path.join(data, 'scalability')
res_path    = os.path.join(root, 'results')

prefixes    = ['100_trajectories_50_points', '500_trajectories_50_points', 
              '1000_trajectories_50_points', '2000_trajectories_50_points', 
              '4000_trajectories_50_points']

jar = 'HIPERMovelets'

orig_stdout = sys.stdout
f = open('Scalability.sh','w')
sys.stdout = f
print('#!/bin/bash')

for j in range(len(prefixes)):
    
    variation   = 'Vary_Number_Of_Trajectories'
    prefix      = prefixes[j]
    descriptor  = os.path.join(data_folder, 'descriptors', 'Scalability_1_Dimension')
    results_dir = os.path.join(res_path, 'Scalablity', variation)
    data_dir    = os.path.join(data_folder, variation)
    
    run(data_dir, results_dir, prefix, 'Hiper-Log', descriptor, 'hiper', Ms=-3, \
        prg_path=prg_path, print_only=True, java_opts='-Xmx60G', jar_name=jar, n_threads=3)

sys.stdout = orig_stdout
f.close()

In [9]:
import os,sys

# My Configurations:
k = 5
datasets = ['geo_only', 'specific', 'generic', 'poi_only']
prefixes = ['brightkite', 'gowalla', 'foursquare_nyc', 'foursquare_global']
descriptors = ['Brightkite_Gowalla', 'Brightkite_Gowalla', 'FoursquareNYC', 'FoursquareGlobal']

jar = 'SUPERMovelets'
methods = [
    ['logd', 'SMLD', 'super'],
    ['log',  'SML', 'super'],
    ['d',    'SMD', 'super'],
    ['x',    'SM', 'super'],
]
extra  = ['-Al true', False, '-Al true', False]
Ms     = [-3, -3, False, False]


# jar = 'MASTERMovelets'
# methods = [
#     ['log',  'MML', 'master'],
#     ['x',    'MM', 'master'],
# ]
# extra  = [False, False]
# Ms     = [-3, False]

for j in range(0, len(methods)):
    method = methods[j]
    for i in range(0, len(prefixes)):
        prefix = prefixes[i]
        for dataset in datasets:
            if prefix in ['brightkite', 'gowalla'] and dataset is 'generic':
                continue
            
            orig_stdout = sys.stdout
            f = open('./scripts/'+method[2]+'/run5-'+method[2]+'_'+method[0]+'-'+prefix+'-'+dataset+'.sh','w')
            sys.stdout = f
            
            print('#!/bin/bash')

            descriptor  = os.path.join(data, '5fold', 'descriptors', descriptors[i]+'_'+dataset)
            results_dir = os.path.join(res_path, method[2]+'-'+method[0])
            data_dir    = os.path.join(data, '5fold', prefix)

            k_run(k, data_dir, results_dir, prefix, method[1]+'-'+dataset, descriptor, Ms=Ms[j], extra=extra[j], \
                prg_path=prg_path, print_only=True, java_opts='-Xmx60G', jar_name=jar, n_threads=3)

            sys.stdout = orig_stdout
            f.close()

In [3]:
import os,sys

# My Configurations:
k = 5
datasets = ['geo_only', 'specific', 'generic', 'poi_only']
prefixes = ['brightkite', 'gowalla', 'foursquare_nyc', 'foursquare_global']
descriptors = ['Brightkite_Gowalla', 'Brightkite_Gowalla', 'FoursquareNYC', 'FoursquareGlobal']

jar = 'HIPERMovelets'

methods = [
    ['logp', 'HpL', 'hiper-pvt'],
    ['log',  'HL',  'hiper'],
    ['p',    'Hp',  'hiper-pvt'],
    ['x',    'H',   'hiper'],
]
Ms     = [-3, -3, False, False]

for j in range(0, len(methods)):
    method = methods[j]
    for i in range(0, len(prefixes)):
        prefix = prefixes[i]
        for dataset in datasets:
            if prefix in ['brightkite', 'gowalla'] and dataset is 'generic':
                continue
            
            orig_stdout = sys.stdout
            f = open('./scripts/hiper/run5-hiper_'+method[0]+'-'+prefix+'-'+dataset+'.sh','w')
            sys.stdout = f
            
            print('#!/bin/bash')

            descriptor  = os.path.join(data, '5fold', 'descriptors', descriptors[i]+'_'+dataset+'_hp')
            results_dir = os.path.join(res_path, 'hiper-'+method[0])
            data_dir    = os.path.join(data, '5fold', prefix)

            k_run(k, data_dir, results_dir, prefix, method[1]+'-'+dataset, descriptor, method[2], Ms=Ms[j], \
                prg_path=prg_path, print_only=True, java_opts='-Xmx60G', jar_name=jar, n_threads=3)

            sys.stdout = orig_stdout
            f.close()

### 3. Classification
To run classifiers for the HIPERMovelets results, import from package `automatize` the script analysis.py:

In [None]:
from automatize.analysis import def_random_seed, ACC4All, ALL3, MLP, RF, SVM, results2df, printLatex

This defines a random and a seed numbers for classifyers 

In [None]:
save_results = True

def_random_seed(random_num=1, seed_num=1)

--- 
a. To run the classifyers for each folder inside a result path prefix:

In [None]:
ACC4All(res_path, prefix, save_results)

b. To run the classifyers for a especific result forder:

In [None]:
ALL3(res_path, prefix, method_name)

c. To run a specific classifyer:

In [None]:
MLP(res_path, prefix, method_name)

---
To load the results into an dataframe:

In [None]:
df = results2df(res_path, prefix)
df

---
To print the dataframe result in a Latex formatted table:

In [None]:
printLatex(df)

### 3. Pre-processing data
To use helpers for data pre-processing, import from package `automatize` the script preprocessing.py:

In [None]:
from automatize.preprocessing import joinTrainAndTest, kfold_trainAndTestSplit

To join splitted files use:

In [None]:
dir_path = os.path.join(data, 'foursquare_global')
cols = ['tid','label','lat','lon','day','hour','poi','category','price','rating']

df = joinTrainAndTest(dir_path, cols, train_file="specific_train.csv", test_file="specific_test.csv", class_col = 'label')
df

To k-fold split a dataset into train and test:

In [None]:
k = 5

kfold_trainAndTestSplit(dir_path, k, df, random_num=1, class_col='label')

\# By Tarlis Portela (2020)