In [None]:
# Name : Joseph M. O'Connor
# Date : April 2023
# Project : Creole Forth for Python in a Jupyter notebook demo.
#           Has the following sections:
#           1. Initial setup with simple examples.
#           2. Machine learning example that builds and validatesa binary classification system to mark sites as phishing or
#              non-phishing. 
#           3. To-do list/daily log app using the Dropbox API. 
# References: 
# https://www.statology.org/plot-roc-curve-python/
# https://python.plainenglish.io/using-k-fold-cross-validation-to-evaluate-the-performance-of-logistic-regression-4439215f24c4
# Notebook and associated code is at http://github.com/tiluser/cfpy_jn

Overview of demo execution
-----------------------------------------

1. Import the Creole Forth scripting language and set up some preliminary defs
2. Do some demo executions using the Python functions with embedded Forth commands
3. It's inconvenient to embed Forth in Python every time you want to do domething - what's the alternative?
4. Creating you own kernel - a nontrivial undertaking, even with Xeus to make it simple.
5. Path of least resistance - use magic commands. 
6. Demo executions with magic commands - line and cell. %cfpy and %compdef for line execution and compilation, 
   %%cfpy and %%compdef for cell execution and compilation. 
7. Example(s) from my machine learning class.
8. Demo app - todo list/daily log. Front end is in Lazarus. It builds and saves a todo list, then transfers it 
   to Dropbox. It also builds and saves a daily log entry.  


Why do Forth in a Jupyter notebook?
-----------------------------------------------------

- Jupyter notebook is very commonly used in machine learning/data science.
- Very interactive and easy to use, like a one-dimensional spreadsheet. 
- Supported in many different programming languages. Commonly Python, R, and Julia are used.
- Has lots of built-in tools.

-

Is Forth supported?
----------------------------

- Not directly, but Python is.
- That means a Python written in Forth can work.
- Fortunately, I’ve written a version for Python.
- We’ll be taking a look at how it can be used with Jupyter notebook today. 


Initial setup
-----------------

- Import Creole Forth for Python along with two helper definitions to execute and compile the Forth Code.
- Then show some simple examples

In [None]:
# Simple wrapper definitions to execute and compile Forth code
from CreoleForth import *

def execCF(oneLine):
    gsp.InputArea = oneLine
    cfb1.Modules.Interpreter.doParseInput(gsp)
    cfb1.Modules.Interpreter.doOuter(gsp)
    return None

def buildColon(oneLine):
    gsp.InputArea = oneLine
    cfb1.buildHighLevel(gsp,oneLine,"")
    return None

In [None]:
execCF('HELLO')

In [None]:
execCF('TEST')

In [None]:
execCF('3 4 + .')

In [None]:
execCF('VLIST')

Limitations to this approach
----------------------------------------

- Wrapping Forth code inside a Python function is cumbersome.
- It would be nice to do it more conveniently.

One alternative
----------------------

- Create your own Jupyter notebook kernel.
- There are tools available such as Xeus which are designed for this.
- It's still a fair amount of work

A simpler solution
--------------------------

- Stick with the Python kernel.
- Python has a facility called magic commands, which allow the user to wrap a line or a cell inside a function and then call the   function.
- It only requires writing a few lines of code.

The magic commands
-------------------------------

- %cfpy – executes Forth commands on a single line.
- %%cfpy – executes Forth commands in a cell.
- %compdef – compiles Forth on a single line.
- %%compdef – compiles Forth in a cell. 


In [None]:
# set up magic commands
from IPython.core.magic import (register_line_magic, register_cell_magic, register_line_cell_magic)

@register_line_magic
def cfpy(line):
    "my line magic"
    return execCF(line)

@register_line_magic
def compdef(line):
    return buildColon(line)

@register_cell_magic
def cfpy(line,cell):
    "my cell magic"
    line=""
    return line, execCF(cell)

@register_cell_magic
def compdef(line, cell):
    return line, buildColon(cell)
   
@register_line_cell_magic
def lcmagic(line, cell=None):
    "Magic that works both as %lcmagic and as %%lcmagic"
    if cell is None:
        print("Called as line magic")
        return line
    else:
        print("Called as cell magic")
        return line, cell

# In an interactive session, we need to delete these to avoid
# name conflicts for automagic to work on line magics.
del lcmagic

Use of magic commands - some simple examples
----------------------------------------------------------------------

In [None]:
%cfpy HELLO

In [None]:
%%cfpy

HELLO

In [None]:
%compdef : T2 TEST TEST ;

In [None]:
%cfpy T2

In [None]:
%%compdef
: TESTS 0 DO TEST LOOP ;

In [None]:
%cfpy 3 TESTS

In [None]:
%cfpy 3 4 + .

In [None]:
%%cfpy
// HELLO if 1, TULIP if 0
0 HT

In [None]:
# Examples of building primitives directly within the notebook. The primitives can be based on instance methods,
# static methods, or standalone functions. Primitives should always take the GlobalSimpleProps object gsp as an argument. 
import os

class Stuff:
    def __init__(self):
        self.Title = "Just some test stuff"

    # ( -- ) prints "This is cool"
    def doCool(self, gsp):
        print("This is cool")
        
    @staticmethod
    def doMore(gsp):
        print("This is more")
    
    def doMore2(gsp):
        print("This is more 2")
    doMore2 = staticmethod(doMore2)
               
stuff = Stuff()

def foobar(gsp):
    print("This is a foobar")


In [None]:
# After defining the code for the primitives, add them to the dictionary. Then they're immediately available for execution.
cfb1.buildPrimitive("COOL",stuff.doCool, "stuff.doCool", "FORTH", "COMPINPF","( -- ) Prints this is cool")
cfb1.buildPrimitive("FOOBAR",foobar, "foobar", "APPSPEC", "COMPINPF","( -- ) Foobaring away")
cfb1.buildPrimitive("MORE2",Stuff.doMore2, "Stuff.doMore2", "APPSPEC", "COMPINPF","( -- ) More2")


In [None]:
%cfpy FOOBAR

In [None]:
%cfpy COOL

In [None]:
%cfpy MORE2

Machine learning demo
---------------------------------

- Data analyzed is of thousands of urls which are classified as phishing, suspicious or legitimate.
- Exploratory data analysis, data cleaning, and looking for data correlations was initially done.
- It was followed up with binary classification to mark sites as phishing or non-phishing.
- First logistic regression was done with plots to show the effectiveness of the model. 
- It was then validated with K-fold cross-validation. 
- This is a methodology that resamples data in order to find the efficacy of machine learning models. 
- Data is split into K subsamples. Each subsample is used as a testing set, while the remainder are used as training sets.
- It checks the performance of the model on new data in order to avoid overfitting or underfitting the model.


In [None]:
# Bring in all the libraries needed. 
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import time
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

In [None]:
# Define the primitives needed for the exercise

# ( -- ) Empties the data stack
def doClearDataStack(gsp):
    gsp.DataStack[:] = []
    return 0

cfb1.buildPrimitive("CLSDS", doClearDataStack, "doClearDataStack", "APPSPEC", "COMPINPF","(  -- ) Empties the data stack")

# ( csvfile -- df ) Loads a csv data set
def doLoadCsv(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    csvFile = gsp.Scratch
    df = pd.read_csv(csvFile)
    gsp.Scratch = df
    gsp.push(gsp.DataStack) 
    return 0
     
cfb1.buildPrimitive("LOADCSV", doLoadCsv, "doLoadCsv", "APPSPEC", "COMPINPF","( csvfile -- df ) Loads a csv data set")

# ( df -- ) Outputs the correlation matrix
def doCorrMatrix(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    df = gsp.Scratch
    corrmat = df.corr()
    print(corrmat)
    return 0

cfb1.buildPrimitive("CORRMAT", doCorrMatrix, "doCorrMatrix", "APPSPEC", "COMPINPF","( df -- ) Outputs the correlation matrix")

# Collapse values in dataframe from -1, 0, and 1 to 0 and 1.  -1 and 0 become 0 (suspicious or phishing, 1 stays 1
#    (non-phishing).
def collapse_vals(val):
    lookup = {-1 : 0, 0 : 0, 1 : 1}
    return lookup[val]

# ( df -- dfc ) Collapse all highly correlated fields (r > .5) to 0 and 1                                                                                   
def doCollapseFields(gsp):
    dfc = pd.DataFrame()
    returnVal = gsp.pop(gsp.DataStack)
    df = gsp.Scratch                                                                                  
    dfc['Prefix_Suffix'] = [collapse_vals(val) for val in df['Prefix_Suffix']]
    dfc['having_Sub_Domain'] = [collapse_vals(val) for val in df['having_Sub_Domain']]
    dfc['SSLfinal_State'] = [collapse_vals(val) for val in df['SSLfinal_State']]
    dfc['Domain_registeration_length'] = [collapse_vals(val) for val in df['Domain_registeration_length']]
    dfc['age_of_domain'] = [collapse_vals(val) for val in df['age_of_domain']]
    dfc['web_traffic'] = [collapse_vals(val) for val in df['web_traffic']]
    dfc['Page_Rank'] = [collapse_vals(val) for val in df['Page_Rank']]
    dfc['Google_Index'] = [collapse_vals(val) for val in df['Google_Index']]
    dfc['Result'] = [collapse_vals(val) for val in df['Result']]
    gsp.Scratch = dfc
    gsp.push(gsp.DataStack) 
    return 0
                                                                                     
cfb1.buildPrimitive("COLLAPSE_FIELDS", doCollapseFields, "doCollapseFields", "APPSPEC", "COMPINPF",
    "( df -- dfc ) Collapse all highly correlated fields (r > .5) to 0 and 1")

# ( dfc -- X  y ) Partition data into independent and dependent dataframes
def doPartitionData(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    dfc = gsp.Scratch       
    y = dfc['Result']
    X = dfc.drop('Result', axis='columns')
    gsp.Scratch = X
    gsp.push(gsp.DataStack)
    gsp.Scratch = y
    gsp.push(gsp.DataStack)
    return 0

cfb1.buildPrimitive("PARTITION_DATA", doPartitionData, "doPartitionData", "APPSPEC", "COMPINPF",
    "( dfc -- X  y ) Partition data into independent and dependent dataframes")

# ( X field --  ) Does a pie chart of one of the fields for the X dataframe 
def doPiechartX(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    field = gsp.Scratch
    returnVal = gsp.pop(gsp.DataStack)
    X = gsp.Scratch 
    a_X = X[field].value_counts().plot.pie(autopct='%.2f')
    _ = a_X.set_title(field)
    return 0
    
cfb1.buildPrimitive("PIECHARTX", doPiechartX, "doPieChartX", "APPSPEC", "COMPINPF",
    "( X field --  ) Does a pie chart of one of the fields for the X dataframe")

# ( y --  ) Does a pie chart of one of the fields for the y dataframe 
def doPiechartY(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    y = gsp.Scratch 
    ay = y.value_counts().plot.pie(autopct='%.2f')
    _ = ay.set_title('Result')
    return 0
    
cfb1.buildPrimitive("PIECHARTY", doPiechartY, "doPieChartY", "APPSPEC", "COMPINPF",
    "( y --  ) Does a pie chart for the result of the Y dataframe")

# ( X y -- X_numpy y_numpy ) Convert to numpy matrices 
def doToMatrix(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    y = gsp.Scratch 
    returnVal = gsp.pop(gsp.DataStack)
    X = gsp.Scratch 
    X_numpy = X.to_numpy()
    y_numpy = y.to_numpy()
    gsp.Scratch = X_numpy
    gsp.push(gsp.DataStack)
    gsp.Scratch = y_numpy
    gsp.push(gsp.DataStack)
    return 0

cfb1.buildPrimitive(">MATRIX", doToMatrix, "doToMatrix", "APPSPEC", "COMPINPF",
    "( X y  -- X_numpy Y_numpy  ) Convert to numpy matrixes")

# ( X_numpy y_numpy -- x_train x_test y_train y_test ) Does the train/test split
def doTrainTestSplit(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    y_numpy = gsp.Scratch 
    returnVal = gsp.pop(gsp.DataStack)
    X_numpy = gsp.Scratch 
    x_train, x_test, y_train, y_test = train_test_split(X_numpy, y_numpy, test_size=0.25,shuffle=True)
    gsp.Scratch = x_train
    gsp.push(gsp.DataStack)
    gsp.Scratch = x_test
    gsp.push(gsp.DataStack)
    gsp.Scratch = y_train
    gsp.push(gsp.DataStack)
    gsp.Scratch = y_test
    gsp.push(gsp.DataStack)
    return 0

cfb1.buildPrimitive("TRAIN_TEST_SPLIT", doTrainTestSplit, "doTrainTestSplit", "APPSPEC", "COMPINPF",
    "( X_numpy y_numpy -- x_train x_test y_train y_test ) Does the train/test split")

# ( X_numpy -- model ) Compile, fit, and summarize logistic regression model
def doLogitCompile(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    X_numpy = gsp.Scratch     
    model = Sequential()
    model.add(Dense(1, input_dim = len(X_numpy[0,:]), activation='sigmoid'))
    model.summary()
    model.compile(loss = 'binary_crossentropy', optimizer='rmsprop', metrics = ['accuracy'])
    gsp.Scratch = model
    gsp.push(gsp.DataStack)
    return 0

cfb1.buildPrimitive("LOGITCOMP", doLogitCompile, "doLogitCompile", "APPSPEC", "COMPINPF",
    "( X_numpy -- model ) Compile, fit, and summarize logistic regression model")

# ( model x_train x_test y_train y_test epochs -- x_train x_test y_train y_test train ) trains the model
def doTrainModel(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    epochsNum = int(gsp.Scratch)
    returnVal = gsp.pop(gsp.DataStack)
    y_test = gsp.Scratch 
    returnVal = gsp.pop(gsp.DataStack)
    y_train = gsp.Scratch 
    returnVal = gsp.pop(gsp.DataStack)
    x_test = gsp.Scratch 
    returnVal = gsp.pop(gsp.DataStack)
    x_train = gsp.Scratch  
    returnVal = gsp.pop(gsp.DataStack)
    model = gsp.Scratch 
    train = model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=epochsNum)
    gsp.Scratch = x_train
    gsp.push(gsp.DataStack)
    gsp.Scratch = x_test
    gsp.push(gsp.DataStack)
    gsp.Scratch = y_train
    gsp.push(gsp.DataStack)
    gsp.Scratch = y_test
    gsp.push(gsp.DataStack)
    gsp.Scratch = train
    gsp.push(gsp.DataStack)
    return 0
    
cfb1.buildPrimitive("TRAIN_MODEL", doTrainModel, "doTrainModel", "APPSPEC", "COMPINPF",
    "( model x_train x_test y_train y_test epochs -- x_train x_test y_train y_test train ) trains the mode")
 
# ( train -- ) plot of loss over epochs
def doPlotLoss(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    train = gsp.Scratch 
    plt.figure(figsize=(7,5))
    plt.plot(train.history['loss'],label='Training loss')
    plt.plot(train.history['val_loss'],label='Validation loss')
    plt.xlabel('epochs')
    plt.ylabel('loss')
    plt.legend()
    return 0

cfb1.buildPrimitive("PLOT_LOSS", doPlotLoss, "doPlotLoss", "APPSPEC", "COMPINPF","( train -- ) plot of loss over epochs")

# ( train -- ) plot of accuracy over epochs
def doPlotAccuracy(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    train = gsp.Scratch 
    plt.figure(figsize=(7,5))
    plt.plot(train.history['accuracy'],label='Training accuracy')
    plt.plot(train.history['val_accuracy'],label='Validation accuracy')
    plt.xlabel('epochs')
    plt.ylabel('accuracy')
    plt.legend()
    return 0

cfb1.buildPrimitive("PLOT_ACC", doPlotAccuracy, "doPlotAccuracy", "APPSPEC", "COMPINPF",
    "( train -- ) plot of accuracy over epochs")

# ( x_train y_train -- ) Validate model with K-fold cross-validation
def doCrossValidation(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    y_train = gsp.Scratch 
    returnVal = gsp.pop(gsp.DataStack)
    x_train = gsp.Scratch  
    kfold = KFold(n_splits=5, random_state=0, shuffle=True)
    model = LogisticRegression(solver='liblinear')
    results = cross_val_score(model, x_train, y_train, cv=kfold)
    # Output the accuracy. Calculate the mean and std across all folds. 
    print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))
    return 0

cfb1.buildPrimitive("CROSSVAL", doCrossValidation, "doCrossValidation", "APPSPEC", "COMPINPF",
    "( x_train y_train -- ) Validate model with K-fold cross-validation")

# ( x_train x_test y_train y_test -- ) fit the model using the training data and plot the ROC curve
def doPlotRocCurve(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    y_test = gsp.Scratch 
    returnVal = gsp.pop(gsp.DataStack)
    y_train = gsp.Scratch
    returnVal = gsp.pop(gsp.DataStack)
    x_test = gsp.Scratch
    returnVal = gsp.pop(gsp.DataStack)
    x_train = gsp.Scratch  
    model = LogisticRegression(solver='liblinear')
    model.fit(x_train,y_train)   
    y_pred_proba = model.predict_proba(x_test)[::,1]
    fpr, tpr, _ = metrics.roc_curve(y_test,  y_pred_proba)
    # create ROC curve
    plt.plot(fpr,tpr)
    plt.ylabel('True Positive Rate')
    plt.xlabel('False Positive Rate')
    plt.show()
    return 0

cfb1.buildPrimitive("PLOTROC", doPlotRocCurve, "doPlotRocCurve", "APPSPEC", "COMPINPF",
    "( x_train x_test y_train y_test -- ) fit the model using the training data and plot the ROC curve")
                                                                                    

In [None]:
%%compdef
: SETPART  
    LOADCSV COLLAPSE_FIELDS PARTITION_DATA ;

Print a correlation matrix
------------------------------------

In [None]:
%%cfpy 

CLSDS kf_dataset.csv LOADCSV COLLAPSE_FIELDS CORRMAT

Piechart of dependent variable
--------------------------------------------

In [None]:
%%cfpy 

CLSDS kf_dataset.csv SETPART PIECHARTY DROP

Piechart of one of the predictor variables
-----------------------------------------------------------

In [None]:
%%cfpy 

CLSDS kf_dataset.csv SETPART SWAP Prefix_Suffix PIECHARTX DROP

- Compile the logistic regression model
- Do a train/test split
- Train the model

In [None]:
%%cfpy
CLSDS 
kf_dataset.csv SETPART >MATRIX DROP LOGITCOMP
kf_dataset.csv SETPART >MATRIX TRAIN_TEST_SPLIT
50 TRAIN_MODEL

Plot of training and validation loss
-------------------------------------------------

In [None]:
%%cfpy 
DUP PLOT_LOSS

In [None]:
%%cfpy
PLOT_ACC

K-fold cross validation
--------------------------------

In [None]:
%%cfpy
DROP SWAP DROP CROSSVAL

ROC (Receiver Operator Characteristic) curve plot
------------------------------------------------------------------------

It shows the diagnostic ability of binary classifiers.

In [None]:
%%cfpy
CLSDS 
kf_dataset.csv SETPART >MATRIX TRAIN_TEST_SPLIT PLOTROC


Todo list/log application
----------------------------------

- GUI front-end is built in Lazarus.
- It has two tabs, one for the list and the other for the log.
- Dialog box is called as an executable.
- The next cell executed has Creole Forth for Python code which uploads the saved text files to Dropbox.
- The files in Dropbox can then be viewed from any device with access to Dropbox, such as an iPad or Android. 

In [None]:
import os
import dropbox

# ( -- ) Executes todo dialog box
def doToDoDialog(gsp):
    os.system('lazproj.exe')
    return
    
cfb1.buildPrimitive("TODODLG",doToDoDialog, "doToDoDialog", "APPSPEC", "COMPINPF","( -- ) Executes todo dialog box")

# ( access_token -- ) Uploads saved files to Dropbox
def doDropBoxUploads(gsp):
    returnVal = gsp.pop(gsp.DataStack)
    access_token = gsp.Scratch
    local_path1 = 'todo.txt'
    dropbox_path1 = "/todo.txt"
    local_path2 = 'dailylog.txt'
    dropbox_path2 = "/dailylog.txt"
    client = dropbox.Dropbox(access_token)
    client.files_upload(open(local_path1, "rb").read(), dropbox_path1)
    client.files_upload(open(local_path2, "rb").read(), dropbox_path2)
    return 0

cfb1.buildPrimitive("DB_UPLOADS",doDropBoxUploads, "doDropBoxUploads", "APPSPEC", "COMPINPF",
    "( access_token -- ) Uploads saved files to Dropbox")

In [None]:
%cfpy TODODLG

To use the code below, you need a Dropbox account and to do the following steps: 
1. Go to the apps section at https://www.dropbox.com/developers/apps/ .
2. Create a new app.
3. Get the app key and app secret and put it in a safe place.
4. Generate an access token. It will expire in a few hours without a refresh token. Creating a refesh token is outside
   the scope of this demo. To use the code again if it expires, simply generate a new access token. 

In [None]:
# https://www.dropbox.com/developers/apps
# If your access token doesn't work, just generate another one. 
access_token = 'sl.BcyjFoCbR9MpEFkthl68wGQHBhbNFhWpb5Lj-rCS6_cOl9bKw4B-m1huFXZORoK2ofQ7qJIKTzUNM0tvQC_1SpA9orrvv2n5brQTKW0bGFgl1egzfK1XPyXaye_9AItH2yxQx8zq'
gsp.Scratch = access_token
gsp.push(gsp.DataStack)


In [None]:
%cfpy DB_UPLOADS

Summary
-------------

- Jupyter notebook is an effective IDE for interactive development.
- A Forth written in Python can be adapted to use it without great effort. 


Questions/Comments?
---------------------------------

- Reach me at tiluser0@gmail.com
- Code for demo is available on Github at https://github.com/tiluser/cfpy_jn
