## Node usage analysis

This notebook demonstrates how the `workflowgrader class` can help to extract the data of the nodes used in the workflows.

In [1]:
import knime
from utils import workflowgrader
import pandas as pd
import numpy as np
import os, re
from pathlib import Path
from IPython.display import SVG
import xml.etree.ElementTree as ET

In [2]:
workspace = r"C:/Users/s11006381/knime-workspace/small_gradespace"

In [3]:
workflow_dir_list = [os.path.join(workspace,os.listdir(workspace)[i]) for i in range(len(os.listdir(workspace)))]
for i,j in enumerate(workflow_dir_list):
    print(i,j)

0 C:/Users/s11006381/knime-workspace/small_gradespace\.metadata
1 C:/Users/s11006381/knime-workspace/small_gradespace\11006381_complete
2 C:/Users/s11006381/knime-workspace/small_gradespace\11006381_complete_more
3 C:/Users/s11006381/knime-workspace/small_gradespace\11006381_partial_correct
4 C:/Users/s11006381/knime-workspace/small_gradespace\C5_Lab5_Task1_Data_prep_clean
5 C:/Users/s11006381/knime-workspace/small_gradespace\C5_Lab5_Task1_Data_prep_clean_with_COT
6 C:/Users/s11006381/knime-workspace/small_gradespace\Example Workflows


A `workflowgrader` is initialized with the workspace of interest and the name of the reference workflow where grading is conducted using it as a reference.

In [4]:
wfg = workflowgrader(workspace,'C5_Lab5_Task1_Data_prep_clean_with_COT')

The `.ref_nodes` method returns the count of the various nodes used in the reference workflow.

In [13]:
wfg.ref_nodes

{'CSV Reader': 2,
 'Column Filter': 1,
 'Concatenate': 1,
 'Container Output _Table_': 2,
 'Joiner': 1,
 'Nominal Value Row Filter': 1,
 'Numeric Binner': 1,
 'Reference Column Filter': 1,
 'Reference Row Filter': 1,
 'Row Filter': 2,
 'String Manipulation': 1,
 'String To Number': 1}

The `.accumulate_workflow_nodes()` method is used to accumulate all the nodes used in the workflow to be graded.

In [14]:
wfg.accumulate_workflow_nodes()

100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 250.54it/s]


{'11006381_complete': {'CSV Reader': 2,
  'Column Filter': 1,
  'Concatenate': 1,
  'Container Output _Table_': 2,
  'Joiner': 1,
  'Nominal Value Row Filter': 1,
  'Numeric Binner': 1,
  'Reference Column Filter': 1,
  'Reference Row Filter': 1,
  'Row Filter': 2,
  'String Manipulation': 1,
  'String To Number': 1},
 '11006381_complete_more': {'CSV Reader': 2,
  'Column Filter': 1,
  'Concatenate': 1,
  'Container Output _Table_': 3,
  'Joiner': 1,
  'Nominal Value Row Filter': 1,
  'Numeric Binner': 1,
  'Reference Column Filter': 1,
  'Reference Row Filter': 1,
  'Row Filter': 2,
  'String Manipulation': 1,
  'String To Number': 1},
 '11006381_partial_correct': {'CSV Reader': 2,
  'Column Filter': 1,
  'Container Output _Table_': 1,
  'Joiner': 1,
  'Nominal Value Row Filter': 1,
  'Numeric Binner': 1,
  'Reference Column Filter': 1,
  'Row Filter': 1,
  'String Manipulation': 1,
  'String To Number': 1,
  'Concatenate': 0,
  'Reference Row Filter': 0}}

We can easily convert this to a pandas dataframe and after which to a csv file

In [16]:
df = pd.DataFrame.from_dict(d,orient='index')
df.head()

Unnamed: 0,CSV Reader,Column Filter,Concatenate,Container Output _Table_,Joiner,Nominal Value Row Filter,Numeric Binner,Reference Column Filter,Reference Row Filter,Row Filter,String Manipulation,String To Number
11006381_complete,2,1,1,2,1,1,1,1,1,2,1,1
11006381_complete_more,2,1,1,3,1,1,1,1,1,2,1,1
11006381_partial_correct,2,1,0,1,1,1,1,1,0,1,1,1


In [20]:
df.to_csv('out.csv')