## Integrate runtime information

### 1. Explanation with a simple example

Consider the following script, if we separate all the assignments from the operations, it ends up into 2 cells.

The data file is an array of [[43,52,73],[41,18,94]], with shape [2,3] (one can simulate with x = np.array([[43,52,73],[41,18,94]]))

In [None]:
# cell 1
import tensorflow as tf
import numpy as np

x = np.loadtxt("data.csv", delimiter=",")
x = tf.constant(x)

# cell 2
x.set_shape([3,None])

In notebook environment, if the execute cell 1, we can see what x looks like directly

In [21]:
# cell 1
import tensorflow as tf
import numpy as np

x = np.loadtxt("data.csv", delimiter=",")
x = tf.constant(x)
x

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[43, 52, 73],
       [41, 18, 94]])>

We aim to use Python assignment instructions to represent the **type** and **shape** runtime information of variables of interest. For example, if we use the following instruction, we can get the same runtime information of variable x

In [20]:
# to simulate runtime info of x
x = tf.constant(value=[[43,52,73],[41,18,94]],shape=(2,3))
x

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[43, 52, 73],
       [41, 18, 94]])>

Then we can inject this runtime information as Python instructions/code into the original Python script and get the runtime injected version

In [None]:
# cell 1
import tensorflow as tf
import numpy as np

x = np.loadtxt("data.csv", delimiter=",")
x = tf.constant(x)

# runtime info from cell 1
x = tf.constant(value=[[43,52,73],[41,18,94]],shape=(2,3))

# cell 2
x.set_shape([3,None])

To systematically and sutomatically generate runtime information as Python instructions, we developed a script: nb_runtime_parser.py. 

1. Script cell splitting
2. Execute one cell 
3. Run the script to generate Python code that represents runtime information of the execued cell
4. Repeat steps 2 and 3

For static analyzer like Pythia, it takes the order of the code seriously. The runtime information only makes sense if you place it right after its corresponding executed cell, and use it for the analysis of the following code. However, one can always engineer the order relationship between the original code and runtime information to maximize the performance of static analyzers. Afterall, in notebooks, one can execute in any order. 

### 2. Example UT-10B

Run cell 1 to generate runtime information in notebook environment

In [4]:
# cell 1

import sys
import tensorflow as tf
import numpy

assert tf.__version__ == "1.8.0"
tf.set_random_seed(20180130)
numpy.random.seed(20180130)

rng = numpy.random

# Parameters
learning_rate = 0.01
training_epochs = 2000
display_step = 50

# data:

f = open("required_dataset_by_ut/ut10/data.csv")
data = numpy.loadtxt(f, delimiter=",")

train_X = data[:60000, :-1]
train_Y = data[:60000, -1]

test_X = data[60000:80000, :-1]
test_Y = data[60000:80000, -1]

X_val = data[80000:, :-1]
y_val = data[80000:, -1]

# Training Data
n_input = train_X.shape[1]
n_samples = train_X.shape[0]

print(n_input)

# tf Graph Input
X = tf.placeholder(tf.float32, [n_input])
Y = tf.placeholder(tf.float32)

# Create Model

# Set model weights
W = tf.Variable(tf.zeros([6]), name="weight")
b = tf.Variable(tf.zeros([1]), name="bias")

6


Use our script to parse runtime info to Python code

In [6]:
import nb_runtime_parser

w = %who_ls
print(nb_runtime_parser.runtime_info_to_code(locals(), w))

W = tf.Variable(tf.zeros([6,]))
X = tf.placeholder(tf.float32, [6,])
X_val = np.random.normal(size=(19999, 6))
b = tf.Variable(tf.zeros([1,]))
data = np.random.normal(size=(99999, 7))
display_step = 50
learning_rate = 0.01
n_input = 6
n_samples = 60000
test_X = np.random.normal(size=(20000, 6))
test_Y = np.random.normal(size=(20000,))
train_X = np.random.normal(size=(60000, 6))
train_Y = np.random.normal(size=(60000,))
training_epochs = 2000
y_val = np.random.normal(size=(19999,))


No need to do any thing with cell 2 which is the last cell. Because like we explained earlier, Pythia can only use the runtime information for analyzing following code and there is no code after this cell.

In [None]:
# Construct a linear model
activation = tf.add(tf.multiply(X, W), b)

# Minimize the squared errors
cost = tf.reduce_sum(tf.pow(activation - Y, 2)) / (2 * n_samples)  # L2 loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)  # Gradient descent

# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})

        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch + 1), "cost=",
                  "{:.9f}".format(sess.run(cost, feed_dict={X: train_X, Y: train_Y})),
                  "W=", sess.run(W), "b=", sess.run(b))

    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
    print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

    print("Testing... (L2 loss Comparison)")
    testing_cost = sess.run(tf.reduce_sum(tf.pow(activation - Y, 2)) / (2 * test_X.shape[0]),
                            feed_dict={X: test_X, Y: test_Y})  # same function as cost above
    print("Testing cost=", testing_cost)
    print("Absolute l2 loss difference:", abs(training_cost - testing_cost))

Now we can put all the code together into a Python script and obtain the Python script with runtime information injected.