# Analyzing actual program instruction memory traces

This notebook allows one to analyze sets of *instruction memory traces* for three different actual programs.  An *instruction memory trace* is the ordered set of memory addresses of each instruction executed by a program.  This notebook fetches each data set and, for each data set, then computes and graphs hw many unique pages have been visited after each *k* instructions have been executed.  Each graph represents the cumulative number of unique pages visited.  In these examples, *k* is set to 500, so that the graphs are easier to draw.

On line `56`, in the `offset_size_bits` variable, you can adjust how many right-most bits of a virtual/logical address are used for the *page/frame offset*.  The remaining bits will be used as the *page number*. The value used in the code by default is `12`, which supports 2^12 (4096) bytes within a page/frame.  

In [None]:
import urllib.request
import matplotlib.pyplot as plotter

def fetch_data(input_url_arg):
    response = urllib.request.urlopen(input_url_arg)
    data = response.read()  
    text = data.decode('utf-8')
    lines = text.splitlines()    
    return lines

def analyze(instructions_list_arg, address_size_arg, split_point_arg):
    scale = 16
    visited_pages = {}

    time_steps = []
    unique_pages = []
    shifts = []
    previous_page = -1

    number_of_instructions = len(instructions_list_arg)
    for lineIndex in range(number_of_instructions):
        if (lineIndex % 500 == 0):
            time_steps.append(lineIndex)
            unique_pages.append(len(visited_pages))      
        line = instructions_list_arg[lineIndex]
        words = line.split()
        hex_string = words[0]
        binary_string = bin(int(hex_string,16))[2:].zfill(address_size_arg)
        page = binary_string[0:split_point_arg]
        offset = binary_string[split_point_arg:]
        if (page in visited_pages):
            count = visited_pages[page]
            visited_pages[page] = count + 1
        else:
            visited_pages[page] = 1
        if (page != previous_page):
            shifts.append(1)
        else:
            shifts.append(0)
        previous_page = page
        
    #sub_plot = figure_arg.add_subplot(3, 1, instance_arg)
    #sub_plot.plot(time_steps, unique_pages,"o",color="black")
    fig = plotter.figure()
    axes = plotter.axes()
    plotter.plot(time_steps,unique_pages,"o",color="black")
    plotter.xlabel("Instruction number")
    plotter.ylabel("Unique pages visited")
    plotter.xticks(range(min(time_steps), max(time_steps)+5, 1000))
    plotter.yticks(range(min(unique_pages), max(unique_pages)+5, 3))
    axes.grid()
    fig.show()
    
# parameters
address_size_bits = 64
offset_size_bits = 12 
split_point_bits = address_size_bits - offset_size_bits

# fetch, analyze, and graph the instruction trace information
%matplotlib inline 
# fetch the data from the web
analyze(fetch_data("http://csweb.cs.wfu.edu/~turketwh/341/Fall2019/ls_10000.txt"),address_size_bits,split_point_bits)
analyze(fetch_data("http://csweb.cs.wfu.edu/~turketwh/341/Fall2019/df_10000.txt"),address_size_bits,split_point_bits)
analyze(fetch_data("http://csweb.cs.wfu.edu/~turketwh/341/Fall2019/who_10000.txt"),address_size_bits,split_point_bits)




## To investigate

*Add your answers to the Google Doc that is associated with the same assignment as these Python notebooks.*

1. How many instructions are being analyzed for the graphs in the data being used by the code provided to you?
2. Given the number of instructions and 4096-byte pages, what is the maximum number of pages visited for each of the three programs? Write down three values - one from each graph. The graphs represent the Unix programs: `ls`, `df`, and `who` respectively.
3. Before investigating Question 4 below, predict whether the number of unique pages visited should *increase* or *decrease* as the `offset_size_bits` variable value increases?
4. Increase the page size to 16384-byte pages by adjusting the `offset_size_bits` variable on line `56`. Given the number of instructions and 16384-byte pages, what is the maximum number of pages visited for each of the three programs? How does your answer here compare with your answer to Question 3 above?
5. Shrink the page size to 512-byte pages by adjusting the `offset_size_bits` variable on line `56`. Given the number of instructions and 512-byte pages, what is the maximum number of pages visited for each of the three programs?
6. On lines `62` through `64`, change the filenames so that they no longer end in `10000.txt` but instead end in `all.txt`.  For example, on line `62`, the parameter to the `fetch_data` function should now be: `http://csweb.cs.wfu.edu/~turketwh/341/Fall2019/ls_all.txt`.  Which program has the *longest* memory trace (*maximum* number of instructions executed)?
7. Set the `offset_size_bits` back to `12` to support 4096 byte pages. Keeping the filenames the same, which program visits the *most* (*maximum number*) of pages during program execution?
8. Assume that the TLB for the machine that these programs are running on has a size of `128`? For 4096-byte pages, is this TLB large enough to hold information on all the pages visited by each program? Consider the programs independently.  
9. For the `ls` program (the first graph drawn), is there any *time range* where the number of pages visited does not grow much? 