# Accessing the Vector Multiplier via PYNQ
In this demo, we show how to access the simple vector multiplier using a shared memory interface via PYNQ.

The files for the overlay should already be part of the github repo.  Alternatively, you can build them from scratch by following the instructions in [shared memory demo](https://sdrangan.github.io/hwdesign/sharedmem/) to build the vector multiplier IP, export it to a Vivado project and build the overlay.  

**NOTE**:  Right now this notebook does not seem to work.  You can see the overlay, but the output `c` is all zeros.  I am not sure what I am doing wrong.

We can access the overlay that we have created with the pynq command.

In [None]:
import numpy as np
from pynq import Overlay
overlay = Overlay("../overlay/vector_mult.bit")

We can then print out information on the overlay.

In [75]:
overlay?

[0;31mType:[0m            Overlay
[0;31mString form:[0m     <pynq.overlay.Overlay object at 0xffff90ccf640>
[0;31mFile:[0m            /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq/overlay.py
[0;31mDocstring:[0m      
Default documentation for overlay ../overlay1/vector_mult.bit. The following
attributes are available on this overlay:

IP Blocks
----------
vec_mult_0           : pynq.overlay.DefaultIP
zynq_ultra_ps_e_0    : pynq.overlay.DefaultIP

Hierarchies
-----------
None

Interrupts
----------
None

GPIO Outputs
------------
None

Memories
------------
PSDDR                : Memory
[0;31mClass docstring:[0m
This class keeps track of a single bitstream's state and contents.

The overlay class holds the state of the bitstream and enables run-time
protection of bindings.

Our definition of overlay is: "post-bitstream configurable design".
Hence, this class must expose configurability through content discovery
and runtime protection.

The overlay class exposes 

We see that the overlay has two IPs:  The Zynq and the vector multiplier IP (`vec_mult_0`).  We access the IP and gets its register map. We see that the registers for the addresses, `a`, `b`, and `c` were split into two 32 bits words, `a_1`, `a_2`, `b_1`, `b_2`, and `c_1`, `c_2`.  This split was done automatically to support 64-bit addressing.

In [77]:
ip = overlay.vec_mult_0
ip.register_map

RegisterMap {
  CTRL = Register(AP_START=0, AP_DONE=0, AP_IDLE=1, AP_READY=0, RESERVED_1=0, AUTO_RESTART=0, RESERVED_2=0, INTERRUPT=0, RESERVED_3=0),
  GIER = Register(Enable=0, RESERVED=0),
  IP_IER = Register(CHAN0_INT_EN=0, CHAN1_INT_EN=0, RESERVED_0=0),
  IP_ISR = Register(CHAN0_INT_ST=0, CHAN1_INT_ST=0, RESERVED_0=0),
  a_1 = Register(a=write-only),
  a_2 = Register(a=write-only),
  b_1 = Register(b=write-only),
  b_2 = Register(b=write-only),
  c_1 = Register(c=write-only),
  c_2 = Register(c=write-only),
  n = Register(n=write-only)
}

We can now run a simple test.

* We allocate buffers in the DDR memory.
* Send the addresses to the IP
* Set the buffers with some data
* Read the output data

**THIS DOES NOT WORK**:  For some reason, I get all zeros in `c_buf`.

In [None]:
from pynq import allocate


# Allocate buffers in DDR memory
n=10
a_buf = allocate(shape=(n,), dtype='float32')
b_buf = allocate(shape=(n,), dtype='float32')
c_buf = allocate(shape=(n,), dtype='float32')


In [None]:
# Set the addresses of the buffers in the IP registers
def split_address(addr):
    return addr & 0xFFFFFFFF, (addr >> 32) & 0xFFFFFFFF

ip.a_1, ip.a_2 = split_address(a_buf.physical_address)
ip.b_1, ip.b_2 = split_address(b_buf.physical_address)
ip.c_1, ip.c_2 = split_address(c_buf.physical_address)
ip.n = n

In [None]:
from time import sleep

# Initialize the input buffers
a_buf[:] = np.arange(n, dtype=np.float32)
b_buf[:] = 3*np.arange(n, dtype=np.float32)

a_buf.sync_to_device()
b_buf.sync_to_device()

# Start the execution
ip.register_map.CTRL.AP_START = 1

# Wait for the execution to complete
# Right now we wait a bit between polling events
# to prevent overflowing the bus
while not ip.register_map.CTRL.AP_DONE:
    sleep(0.001) 

# Get the output buffer
c_buf.sync_from_device()

# Print the output buffer
print(c_buf)

PynqBuffer([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)