# EE194 Lab 0: Chisel - Part 3: Chisel Interfaces for Modularity, Decoupling
Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE" or `YOUR ACTION NEEDED HERE`.

If you see `???` right below `YOUR CODE HERE`, make sure to remove that after you have implemented your solution (and before you run the code block).

### Import the necessary Chisel dependencies. 
> There will be cells like these in every lab. Make sure you run them before proceeding to bring the Chisel Library into the Jupyter Notebook scope!

In [None]:
interp.configureCompiler(_.settings.processArguments(List("-Wconf:cat=deprecation:s"), true))
interp.load.module(os.Path(s"${System.getProperty("user.dir")}/resource/chisel_deps.sc"))

In [None]:
import chisel3._
import chisel3.util._
import chiseltest._
import chiseltest.RawTester.test

## Ready/Valid Protocol
### Recap:
* **valid** - output from producer indicating that it is ready to send data
* **ready** - output from consumer indicating able to receive
* **bits** - the payload producer is sending consumer
* Transfer occurs when both ready & valid in same cycle

![](./images/handshake-wave.svg)

![](./images/readyValid.svg)

### Chisel Ready/Valid
* Supported by default!
* To use, wrap data to transfer with desired protocol
    * Library will add additional signals & provide helper functions
    * By default, sending data in output direction, use Flipped to reverse

#### [Valid](https://javadoc.io/doc/edu.berkeley.cs/chisel3_2.13/latest/chisel3/util/Valid.html) - only `valid` signal
* Consumer can't say no
    * Must consume when sent
* Indicates the existence of data
    * Almost like hardware equivalent of Scala's Option
* Contains fields - `.valid`, `.bits` (containing the actual data)
    * See example `MakeValid` below. 

In [None]:
class MakeValid(w: Int) extends Module {
    val io = IO(new Bundle {
        val en  = Input(Bool())
        val in  = Input(UInt(w.W))
        val out = Valid(UInt(w.W))
    })
    io.out.valid := io.en
    io.out.bits := io.in
}

printVerilog(new MakeValid(4))

In [None]:
class ValidReceiver(w: Int) extends Module {
    val io = IO(new Bundle {
        val in = Flipped(Valid(UInt(w.W))) // notice on the receiving end, a `Flipped` is wrapped on the interface
    })
    when (io.in.valid) {
        printf("  received %d\n", io.in.bits)
    }
}

// printVerilog(new ValidReceiver(4))
test(new ValidReceiver(4)) { c =>
    for (cycle <- 0 until 8) {
        c.io.in.bits.poke(cycle.U)
        println(s"cycle: $cycle")
        c.io.in.valid.poke((cycle%2 == 0).B)
        c.clock.step()
    }
}

#### Valid Accumulator
> The following `Accumulator` class stores and output the accumulated values. When the `rst` signal is high, the accumulated value should reset to `0.U`. Using the given `Accumulator` class as a starting point, modify it so that it uses `Valid[UInt]` as the input, and only accumulate the incoming `data` if it is `valid`. Reset (rst) takes priority over valid. Name the newly modified class as `ValidAccumulator`.

In [None]:
class Accumulator(width: Int) extends Module {
    val io = IO(new Bundle {
        val rst = Input(Bool())
        val data = Input(UInt(width.W))
        val count = Output(UInt(width.W))
    })

    val count_reg = Reg(UInt(width.W))
    when (io.rst === true.B) {
        count_reg := 0.U
    } .otherwise {
        count_reg := count_reg + io.data
    }
    io.count := count_reg
}

// YOUR CODE HERE
???

In [None]:
def testValidAccumulator: Boolean = {
    test(new ValidAccumulator(4)) { dut =>
        dut.io.data.bits.poke(5.U)
        dut.io.data.valid.poke(false.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(0.U)
        dut.clock.step()
        
        dut.io.data.bits.poke(5.U)
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(0.U)
        dut.clock.step()
        
        dut.io.data.bits.poke(6.U)
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(5.U)
        dut.clock.step()
        
        dut.io.data.bits.poke(7.U)
        dut.io.data.valid.poke(false.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(11.U)
        dut.clock.step()
        
        dut.io.data.bits.poke(7.U)
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(11.U)
        dut.clock.step()
        
        dut.io.data.bits.poke(0.U)
        dut.io.data.valid.poke(false.B)
        dut.io.rst.poke(true.B)
        dut.io.count.expect(2.U)
        dut.clock.step()
        
        dut.io.count.expect(0.U)
    }
    true
}
assert(testValidAccumulator)

#### [Decoupled](https://github.com/chipsalliance/chisel/blob/v3.6.1/src/main/scala/chisel3/util/Decoupled.scala#L125-L155) - `ready & valid` signals
* Consumer can apply backpressure
* **BEWARE** of combinational loops
    * Avoid using ready/valid input to combinationally create ready/valid output
* Built-in helper functions for `Decoupled` interfaces:
    * `fire` - Bool that is true if and only if ready & valid
    * `enq(data)` - Sends data and sets valid to true (doesn't check ready)
    * `noenq` - Sets valid to false
    * `deq`/`nodeq` - Like enq/noenq for receiver
* Contains fields `.valid`, `.ready`, & `.bits` - Similar to the `Valid` interface above.

> Using your `ValidAccumulator` class as a starting point, write a `DecoupledAccumulator` class that uses a `Decoupled[UInt]` for the input data to indicate back pressure (via ready/valid signalling). It must wait `coolDown` cycles after accepting an input before it can accept another. For example, if `coolDown = 1`, it can accept new numbers no faster than every other cycle. Like the previous Accumulator problems, add the proper reset logic. You can assume `coolDown > 0`. At start up or coming out of reset/rst, you need to wait the `coolDown` amount. Note that `rst` is not the same as the reset signal here -- `rst` is to reset the accumulator after the initial reset signal (ie: `rst` will not always be high when the circuit starts).

In [None]:
class DecoupledAccumulator(width: Int, coolDown: Int) extends Module {
    // YOUR CODE HERE
    ???
}

In [None]:
def testDecoupledAccumulator: Boolean = {
    test(new DecoupledAccumulator(4, 1)) { dut =>
        dut.io.data.bits.poke(1.U) // ignored
        dut.io.data.valid.poke(true.B) 
        dut.io.rst.poke(false.B)
        dut.io.count.expect(0.U)
        dut.io.data.ready.expect(false.B)
        dut.clock.step()

        dut.io.data.bits.poke(2.U) // accumed
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(0.U)
        dut.io.data.ready.expect(true.B)
        dut.clock.step()

        dut.io.data.bits.poke(3.U) // ignored
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(2.U)
        dut.io.data.ready.expect(false.B)
        dut.clock.step()

        dut.io.data.bits.poke(4.U) // accumed
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(2.U)
        dut.io.data.ready.expect(true.B)
        dut.clock.step()

        dut.io.data.bits.poke(5.U) // ignored since invalid
        dut.io.data.valid.poke(false.B) // doesn't count
        dut.io.rst.poke(false.B)
        dut.io.count.expect(6.U)
        dut.io.data.ready.expect(false.B)
        dut.clock.step()

        dut.io.data.bits.poke(6.U) // ignored since invalid
        dut.io.data.valid.poke(false.B) // doesn't count
        dut.io.rst.poke(false.B)
        dut.io.count.expect(6.U)
        dut.io.data.ready.expect(true.B)
        dut.clock.step()

        dut.io.data.bits.poke(7.U) // accumed
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(6.U)
        dut.io.data.ready.expect(true.B)
        dut.clock.step()

        dut.io.count.expect(13.U)
        dut.io.rst.poke(true.B) // reset
        dut.clock.step()
        dut.io.count.expect(0.U)
    }
    test(new DecoupledAccumulator(4, 2)) { dut =>
        dut.io.data.bits.poke(1.U) // ignored
        dut.io.data.valid.poke(true.B) 
        dut.io.rst.poke(false.B)
        dut.io.count.expect(0.U)
        dut.io.data.ready.expect(false.B)
        dut.clock.step()

        dut.io.data.bits.poke(2.U) // ignored
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(0.U)
        dut.io.data.ready.expect(false.B)
        dut.clock.step()

        dut.io.data.bits.poke(3.U) // accum
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(0.U)
        dut.io.data.ready.expect(true.B)
        dut.clock.step()

        dut.io.data.bits.poke(4.U) // ignored
        dut.io.data.valid.poke(true.B)
        dut.io.rst.poke(false.B)
        dut.io.count.expect(3.U)
        dut.io.data.ready.expect(false.B)
        dut.clock.step()
    }
    true
}
assert(testDecoupledAccumulator)

## Chisel `Queue`
* AKA: FIFO 
* Natively supported in Chisel via `Queue` generator!

### Review
* Use when traffic is bursty - a queue can smooth traffic rate
    * Queue fills up when too much demand
    * When demand wanes, can completely empty the queue
* A queue can't solve a throughput mismatch
    * If always production rate > consumption rate, queue can't help
* A queue is a great place to use decoupled interfaces

![](./images/queue.svg)

### Chisel `Queue` Interface
* Docs: https://javadoc.io/doc/edu.berkeley.cs/chisel3_2.13/latest/chisel3/util/Queue.html
* Uses `Decoupled` on both enqueue & dequeue sides
* Specify type and max number of entries `Queue(UInt(4.W), 8)`
* Addition arguments to the constructor above:
    * `pipe` - if full, allow enqueue/dequeue at same time
    * `flow` - if empty, enqueued value available immediately for dequeue

![](./images/queueReady.svg)

In [None]:
class CountIntoQueue(maxVal: Int, numEntries: Int, pipe: Boolean, flow: Boolean) extends Module {
    val io = IO(new Bundle {
        val en  = Input(Bool())
        val out = Decoupled(UInt())
        val count = Output(UInt())
    })
    val q = Module(new Queue(UInt(), numEntries, pipe=pipe, flow=flow))
    val (count, wrap) = Counter(q.io.enq.fire, maxVal) // when q.io.enq.fire is true, count increments every clock cycle - Counter is a generator built into Chisel
    q.io.enq.valid := io.en
    q.io.enq.bits := count
    io.out <> q.io.deq // note that out is a `Decoupled` interface, which means it has `ready` and `valid` signals that dictate when we can dequeue a value from the queue (`valid` signal is set by the Queue on the dequeue side)
    io.count := count // for visibility
}

printVerilog(new CountIntoQueue(3,1,false,false))

In [None]:
test(new CountIntoQueue(4,3,pipe=false,flow=false)) { c =>
    c.io.en.poke(true.B)
    c.io.out.ready.poke(false.B)
    for (cycle <- 0 until 4) {   // Fill up queue
        println(s"f count:${c.io.count.peek()} out:${c.io.out.bits.peek()} v:${c.io.out.valid.peek()}")
        c.clock.step()
    }
    println()
    c.io.en.poke(false.B)
    c.io.out.ready.poke(true.B)
    for (cycle <- 0 until 4) {   // Drain queue
        println(s"d count:${c.io.count.peek()} out:${c.io.out.bits.peek()} v:${c.io.out.valid.peek()}")
        c.clock.step()
    }
    println()
    c.io.en.poke(true.B)
    for (cycle <- 0 until 4) {   // Simultaneous
        println(s"s count:${c.io.count.peek()} out:${c.io.out.bits.peek()} v:${c.io.out.valid.peek()}")
        c.clock.step()
    }
}

## Arbiters in Chisel
* Native in Chisel!
* Uses `Decoupled` for requests and outcome/arbiter output
    * `valid` (from requestor) indicates if actually sending request
    * `ready` (to requestor) indicates request granted
    * Normal ready/valid interface on the arbiter output side where the selected input will be passed to the output if `ready` && `valid`

### Review
* Arbitration is needed to choose between multiple components attempting to access a scarce resource
    * Needs way to choose (arbitrate) if multiple simultaneous requests
* Different tie-breaking algorithms available e.g. fixed priority or round-robin
* Examples of when to use an Arbiter
    * Structural hazard in a processor, such as core & memory both trying to write to cache at same time

### Types of Chisel Arbiters
* [`Arbiter`](https://javadoc.io/doc/edu.berkeley.cs/chisel3_2.13/latest/chisel3/util/Arbiter.html) - fixed priority from least significant (e.g. port 0 wins)
* [`RRArbiter`](https://javadoc.io/doc/edu.berkeley.cs/chisel3_2.13/latest/chisel3/util/RRArbiter.html) - round robin for who wins ties
* [`LockingRRArbiter`](https://javadoc.io/doc/edu.berkeley.cs/chisel3_2.13/latest/chisel3/util/LockingRRArbiter.html) - round robin, but "winner" granted out for `count` cycles

![](./images/arbiter.svg)

In [None]:
class UtilArbDemo(numPorts: Int, w: Int) extends Module {
    val io = IO(new Bundle {
        val req = Flipped(Vec(numPorts, Decoupled(UInt(w.W))))
        val out = Decoupled(UInt(w.W))
    })
    require (numPorts > 0)
    val arb = Module(new LockingRRArbiter(UInt(w.W), numPorts, 2))
    for (p <- 0 until numPorts) {
        arb.io.in(p) <> io.req(p) 
    }
    // arb.io.in <> io.req // This is the same as that `for` loop above. For why, see: https://www.chisel-lang.org/docs/explanations/connection-operators
    io.out <> arb.io.out
    printf("req: ")
    for (p <- numPorts-1 to 0 by -1) {
        printf("%b", arb.io.in(p).valid)
    }
    printf(" winner: %d (v: %b)\n", arb.io.out.bits, arb.io.out.valid)
}

In [None]:
// printVerilog(new UtilArbDemo(2,8))
val numPorts = 4
test(new UtilArbDemo(numPorts,8)) { c =>
    c.io.out.ready.poke(true.B)
    for (cycle <- 0 until 5) {
        for (p <- 0 until numPorts) {
            c.io.req(p).bits.poke(p.U)
            c.io.req(p).valid.poke((p >= cycle).B)
        }
        c.clock.step()
    }
}