## Agile Hardware Design
***
# Decoupling

## Prof. Scott Beamer
### sbeamer@ucsc.edu

## [CSE 293](https://classes.soe.ucsc.edu/cse293/Winter22/)

## Plan for Today

* Scala case classes
* Decoupling blocks in Chisel
* Chisel Queue demo

## Scala Case Classes

* Special type of class with additional features built-in
  * Companion object (with constructor) (don't need `new` to instantiate)
  * All parameters are automatically public (don't need to make them `val`)
  * Automatic implementations of `toString`, `equals`, and `copy` 
  * Great for pattern matching (future lecture)


In [None]:
case class Movie(name: String, year: Int, genre: String) {
    def decade(): String = (year - year%10) + "s"
}

val m1 = Movie("Gattaca", 1997, "drama")
m1.genre
val m2 = Movie("The Avengers", 1998, "action")
m2.copy(year=2012)
m2.decade()

## Using `case class` for Parameters in Chisel

In [None]:
case class CounterParams(limit: Int, start: Int = 0) {
    def width = log2Ceil(limit + 1)
}

class MyCounter(cp: CounterParams) extends Module {
    val io = IO(new Bundle {
        val en  = Input(Bool())
        val out = Output(UInt(cp.width.W))
    })
    val count = RegInit(cp.start.U(cp.width.W))
    when (io.en) {
        when (count < cp.limit.U) {
            count := count + 1.U
        } .otherwise {
            count := cp.start.U
        }
    }
    io.out := count
}

println(getVerilog(new MyCounter(CounterParams(14))))

## Motivation for Handshaking Protocol

* Can already be difficult to correctly implement a seqentual component, but what about two sequential components interacting?

* For today, let's only focus on transferring data
  * A _producer_ sending data to a _consumer_

* _**Challenge:**_ recognize when a side is (or is not) able to send/receive data

<img src="images/producer.svg" alt="ready/valid schematic" style="width:75%;margin-left:auto;margin-right:auto"/>

## Best to Distribute Control

* When to use _centralized_ vs _distributed_ control?
  * Common tradeoff throughout systems
  * Centralized can be more efficient and easier to implement (for small scale)
  * Distributed (peer-to-peer) can scale to larger designs much more easily
  * _Common outcome:_ centralized within components and distributed between them
  * Thus, question: _"At what scale to switch from centralized to distributed?"_

* For data transfer between components, may need ...
  * Ability for producer to indicate no data is being sent
  * Ability for consumer to indicate inability to receive data (_back pressure_)

## Ready/Valid Protocol

* Common hardware design pattern for producer-consumer data transfer

* _**valid**_ - output from producer indicating sending data

* _**ready**_ - output from consumer indicating able to receive

* Transfer occurs when both _ready & valid_ in same cycle



<img src="images/readyValid.svg" alt="ready/valid schematic" style="width:75%;margin-left:auto;margin-right:auto"/>

## Chisel Supports Ready/Valid

* Best to use standard library's support for these patterns
  * Less code to write, less chance of error, standardization improves readability
* To use, wrap data to transfer with desired protocol
  * Library will add needed additional signals & provide helper functions
  * By default, sending data in output direction, use `Flipped` to reverse

### [Valid](https://www.chisel-lang.org/api/latest/chisel3/util/Valid.html) - only `valid`

* Consumer can't say no
  * Must consume when sent
* Indicates the existence of data
  * Amost like hardware equivalent of Scala's `Option`

### [Decoupled](https://www.chisel-lang.org/api/latest/chisel3/util/Decoupled$.html) - `ready & valid`

* Consumer can apply backpressure
* _**BEWARE**_ of _combinational loops_
  * Avoid using ready/valid input to combinationally create ready/valid output

## Combinational Loops

* (Uncontrolled) feedback paths that do NOT pass through state elements (registers or memories)
    * State elements provide _synchronization_ and thus control feedback
    * Generated hardware can have unpredictable values, or even get trapped in metastable state
* Generally want to avoid combinational loops (usually a mistake)
    * Can sometimes prove will converge, but should be very deliberate

In [None]:
class LoopyCounter(width: Int) extends Module {
    val io = IO(new Bundle {
        val count = Output(UInt(width.W))
    })
    io.count := io.count + 1.U
//     io.count := RegNext(io.count + 1.U)
}
println(getVerilog(new LoopyCounter(4)))

<img src="images/combo.svg" alt="combinational loop example" style="width:60%;margin-left:auto;margin-right:auto"/>

## Example: Using Chisel `Valid` (1/2)

In [None]:
class MakeValid(n: Int) extends Module {
    val io = IO(new Bundle {
        val en  = Input(Bool())
        val in  = Input(UInt(n.W))
        val out = Valid(UInt(n.W))
    })
    io.out.valid := io.en
    io.out.bits := io.in
}

println(getVerilog(new MakeValid(4)))

## Example: Using Chisel `Valid` (2/2)

In [None]:
class ValidReceiver(n: Int) extends Module {
    val io = IO(new Bundle {
        val in = Flipped(Valid(UInt(n.W)))
    })
    when (io.in.valid) {
        printf("  received %d\n", io.in.bits)
    }
}

// println(getVerilog(new ValidReceiver(4)))
test(new ValidReceiver(4)) { c =>
    for (cycle <- 0 until 8) {
        c.io.in.bits.poke(cycle.U)
        println(s"cycle: $cycle")
        c.io.in.valid.poke((cycle%2 == 0).B)
        c.clock.step()
    }
}

## Example: Using Chisel `Decoupled` (1/2)

In [None]:
class CountWhenReady(n: Int) extends Module {
    val io = IO(new Bundle {
        val en  = Input(Bool())
        val out = Decoupled(UInt())
    })
    val advanceCounter = io.en && io.out.ready
    val (count, wrap) = Counter(advanceCounter, n)
    io.out.bits := count
    io.out.valid := io.en
}

println(getVerilog(new CountWhenReady(4)))

## `Decoupled` Helper Functions

* Convenience functions that wrap up functionality & improve readability ([code](https://github.com/chipsalliance/chisel3/blob/v3.5.0/src/main/scala/chisel3/util/Decoupled.scala#L44))
    * Internally, they are Scala functions working on Chisel things
* `fire` - Bool that is true if and only if ready & valid
* `enq(data)` - Sends data and sets valid to true (doesn't check ready)
* `noenq` - Sets valid to false
* `deq`/`nodeq` - Like enq/noenq for receiver

## Example: Using Chisel `Decoupled` (2/2)

In [None]:
class CountWhenReady(maxVal: Int) extends Module {
    val io = IO(new Bundle {
        val en  = Input(Bool())
        val out = Decoupled(UInt())
    })
    val (count, wrap) = Counter(io.out.fire, maxVal)
    when (io.en) {
        io.out.enq(count)
//         io.out.bits := count
//         io.out.valid := true.B
    } .otherwise {
        io.out.noenq()
//         io.out.bits := DontCare
//         io.out.valid := false.B
    }
}

// println(getVerilog(new CountWhenReady(3)))

test(new CountWhenReady(3)) { c =>
    c.io.en.poke(true.B)
    for (cycle <- 0 until 7) {
        c.io.out.ready.poke((cycle%2 == 1).B)
        println(s"cycle: $cycle, count: ${c.io.out.bits.peek()}")
        c.clock.step()
    }
}

## Using Queues to Handle Backpressure

* If traffic is bursty, can use a _queue_ to smooth traffic rate
  * Queue fills up when too much demand
  * When demand wanes, can drain queue
* A queue can't solve a throughput mismatch
  * If always production rate > consumption rate, queue can't help
* A queue is a great place to use _decoupled_ interfaces
* Chisel's util provides `Queue` generator

<img src="images/queue.svg" alt="ready/valid schematic" style="width:65%;margin-left:auto;margin-right:auto"/>

## Using Chisel's `Queue`

* Part of `util` ([docs](https://www.chisel-lang.org/api/latest/chisel3/util/Queue.html))
* Uses `Decoupled` for both input and output
* Specify type and number of entries `Queue(UInt(4.W), 8)`
* Additional optional arguments
  * `pipe` - if 1 entry, allow enqueue/dequeue at same time
  * `flow` - if empty, enqueued value available immediately for dequeue

<img src="images/queueReady.svg" alt="ready/valid schematic" style="width:85%;margin-left:auto;margin-right:auto"/>

## Chisel `Queue` Demo (1/2)

In [None]:
class CountIntoQueue(maxVal: Int, numEntries: Int, pipe: Boolean, flow: Boolean) extends Module {
    val io = IO(new Bundle {
        val en  = Input(Bool())
        val out = Decoupled(UInt())
        val count = Output(UInt())
    })
    val q = Module(new Queue(UInt(), numEntries, pipe=pipe, flow=flow))
    val (count, wrap) = Counter(q.io.enq.fire, maxVal)
    q.io.enq.valid := io.en
    q.io.enq.bits := count
    io.out <> q.io.deq
    io.count := count // for visibility
}

// println(getVerilog(new CountIntoQueue(3,1,false,false)))

## Chisel `Queue` Demo (2/2)

In [None]:
test(new CountIntoQueue(4,3,pipe=false,flow=false)) { c =>
    c.io.en.poke(true.B)
    c.io.out.ready.poke(false.B)
    for (cycle <- 0 until 4) {   // Fill up queue
        println(s"f count:${c.io.count.peek()} out:${c.io.out.bits.peek()} v:${c.io.out.valid.peek()}")
        c.clock.step()
    }
    println()
    c.io.en.poke(false.B)
    c.io.out.ready.poke(true.B)
    for (cycle <- 0 until 4) {   // Drain queue
        println(s"d count:${c.io.count.peek()} out:${c.io.out.bits.peek()} v:${c.io.out.valid.peek()}")
        c.clock.step()
    }
    println()
    c.io.en.poke(true.B)
    for (cycle <- 0 until 4) {   // Simultaneous
        println(s"s count:${c.io.count.peek()} out:${c.io.out.bits.peek()} v:${c.io.out.valid.peek()}")
        c.clock.step()
    }
}