Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] just to check - conflict detection #109

Open
wants to merge 135 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
cae5bd6
Default to using single-ported scratchpad memories
jerryz123 Jan 9, 2021
48c2046
Attempt to fix
hngenc Jan 27, 2021
b9a22d5
Fix about_to_fire_all_rows
hngenc Jan 27, 2021
3dfca8b
Make SRAMs single-ported
hngenc Jan 27, 2021
94a53f0
Simplify the scratchpad
hngenc Jan 27, 2021
f44473d
Merge pull request #43 from ucb-bar/singleported
hngenc Jan 27, 2021
61edc05
Fix max block len computation (#53)
jerryz123 Feb 4, 2021
9dc6ec8
Stopgap measure to fix convs (#54)
hngenc Feb 7, 2021
c7a2100
Remove popcounts in dma (#55)
hngenc Feb 8, 2021
8335f1b
pipeline scaling parameters (#56)
hngenc Feb 8, 2021
6d183b1
Chip config (#58)
SeahK Feb 11, 2021
5b4d235
Group modules to be retimed together (#57)
jerryz123 Feb 11, 2021
bed471f
Update ChipConfig
jerryz123 Feb 11, 2021
e75b5e9
Merge pull request #59 from ucb-bar/chipconfig-update
jerryz123 Feb 11, 2021
2f645ad
Add pipeline register after LoopMatmul units (#60)
jerryz123 Feb 13, 2021
21ec295
Time-multiplex acc-scale onto fewer FMAs
jerryz123 Feb 13, 2021
07171fd
Add new acc_scale_latency parameter for accumulator scale unit pipeli…
jerryz123 Feb 13, 2021
4609996
Accumulator mem must respond in-order
jerryz123 Feb 13, 2021
279718a
Multiplex single accumulator scale unit between both accumulator banks
jerryz123 Feb 14, 2021
c9e9b1a
Maintain in-order accesses to AccScale unit
jerryz123 Feb 15, 2021
3714ad5
Fix Firesim Freeze (#62)
hngenc Feb 16, 2021
344c76c
Time-multiplex the VectorScalarMultiplier
jerryz123 Feb 16, 2021
1a6ed24
Add FP configs and fix them (#64)
hngenc Feb 16, 2021
66c1513
Merge pull request #66 from ucb-bar/vsm-scale
jerryz123 Feb 16, 2021
e4ce774
Merge remote-tracking branch 'origin/dev' into multiplex-scale
jerryz123 Feb 16, 2021
63e2bfe
Short circuit past vector-scalar-multiplier for identity-scales
jerryz123 Feb 17, 2021
cf191ba
Bank the accumulator memories into single-ported mems
jerryz123 Feb 17, 2021
96b59a4
Add pipeline register after accumulatorscale register
jerryz123 Feb 17, 2021
353b9e5
Allow parallel read/writes from different sub-banks of accumulator
jerryz123 Feb 17, 2021
3ef8907
Allow back-to-back accumulator writes if their masks do not overlap
jerryz123 Feb 18, 2021
ff4825d
Clean up configs for scale-multplexing
jerryz123 Feb 18, 2021
2d1c508
debugging
jerryz123 Feb 18, 2021
adf41e2
Move scale units into ScalePipe modules for better retiming
jerryz123 Feb 19, 2021
e1d512f
Bump acc-scale pipelining to 5
jerryz123 Feb 19, 2021
a356ec6
Fix tlb req cmd for DMA reads (#68)
hngenc Feb 19, 2021
6c0e603
Merge remote-tracking branch 'origin/dev' into multiplex-scale
jerryz123 Feb 19, 2021
1f704d6
Merge bad_dataflow asserts into a single assert
jerryz123 Feb 24, 2021
9a92fa0
Include used opcode in gemmini_params.h | clean up configs
jerryz123 Feb 26, 2021
be80e39
Bump Chipyard
jerryz123 Feb 26, 2021
b29a14b
Bump Spike
jerryz123 Feb 27, 2021
98374db
Fix support for dualported unbanked accumulators
jerryz123 Mar 3, 2021
961f625
Scale LoopMatmul with rob size
jerryz123 Mar 7, 2021
347c48b
Move max_lds/exs/sts calculation to Controller.scala
jerryz123 Mar 7, 2021
765e9ab
Merge pull request #72 from ucb-bar/loopmatmul-robsize
jerryz123 Mar 7, 2021
7f77625
Merge pull request #70 from ucb-bar/parameterize-opcode
jerryz123 Mar 8, 2021
728801e
Merge branch 'dev' into multiplex-scale
jerryz123 Mar 8, 2021
a7b0d8c
Fix conv FSM freeze (#69)
hngenc Mar 10, 2021
e832e2d
Merge remote-tracking branch 'origin/dev' into multiplex-scale
jerryz123 Mar 10, 2021
1c5fb09
Address PR feedback
jerryz123 Mar 11, 2021
373bf99
merging with dev, fixed loopmatmul bug
SeahK Mar 12, 2021
a62a404
change to loopconv
SeahK Mar 12, 2021
efaf15c
save
SeahK Mar 14, 2021
c7613a8
first trial with conv_fsm
SeahK Mar 14, 2021
49786e9
first trial with conv_fsm
SeahK Mar 14, 2021
e67ecf5
fixed tag initialization
SeahK Mar 14, 2021
50147e9
printf
SeahK Mar 14, 2021
8879d2c
ISA change
SeahK Mar 14, 2021
8e429ad
config bugs
SeahK Mar 14, 2021
a7d0b31
debugging
SeahK Mar 14, 2021
908a9bc
decreased wire width
SeahK Mar 14, 2021
0c24167
fixing bugs
SeahK Mar 14, 2021
548d16c
start/end for conv
SeahK Mar 14, 2021
a3b497e
fixing bugs
SeahK Mar 14, 2021
820fcf8
adding support for sw padding
SeahK Mar 15, 2021
91f7283
sw padding support for conv fsm
SeahK Mar 15, 2021
9853765
fixing bugs
SeahK Mar 15, 2021
9ec4e50
Fix garbage writes locking up scratchpad pipeline
jerryz123 Mar 15, 2021
144c880
stride bug
SeahK Mar 16, 2021
929ff92
stride bug
SeahK Mar 16, 2021
d9237b5
Bump gemmini-rocc-tests (#81)
hngenc Mar 17, 2021
d2420af
added och division
SeahK Mar 17, 2021
9ce629d
ROB stall tracker should track io.completed as well
jerryz123 Mar 17, 2021
2622d07
Use all singleported accumulator write queue entries
jerryz123 Mar 17, 2021
54449a7
Revert DefaultConfig to original behavior
jerryz123 Mar 17, 2021
bccb2f6
Fix full-data for un-multiplexed acc-scale
jerryz123 Mar 18, 2021
b631f0b
Merge pull request #61 from ucb-bar/multiplex-scale
hngenc Mar 18, 2021
8e262b4
Double-buffer conv layers that have pooling (#82)
hngenc Mar 18, 2021
cc299e7
Only flush spatial array in OS mode (#85)
hngenc Mar 19, 2021
2f1b242
Bump gemmini-rocc-tests (#88)
hngenc Mar 20, 2021
2620d30
Make the internal conv-fsm truly weight-stationary (#86)
hngenc Mar 20, 2021
88279c4
Split rob entries based off requested resources for efficiency (#73)
jerryz123 Mar 20, 2021
2b0cfc9
merge dev change
SeahK Mar 21, 2021
3c845e3
pooling fixed
SeahK Mar 21, 2021
b49679d
Increase the throughput of matmuls in WS mode (#87)
hngenc Mar 24, 2021
d39b7b4
Perform global-averaging on Gemmini (#90)
hngenc Mar 24, 2021
0c6e603
Merge branch 'dev' of https://github.com/ucb-bar/gemmini into conflic…
SeahK Mar 26, 2021
3807665
start working on profiling
SeahK Apr 4, 2021
b6e61ad
adding profiler
SeahK Apr 4, 2021
133a4d4
fixing bugs
SeahK Apr 4, 2021
125192a
debugging
SeahK Apr 5, 2021
4a982ce
debugging
SeahK Apr 5, 2021
6c5b02c
changing profiling to looploader
SeahK Apr 5, 2021
8d3b214
debugging
SeahK Apr 5, 2021
cda843f
debugging
SeahK Apr 5, 2021
ca0fe4f
adding more profile function
SeahK Apr 5, 2021
286d2a2
added dontTouch
SeahK Apr 5, 2021
0f8eb29
increase bitwidth
SeahK Apr 5, 2021
0533ccb
add maximum, fix profiling start bug
SeahK Apr 6, 2021
4e71bbc
parameterize selection btw average and max
SeahK Apr 6, 2021
45e9679
fix how to calculate latency
SeahK Apr 6, 2021
3ffa0d4
debugging
SeahK Apr 6, 2021
47323af
debugging, added reset signals
SeahK Apr 6, 2021
d67ebe5
debugging
SeahK Apr 6, 2021
667ec2d
debugging
SeahK Apr 6, 2021
f2966f0
debugging
SeahK Apr 6, 2021
d2eeae4
added reset for translate enq
SeahK Apr 6, 2021
ae3e381
more reset
SeahK Apr 6, 2021
99d89c8
debugging
SeahK Apr 6, 2021
9e946f3
remove division
SeahK Apr 6, 2021
a6f4ccb
add dw mapping
SeahK Apr 8, 2021
98aa967
looploader dw
SeahK Apr 8, 2021
cff3e4d
fix monitoring trigger signal, added in_channel divide for dw conv
SeahK Apr 8, 2021
4de86ce
divide output channel for squeezenet concat
SeahK Apr 9, 2021
45f60d6
change stride
SeahK Apr 14, 2021
91505b4
bump rocc-tests
SeahK Apr 16, 2021
5ebb224
start working on 2 out
SeahK Apr 18, 2021
31c8ce0
adding pooling after normal output
SeahK Apr 18, 2021
ccea00c
debugging
SeahK Apr 19, 2021
6994f37
debugging
SeahK Apr 19, 2021
3086b04
debugging
SeahK Apr 19, 2021
6570b11
fixing looploader
SeahK Apr 26, 2021
7ec19bc
debugging
SeahK Apr 26, 2021
9f61b44
debugging
SeahK Apr 26, 2021
c107cb5
deleted counter
SeahK Apr 26, 2021
1a7fa4a
debugging
SeahK Apr 26, 2021
2cde9ce
changed latency
SeahK Apr 26, 2021
60a1103
changed latency
SeahK Apr 26, 2021
123a980
code for partial sum move out
SeahK Apr 30, 2021
e7feb86
stride for partial sum
SeahK Apr 30, 2021
f2d1841
debugging
SeahK Apr 30, 2021
d54d314
fixing bias for loopconv
SeahK Apr 30, 2021
520268e
changing bubble insertion part
May 4, 2021
e18f5c3
debugging
May 5, 2021
b1c38c4
delete bubble enable signal
May 5, 2021
0fcbe86
add enable signal again
May 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/build-toolchains.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,5 @@ if [ ! -d "$HOME/$1-install" ]; then
cd $HOME

# init all submodules including the tools (doesn't use CI_MAKE_PROC due to mem. constraints)
CHIPYARD_DIR="$LOCAL_CHIPYARD_DIR" NPROC=$CI_MAKE_PROC $LOCAL_CHIPYARD_DIR/scripts/build-toolchains.sh esp-tools
CHIPYARD_DIR="$LOCAL_CHIPYARD_DIR" NPROC=$CI_MAKE_NPROC $LOCAL_CHIPYARD_DIR/scripts/build-toolchains.sh esp-tools
fi
2 changes: 1 addition & 1 deletion .circleci/defaults.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#############

# make parallelism
CI_MAKE_NPROC=8
CI_MAKE_NPROC=4
LOCAL_MAKE_NPROC=$CI_MAKE_NPROC

# verilator version
Expand Down
2 changes: 1 addition & 1 deletion CHIPYARD.hash
Original file line number Diff line number Diff line change
@@ -1 +1 @@
6b0d57d60690cc223013ea228b687b519b716c50
1e2f778a6705033d67ccbcc932e66083e4646f15
2 changes: 1 addition & 1 deletion SPIKE.hash
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8626fb144e019895767830d850deca7711773e5c
dbd3b0874dde4eead6b8d0c4195ee8b41dd113fc
221 changes: 150 additions & 71 deletions src/main/scala/gemmini/AccumulatorMem.scala
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,21 @@ class AccumulatorReadReq[T <: Data](n: Int, shift_width: Int, scale_t: T) extend
override def cloneType: this.type = new AccumulatorReadReq(n, shift_width, scale_t.cloneType).asInstanceOf[this.type]
}

class AccumulatorReadResp[T <: Data: Arithmetic](rdataType: Vec[Vec[T]], fullDataType: Vec[Vec[T]]) extends Bundle {
val data = rdataType.cloneType
val full_data = fullDataType.cloneType
class AccumulatorReadResp[T <: Data: Arithmetic, U <: Data](fullDataType: Vec[Vec[T]], scale_t: U, shift_width: Int) extends Bundle {
val data = fullDataType.cloneType
val fromDMA = Bool()

override def cloneType: this.type = new AccumulatorReadResp(rdataType.cloneType, fullDataType.cloneType).asInstanceOf[this.type]
val scale = scale_t.cloneType
val relu6_shift = UInt(shift_width.W)
val act = UInt(2.W)
val acc_bank_id = UInt(2.W) // TODO don't hardcode
override def cloneType: this.type = new AccumulatorReadResp(fullDataType.cloneType, scale_t, shift_width).asInstanceOf[this.type]
}

class AccumulatorReadIO[T <: Data: Arithmetic, U <: Data](n: Int, shift_width: Int, rdataType: Vec[Vec[T]], fullDataType: Vec[Vec[T]], scale_t: U) extends Bundle {
val req = Decoupled(new AccumulatorReadReq(n, shift_width, scale_t))
val resp = Flipped(Decoupled(new AccumulatorReadResp(rdataType.cloneType, fullDataType.cloneType)))
class AccumulatorReadIO[T <: Data: Arithmetic, U <: Data](n: Int, shift_width: Int, fullDataType: Vec[Vec[T]], scale_t: U) extends Bundle {
val req = Decoupled(new AccumulatorReadReq[U](n, shift_width, scale_t))
val resp = Flipped(Decoupled(new AccumulatorReadResp[T, U](fullDataType, scale_t, shift_width)))

override def cloneType: this.type = new AccumulatorReadIO(n, shift_width, rdataType.cloneType, fullDataType.cloneType, scale_t.cloneType).asInstanceOf[this.type]
override def cloneType: this.type = new AccumulatorReadIO(n, shift_width, fullDataType.cloneType, scale_t.cloneType).asInstanceOf[this.type]
}

class AccumulatorWriteReq[T <: Data: Arithmetic](n: Int, t: Vec[Vec[T]]) extends Bundle {
Expand All @@ -42,16 +44,19 @@ class AccumulatorWriteReq[T <: Data: Arithmetic](n: Int, t: Vec[Vec[T]]) extends
override def cloneType: this.type = new AccumulatorWriteReq(n, t).asInstanceOf[this.type]
}

class AccumulatorMemIO [T <: Data: Arithmetic, U <: Data](n: Int, t: Vec[Vec[T]], rdata: Vec[Vec[T]], scale_t: U) extends Bundle {
val read = Flipped(new AccumulatorReadIO(n, log2Ceil(t.head.head.getWidth), rdata, t, scale_t))
class AccumulatorMemIO [T <: Data: Arithmetic, U <: Data](n: Int, t: Vec[Vec[T]], scale_t: U) extends Bundle {
val read = Flipped(new AccumulatorReadIO(n, log2Ceil(t.head.head.getWidth), t, scale_t))
// val write = Flipped(new AccumulatorWriteIO(n, t))
val write = Flipped(Decoupled(new AccumulatorWriteReq(n, t)))

override def cloneType: this.type = new AccumulatorMemIO(n, t, rdata, scale_t).asInstanceOf[this.type]
override def cloneType: this.type = new AccumulatorMemIO(n, t, scale_t).asInstanceOf[this.type]
}

class AccumulatorMem[T <: Data, U <: Data](n: Int, t: Vec[Vec[T]], rdataType: Vec[Vec[T]], mem_pipeline: Int, scale_args: ScaleArguments[T, U], read_small_data: Boolean, read_full_data: Boolean)
(implicit ev: Arithmetic[T]) extends Module {
class AccumulatorMem[T <: Data, U <: Data](
n: Int, t: Vec[Vec[T]], scale_args: ScaleArguments[T, U],
acc_singleported: Boolean, num_acc_sub_banks: Int
)
(implicit ev: Arithmetic[T]) extends Module {
// TODO Do writes in this module work with matrices of size 2? If we try to read from an address right after writing
// to it, then we might not get the written data. We might need some kind of cooldown counter after addresses in the
// accumulator have been written to for configurations with such small matrices
Expand All @@ -64,9 +69,8 @@ class AccumulatorMem[T <: Data, U <: Data](n: Int, t: Vec[Vec[T]], rdataType: Ve
import ev._

// TODO unify this with TwoPortSyncMemIO
val io = IO(new AccumulatorMemIO(n, t, rdataType, scale_args.multiplicand_t))
val io = IO(new AccumulatorMemIO(n, t, scale_args.multiplicand_t))

val mem = TwoPortSyncMem(n, t, t.getWidth / 8) // TODO We assume byte-alignment here. Use aligned_to instead

// For any write operation, we spend 2 cycles reading the existing address out, buffering it in a register, and then
// accumulating on top of it (if necessary)
Expand All @@ -75,83 +79,158 @@ class AccumulatorMem[T <: Data, U <: Data](n: Int, t: Vec[Vec[T]], rdataType: Ve
val acc_buf = ShiftRegister(io.write.bits.acc, 2)
val mask_buf = ShiftRegister(io.write.bits.mask, 2)
val w_buf_valid = ShiftRegister(io.write.fire(), 2)

val w_sum = VecInit((RegNext(mem.io.rdata) zip wdata_buf).map { case (rv, wv) =>
val acc_rdata = Wire(t)
acc_rdata := DontCare
val read_rdata = Wire(t)
read_rdata := DontCare
val block_read_req = WireInit(false.B)
val w_sum = VecInit((RegNext(acc_rdata) zip wdata_buf).map { case (rv, wv) =>
VecInit((rv zip wv).map(t => t._1 + t._2))
})

mem.io.waddr := waddr_buf
mem.io.wen := w_buf_valid
mem.io.wdata := Mux(acc_buf, w_sum, wdata_buf)
mem.io.mask := mask_buf

mem.io.raddr := Mux(io.write.fire() && io.write.bits.acc, io.write.bits.addr, io.read.req.bits.addr)
mem.io.ren := io.read.req.fire() || (io.write.fire() && io.write.bits.acc)

class PipelinedRdataAndActT extends Bundle {
val data = mem.io.rdata.cloneType
val full_data = mem.io.rdata.cloneType
val scale = io.read.req.bits.scale.cloneType
val relu6_shift = io.read.req.bits.relu6_shift.cloneType
val act = io.read.req.bits.act.cloneType
val fromDMA = io.read.req.bits.fromDMA.cloneType
if (!acc_singleported) {
val mem = TwoPortSyncMem(n, t, t.getWidth / 8) // TODO We assume byte-alignment here. Use aligned_to instead
mem.io.waddr := waddr_buf
mem.io.wen := w_buf_valid
mem.io.wdata := Mux(acc_buf, w_sum, wdata_buf)
mem.io.mask := mask_buf
acc_rdata := mem.io.rdata
read_rdata := mem.io.rdata
mem.io.raddr := Mux(io.write.fire() && io.write.bits.acc, io.write.bits.addr, io.read.req.bits.addr)
mem.io.ren := io.read.req.fire() || (io.write.fire() && io.write.bits.acc)
} else {
val mask_len = t.getWidth / 8
val mask_elem = UInt((t.getWidth / mask_len).W)
val reads = Wire(Vec(2, Decoupled(UInt())))
reads(0).valid := io.write.valid && io.write.bits.acc
reads(0).bits := io.write.bits.addr
reads(0).ready := true.B
reads(1).valid := io.read.req.valid
reads(1).bits := io.read.req.bits.addr
reads(1).ready := true.B
block_read_req := !reads(1).ready
for (i <- 0 until num_acc_sub_banks) {
def isThisBank(addr: UInt) = addr(log2Ceil(num_acc_sub_banks)-1,0) === i.U
def getBankIdx(addr: UInt) = addr >> log2Ceil(num_acc_sub_banks)
val mem = SyncReadMem(n / num_acc_sub_banks, Vec(mask_len, mask_elem))

val ren = WireInit(false.B)
val raddr = WireInit(getBankIdx(reads(0).bits))
val nEntries = 3
// Writes coming 2 cycles after read leads to bad bank behavior
// Add another buffer here
class W_Q_Entry[T <: Data](mask_len: Int, mask_elem: T) extends Bundle {
val valid = Bool()
val data = Vec(mask_len, mask_elem)
val mask = Vec(mask_len, Bool())
val addr = UInt(log2Ceil(n/num_acc_sub_banks).W)
override def cloneType: this.type = new W_Q_Entry(mask_len, mask_elem).asInstanceOf[this.type]
}
val w_q = Reg(Vec(nEntries, new W_Q_Entry(mask_len, mask_elem)))
for (e <- w_q) {
when (e.valid) {
assert(!(
io.write.valid && io.write.bits.acc &&
isThisBank(io.write.bits.addr) && getBankIdx(io.write.bits.addr) === e.addr &&
((io.write.bits.mask.asUInt & e.mask.asUInt) =/= 0.U)
))
when (io.read.req.valid && isThisBank(io.read.req.bits.addr) && getBankIdx(io.read.req.bits.addr) === e.addr) {
reads(1).ready := false.B
}
}
}
val w_q_head = RegInit(1.U(nEntries.W))
val w_q_tail = RegInit(1.U(nEntries.W))
when (reset.asBool) {
w_q.foreach(_.valid := false.B)
}
val wen = WireInit(false.B)
val wdata = Mux1H(w_q_head.asBools, w_q.map(_.data))
val wmask = Mux1H(w_q_head.asBools, w_q.map(_.mask))
val waddr = Mux1H(w_q_head.asBools, w_q.map(_.addr))
when (wen) {
w_q_head := w_q_head << 1 | w_q_head(nEntries-1)
for (i <- 0 until nEntries) {
when (w_q_head(i)) {
w_q(i).valid := false.B
}
}
}

when (w_buf_valid && isThisBank(waddr_buf)) {
assert(!((w_q_tail.asBools zip w_q.map(_.valid)).map({ case (h,v) => h && v }).reduce(_||_)))
w_q_tail := w_q_tail << 1 | w_q_tail(nEntries-1)
for (i <- 0 until nEntries) {
when (w_q_tail(i)) {
w_q(i).valid := true.B
w_q(i).data := Mux(acc_buf, w_sum, wdata_buf).asTypeOf(Vec(mask_len, mask_elem))
w_q(i).mask := mask_buf
w_q(i).addr := getBankIdx(waddr_buf)
}
}

}
val bank_rdata = mem.read(raddr, ren && !wen).asTypeOf(t)
when (RegNext(ren && reads(0).valid && isThisBank(reads(0).bits))) {
acc_rdata := bank_rdata
} .elsewhen (RegNext(ren)) {
read_rdata := bank_rdata
}
when (wen) {
mem.write(waddr, wdata, wmask)
}
// Three requestors, 1 slot
// Priority is incoming reads for RMW > writes from RMW > incoming reads
when (reads(0).valid && isThisBank(reads(0).bits)) {
ren := true.B
when (isThisBank(reads(1).bits)) {
reads(1).ready := false.B
}
} .elsewhen ((w_q_head.asBools zip w_q.map(_.valid)).map({ case (h,v) => h && v }).reduce(_||_)) {
wen := true.B
when (isThisBank(reads(1).bits)) {
reads(1).ready := false.B
}
} .otherwise {
ren := isThisBank(reads(1).bits)
raddr := getBankIdx(reads(1).bits)
}
}
}

val q = Module(new Queue(new PipelinedRdataAndActT, 1, true, true))
q.io.enq.bits.data := mem.io.rdata
q.io.enq.bits.full_data := mem.io.rdata
val q = Module(new Queue(new AccumulatorReadResp(t, scale_args.multiplicand_t, log2Ceil(t.head.head.getWidth)), 1, true, true))
q.io.enq.bits.data := read_rdata
q.io.enq.bits.scale := RegNext(io.read.req.bits.scale)
q.io.enq.bits.relu6_shift := RegNext(io.read.req.bits.relu6_shift)
q.io.enq.bits.act := RegNext(io.read.req.bits.act)
q.io.enq.bits.fromDMA := RegNext(io.read.req.bits.fromDMA)
q.io.enq.bits.acc_bank_id := DontCare
q.io.enq.valid := RegNext(io.read.req.fire())

val p = Pipeline(q.io.deq, mem_pipeline, Seq.fill(mem_pipeline)((x: PipelinedRdataAndActT) => x) :+ {
x: PipelinedRdataAndActT =>
val activated_rdata = VecInit(x.data.map(v => VecInit(v.map { e =>
// val e_scaled = e >> x.shift
val e_scaled = scale_args.scale_func(e, x.scale)
val e_clipped = e_scaled.clippedToWidthOf(rdataType.head.head)
val e_act = MuxCase(e_clipped, Seq(
(x.act === Activation.RELU) -> e_clipped.relu,
(x.act === Activation.RELU6) -> e_clipped.relu6(x.relu6_shift)))

e_act
})))
val p = q.io.deq

val result = WireInit(x)
result.data := activated_rdata

result
})
io.read.resp.bits.data := p.bits.data
io.read.resp.bits.fromDMA := p.bits.fromDMA
io.read.resp.bits.relu6_shift := p.bits.relu6_shift
io.read.resp.bits.act := p.bits.act
io.read.resp.bits.scale := p.bits.scale
io.read.resp.bits.acc_bank_id := DontCare // This is set in Scratchpad
io.read.resp.valid := p.valid
p.ready := io.read.resp.ready

val q_will_be_empty = (q.io.count +& q.io.enq.fire()) - q.io.deq.fire() === 0.U
io.read.req.ready := q_will_be_empty && (
// Make sure we aren't accumulating, which would take over both ports
!(io.write.fire() && io.write.bits.acc) &&
// Make sure we aren't reading something that is still being written
!(RegNext(io.write.fire()) && RegNext(io.write.bits.addr) === io.read.req.bits.addr) &&
!(w_buf_valid && waddr_buf === io.read.req.bits.addr)
)
io.read.resp.bits.data := p.bits.data
io.read.resp.bits.full_data := p.bits.full_data
io.read.resp.bits.fromDMA := p.bits.fromDMA
io.read.resp.valid := p.valid
p.ready := io.read.resp.ready

if (read_small_data)
io.read.resp.bits.data := p.bits.data
else
io.read.resp.bits.data := 0.U.asTypeOf(p.bits.data) // TODO make this DontCare instead

if (read_full_data)
io.read.resp.bits.full_data := p.bits.full_data
else
io.read.resp.bits.full_data := 0.U.asTypeOf(q.io.enq.bits.full_data) // TODO make this DontCare instead
!(w_buf_valid && waddr_buf === io.read.req.bits.addr) &&
!block_read_req
)

// io.write.current_waddr.valid := mem.io.wen
// io.write.current_waddr.bits := mem.io.waddr
io.write.ready := !io.write.bits.acc || (!(io.write.bits.addr === mem.io.waddr && mem.io.wen) &&
io.write.ready := !io.write.bits.acc || (!(io.write.bits.addr === waddr_buf && w_buf_valid) &&
!(io.write.bits.addr === RegNext(io.write.bits.addr) && RegNext(io.write.fire())))

// assert(!(io.read.req.valid && io.write.en && io.write.acc), "reading and accumulating simultaneously is not supported")
Expand Down
Loading