very large clocked process makes it impossible to bring out non-clocked signals #17

wallclimber21 · 2016-08-30T19:01:50Z

In the current code, there are some huge clocked always clauses, like the one that starts on line 1168.

There is nothing fundamentally wrong with this as long as you don't need to bring out anything combinationally.

The problem is that this is what I'd like to do. :-)

The immediate issue is that, for one reason or the other, cpuregs is not detected by Quartus as a memory block, so it consumes tons of regular registers. (From your documentation, it seems that Vivado is detecting this just fine.)

I'd like to experiment by replacing constructs like this:
cpuregs[latched_rd] <= reg_pc + (latched_compr ? 2 : 4);

to
cpuregs_wr = 1'b1;
cpuregs_addr = latched_rd
cpuregs_wdata = reg_pc + (latched_compr ? 2 : 4);

and then have 1 place with a single cpuregs[cpuregs_addr] ... clause.

If that still doesn't do the trick, I'd even instantiate an Altera memory macro to force a memory block.

With the current clocked always block, such an experiment requires a full-out rewrite of the whole always block instead of a small surgical patch.

Would you be open to converting this clock process into 2 processes, a pure combinational one and a clock one that just contains the register? I'm willing to do the work if you just tell me the naming convention. My standard convention is:

always(*)
<var_name>_nxt = ....

always @(posedge clk)
<var_name> <= <var_name>_nxt;

But I'm willing to use whatever way you prefer.

(I'm NOT asking to make the change for the cpuregs isolation itself, that could be a forked branch on my side of you think it makes the code look too ugly.)

In addition to cpuregs experiments, such a change would also make it possible to later insert logic to scan out register contents via jtag etc.

cliffordwolf · 2016-08-30T21:56:38Z

Hmm.. Does Quartus handle the memory fine if you use this pre-synthesized version of the design?
http://scratch.clifford.at/picorv32_syn.v

(PicoRV32 in its default configuration, generated with yosys -p 'prep -nomem -top picorv32; memory -nomap; opt' -o picorv32_syn.v picorv32.v.)

cliffordwolf · 2016-08-30T22:10:32Z

I don't have quartus installed to test this, but I think commit 82d837b might solve the issue. Please give it a try and let me know if it does.

wallclimber21 · 2016-08-30T22:38:57Z

With our picorv32_syn.v file, Quartus infers not one but two cpuregs, a block memory not MLABs. They'r each dual ported too, 32-wide and 32-deep. I think that's progress. :-)

However the commit 82d837b synthesizes to FFs.

BTW: one other reasons to make cpuregs an explicit RAM is because it's the only way to go forward when using a non-FPGA synthesis flow. Those typically don't support automated RAM inference.

wallclimber21 · 2016-08-30T22:40:34Z

It's uplifting (or depressing, depending on once pov) that a tool like Yosys is better at this than Quartus!

Looks like I should install Yosys for experimentation.

cliffordwolf · 2016-08-31T10:03:26Z

Quartus infers not one but two cpuregs

That's to be expected. The PicoRV32 default configuration is to infer a register file with two read ports (ENABLE_REGS_DUALPORT), which is usually implemented from two block rams.

However commit 82d837b synthesizes to FFs.

What's about commit c9519df?

one other reasons to make cpuregs an explicit RAM is because it's the only way to go forward when using a non-FPGA synthesis flow.

Well... with Yosys you can infer custom RAM resources pretty easily. Also: on an ASIC the area difference between a memory generated by a memory compiler and the memory from FFs that the synth tool creates is nowhere near what you would have on an FPGA. Usually it's not worth the effort for a 1 kBit memory like the RV32 register file. Especially when considering that usually you have something like a program memory right next to the CPU, and that will have to hold much more than just 32 words.

wallclimber21 · 2016-08-31T15:05:35Z

What's about commit c9519df?

Success!

That's to be expected. The PicoRV32 default configuration is to infer a register file with two read ports (ENABLE_REGS_DUALPORT), which is usually implemented from two block rams.

From quick inspection of the code, I thought that there weren't cases where you read and write to the register files during the same cycle (cpu_state_fetch. In that case, a single dualported memory with 2 RW ports would have been sufficient. But I see now that this is not the case. Carry on. :-)

With this, the issue can be closed. If one day, I really need to hand-instantiate a small RAM, it's only a minor change in the code.

cliffordwolf self-assigned this Aug 30, 2016

cliffordwolf closed this as completed Aug 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very large clocked process makes it impossible to bring out non-clocked signals #17

very large clocked process makes it impossible to bring out non-clocked signals #17

wallclimber21 commented Aug 30, 2016 •

edited

Loading

cliffordwolf commented Aug 30, 2016

cliffordwolf commented Aug 30, 2016

wallclimber21 commented Aug 30, 2016 •

edited

Loading

wallclimber21 commented Aug 30, 2016

cliffordwolf commented Aug 31, 2016

wallclimber21 commented Aug 31, 2016

very large clocked process makes it impossible to bring out non-clocked signals #17

very large clocked process makes it impossible to bring out non-clocked signals #17

Comments

wallclimber21 commented Aug 30, 2016 • edited Loading

cliffordwolf commented Aug 30, 2016

cliffordwolf commented Aug 30, 2016

wallclimber21 commented Aug 30, 2016 • edited Loading

wallclimber21 commented Aug 30, 2016

cliffordwolf commented Aug 31, 2016

wallclimber21 commented Aug 31, 2016

wallclimber21 commented Aug 30, 2016 •

edited

Loading

wallclimber21 commented Aug 30, 2016 •

edited

Loading