Cycle count of rocket chip zynq infrastructure #63

kenzhang82 · 2017-08-10T00:41:10Z

Hi there,

Maybe I post this in the wrong forum, but I have searched many places and couldn't find answer to it.

How could we count how many cycles each instruction takes to be executed for a C program running on top of rocket chip FPGA infrastructure (say default config, i.e. pre-built image)?

The closest answer I could find is to compile the C++ cycle emulator and to simulate it, but even a simple "hello world" C program takes long time to be simulated.

Any help would be much appreciated! Thanks.

Ken

davidbiancolin · 2017-08-10T03:23:04Z

I'm going to point you at section 2.8 of the RISC-V user level specification. https://riscv.org/specifications/ :)

kenzhang82 · 2017-08-10T07:54:42Z

Thanks @davidbiancolin, much appreciated! I understand why the emulator is slow, but is there anyway to know how many cycles each instruction (for example, add, sub, mul etc) takes to be executed. I tried to turn the verbose mode ON, but it doesn't have the cycle count? Or did I do anything wrong?

Also, I just figured out that we could use spike pk -s to do the same thing (understood that spike is just a functional simulator), but what would be the best (and accurate) way to profile the cycle count of a C program running on rocket chip zynq infrastructure? Thanks.

aswaterman · 2017-08-10T08:17:17Z

Hi @z419379295 - you're asking for a metric that's fundamentally ambiguous, because pipelined processors overlap latencies. Suppose MUL has 3-cycle latency and LW has 2-cycle latency:

MUL x1, x1, x2
LW x2, 0(x2)
ADD x2, x2, x1

A single-issue in-order pipeline would incur one stall cycle before the ADD, so the sequence completes over the course of four cycles. But since the ADD is stalled on both the MUL and the LW, how do you decide how to apportion those cycles between the instructions?

kenzhang82 · 2017-08-10T22:39:38Z

Thanks @aswaterman for your help!! Aha, that makes sense to me now. Maybe I was not able to see the big picture, maybe what I was trying to do was to identify the power consumption of a C code that is being executed in rocket chip synthesized in Zynq PL, I thought it might be good to see which instruction takes up the most cycle? Or is there any way to achieve this (i.e. power profiling of instructions)? Thanks.

aswaterman · 2017-08-10T22:44:46Z

I'm not really sure... maybe run several benchmarks, measure their power consumption and instruction mix, and then attempt to correlate power consumption with instruction mix?

kenzhang82 · 2017-08-10T22:59:46Z

The benchmark? You mean running on cycle-accurate C++ emulator? How do we measure the power consumption of a software algorithm running on RISC-V processor?

ben-k · 2017-08-11T22:09:27Z

You would need some sort of RTL-based power model. The details are something of an open research question, so there's not going to be a push-button answer here.

kenzhang82 · 2017-08-14T04:33:03Z

Cool, thanks!

kenzhang82 closed this as completed Aug 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cycle count of rocket chip zynq infrastructure #63

Cycle count of rocket chip zynq infrastructure #63

kenzhang82 commented Aug 10, 2017

davidbiancolin commented Aug 10, 2017

kenzhang82 commented Aug 10, 2017

aswaterman commented Aug 10, 2017

kenzhang82 commented Aug 10, 2017

aswaterman commented Aug 10, 2017

kenzhang82 commented Aug 10, 2017

ben-k commented Aug 11, 2017

kenzhang82 commented Aug 14, 2017

Cycle count of rocket chip zynq infrastructure #63

Cycle count of rocket chip zynq infrastructure #63

Comments

kenzhang82 commented Aug 10, 2017

davidbiancolin commented Aug 10, 2017

kenzhang82 commented Aug 10, 2017

aswaterman commented Aug 10, 2017

kenzhang82 commented Aug 10, 2017

aswaterman commented Aug 10, 2017

kenzhang82 commented Aug 10, 2017

ben-k commented Aug 11, 2017

kenzhang82 commented Aug 14, 2017