Analyzing the RISC-V Instruction Set Architecture

I was very excited to read the recent EE Times article “RISC-V: An Open Standard for SoC” by Krste Asanović and David A. Patterson announcing their open source processor architecture. As we near the end of CMOS scaling, the bulk of future performance and efficiency improvements will need to come from architecture improvements. The availability of a high quality open processor architecture will help tremendously in this area.

Looking through the RISC-V ISA specifications [1], I couldn’t resist comparing it to the Epiphany ISA [2], and wanted to share the results in case others might find the comparison useful.

The book “Computer Architecture: A Quantitative Approach”, by John Hennessy and David Patterson was my constant companion during 2008 when I created the Epiphany architecture, so it should come as no surprise that the two architectures are more similar than different.:-) Having defended my own ISA choices for 6 years now, I can confidently state that there is no right answer to most of the “zero-sum” decisions that need to be made during the definition process. My favorite Fallacy/Pitfall advice from the Hennessy/Patterson book:

  • Fallacy: You can design a flawless architecture

  • Fallacy: An architecture with flaws cannot be successful

Figure 1 and 2 shows a summary of the two architectures, highlighting some of the similarities and differences and as well as some of the architectural design choices. For comparison purposes, I chose to compare the “IMAF” version of the RISC-V since this mostly closely resembles the Epiphany architecture in terms of a standard feature set. Both architectures were designed for extensibility and modularity, so it’s easy to add and remove key functionality for special purpose applications. However, as we have seen in the past, the importance of binary compatibility should never be under-estimated. For this reason, even if an ISA is modular and configurable, once it reaches mass adoption it is likely to stabilize around one configuration.

fig1

Figure 1: Comparing the RISC-V and Epiphany

fig2

Figure 2: Design Philosophy

The two architectures were designed for completely different purposes; the Epiphany was designed to function as a math coprocessor while the RISC-V was designed to run the full software stack. As a general purpose processor, the RISC-V must be good at everything so it must include more instructions to be complete.

Both architectures are silicon proven. A 40nm 64-bit RISC-V (RV64IMA (no floating point)) implementation with 32KB of cache occupies 0.39mm^2.[3] In contrast, a 28nm 32-bit Epiphany implementation with 32KB of local SRAM and a Network-On-Chip occupies 0.13mm^2.[4]

The sections below include a more detailed analysis of the ISAs, with my own Epiphany biased comments added.

Branching

Both architectures implement complete support for branching and function support, albeit with a different approach. The Epiphany performs conditional moves and branches based on condition codes (like on ARM, PowerPC) while the RISC-V performs conditional branching based on comparison between two registers. Refer to the specifications[1,2] for more details.

fig3

Figure 3: Branch/Control Instructions

Load/Store

The RISC-V has support for loading signed byte and half word loads, which is clearly an important feature for working with signed byte and half word data types. Some may argue that the RISC-V could use more addressing modes to boost efficiency in compute intensive applications. True to its DSP heritage, the Epiphany architecture places more emphasis on addressing modes and even includes a postmodify addressing mode for accessing sequential in memory buffers more efficiently. The Epiphany also includes support for double word loads/stores which is needed to keep the floating point working at maximum capacity in most applications.

fig4

Figure 4: Load/Store Instructions

Integer

The RISC-V has significantly more integer instructions than Epiphany to support a broader set of applications. It should be noted that the Epiphany performs surprisingly well on non-DSP benchmarks despite its minuscule instruction set.[5] The ingenuity of compiler developers and the optimization prowess of state of the art compilers like GCC and LLVM should never be under estimated. The Epiphany does not include many operations on immediates because it was assumed to be: a.) non-essential, and b.) possible to implement these functions effectively with the compiler.

fig5

Figure 5: Integer Instructions

 fig6c

Figure 6: Multiply/Divide Instructions

Floating Point

The RISC-V represents a well tested standard floating point instruction set with a very broad application range. The Epiphany again shows traces of its DSP heritage. Since most signal processing algorithms boil down to a simply multiply accumulate operation, the original DSPs were design to be super optimized programmable multiply accumulate units run in a tight programmable for loop.

fig7

Figure 7: Floating Point Instructions

Conclusion:

The RISC-V and Epiphany are both representative of the RISC approach to computer architecture that Patterson and others invented long ago. The RISC-V architecture is not revolutionary, but it is an excellent general purpose architecture with solid design decisions. The true breakthrough here is really the open source licensing model and the maturity of the design as compared to most other open source hardware projects. I am personally very enthusiastic about the kinds of low cost systems that will be built around this RISC-V architecture going forward. A royalty free 64-bit RISC-V core would have a raw silicon cost of a couple of cents in current CMOS process nodes. Now that is exciting!

Andreas Olofsson is the founder of Adapteva and the creator of the Epiphany architecture and Parallella open source computing project. Follow Andreas on Twitter.

References: 

[1] RISC-V Instruction Set Architecture: http://riscv.org/download.html#tab_isaspec

[2] Epiphany Instruction Set Architecture:  http://adapteva.com/docs/epiphany_arch_ref.pdf

[3] Implementation of the RISC-V architecture: http://riscv.org/download.html#tab_rocket

[4] A 1024 core 70 GFLOP/W Floating Point Manycore Microprocessor: http://adapteva.com/docs/papers/hpec2011_1024cores.pdf

[5] CoreMark Benchmark Results: http://www.eembc.org/coremark/index.php

Posted in Andreas' Blog.

12 Comments

  1. Hi! Interesting write-up! I’m a Berkeley RISC-V user, so I’ll try to add my own opinions on top of yours.

    As the manual describes, the AUIPC instruction was added for dealing with dynamically linked/position-independent code. That was one of those instructions that was added and tweaked after Andrew spent considerable time seeing how real code was behaving. It’s not an obvious one you’d think to include at the start.

    I assume movcc is a conditional move? We considered that a classic case of overly optimizing for a single micro-arch design point (and thus complicating other micro-arch designs). It’s been tempting though.

    The Mul/Div stuff being expensive, and well, that’s why it’s an optional extension! =) And of course, there’s no reason why you can’t punt the weirder, rarely used ones to software. Although I remember being shocked (and annoyed) that printf requires a number of the weirder bastards (how do I test my SW rem/div implementation? using printf. What does printf need working to print out if rem/div is working? rem/div. Doh!)

    “The RISC-V architecture is not revolutionary, but it is an excellent general purpose architecture with solid design decisions.”

    Yup! The base ISA isn’t suppose to be researchy, just able-to-be-used for research. It’s been working out pretty well for us so far.

    • Hi Chris,

      Thanks for the comments and expanding on some of the RISC-V design choices! Agree with comments on MUL/AUIPC/MOVCC in general. Since the RISC-V and Epiphany were designed for completely different purposes, some of the design choices are bound to be slightly different.

      Look forward to following the RISC-V development!

      Andreas

  2. How much of RISC-V (capitalisation?) and Epiphany (camelisation?) ISA’s were empirically motivated?

    I am always drawn towards doing aggregate analyses of run-traces for various workloads. I have done static analysis of linux kernel ARMv8, for example, which demonstrates a highly skewed register usage pattern, motivating towards a more CISC’y architecture (in terms of addressing modes (5052 being the extreme).

    Pity that there is only (at the moment) a javascript risc-v emulator – would love to play (instrument) something more programmatically accessible (does anyone remember C?)

    My goal is hardware ecology – what is the empirical best bang for buck in terms of silicon space and power consumption. It depends entirely, of course, on target application.

    R

    • Not sure I am getting the camelisation vs capitalization reference?

      Certainly both architectures were empirically motivated? The Epiphany architecture for example went through 50 complete iterations of instruction–>layout to gauge the power/area penalty of features. In terms of program performance, things were more subjective. The optimization goal wasn’t the best average performance on a benchmark, the goal was to get peak performance on a certain class of problems (digital signal processing) while keeping performance on most other programs “reasonable”.

      In terms of running traces, both archs are available in silicon. The RISC-V is available on a zynq FPGA. Hopefully soon we will be able to run it on the Parallella board as well.

  3. OK, post-login, I will try to post again.

    (Love your architecture, b.t.w, and just cos it is different)

    My angle is silicon ecology. How can you get the best performance, for targeted workloads, within power constraints? For large-scale computing, power restrictions are more collective than local – rack electricity, and room cooling. For small-scale computing, power restrictions relate to whether you’re going to burn out your silicon for some loads if you pack as densely as you can.

    Large-scale computing is going to be dominated by cost, and cost is going to be dominated by power consumption. Not unit cost (in an ideal world).

    Do you have empirical evidence for your design choices?

    Some interesting silicon angles:

    – Clock speed is bounded by your longest path. Most computations are addition, and have low entropy (mostly small additions).
    – Pipelining for general purpose computing is a waste of silicon space, cos branch prediction takes over (space-wise).
    – Register banks are a waste of space, because empirical register usage is highly skewed.

    R

    • We reach 70 GFLOPS/Watt computing efficiency for the Epiphany at 28nm with a ANSI-C programmable MIMD (non vector) architecture. As far as I know, nobody else in the world has come close to that.

      • There should be better benchmark GFLOP/Watt. During the Internet bubble, the network processor also has impressive theoretical numbers. But the problem is that no one know how to program to get the throughput. I look forward for some other benchmark.

      • I was disappointed by the ARM 64 bit redesign. Hey, they told there engineers not to focus on performance. Engineers are very literal minded, so guess what… They actually filled up the chip with hundreds of SIMD instructions that no one will ever use. I think the scalar side of the processor is actually quite nice but cannot be unfused from the SIMD side. I think it’s like a 33% performance improvement after spending mega dollars.
        One issue for the future is non-volatile solid state memory. At the moment it is very slow with like 1ms set-up times! However there are a lot of technical improvements that are possible. We should see DRAM speed access times in the next 5 to 7 years and obviously with 10nm- technology very high memory density. This brings up the issue of table based computing. Obviously if one memory look-up can replace doing a 1000 instruction cycle function then you save a great amount of time and energy dissipation. Of course there must be a way to side-step the CPU cache or you lose most of what you would gain energy wise.

  4. Pingback: RISC-V Analysis by Adapteva Founder | RISC-V BLOG

  5. i strongly recommend in the next version after the 64 core board, that you push harder to implement the floating point DIVIDE instruction. I have been in computers a long time, and machines that fail to implement DIVIDE always seem to suffer in the market. I consider the Raspberry Pi an anomaly; it is often given as a gift and is only notable for a low retail price. it used the ARM v6 architecture which also lacks a divide. Maybe it is coincidence, but look at how ARM shot up like a rocket in the cellphone world when they moved to the v7 architecture and finally implemented DIVIDE, which they had been resisting for the same reasons Adapteva has dodged it. I would imagine software divide is 10 to 100 times slower, and that is going to hurt for general purpose computation. Although Adapteva has to survive in the near-term making under 16 or 64 core chips, the real win for Adapteva would be the 4096 model, which in an cluster architecture would be a cray stomping monster. I think it is obvious to all that the amount of shared memory has to scale even faster than the number of cores, because wasted memory is kind of a given when you are trying to manage so many cores. You can’t be super tidy with memory and tidy with execution planning; it is just too much to ask the programmer. Anyway keep up the good work. I really like the simplicity of the Epiphany system.

    The real frontier of programming is parallel programming, and the lack of simple general purpose languages is really hampering the field. I venture to speculate that Adapteva is more threatened by the difficult of debugging parallel programming code than making the chips, which you will eventually get just right, but look at how difficult it is to get decent parallel code out of programmers. Very tough to make robust code when the tools are so crude. There was some great research in the 70’s but it all went south when automatic programming because the buzzword of the day.

Leave a Reply Using Facebook or Twitter Account