The E64G401 is 64-core microprocessor/coprocessor reference design based on the 4th generation of the Epiphany multicore architecture that was designed as a direct replacement of the E16G301. The Epiphany™ architecture defines a multicore, scalable, shared memory, parallel computing fabric and consists of a 2D array of compute nodes connected by a low-latency mesh network-on-chip. The main components of the E64G401 product are shown below. For more detailed information about the Epiphany architecture, please refer to the Epiphany Architecture Reference Manual.
E64G401 Datasheet (PDF) (Updated June 17, 2013)
Chip reference design IP available for licensing in GLOBALFOUNDRIES 28SLP process. Epiphany-IV silicon devices available as part of Parallella boards in Q3/2013.
- 64 High Performance RISC CPU Cores
- 800 MHz Operating Frequency
- 102 GFLOPS Peak Performance
- 1.6 TB/s Local Memory Bandwidth
- 102 GB/s Network-On-Chip Bisection Bandwidth
- 6.4 GB/s Off-Chip Bandwidth
- 2 MB On-Chip Distributed Shared Memory
- 2 Watt Maximum Chip Power Consumption
- IEEE Floating Point Instruction Set
- Fully-featured ANSI-C/C++ programmable
- GNU/Eclipse based tool chain
- Source synchronous sub-LVDS off chip links for host or direct chip-to-chip interfacing.
- Chip to chip links for integrating up to 64 chips on a single board
- 324-ball 15x15mm flip-chip BGA
Each compute node contains an independent superscalar floating-point RISC CPU operating at 800 MHz and 1.6 GFLOPS/sec. The CPU has an efficient general-purpose instruction set that excels at compute intensive applications while being efficiently programmable in C/C++ without any need to write code using assembly or processor specific intrinsics.
The Epiphany memory architecture is based on a flat memory map in which each compute node has a small amount of local memory as a unique addressable slice of the total 32-bit address space. A processor can access its own local memory and other processors memory through regular load/store instructions, with the only difference being the latency and effective throughput of the transactions. The local memory system is comprised of 4 separate banks, allowing for simultaneous memory access by the instruction fetch engine, local load-store instructions, and by load/store transactions initiated by other processors within system.
The eMesh Network-on-Chip is a 2D mesh network that handles all on-chip ad off-chip communication. The network is based on atomic 32- bit memory transactions and is transparent to the program running. The network consists of three separate and orthogonal mesh structures, each serving different types of transaction traffic: one network for on-chip write traffic, one network for off chip write traffic, and one network for all read traffic.
The eMesh network and memory architecture is extended off-chip using source synchronous LVDS based serial links that provide up to 1.6GB/sec of effective bandwidth per link. Each E16G401 has 4 links, one in each direction (north, east, west, south), allowing chips to be easily interfaced with FPGAs and/or other E16G401 chips on a board.
The E16G401 product can be used in a number of different system configurations, some of which are shown in this section.
- Smart-phones and tablet app acceleration
- High end audio
- Computational photography
- Speech Recognition
- Face detection/recognition
- Super Computers
- Big Data Analytics
- Software Defined Networking
- Data-center Appliances
- High Frequency Trading
- Extremely Large Sensor Imaging
- Hyperspectral Imaging
- Communication Jamming
- Military Radios
- Communication test-bed
- Software defined radio
- Adaptive Pre-distortion
- Machine Vision
- Autonomous Robots/Navigation
- Automotive Safety
- High Speed Data Acquisition/Generation
- Security Cameras
- Video Transcoding