Epiphany-IV 64-core 28nm Microprocessor (E64G401)

Introduction
The E64G401 is 64-core microprocessor/coprocessor reference design based on the 4th generation of the Epiphany multicore architecture that was designed as a direct replacement of the E16G301. The Epiphany™ architecture defines a multicore, scalable, shared memory, parallel computing fabric and consists of a 2D array of compute nodes connected by a low-latency mesh network-on-chip. The main components of the E64G401 product are shown below. For more detailed information about the Epiphany architecture, please refer to the Epiphany Architecture Reference Manual.

Datasheet:

E64G401 Datasheet (PDF) (Updated June 17, 2013)

Availability:

Chip reference design IP available for licensing in GLOBALFOUNDRIES 28SLP process. Epiphany-IV silicon devices available as part of Parallella boards in Q3/2013.

Features:

64 High Performance RISC CPU Cores
800 MHz Operating Frequency
102 GFLOPS Peak Performance
1.6 TB/s Local Memory Bandwidth
102 GB/s Network-On-Chip Bisection Bandwidth
6.4 GB/s Off-Chip Bandwidth
2 MB On-Chip Distributed Shared Memory
2 Watt Maximum Chip Power Consumption
IEEE Floating Point Instruction Set
Fully-featured ANSI-C/C++ programmable
GNU/Eclipse based tool chain
Source synchronous sub-LVDS off chip links for host or direct chip-to-chip interfacing.
Chip to chip links for integrating up to 64 chips on a single board
324-ball 15x15mm flip-chip BGA

RISC Processor:
Each compute node contains an independent superscalar floating-point RISC CPU operating at 800 MHz and 1.6 GFLOPS/sec. The CPU has an efficient general-purpose instruction set that excels at compute intensive applications while being efficiently programmable in C/C++ without any need to write code using assembly or processor specific intrinsics.

Memory System:
The Epiphany memory architecture is based on a flat memory map in which each compute node has a small amount of local memory as a unique addressable slice of the total 32-bit address space. A processor can access its own local memory and other processors memory through regular load/store instructions, with the only difference being the latency and effective throughput of the transactions. The local memory system is comprised of 4 separate banks, allowing for simultaneous memory access by the instruction fetch engine, local load-store instructions, and by load/store transactions initiated by other processors within system.

Network-On-Chip:
The eMesh Network-on-Chip is a 2D mesh network that handles all on-chip ad off-chip communication. The network is based on atomic 32- bit memory transactions and is transparent to the program running. The network consists of three separate and orthogonal mesh structures, each serving different types of transaction traffic: one network for on-chip write traffic, one network for off chip write traffic, and one network for all read traffic.

Off-Chip IO:
The eMesh network and memory architecture is extended off-chip using source synchronous LVDS based serial links that provide up to 1.6GB/sec of effective bandwidth per link. Each E16G401 has 4 links, one in each direction (north, east, west, south), allowing chips to be easily interfaced with FPGAs and/or other E16G401 chips on a board.

System Examples:
The E16G401 product can be used in a number of different system configurations, some of which are shown in this section.

Potential Applications
Consumer:

Smart-phones and tablet app acceleration
High end audio
Computational photography
Speech Recognition
Face detection/recognition

Computing Infrastructure:

Super Computers
Big Data Analytics
Software Defined Networking
Data-center Appliances
High Frequency Trading

Mil/Aero:

Radar/Sonar
Extremely Large Sensor Imaging
Hyperspectral Imaging
Communication Jamming
Military Radios
Munitions/Guidance

Medical:

Ultrasound
CT

Communication:

Communication test-bed
Software defined radio
Adaptive Pre-distortion

Industrial/Instrumentation:

Machine Vision
Autonomous Robots/Navigation
Automotive Safety
High Speed Data Acquisition/Generation

Other:

Compression
Security Cameras
Video Transcoding