Make Chips Do More and Last Longer with Embedded FPGA

Flex Logix F4 (120x120)

By Geoff Tate, Flex Logix

The cost and the time to design ASSP/ASIC/SoCs keeps rising.

This is a challenge because it means the market/sales for chips needs to be bigger to get a good Return on Investment (ROI) – and long design time makes it harder to hit changing customer specs and standards.

Also, customers are demanding more flexibility in chips so their systems can be upgraded for critical changes (such as protocols/standards), which increases the useful life of their systems and increases their ROI.

For example, in data centers, customers are now seeking reconfigurability.  Rather than a fork-lift upgrade when standards evolve, data centers want programmable chips so they can upgrade the data center’s ability during the life of the center without touching the hardware.  This also gives the data center the option to customize for added competitive advantage.  As Doug Burger of Microsoft said at a recent talk at FPL 2016, (Re)Configurable Clouds will change the world with the ability to reprogram a datacenter’s hardware protocols: networking, storage, security.  Adding FPGA technology into the mix is a key in doing this.  Embedded FPGA technology is now available to increase performance while lowering cost and power.

Another example is microcontrollers.  In older process nodes such as 90nm where mask costs are cheap, a line card can have dozens or hundreds of versions. This offers each customer the small differences in, for example, the number and types of serial interfaces (SPI, I2C, UART, etc).  However, now that leading edge microcontrollers are moving to 40nm where masks cost $1M each, microcontroller manufacturers need a programmable way to customize their chips and offer multiple SKUs.  Adding this capability also opens the path for their customers to customize the MCUs themselves, similar to how they now write C code for the on-board processors.  There are a few microcontrollers today, such as Cypress’ PSoC, which offer some limited customizability.  However, only embedded FPGA can provide more and scalable customizability.

What is embedded FPGA?

Embedded FPGA has the same DNA as an FPGA chip, but they are optimized for very different applications.

FPGA chips have a lot of high-speed I/O, and now often on-chip processors, to provide an accelerator subsystem for use on a printed circuit board.  Most FPGAs are used in low-volume applications and if volume ramps up, they become cost reduced as ASICs.

Embedded FPGAs are like FPGA chips without the SERDES/PHYs/GPIO/PLL and without the packaging.  This reduces die area, increases the number of high-speed on-chip connections and enables the embedded FPGA to be integrated into a chip.

To get FPGA-chip-like density, the embedded FPGA needs to be a hard IP implemented in the chip’s target process with matching Vt and metal stack.  Some vendors deliver a few embedded FPGA arrays with fixed size and features; while other vendors deliver building blocks that enable construction of embedded FPGAs of any size with optional DSP and RAM as needed.

Flex Logix F1

Embedded FPGAs can be optimized for low power or for high performance.  Typically, advanced process nodes are used for applications emphasizing performance while older process nodes are focused on power sensitive applications.  As an example, in TSMC40ULP, Flex Logix’s EFLX-100 embedded FPGA core has been optimized to allow a wide range of threshold voltages, multiple power management states and 0.5V state retention for MCU and IoT applications.  This allows the use of the EFLX-100 embedded FPGAs from 100 to 3000 LUTs to be implemented.  In TSMC16FF+/FFC, Flex Logix’s EFLX-100 and EFLX-2.5K are optimized for high performance for applications such as networking and base stations with ~1GHz operation for single-stage control logic.  In this case, embedded FPGAs from 100 to >100K LUTs can be implemented.  Despite the node-specific optimizations, the EFLX digital architecture is the same across nodes and across array size.

Embedded FPGA enables new architectures not possible with FPGA chips:

  • Interfaces to embedded FPGA on chip can be 64-bit, 128-bit, 256-bit, 512-bit and clock at on-chip data-rates.  This enables bandwidth bottlenecks that might exist between an SoC and an FPGA chip to be broken when an FPGA is embedded on-chip.  Also, the on-chip interface has very low latency (single-digit nanoseconds) whereas FPGA chips today use SERDES for high-bandwidth connections that have high latency (and high power).
  • FPGAs are often thought of as high power, largely because of the SERDES and PHYs involved in making an FPGA chip useful.  When stripped of this overhead, the FPGA fabric itself can be very power efficient, especially when optimized in a process such as TSMC40ULP.
  • Embedded FPGAs as small as 100 LUTs are available so they can be used in multiple locations in a single chip.

Flex Logix F2

Example of fast control logic

Applications such as programmable networking/base stations/signal processing chips can make use of embedded FPGA for fast control logic.  In this application, the emphasis is in keeping up with the clock rate of the control plane of the hardwire logic (or perhaps a multiple of that clock rate).  Typically, the control logic consists of a “logic cone” with control inputs of 100-500 signals with likely single-stage (one LUT between flops) pipelined logic which then generates 50-100 output signals.  In older networking/base station chips, this logic would be hard-wired. By making it programmable, the chip is able to adapt to changing networking/storage/security protocols.  TSMC28HPM/HPC control logic like this can run in the ~500MHz range; in TSMC16FF+/FFC in the ~1GHz range, which will often vary depending on the array size, input/output layout and the RTL.[KK1]

Example of I/O flexibility

Microcontroller customers require a wide range of serial interface protocols for different applications.  Today that means many dozens of mask variations.  As advanced microcontrollers move to 40nm where mask costs are $1M, a new solution is desirable to make variation economical for the manufacturer and the customer.  Today there are already some microcontrollers such as Cypress’ PSoC with limited programmability.  Embedded FPGA will enable much more scalable programmability.

Below is an example of an embedded FPGA that interfaces to the APB bus (with the slave interface logic running in the array itself, which could be hardwired outside the array) and the GPIO interface to the other side of the embedded FPGA.  Reconfigurable RTL logic can be implemented in the array to implement any kind of serial or parallel interface that the customer desires.  The manufacturer can offer dozens of variations just by changing the RTL and giving each combination a different SKU.  Of course, the customer could program the RTL as well, but this would require a higher level of support, documentation and debug tools.

Flex Logix F3

Example of co-processor

An embedded FPGA on a processor bus (AHB, AXI) can be any size desired and can optionally include DSP acceleration (pipelined multiplier-accumulators) and RAM.  Literally any kind of co-processor function can be implemented if the array is of sufficient size.

In a microcontroller, different customers want different encryption algorithms (AES, DES, SHA, PKA, etc) and some want multiple at the same time.  This drives a lot of variation in design and mask costs.  With an embedded FPGA of sufficient size, the encryption algorithm can be reconfigurable and can be changed/updated if needed.

The embedded FPGA can also be connected to some of the I/O to offload I/O processing from the host, perhaps at lower power/energy as well by not waking up the host until it is required for more complex tasks.

Physical design considerations

To ensure first time success, it is important to use an embedded FPGA that has been proven in silicon.

For example, Flex Logix’s EFLX IP has been proven in silicon for TSMC40ULP, TSMC28HPM/HPC and is in fab now for TSMC16FF+/FFC.  Validation chips prove out 2×2 or larger arrays with on-chip PLL and PVT monitors to allow precise testing at high speed and over temperature and voltage range.

Flex Logix F4

Your vendor should give you an array of the size you need, in the process you need and with the DSP/RAM options you need, if any.

Typically for your application, you have determined Vt (threshold voltage) mask options and a metal stack – and you need to make sure your embedded FPGA IP supplier is compatible with your Vt choices and metal stack.  This is more likely if the embedded FPGA is implemented with four to six layers of metal for routing as metal stack variation increases with more layers of metal.

The physical design deliverables should include at least GDS-II, LIB, LEF, CDL/Spice netlist, a verilog model and integration guidelines plus a detailed datasheet detailing inputs, outputs, configuration interface and clock interface plus timing/power specifications.  Below is an example of the pin descriptions for the EFLX-2.5K IP core in TSMC28HPM/HPC.

Flex Logix F5

Most vendors will have run Voltus or Apache to determine the IR drop for static and full-speed dynamic operation to ensure the embedded FPGA can fit in the IR stack.  This will demonstrate that the embedded FPGA has been designed with a robust power grid to handle worst case conditions.

A software tool should also be provided to determine critical path timing for any RTL at worst case conditions (typically low voltage, 125C Tj and SS corner).  The software tool also should generate the bit stream to load into the embedded FPGA to program it to execute the RTL.

Timing closure is easiest if all the inputs to the embedded FPGA are flopped and all the outputs from the embedded FPGA are flopped.  The ensures that  timing closure isn’t affected by changing RTL code inside the embedded FPGA.  It is possible to close timing with timing paths that reach into the embedded FPGA, but that may then make it difficult or constraining to change RTL in the future.

Most importantly, it is critical to ensure the vendor’s team, especially the manager, is experienced in high volume, advanced chip design and has a proven track record of designing, delivering, integrating and seeing semiconductor IP into volume production.


Embedded FPGA is a new option for chip designers that enables new, reconfigurable chip architectures that provide more customization, more flexibility, and more upgradeability.  This results in chips and systems that are more effective for longer.  Given the increasing cost and time to design chips, it is important they have long lifetimes to maximize ROI.

Share and Enjoy:
  • Digg
  • Sphinn
  • Facebook
  • Mixx
  • Google
  • TwitThis