Microcontroller Architects Look to Embedded FPGAs for Flexibility

Why embedded FPGAs are set to enhance and complement ARM processors

Today, microcontroller families typically have dozens of versions that have various combinations of GPIO configurations: SPIs, UARTs, I2Cs, etc. to address the needs of different customers. This requires mask changes for each version. A new version takes quarters to go through the design and verification process. Now that microcontrollers are moving to the 40nm node where mask costs are ~$1M, a new solution is required.

Embedded FPGAs provide this solution by enabling microcontrollers to have a part or all of their GPIO subsystem be programmable and reconfigurable. This enables any GPIO to be any serial protocol and enables some processing to be done in the embedded FPGA, speeding response offloading the embedded processor, and perhaps saving energy.

Flex Logix provides EFLX embedded FPGA in the 40nm node, which will be integrated into the I/O subsystems of future microcontrollers to provide this flexibility and reconfigurability (Figure 1). Embedded FPGA can also connect to higher performance buses such as AHB and AXI.

Figure 1:  40nm node EFLX embedded FPGA depicted as part of a microcontroller I/O subsystem.

Figure 1: 40nm node EFLX embedded FPGA depicted as part of a microcontroller I/O subsystem.

What is an Embedded FPGA?

An FPGA combines an array of programmable/reconfigurable logic blocks in a programmable interconnect fabric. In an FPGA chip, the outer rim of the chip consists of a combination of GPIO, SERDES and specialized PHYs such as DDR3/4. In advanced FPGAs, the I/O ring is roughly 1/4 of the chip and the “fabric” is roughly 3/4 of the chip. The “fabric” itself is mostly interconnect in today’s FPGA chips, where 20-25% of the fabric area is programmable logic and 75-80% is programmable interconnect.

Figure 2: Standard digital signaling connects an embedded FPGA to the rest of the chip.

Figure 2: Standard digital signaling connects an embedded FPGA to the rest of the chip.

An embedded FPGA is an FPGA fabric without the surrounding ring of GPIO, SERDES, and PHYs (Figure 2). Instead, an embedded FPGA connects to the rest of the chip using standard digital signaling, enabling very wide, very fast on-chip interconnects.

To achieve high density, Flex Logix provides embedded FPGA as hard IP proven in silicon. Microcontrollers now are moving to 40nm, where Flex Logix provides an EFLX-100 IP core in TSMC 40ULP/LP, which is by itself a stand-alone embedded FPGA, with 120 LUTs, reconfigurable interconnect, 152 inputs and 152 outputs. Each EFLX LUT is actually a dual-4-input LUT, each with its own output Flip Flop that can be combined to make a single 5-input LUT.

For larger arrays, the EFLX-100 can be “tiled” to build arrays up to 5×5 or 3,000 LUTs of reconfigurable logic with 760 inputs and 760 outputs. The EFLX-100 has two versions that can be mixed together in arrays. One version is all logic and the other has two 22-bit MACs for DSP functions.

EFLX arrays can be integrated into an SoC or MCU in three common ways as depicted in Figure 3.


Figure 3: Three ways to integrate EFLX arrays into an SoC or MCU.

The embedded FPGA is programmed using Verilog and VDHL, which are input to a synthesis tool such as the Synopsys Synplify tool. Then the EFLX Compiler packs/places/routes the array and generates a bit stream which programs the EFLX array to emulate the RTL. The bit stream for a single EFLX-100 is ~50K bits and it can be stored in the same flash memory that stores the embedded processor code. The EFLX array can be reprogrammed at any time, just like with an embedded processor.

Connection to the APB Bus

An APB bus requires 102 connections: 32 address, 32 data in, 32 data out, clock, 5 controls.

An EFLX-100 has 152 inputs and 152 outputs, so even a small embedded FPGA can directly connect to the APB bus. There are 83 inputs and 119 outputs available even for a single EFLX-100 after connecting to the APB bus.

The EFLX can interface directly to the APB bus without any additional logic except for the PREADY signal, which needs to be isolated when the EFLX is power gated. An EFLX-100 has 120 LUTs, and only seven are required to implement the logic needed to implement an APB slave interface. Of course, the logic for the Slave Interface can also be hardwired external to the embedded FPGA, which frees up a little more room for reconfigurable logic. The EFLX array only supports unidirectional I/Os (i.e. inputs and outputs only) and all the I/Os are general purpose CMOS I/Os. When the EFLX interfaces to the I/O pins of the MCU/SoC, an external buffer is required between the EFLX array and the I/O pad (Figure 4). The I/O buffer can be a unidirectional buffer or a bidirectional buffer. If it is a bidirectional buffer, the EFLX array needs to control the direction of that buffer used by the array.


Figure 4: Showing requirement for an external buffer between the EFLX array and the I/O pad in a situation where the EFLX interfaces to the I/O pins of the MCU/SoC .

GPIO and Serial Peripherals

The EFLX array can be used to perform I/O functions in an MCU/SoC. There are numerous serial protocol peripheral standards, and every customer wants a different combination. Common versions are UART, SPI and I2C. Microcontroller companies typically design and stock slightly different versions with a couple of each or, in some cases, up to five or six of each. However, nobody needs all of them, and some customers want other serial I/O peripherals. With an embedded FPGA, it is possible to implement exactly the serial I/O peripheral needed in the EFLX fabric.


Figure 5: It’s possible to deploy exactly the serial I/O peripheral required.

For example, an EFLX array can be configured either as an APB 32-bit GPIO port or an APB simple UART (Figure 5). When reconfigured as a 32-bit GPIO port, the function only uses 93 LUTs (seven LUTs for the APB interface and 86 LUTs for the GPIO registers) and can run @ 50MHz on an eHVT/SVT EFLX array (V=0.85v, 125°C, slow/slow process corner). When the array is reconfigured as a simple UART master (no flow control or FIFOs), the function uses 75 LUTs and can run @ 36MHz (UART data rate) on an eHVT/SVT EFLX array (V=0.85v, 125°C, slow/slow process corner). In the two above examples, the EFLX array uses the same I/O buffers, which are repurposed for the specific function configured in the array.

If the goal is to use 32 GPIOs with 24-GPIO ports and up to 4 UARTs, then 93 + 4*75 =393 LUTs are required. A 2×2 EFLX-100 array has 480 LUTs and can fit the functionality of the desired I/Os. In addition, a 2×2 EFLX-100 array can fit a full function UART such as a 16550, which requires 365 LUTs without any optimizations of the design for a reconfigurable fabric.

Other serial peripherals can be similarly implemented in EFLX, and there are a wide range of types. Many suppliers of the RTL for any kind of serial peripheral provide IP solutions for FPGA chips. These can be used in embedded FPGA as well, since EFLX is programmed using Verilog or VHDL.

Another option, which would reduce silicon area, would be to implement the minimum desired serial peripherals in hard-wired RTL, routing their signals through the EFLX to any of the GPIO, then adding additional serial peripherals in the EFLX array. Figure 6 shows the EFLX array functioning as an I/O switch as well as adding additional I/O functionality.


Figure 6: An I/O switch with additional I/O functionality.

Processing to Save Energy and Improve Responsiveness

RTL to process inputs and outputs can be added to the EFLX embedded FPGA. For example, a 16-bit 5-tap FIR filter can be implemented using the MACs in the DSP version of the EFLX-100. This RTL will fit in 3 EFLX-100 tiles using 5 MACs with no additional logic LUTs (Figure 7).

Figure 7

Figure 7: Implementing a 16-bit 5-tap FIR filter.

This RTL code will execute a 16-bit 5-tap FIR filter in less time than an ARM processor and more importantly using less energy. In TSMC40ULP eHVT, this function requires only 10.75nJ, which is 5x less than an ARM processor core not even counting the memory access energy of an ARM processor. As a result, an EFLX array can extend battery life for simpler, repetitive processing of I/O and only wake up the ARM processor to handle more complex tasks as required.

Many other kinds of I/O processing are possible and these are limited only by the size of the EFLX array chosen. An RTL state machine in the EFLX array can identify and respond to critical input patterns much more quickly than the more distant ARM processor.

Embedded FPGA provides a new way for microcontroller architects to deliver a wide range of serial and other I/O functionality using a single mask set. It even provides end customers the potential to program the array to offload processing tasks into the embedded FPGA from the ARM processor to save energy and improve responsiveness. Embedded FPGA certainly won’t displace ARM processors, but it will instead enhance and complement them.

Flex-Logix-Tony-kozacuzk-headshotTony Kozaczuk is Director, Solutions Architecture, Flex Logix Technologies, Inc.