Don’t Swallow the Camel and Other FPGA Wisdom



For FPGAs, higher gate counts and fabric speeds are not all that is new.

Jeff Milrod, BittWare president and CEO, does not want FPGA developers to get indigestion. To prevent this problem, he recommends the three-step process you’ll read about here, but before making his recommendations, Milrod also weighs in on a number of other issues.

EECatalog: As FPGAs get used more frequently as co-processors, accelerators, or offload engines, what are the design challenges of writing optimized code to take advantage of this/these capability (ies)?

Jeff_MilrodMilrod, BittWare: The concept of FPGA optimization is rather complicated. Unlike SW, it is not about number of cycles, nor can it be thought of as a single threaded problem. While maximum toggle rate frequency (often called Fmax) can be important for some applications, it alone doesn’t necessarily indicate the FPGA performance as it neglects the inherent parallelism of FPGAs. Bus widths can be doubled, quadrupled, etc… and whole algorithmic streams can be run simultaneously, thereby resulting in far greater performance increases than improving Fmax alone. Improving performance via parallelisms often requires focusing on the FPGA’s resource allocation and optimization so that buses can be widened and streams can run in parallel without overflowing the part.

However, it is often most important to focus on optimizing for human resources—time-to-market can often be more important than that last 10 percent optimization. Similarly, designing modular and structured function blocks might not yield the most optimal implementation, but can result in huge performance advantages down the road by allowing code reuse and reducing debug times. In fact, I now believe that for most applications the performance advantages of using FPGAs is compelling enough that optimization of the FPGA implementation is often not essential—but getting the system deployed is. To that end, we focus a great deal of our efforts towards getting our customers up and running quickly using our FPGA Developers Kit (FDK), board support packages, example projects, and proactive support. For any given application, our function blocks and interfaces might not be completely optimal—but they work out of the box, and enable much quicker development and deployment.

Developers often “strain at gnats and swallow camels,” which can cause huge and often unnecessary delays. FPGA development and optimization is better served by first getting the basic project functioning, [second] identifying bottlenecks and [third] targeting any required optimizations with a more focused approach.

EECatalog: Is there really any practical difference between volatile and nonvolatile FPGAs, in real practice?

Milrod, BittWare: The only conceptual difference I’m aware of would be boot time. Nonvolatile FPGAs are instant-on, which can be important for some applications. In practice, volatile FPGAs are so much bigger and faster that they are highly advantageous for all other applications. On many of our boards, we use an instant-on nonvolatile FPGA to boot the board and the bigger volatile FPGA for all other FPGA-based tasks.

EECatalog: Design partitioning, long bring-up, debug difficulties, performance and reusability have been named as challenges to FPGA-based prototyping. Which of these do you think has the best chance of being addressed successfully in the next couple of years, and how do you define success?

Jeff Milrod, BittWare: For those who either need to use, or insist on using, HDL coding, all of those are quite challenging and will continue to be—that is simply the nature of any low-level coding. However, great advancements have been made with respect to the tools, and these advancements will continue, albeit incrementally. Maybe the biggest thing that has helped these challenges, and will continue to help, is the fact that gate counts and fabric speeds have increased tremendously, thus requiring much less code optimization and tuning.

Previous efforts at abstracting away these challenges have been underwhelming at best and have generally only been helpful for developing code components rather than complete projects. However, there are some major paradigms shifts occurring now involving high-level coding abstractions that look like they could blow this problem away. For example, Altera’s SDK for OpenCL enables high-level coding practices to implement complete complex FPGA algorithms. This does require a one-time, low-level ‘”board-support package” (BSP) to be built. But once that is done, the coding of the FPGA becomes essentially a software task with the associated ease-of-use and reusability.

EECatalog: What’s the focus of your buy versus build discussions?

Milrod, BittWare: We focus our buy vs. build discussions on low-level board support IP, SW and drivers, technical support, life cycle management, design and manufacturing quality, and technology refresh (i.e., the fact that we will continue to design similar COTS boards with future generations of FPGAs that customers will be able to easily upgrade to). The “build” option isn’t simply a matter of designing and building a board (no easy task on its own). It also includes everything mentioned prior—high- and low-level software, technical difficulties, possibly manufacturing issues, as well as having to deal with part sourcing, part EOL and next-generation designs. Unless a company has the capabilities to deal with all of these tasks, the decision to buy is many times the best decision not only from a cost standpoint, but also when looking at time-to-market.

EECatalog: C-to-gates, OpenCL and other high-level languages seek to simplify the job of FPGA programming. Do they work?

Milrod, BittWare: Except for what I call the “last micron” problem, for the most part the “guts” of FPGAs have actually been fairly easy to program for quite a while. Both major FPGA vendors, and even 3rd parties, have long had pretty good tools for high-level coding of algorithms. For example, it doesn’t get much easier to implement complex processing than clicking a button to compile a Matlab/Simulink model to RTL/HDL, and that’s been widely available for many years now.

However, like the “last mile” problem of getting high-speed network backbones connected to every house, the key challenge in FPGA design is getting that algorithm developed in Matlab to connect to off-chip peripherals such as off-chip memory, network interfaces, and PCIe ports. Since those don’t exist on an FPGA, they need to be developed. Even with those in place, it becomes an iterative problem as the peripheral interfaces need to then be integrated with the algorithm and, since its actually programming hardware rather than coding software, the timing must be “closed.” I believe that this “last micron”’ problem—the peripheral implementation and integration—generally takes at least as long as the algorithm development, and often far longer.

Great strides have been, and are being made, to generally simplify these peripheral implementation and integration challenges with better tools such as Xilinx’s IP Integrator and Altera’s Qsys. Altera’s SDK for OpenCL takes a more extreme approach: it first requires the development of a locked down, low-level hardware peripheral implementation, called a Board Support Package (BSP), that interfaces to the compiler. The development of the BSP requires special expertise and is quite hardware centric, thus we provide several standard versions to our customers along with optional customization services. Once the BSP is implemented, the OpenCL compiler then completely abstracts the FPGA, enabling it to be coded like a processor and allowing the “masses” to have “easy” access to performance of FPGAs.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis
Extension Media websites place cookies on your device to give you the best user experience. By using our websites, you agree to placement of these cookies and to our Privacy Policy. Please click here to accept.