Picturing ARM for GPGPU Applications: Q&A with Doug Patterson, Aitech Defense Systems

The factors that go into balancing performance and cost-effectiveness for GPGPU-based systems

Editor’s note: Just prior to Aitech Defense Solutions announcing the availability of its fanless rugged GPGPU supercomputer, Doug Patterson, the company’s VP, Military & Aerospace Business Sector, responded to a few questions about GPGPU-based systems.

EECatalog: What are some of the arguments for and against the choice of ARM for the Tegra GPGPU?

Doug Patterson, Aitech

Doug Patterson, Aitech

Doug Patterson, Aitech:
This choice was made by NVIDIA engineers, and it’s a smart one. For the past several years, we’ve seen a real boost in the ARM architecture performance, while still keeping power consumption low. The big players like Hewlett Packard (HP) are already selling servers with ARM (look for “Moonshot” server), a very strong sign that ARM successfully penetrated not only the mobile market (in this market ARM is no doubt a leader), but also enterprise servers.

HP has become the first major vendor to add a 64-bit ARM server to its server product line price list, but it will not be the last. Dell has built 64-bit ARM-based servers for particular customers, but HP is the first big vendor that’s selling one as a standard product. ARM architecture has always been a good choice for embedded systems. The main selling points—low power consumption with a superb level of compute power in a small physical space—make ARM good for building SWaP-optimized embedded systems.

The Linux OS and app developer’s community running the well-known and widely used “Ubuntu” Linux distribution supports ARM very well. So, if your target is building high-performance embedded systems, key advantages that play nicely into NVIDIA’s strengths, ARM is definitely a good choice. Aitech’s rugged A176 Cyclone GPGPU, using the NVIDIA Tegra TX1, brings 1 TFLOP, or >60 GFLOPS/Watt of performance, into ~20 cu-in.

Comparing ARM to x86 architecture, there are some caveats with the development and integration process. ARM architecture doesn’t support Windows OS, which makes the target market for ARM Linux-oriented only. It therefore requires special knowledge during the development and integration stage, since hardware is not always available for native platform development during the first stage of development.

This means that a cross compiler is needed for software development. Although these differences exist, they are easily overcome by training, so the learning curve is pretty flat and relatively fast.

Figure 1: Patterson explained why he believes that other major vendors will follow HP in looking to ARM architecture.

Figure 1: Patterson explained why he believes that other major vendors will follow HP in looking to ARM architecture.

EECatalog: What trends and developments in advanced sensor processing have you been keeping an eye on?

Patterson, Aitech
: Sensors are becoming even more intelligent. With embedded processing platforms becoming smaller, lighter and lower power, and with orders of magnitude more performance, it’s easier to push much more of the sensor pre-processing out to the sensor. Tasks include pre-filtering/noise rejection or video motion/edge-detection to the most advanced automatic target recognition. This has long been the generally accepted definition of distributed processing, and now it’s here, with 1 TFLOP in small form factors making it a reality.

EECatalog: What are the persistent problems defense applications face when trying to first, select the correct GPGPU for their challenges, and second, deploy the GPU effectively?

Patterson, Aitech: The main problems from experience include issues related to heat/power dissipation and challenges of the app development process, with regard to actual versus expected performance.

Performance always comes at the price of higher power consumption, unless you are moving to the more specialized architectures that provide more performance and less power consumption compared with previous models. Since performance is the primary “key” requirement for GPGPU systems, the customer usually compromises and picks the “as fast as possible” GPGPU, performs a software evaluation and then starts to search for a “rugged” solution. Among the factors to consider are:

  • What is the form factor needed?
  • How small and how high is the power density?
  • How is the unit is to be cooled?

It’s up to the hardware/system supplier (Aitech, for example) to understand all the system architecture aspects and propose the proper, most cost-effective solution without jeopardizing GPGPU performance. Only a company designing and building not only the electronics, but also developing rugged enclosures and building rugged systems is equipped to deal with these (and related) challenges.

EECatalog: How nervous are companies trying to keep up with mil-aero advanced sensor processing demands that the solution they select will be out of date too quickly?

Patterson, Aitech: The demand for more and more processing power from GPGPU is constantly growing. We see it from our customers: Each new project starts with the question: “Can you do it faster?” Not only is actual performance important, but you also need to have enough “provisioned power” for the next deployment without changing the hardware.

Companies thinking about the longer term, like Aitech, provide a very modular solution. So even when the customer is utilizing and investing the time with a certain GPGPU, we can easily swap and move to the next generation GPGPU device, and in most cases, maintain backward software compatibility for the new one. The customer can again be on the “safe side” with the best performance he can get. We even have a program, COTSLifecycle+, dedicated to providing our customers with a minimum of 12 years of same-product continuity.

So, are they nervous about GPGPU technology? Probably not, maybe more concerned. Once the evaluation is done, and they see that they have enough “GPGPU power,” they seem to be fine.

Closing thoughts?

Patterson, Aitech: The Tegra TX1 is a generation ahead in performance and requires less than half the power dissipation of products using older NVIDIA Maxwell technology.

Aitech is working closely with NVIDIA in the GPGPU applications field, and at NVIDIA’s request presented at GTC Europe Sept. 28-29 in Amsterdam. We’re constantly benchmarking and choosing the best available performance with the lowest power consumption in the smallest form factor. Today, Aitech is deploying GPGPU solutions not only at the board level (C874+C530), but also on a fully integrated system level (A191, A195, A196).

Since we are the “design house,” we’re doing the electronics as well as the mechanics, system integration, qualification, and software development, etc. This combination makes us an ideal supplier of GPGPU-based systems. Seeing that the ARM and NVIDIA architectures are getting a performance boost each year, we’re already actively designing two successor products and building a new product line based on NVIDIA Jetson TX1, which has an ARM quad CPU teamed with the NVIDIA CUDA GPU in the same device package.