Accelerated Synthesis Runtimes Increase Productivity



As FPGAs grow ever bigger and more complex, hard-working synthesis tools are stepping up to help designers find optimum solutions for balancing runtime and quality of results.

The size and complexity of FPGAs is growing, but schedules are not. Designers are expected to deliver products within tight deadlines and budgets, making it critical to find ways to improve productivity. One area that continues to drive innovation to achieve higher productivity is FPGA design synthesis runtimes, which range from several hours to an entire day for the largest and most complex designs. Therefore, designers need a set of tools that delivers the best runtime performance while not compromising area, timing and quality of results (QoR).

A robust synthesis tool can offer designers several technologies and methodologies to gain the best balance of runtime and QoR. Getting to first hardware is critical to enable early system software development and system validation. Once the first hardware is delivered, it often requires several iterations to make it stable due to bugs. FPGA designers face several design considerations when trying to achieve very fast runtimes to accelerate design completion.

  • Achieving the first successful synthesis quickly and efficiently
  • Scaling synthesis turnaround time when incorporating design changes and new design modules
  • Avoiding resynthesis of pre-verified, static modules such as IPs
  • Running synthesis for modules under development without having to spend time resynthesizing completed modules

Synthesis tools that provide runtime options are essential for FPGA designers. The ability to support several runtime alternatives is a desirable feature. These alternatives could include manual and automatic compile points, which can be enhanced further with multiprocessing and its newest technology, distributed synthesis.

With “compile point” technology, designers can automatically or manually create RTL partitions (“compile points”) in their FPGA designs, so parallel synthesis of large designs can occur automatically and incrementally. Combining compile points with multiprocessing, FPGA designers achieve significantly faster runtimes with no degradation in QoR. For even higher runtime performance, the combination of multiprocessing and automatic compile points (ACP) used in conjunction with fast synthesis mode can yield up to a 10x improvement.

The latest version of Synplify provides customers with tools that offer advanced distributed synthesis technologies. The latest offerings can boost runtimes and productivity well beyond what’s possible with compile points.

Distributed synthesis takes the compile point concept to the next level, employing distributed processing across multiple cores and multiple machines, throughout the entire synthesis flow. This provides designers with the best runtime acceleration and utilization of compute resources, as shown in Figure 1. All of these technologies combined provide designers with a fully scalable design while automating the overall process. Slashing FPGA runtimes increases productivity.

Figure 1: The Synplify Premier robust synthesis tool from Synopsys is an example of a tool that supports multi-machine and -processing, making it possible to accelerate runtimes.

Figure 1: The Synplify Premier robust synthesis tool from Synopsys is an example of a tool that supports multi-machine and -processing, making it possible to accelerate runtimes.

The Standard Synthesis Flow: What’s under the Hood?

Leading the way to accelerating runtimes is a solution that integrates two technologies: compile points and distributed synthesis. To understand the benefits of integration, let’s first examine the overall synthesis process, which includes several steps, shown in Figure 2.

  1. Compile: Compile RTL, taking into account RTL-level guidelines, known as compiler design constraints (CDC). Compiler constraints include high-level directives to the synthesis engine such as “during synthesis, preserve this node in the network for probing.”
  2. Pre-map: Pass compiled netlist to a pre-map stage where FPGA design and timing constraints (FDC) are applied. Design constraints include attributes placed on nodes to preserve them, I/O standard definitions and RTL partition definitions. Timing constraints include clock definitions, timing exceptions and timing goals.
  3. Map: Pass results to map stage to optimize and implement the design for using the primitives of the target FPGA device technology.
Figure 2: Standard synthesis flow example

Figure 2: Standard synthesis flow example

Synplify Premier from Synopsys provides a solution that allows FPGA designers to apply compile points to support parallel synthesis of large designs automatically and incrementally. User-defined compile points can be used on any FPGA—whether the design is small or very large. Since a compile point is an independent synthesis unit, it is up to the user to determine how many compile points there are and how large the compile points are. This provides designers with a runtime advantage through incremental support of compile points.

Using Compile Points for Faster Turnaround Time and Design Preservation

Using Synplify, FPGA designers can define RTL partitions “compile points,” which allow the design to be broken down into smaller modules. These modules can then be synthesized independently on an “as needed” basis. A tiny change to RTL code or constraints requires only resynthesizing the corresponding compile point. These compile points are defined manually in your constraints file, or Synplify can create them using automatic compile points. The tools treat each compile point as a separate block, which can be synthesized and optimized independently.

Time budgets for each automatic compile point are automatically created, which is important for achieving timing goals on paths crossing compile points. Compile points can also be nested, where a parent compile point is a container compile point that contains child compile points, shown in Figure 3.

Figure 3: RTL partitions “compile points” nested hierarchically with several child compile points existing within a given parent compile point.

Figure 3: RTL partitions “compile points” nested hierarchically with several child compile points existing within a given parent compile point.

Synplify synthesis tools do not resynthesize compile points that have unchanged logic or constraints, letting designers take advantage of incremental synthesis. This approach helps accelerate turnaround time for incremental design changes. Additionally, the use of multiprocessing in conjunction with compile points to synthesize in parallel on multiple processors further accelerates runtime while not sacrificing QoR. An option to have the tool automatically complete the parallel synthesis using automatic compile points, with no special setup required, generally halves the runtime. Synplify makes decisions on the compile points based on the size of the design, sizes of hierarchical modules, boundary logic, number of ports driven by constants and other parameters.
It is recommended that designers use automatic compile points when fast runtime and best QoR are required. In general, some QoR can be lost when using ACP, compared with un-partitioned synthesis. In addition, automatic compile points are useful when many localized updates to the design or design constraints are expected.

Manual compile points allow designers to specify RTL partitions and define boundary-timing constraints individually. With manual compile points, a designer can gain high QoR by forcing a critical path to remain within a partition and not cross a partition boundary. Manual compile points also allow designers to mark as completed certain modules while continuing to develop other modules.

Synplify treats compile points as individual blocks for incremental synthesis and only resynthesizes those compile points that have design or constraint changes. Then, it resynthesizes the top level and automatically detects design changes if the synthesis is necessary.

Coupling multiprocessing with the compile point technology—by specifying the number of parallel processor cores that run synthesis jobs—further speeds up runtimes. Synplify supports synthesis on up to four parallel cores for a single license, which accelerates synthesis runtime by 2x, and scales up to a much higher number of processor cores with additional license.

Figure 4: Distributed synthesis accelerates runtime using parallel synthesis across multiple machines and processors.

Figure 4: Distributed synthesis accelerates runtime using parallel synthesis across multiple machines and processors.

Distributed Synthesis

Integrated distributed synthesis technology makes it possible to apply distributed processing to every step of the synthesis process—an approach which pairs advanced capabilities with the ability to run incremental flows. Designers using Synplify control the how RTL partitions are defined. As the overall FPGA design increases in size and complexity, it’s crucial to have a scalable flow that is able to handle increasing design complexity and sizes.

Such a scalable flow can give FPGA designers significant runtime acceleration—of up to 2x beyond what is possible using compile points. Distributed synthesis can also be enabled across multiple machines such as a server farm, to support very large designs.

Synplify can run distributed synthesis across a server farm environment, using the common distributed processing library (CDPL) mechanism, shown in Figure 4. This allows for further distribution of the design synthesis beyond a single computer and takes advantage of multiple machines and multiprocessing, meaning that parallel processing is no longer limited by the number of processor cores on a single machine, but rather the total compute resources available to the designer.

Individual portions of the design, known as “groups,” are created at each stage of the synthesis process, allowing distributed processing to occur on these individual groups. These groups are automatically created and are analogous to compile points in the compile point flow. When compared to the standard synthesis flow, distributed synthesis separates the pre-map stage into two stages: on-demand constraint applications (ODC) and global preparation, shown in Figure 5.

Figure 5: Distributed synthesis flow with groups

Figure 5: Distributed synthesis flow with groups

  • Compile Stage: At the compile stage Synplify creates RTL partitions for the design “groups” and is automatically performed for distributed processing on one or more microprocessor cores.
  • On Demand Constraint Application Stage: In the standard synthesis flow, timing and design constraint application is part of the pre-map stage; in the distributed flow this is a stand-alone step in the ODC stage. When constraints are applied to a design object, only the group containing that object is loaded into machine memory on demand and not the entire design. This reduces the memory footprint, since only the target group is in memory. It also decreases runtime by distributing the constraint application across processor cores.
  • Global Prep Stage: After constraints are applied, the global prep stage performs various preparation steps for map, in parallel for each group, which includes constant propagation, tri-state resolution and gated clock conversion, which allows designers to get feedback sooner and decreases iteration times.
  • Map Stage: Similar to the compile points flow, each group undergoes technology mapping individually. However, unlike the compile points flow, the distributed flow has synthesis netlist generation and constraint forward annotations occurring in a distributed manner for improved efficiency.

Distributing the processing makes the memory ceiling of a single machine less of a limiting factor in completing synthesis. The overall memory footprint is only affected by the maximum size of a group (partition), not the entire full design. Even if the user runs larger designs without switching to a machine with more memory, the synthesis will still finish since groups can be sized accordingly.

Runtime performance using distributed synthesis increases proportionally when compared with the compile points flow. This is due to the scalable distributed architecture and advanced grouping methodology that produces the optimal number and size of partitions. This enables even the largest designs to complete synthesis quickly when distributed across the appropriate number of processors and machines.

Maximizing Time in Market for FPGA Applications

Advances in power efficiency, high performance and low cost are changing the FPGA landscape. With designs implementing larger and more complex and powerful FPGAs, designers now need tools and methodologies that deliver automation, faster turnaround times and the fastest runtimes. Solutions, such as Synplify Premier from Synopsys, can help accelerate runtime performance by supporting distributed synthesis with multiprocessing. Using a feature-rich implementation tool helps designers focus on their own product differentiation while accelerating time to market and meeting cost targets.
_______________________________________________________________________________________________________

JoeMallett_2015_webATO5C2376Joe Mallett is senior manager, product marketing for FPGA-based synthesis software tools at Synopsys. He has 20 years of experience in design and implementation in the semiconductor and EDA industries. Before joining Synopsys he was a senior product marketing manager at Xilinx Semiconductor, where he worked to define and launch FPGA products. His background includes SoC design/prototyping, embedded software, HDL Synthesis, IP, and Product/Segment Marketing. He holds a BSEE from Portland State University.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis