Re-defining Quality’s Boundaries to Serve Mission-Critical Interests in Several Spheres



We’ll look at today’s approach to quality as practiced by  companies whose products and systems operate in mission-critical fields such as mil-aero.

When products are deployed in data centers, battlefields, space, and financial systems, the consequences of failure include not only financial loss but, far more important, the potential loss of life. So, products must perform flawlessly, without intermittent problems, meeting or exceeding their mean-time-between-failure (MTBF) expectations.

Some of the companies that operate in these spheres are now defining quality to include the whole ecosystem, helping users and the supply chain to manage quality. Additionally, these companies support customers in a timely way during field use and deliver future firmware upgrades and bug fixes. All this is easier said than done. To achieve reliability, availability, and maintainability, companies need to heavily invest in defining and developing systems to ensure quality. Not only will a quality system yield customer satisfaction, it also helps manufacturers to stay competitive.

A Holistic Plan
As the inventor of the FPGA, Xilinx operates in a world full of complexity—it produces intricate products  such as its multiprocessor system-on-chip (MPSoC) and Adaptive Compute Acceleration Platform (ACAP) through elaborate manufacturing procedures for markets such as data centers, defense and aerospace, and automotive. To adapt to new market trends and advancing technologies in these sectors, Xilinx is going beyond traditional testing and industry quality standards using Artificial Intelligence (AI) to address the question of quality holistically.

Early Integration 
Instead of serving an outside review function, the Xilinx quality team is fully incorporated into the product development process. This integrated approach enables efficient tracking, where a unique 2D barcode is created on top of every product, and a unique internal ID (electrical signature) is burned inside of the device. This way, DNA “is created for each product with all electrical and parametric test data. Every part is traceable to a specific die, a supplier, and a customer,” according to Vincent Tong, EVP of Global Operations and Quality at Xilinx. Xilinx not only pays attention to its own quality program, by integrating a system level testing and remote debug capability, the company is able to help customers improve productivity, enabling higher quality and faster time to market.

Machine Learning
Xilinx has built up a database of all the data points collected from every step of its supply chain, from manufacturing through delivery, including customer feedback. A massive library of failure (or reliability) signatures is created by analyzing internal tests involving numerous environmental conditions (temperature, humidity, or climate) and issues customers have faced. Using machine learning, it can identify and remove individual die that have actually passed the traditional QA and QC processes but contain failure signature(s). This way, Xilinx can ensure that only the dies with the highest quality will remain operational and only products that will operate flawlessly in extreme conditions will be delivered to customers.

Cumulative Learning for Product Longevity
The typical life cycle of a product from a graphics processing unit (GPU)) company is about two years. With a lead time of four months, the company has the opportunity for one or two product iterations. But because quality has boosted the life cycle of Xilinx’s products to a decade—it is still shipping products it created 15 years ago—it can perform more continuous improvements that further enhance the quality of its products. Furthermore, Xilinx’s database enables cumulative learning, folding what it has learned from the older generation of products (28nm and 20nm) into the new generation (16nm and 7nm).

Going Beyond Industry Standards
Xilinx has developed additional successful methodologies that test beyond industry standards, which is especially beneficial in areas of innovation like advanced driver-assistance systems (ADAS) and autonomous driving:

  •  Joint safe launch programs leverage customer reference designs for targeted system-level testing to offer more immediate system-level feedback to developers. Xilinx not only provides a product to customers, it also tests the product in real-world applications on behalf of customers.
  • Remote debug capabilities with state-of-the-art diagnostic tools detect device or board issues and provide near real-time support.

Working with Suppliers
Lastly, because Xilinx is fabless, it works closely with suppliers such as TSMC to improve the quality and yield of its products and monitor in-line information from them. Another critical issue that suppliers face is delivery. Xilinx has used analytics to help shorten delivery times, making sure the right material is shipped to the right customers with on-time delivery performance reaching 98%. Xilinx also implements formal processes to periodically review customer feedback, which includes both quality and delivery metrics. Currently, the average customer feedback score is over 9.5 out of 10.

While many other companies, including startups, talk a good game, very few, if any, have managed to walk the talk. “Customer trust is earned over time, and it has taken us years to get to where we are. Xilinx will continue to deploy AI technologies and invest in people to delivery industry-best quality to the Adaptable Intelligent World,” concludes Vincent Tong.

Figure 1: A 2D barcode with a unique internal ID (electronic signature) forms the DNA of Xilinx’s Zynq product. This process enables every part to be traceable to a specific die, a supplier, and a customer.

Where Quality Means the Difference between Life and Death
In the consumer electronics space, customers tolerate products whose quality declines over a few years, possibly because the usage of the devices is important but not mission critical. However, in military and aerospace electronics, quality literally means the difference between life and death. For the U.S. armed services, “Quality is never optional.” Combat demands systems that consistently boot up in seconds, regardless of the conditions in the surrounding environment. The Mission Assurance metrics of reliability, availability, and maintainability (RAM), are paramount in every defense and aerospace system.

The concept of Mission Assurance spans product development, testing, and deployment, and includes aspects of systems engineering, risk management, and quality and process control. Mission Assurance aims to achieve complete user success in every application, in every situation, every time. A Mission Assurance Category has three levels of criticality: MAC 1 is the highest, where system failure is catastrophic to the mission, down to MAC 3, the lowest.

During the autopsy of a failed system, everything is audited. The failure can often be traced back through a fault-tree analysis (or Fault Chain) to one single part that has been overstressed due to extreme conditions like temperature or voltage. The failure of a single part usually ends up causing a cascade of failures that drives the system to a complete breakdown. One tiny screw, nut, bolt, washer or rivet, or an electronic component overstressed due to temperature or over-voltage that fails in a system in the wrong place at the wrong time, can set in motion a path that leads to system failure, leading to a mission failure, or even worse, loss of life. By tracking the failures, the overall quality can be improved to ultimately achieve zero defect. Aitech understands that to ensure every mission is accomplished, the integration of quality must start at the very outset of product development and at the lowest level of a system.

During the development of the SP0-S space-qualified 3U CompactPCI SBC, Aitech Defense Systems used a radiation-hardened board and added voltage and temperature monitoring resources, including three radiation-tolerant temperature sensors to monitor the CPU and the thermal interface at two separate points. A radiation-tolerant microcontroller was used to monitor five critical power supplies and the board’s main power inputs. At the same time, the SP0-S doubled the previous version’s processing performance. With Aitech’s proprietary algorithms and OS BSPs, SP0-S has enabled L1 and L2 caches that allow zero (0) wait-state access to program and data, and increased performance by eight times in most instances. The SP0-S board has been fully tested and characterized at UC Davis’ proton cyclotron and Texas A&M’s heavy ion cyclotron, both of which are NASA-approved, to over 100 kRad (Si). The SBC can tolerate greater than 65 MeV-cm2/mg during its operation in or between any of the three main space orbits.

Figure 2: During the final test of Aitech’s SP0-S space-qualified 3U CompactPCI single board computer, it is made sure that it is radiation-hardened and passes all the stringent quality test requirements.

Conclusion
The applications in data centers, battlefields, space, and financial systems, all demand reliability with long mean-time-between-failure (MTBF), failure-proof and performance from quality products. Both Xilinx and Aitech understand the real meaning of quality and thrive. They are responding and adapting to the current tech trends and customer needs by considering quality as a multi-faceted concept and a multi-step process. They have invested significant effort in creating and modeling scenarios of system failure under a wide variety of conditions and developed preventative measures or coping strategies accordingly. In doing so, they are elevating the game for themselves as well as their peers.


John Koon’s current roles include embedded technology research and content creation. Prior to this, he was the Editor-in-Chief of the RTC Magazine and COTS Journal. As a researcher, he presents findings of technology trends such as Aviation, AI, autonomous driving, robotics and automation, low-power WAN, medical innovations, wireless technology including 5G and low-power WAN, Fog Computing (beyond cloud) and edge, IoT, NB-IoT, LoRaWAN, cybersecurity, blockchain, automotive including autonomous driving, automation, robotics, m2m, software, aerospace, manufacturing and COTS advancements. His writings can be found in eBooks, blogs and technical trade journals. He holds a BS in engineering (California State Polytechnic University, Pomona) and an MBA (San Diego State University).

 

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • TwitThis
Extension Media websites place cookies on your device to give you the best user experience. By using our websites, you agree to placement of these cookies and to our Privacy Policy. Please click here to accept.