## Creation, Validation, and Implementation of a Highly Accelerated Life Testing (HALT) Procedure to Improve the Reliability of Printed Circuit Boards by Abhishek Singh Bachelor of Science in Mechanical Engineering National Institute of Technology -Trichy, 2013 Submitted to the Department of Mechanical Engineering in partial fulfillment of the requirements for the degree of Master of Engineering in Advanced Manufacturing and Design at the MASSACHUSETTS INSTITUTE OF TECHNOLGOY September 2016 © 2016 Abhishek Singh. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. # Signature redacted | Signature of Author: | | | |-----------------------------------------|-----------------------------------|-------------------------------| | 7 | | ent of Mechanical Engineering | | Signature Signature | e redacted | August 05, 2016 | | | - | David E. Hardt | | Rai | lph E and Eloise F. Cross Profess | sor of Mechanical Engineering | | Signatu | re redacted | Thesis Advisor | | Accepted by: | ~ | <br>Rohan Abeyaratne | | | Quentin B | Berg Professor of Mechanics & | | MASSACHUSETTS IN OF TECHNOLOG SEP 13 20 | 16 | ommittee of Graduate Students | **ARCHIVES** (This page is intentionally left blank) Creation, Validation, and Implementation of a Highly Accelerated Life Testing (HALT) Procedure to Improve the Reliability of Printed Circuit Boards by Abhishek Singh Submitted to the Department of Mechanical Engineering On 15th August, 2016 in partial fulfillment of the requirements for The Degree of Master of Engineering in Advanced Manufacturing and Design Keywords: Reliability, HALT, PCBs, Vibration testing Abstract Highly Accelerated Life Testing (HALT) is a test methodology to evaluate the reliability of electronic and electromechanical devices. This thesis aimed at the implementation of Highly Accelerated Life Testing (HALT) on Printed Circuit Boards. A standard operating procedure for implementing Highly Accelerated Life Testing (HALT) to increase the reliability of PCBs used in various Waters equipment is developed and validated. For the creation of the standard operating procedure a very reliable Alliance PCB is selected that acts as a benchmark for screening other boards. Extensive testing is done on the Alliance PCB by subjecting it to thermal and vibration stresses through a Hot Step Stress profile, Cold Step Stress profile, Thermal Cycling, Vibration Step Stress profile and Combined Cycle profile. For validation, the standard operating procedure created is applied to a less reliable Acquity board to check whether the same standard operating procedure is effective in precipitating and detecting failures on a variety of different boards that Waters equipment uses. As expected the operating limits of temperature and vibration of the Acquity board are found to be less than the Alliance board i.e. the Acquity experiences failure much earlier than Alliance when subjected to the same stress profiles. This result is in direct correlation with the on field failure data obtained for these two boards – the less reliable board has lesser operating limits. Thesis Supervisor: Dr. David E. Hardt Title: Ralph E and Eloise F. Cross Professor of Mechanical Engineering 3 (This page is intentionally left blank) #### **ACKNOWLEDGEMENTS** I take this opportunity to express my gratitude to the people who have been instrumental in the successful completion of this project. First, I would like to thank Professor David Hardt for guiding us through the entire period of this project work and pushing us to do our best. His valuable guidance and constant encouragement helped us greatly during the course of the project. I would like to thank my wonderful teammates Chun Ming Zhang and Dexter Chew. We had a very cross functional team and each of us brought different skill set to the table, which was important for the success of this project. We gelled well as a team and I thoroughly enjoyed working with them for the last six months. I would like to express my heartfelt gratitude to Mr. Jim McPherson, Mr. Gabriel Kelly and Mr. Gregory Puszko from Waters Corporation for their constant support, encouragement, and diligence in providing us the necessary tools and resources to be successful. The guidance and support received from the employees at Reliability Engineering division, Testing Engineering division and Marketing division at Waters Corporation has been phenomenal and was vital for the success of the project. I am grateful for their constant support and help. Special thanks to Mr. Jared Ouillette, Ian Klobucher and Brian from the Reliability Engineering Lab and Mr Antonio from the Testing Department for their enthusiasm and support during the implementation phases of this project. Last and definitely not the least, I would like to thank my parents Mr. Yasbir Singh, Mrs. Neeta Singh, my brother Abhijeet Singh and my sister Priyanka Singh for being very supportive through the entire course of the project. (This page is intentionally left blank) ## **Table of Contents** | ABSTRACT | 3 | |---------------------------------------------------------|-----------| | ACKNOWLEDGEMENTS | 5 | | LIST OF TABLES | 11 | | CHAPTER 1: INTRODUCTION | 12 | | 1.1 Background Information on Waters Corporation | 13 | | 1.2 Liquid Chromatography and Mass Spectrometry | 14 | | 1.2.1 High-Performance Liquid Chromatography (HPLC) | | | 1.2.1 Ultra-Performance Liquid Chromatography (HPLC) | | | 1.3 Motivation | | | 1.4 Objective | | | 1.5 Problem Statement | | | 1.6 Task Division | | | CHAPTER 2 : INTRODUCTION TO RELIABILITY ENGINEERING | _ | | 2.1 Background | | | 2.2 Why is Reliability Important? | | | 2.3 Basic Reliability Concepts | | | 2.5 Reliability of PCBs at Waters | | | CHAPTER 3 – LITERATURE REVIEW | | | 3.1 Introduction to Highly accelerated life test (HALT) | | | 3.2 HALT Profiles | | | 3.2.1 Temperature Homogeneity | | | 3.2.2 Cold Step Stress | | | 3.2.3 Hot Step Stress | | | 3.2.4 Thermal Cycling | | | | | | 3.2.5 Vibration | | | 3.2.5.1 Failure of PCBs due to Vibration | | | | | | 3.2.5.3 HALT vibration chamber | | | 3.2.5.4 HALT vibration Profile | | | 3.2.4 Combined Vibration and Temperature | | | 3.2.5 Failure Analysis and Corrective Actions | | | 3.2.6 Benefits of HALT | | | CHAPTER 4 – PREPARATION OF STANDARD OPERATING PROCEDURE | | | 4.1 Overview – The Methodology | | | 4.2 Functional Testing of the Alliance board | <i>38</i> | | 4.3 Data Acquisition and Equipment Used | 45 | | 4.3.1 Temperature Measurement | 45 | | 4.3.2 Vibration Response Measurement | 45 | | 4.4 Experimental Step and Analysis | 47 | | 4.4.1 Hot Step Stress | | | 4.4.2 Cold Step Stress | | | 4.4.3 Thermal cycling | | | 4.4.5 Vibration | | | 4.4.5.1 Vibration Fixture | | | 4.4.5.2 Response Characterization of the Fixture | | | 4.4.5.2.1 Experimental Setup | | | | 4.4.5.2.2 Location of Accelerometers. | 53 | |---|-------------------------------------------------------------------------|----| | | 4.4.5.2.3 Experimental Results | 54 | | | 4.4.5.3 Vibration Step Stress | | | | 4.4.5.3.1 Vibration Step Stress Profile | | | | 4.4.5.3.2 Experimental Setup and Procedure | 59 | | | 4.4.5.3.3 Experimental Results | | | | 4.4.6 Combined Thermal Cycling and Vibration | | | | 4.4.6.1 Combined Cycle Profile | | | | 4.4.6.2 Experimental Setup and Procedure | | | | 4.4.6.3 Experimental Results | 65 | | C | HAPTER 5 – VALIDATION OF THE STANDARD OPERATING PROCEDURE | 67 | | | 5.1 Introduction | 67 | | | 5.2 Selection of Acquity PCB | 67 | | | 5.2.1 Ease of testing and root cause analysis | 68 | | | 5.2.2 Field Failure Data | 69 | | | 5.2 Functional testing of the Acquity PCB | 70 | | | 5.3 Experimental setup | | | | 5.4 Experiments – Acquity PCB | 74 | | | 5.4.1 Hot Step Stress | 74 | | | 5.4.2 Cold Step Stress | 77 | | | 5.4.3 Thermal Cycling | 78 | | | 5.4.3 Vibration Step Stress | 79 | | | 5.4.3.1 Vibration Step Stress Profile | 79 | | | 5.4.3.2 Vibration Fixture | | | | 5.4.3.3 Experimental Results. | 84 | | | 5.4.3.4 Vibrational step stress – 4 module configuration of the fixture | | | | 5.4.5 Combined Cycle | | | | 5.4.5.1 Combined Cycle Profile | | | | 5.4.5.2 Combined cycle experimental results | | | C | HAPTER 6: CONCLUSION AND FUTURE WORK | 90 | | | | | | к | IRLIOGRAPHY | 92 | ## **List of Figures** | Figure 1: Waters Corporation, Milford, MA | . 12 | |------------------------------------------------------------------------------------------------------------------------|------| | Figure 2: Diversification of Waters Corporation in various fields | . 13 | | Figure 3: The Bath tub curve: The green, red and cyan curves represent the infant mortality, | | | constant and wear out failures respectively, while the blue curve is a combined | | | representation of the three individual failure rates | . 23 | | Figure 4: - Life Cycle of electronic components in the absence of burn-in test [10] | . 25 | | Figure 5- Ruptured Lead Wires due to Fatigue [16] | | | Figure 6- Solder joint fatigue failure [17] | . 31 | | Figure 7- Solder joint crack initiation [17] | . 32 | | Figure 8- Acceleration vs Time graph as obtained from an accelerometer | . 33 | | Figure 9- HALT Chamber | | | Figure 10- HALT Vibration Profile | . 35 | | Figure 11: Spending rate against time: cost reductions on HALT [12] | | | Figure 13- Alliance Plunger drive PCB | | | Figure 15- The Waters Alliance Separations module | | | Figure 16- Functionality of Plunger Drive PCB. | | | Figure 17- HyperTerminal functional test output | | | Figure 18- The triaxial accelerometer used for monitoring vibration | | | Figure 19- Data acquisition device. | | | Figure 20- National Instruments DAQ. | | | Figure 21: LabVIEW output that measures vibration in X, Y & Z directions | | | Figure 22 – Module of the Alliance PCB fixture | | | Figure 23 – Alliance fixture used for vibration step stress and combined cycle | | | Figure 24 – Exploded view of the vibration step stress arrangement | | | Figure 25 – Alliance fixture mounted on the HALT chamber vibration table | | | Figure 26 – The triaxial accelerometer mounted on the Alliance | | | Figure 27 – Vibration Step Stress experimental setup | | | Figure 28 - Vibration Simulation on the Alliance PCB | | | Figure 29 – Locations for placing the accelerometers | | | Figure 30: Graphical representation of Vibrational variation at location 1 on the PCB | | | Figure 31: Graphical representation of Vibrational variation at location 2 on the PCB | | | Figure 32: Graphical representation of Vibrational variation at location 3 on the PCB | | | Figure 33 – Vibration Step Stress Profile | | | Figure 34 – Vibration step stress experimental setup | | | Figure 35 – Vibration step stress experimental setup | | | Figure 36 – Vibration Step Stress Profile | | | Figure 37 – Error message observed during vibrational step stress | | | Figure 38 – Alliance PCB functional test fixture | | | Figure 39 – Alliance PCB functional test fixture | | | Figure 40 – Combined Cycle HALT profile | | | Figure 41 – Combined Cycle HALT profile | | | Figure 41 – Combined Cycle HALT profile Figure 42-Capacitor breakage observed during combined cycle of 2100000425 PCB | | | rigure 42-Capacitor breakage observed during combined cycle of 2100000423 PCB | . 00 | | Figure 43-Microscopic Image of the fractured surface | 66 | |------------------------------------------------------------------------------------------------------|----| | Figure 46 – The Binary Solvent Manager that uses Acquity PCB | 71 | | Figure 47: Arrangement of Pumps and PCBs inside the Binary Solvent Manager | 71 | | Figure 48: Screen shot of the user interface of the Console | 72 | | Figure 49 – Experimental setup for testing Acquity board | 73 | | Figure 50: Hot step stress profile | 74 | | Figure 51 – Console output showing pressure fluctuation during Hot Step Stress | | | Figure 52 – Console output showing pressure fluctuation during Hot Step Stress | 76 | | Figure 53 – Console output showing pressure fluctuation during Hot Step Stress | 76 | | Figure 54: Cold step stress profile | 77 | | Figure 55 – Console output showing pressure fluctuation during Cold Step Stress | 78 | | Figure 56: Thermal cycling profile | 79 | | Figure 57 – HALT profile for vibration step stress | 80 | | Figure 58 – Module of the Acquity PCB fixture | 81 | | Figure 59 – Standoffs provided on the Acquity fixture (five module configuration) | 82 | | Figure 60 – Acquity PCB mounted on the fixture | 83 | | Figure 61 – Acquity PCB mounted on the fixture | 83 | | Figure 62 – M4 & M3 screws for mounting the PCB on the standoffs of the Acquity fixture | 84 | | Figure 63 – Console output showing operational failure of Acquity PCB during vibration step | | | stress | 85 | | Figure 64: 4 module configuration for vibration step stress | 86 | | Figure 65: Operational failure observed at 30 G <sub>rms</sub> for four module fixture configuration | 87 | | Figure 66: Combined Cycle profile for Acquity | 88 | | Figure 67: Error message observed during combined cycle Acquity | | | Figure 68: Capacitor breakage observed during combined cycle of Acquity PCB | 89 | ## **List of Tables** | Table 1 - Vibration set point of HALT chamber and the response of the Alliance PCB at location | |------------------------------------------------------------------------------------------------| | 154 | | Table 2 - Vibration set point of HALT chamber and the response of the Alliance PCB at location | | 256 | | Table 3 - Vibration set point of HALT chamber and the response of the Alliance PCB at location | | 357 | | Table 4 - List of PCBs for which both In circuit and functional test fixture were available 69 | ## **Chapter 1: Introduction** This thesis concentrates on the creation and validation of a standard operating procedure (SOP) for implementing Highly Accelerated Life Testing (HALT) for increasing the reliability of PCBs. This thesis is based on an industrial project at Waters Corporation that designs, manufactures and services analytical laboratory instruments that are used by pharmaceuticals, industries, academic personnel and other laboratory applications. Their focus is liquid chromatography, mass spectrometry technology systems, and thermal analysis, which include consumable products such as columns and other support products. Its headquarters is at Milford, Massachusetts as shown in Figure 1. Currently, there is a need at Waters Corporation to implement and integrate a Highly Accelerated Life Testing procedure in the product development process. This chapter concentrates on providing background information on Waters Corporation, the motivation and the problem statement that this thesis seeks to address. Figure 1: Waters Corporation, Milford, MA #### 1.1 Background Information on Waters Corporation Waters Corporation was founded by James Logan Waters in 1958. Waters Corporation designs and manufactures analytical laboratory instruments that are used in pharmaceutical, industrial and other academic laboratories. The company has advanced in the field of analytical chemistry by producing breakthrough research and state of the art technological systems and has become a major player in the market. Their revenue for the year 2013 was \$1.9 billion. They have offices in 27 countries which also include 11 manufacturing facilities. On the road to becoming a major player in the analytical instruments industry, Waters has acquired a number of companies and thus expanded their business to even greater extents as shown in Figure 2. Figure 2: Diversification of Waters Corporation in various fields Waters has divided their products into two divisions: the Biochemical and Chemical Analysis Division, and the Physical Testing Division. The Biochemical and Chemical Analysis Division is based in Milford, MA and produces liquid chromatography instruments, while the Physical Testing Division is based in Manchester, England and Wexford, Ireland and produces mass spectrometry instruments. Thermal analysis and calorimetric instruments are also produced by their physical testing division. With customer success being the prime mission for Waters Corporation, they have a global network of authorized service centers that can install, repair and replace part services, thereby creating a strong bond between customers and the company. This thesis describes the implementation of Highly Accelerated Life Testing (HALT) on PCBs at Waters Reliability Engneering Lab, Milford, MA. #### 1.2 Liquid Chromatography and Mass Spectrometry One of the important tools in analytical chemistry is liquid chromatography. It was defined by the Russian botanist, Mikhail S. Tswett in the early 1900s. His studies focused on separating plant compounds by using a solvent and a column packed with materials [1]. This created a pathway for many scientists to use this technique to separate out the individual parts from the sample. The technique is based on the phenomenon that different compounds have different strengths of chemical attraction to particles. When the compound is made to flow using a solvent (mobile phase) in a column filled with particles (stationary phase), the individual parts are separated based on the chemical attraction to the particles in the column which creates different color bands. Based on these color bands, the individual parts of the compound can be identified. #### 1.2.1 High-Performance Liquid Chromatography (HPLC) In early liquid chromatography systems, high pressure of about 35 bar was used to generate the flow in packed columns. These systems were known as High-Pressure Liquid Chromatography or HPLC. The 1970s saw tremendous improvement in HPLC technology which could develop pressures up to 400 bar and incorporated improved injectors, detectors and columns. With continued advances in performance with technologies such as smaller particles and higher pressures the acronym remained the same but the name was changed to High-Performance Liquid Chromatography (HPLC). HPLC systems are one of the most powerful instruments used in analytical chemistry today as they can easily identify compounds in trace concentrations as low as parts per trillion (ppt). It has the ability to separate, identify, and quantitate the compounds that are present in any sample that can be dissolved in a liquid and find its application in many industries such as pharmaceuticals, food, cosmetics, environmental matrices, forensic samples and industrial chemicals #### 1.2.1 Ultra-Performance Liquid Chromatography (HPLC) In the last decade, advancement in instrumentation and column technology led to significant increase in resolution, speed, and sensitivity in liquid chromatography. A very high level of performance is achieved by using columns with particles as small as 1.7 microns and instrumentation with specialized capabilities designed to deliver mobile phase at about 1000 bar. This new system holistically created with updated capabilities is called Ultra-Performance Liquid chromatography (UPLC) technology. The UPLC system consists of 4 components - a solvent pump, a sample injector, a stationary phase or 'column' which allows the separation, and a detector to analyze the separating components. #### 1.3 Motivation Waters Corporation is a leading analytical instrument manufacturer of high-performance liquid chromatography (HPLC) systems and mass spectroscopy. In addition to the intricate mechanical components such as columns, pumps, syringes, and valves in the system, the electronic components such as power supply and printed circuit boards (PCB) that control all of the hardware also play a critical role in the instrument. As a matter of fact, it was reported that the field failure rate of electronic components is often higher than that of the mechanical components. Although efforts and design specification have been made and verified in reliability engineering during the product development process and testing engineering during post-production process, the unpredictability in shipment and end-user environment still incur some additional stresses that have led to premature failures of PCBs within the warranty periods. It is believed that the establishment of a generic Highly Accelerated Life Test (HALT) process could not only facilitate the communication between reliability engineering, R&D engineering, and testing engineering department to better assess the product design weakness within a shorter period during product development cycle, but also therefore increase product robustness and reduce warranty cost. #### 1.4 Objective The primary objective of this project is to create and establish a standard operating procedure (SOP) for implementing Highly Accelerated Life Testing (HALT). The objectives can be further categorized into three: - (1) Creation of a standard operating procedure that will include the step by step process of carrying out the entire process of Highly Accelerated Life Testing (HALT), starting from fixture design all the way through obtaining a reliable product. - (2) Validation of the standard operating procedure using a different PCB than the one used to create the HALT procedure. - (3) A cost impact analysis that will include the cost of implementing Highly Accelerated Life Testing (HALT) on a PCB and will also delve briefly into the benefits of implementing HALT. #### 1.5 Problem Statement While design verification and instrument evaluation is embedded in Waters' product development process, the current design process does not utilize any generic HALT procedure to identify latent defects in the design or production of components that may not manifest until after years of field operation. Without a generic method in place, the current process requires specialized test plans and analysis developed specifically for the component or assembly. Although Waters already has a discrete approach toward post-production examination of the PCBs quality via In-Circuit Testing (ICT) or Functional Testing (FCT), to make sure the board is functioning properly, yet pre-mature failures still occur in the field. Implementation of a generic HALT procedure for various electronic components such as power supplies and PCBs would help identify these latent issues early on with minimum resources. Waters is looking for a HALT program that can help identify design issues before parts are approved to be used in instruments. On the same note, possible issues may arise when discussing to what extent these components should be tested. There is often a trade-off that exists between production costs, testing costs, and cost of quality. For example, a new component could be introduced that meets the required functional specifications, and is a lower cost component, but if the failure rates of the component are higher compared to the old part, it may lead to a high cost of quality (the need for service engineers to replace the part) that outweighs the production cost savings. On the other hand, a component could be re-designed using the latest manufacturing techniques in order to improve quality and practically guarantee zero failures over the life of the component, but at a grossly increased production cost. There is a need to find the balance between production costs and cost of quality when it comes to the component, design, or production changes. A successful HALT program would facilitate efficient screening of these scenarios and provide data to augment design process. #### 1.6 Task Division This thesis is written in conjunction with the work done by Chun Ming Chang [1] and Dexter Chew [2], and several sections and descriptions in this thesis are written in common with their works. Each team member took the lead on one of three aspects of the HALT implementation. Dexter Chew was responsible for the Design and Fabrication of Fixtures used in HALT. Chung Ming Chang worked on conducting the Highly Accelerated Life Tests with a focus on Thermal stresses and the author of this thesis was responsible for conducting the Highly Accelerated Life tests with a focus on Vibration stresses. ## **Chapter 2: Introduction to Reliability Engineering** Manufacturers and designers of electronic products are constantly facing new challenges while trying to cope with consumers and market's continuous demands of new features and improved performance in a smaller size. The component sizes, I/O (input/output) pitches and line widths of the printed circuit boards (PCBs) have been decreasing rapidly and the increasing packaging density is causing manufacturing and reliability challenges. Higher operating speeds require shorter interconnection distances to minimize propagation delays, resulting in the components becoming smaller and the number of electric contacts increasing. As a consequence, the components are brought even closer to the PCB with smaller interconnections, adding to the stresses and strains of the interconnections. While the smaller and more versatile components increase current densities and the number of I/Os, the enhanced performance creates thermal challenges due to the increased power densities. It is a generally recognized fact that the reliability of electronics is to a large extent determined by the ability of electrical interconnections to withstand various loadings during product's operational lifetime. Reliability testing is needed to provide assurance that designs and products are reliable and durable in service, and due to modern day's tight time-to-market requirements and shortening product cycles, these tests should be conducted in a timely fashion. The reliability testing of the interconnections has been traditionally carried out with accelerated reliability tests, where the test units are used more frequently than usual or are subjected to higher than usual levels of accelerating variables like temperature or voltage. These tests can be grouped under two categories, single load tests, where only one loading type is used, and multiple loading tests, where two or more different loading types are used. Today, the vast majority of accelerated tests are still conducted as single load tests, mainly because of simplicity and because the interaction effects of various combined loads are still not known or understood. There are, however, various potential advantages in multiple loading tests. Most modern products are designed to operate without failure for years and it has become more difficult, even with an accelerated single load test, to obtain a sufficient amount of failure data within a reasonable amount of time. Multiple loading tests have recently been employed as a means of overcoming such difficulties. Multiple loading tests can also offer a much better representation of the actual use-environment strains and stresses since the components in actual products are rarely exposed to a single type of strain or stress. This thesis focuses on single as well as multiple loading tests and, more specifically, on the combination of mechanical and thermal loads. The theory part of the thesis gives a review on the current status of the subject, while the experimental part attempts to combine two separate loads (namely vibration and temperature) into a single, more comprehensive and efficient concurrent reliability testing method. #### 2.1 Background Reliability engineering emerged as a separate engineering discipline during the 1950s but traces its roots to the Second World War. At the onset of the war, the essential ingredients of reliability engineering, statistics (theory of sampling) and mass production were well established and reliability engineering was ripe to emerge. Electronics played a critical role in the war but the increasing complexity of military electronic systems was generating failure rates which originated in greatly reduced availability and increased costs. During the war, the availability of American naval and army electronic equipment was approximately 70% and less than 40% respectively, while over 60% of airborne equipment shipped to the Far East arrived damaged [3] During the 1950s the gathering pace of electronic device technology meant that especially the developers of new military systems were making increasing use of large numbers of new component types and involving new manufacturing processes, with the inevitable consequences of low reliability. The increasing complexity of military electronic systems was generating intolerable failure rates and greatly influenced the emergence of reliability engineering. As a consequence, several methods (such as the identification of root cause of field failure and determination of mitigating actions or the specification of quantitative reliability requirements) to achieve higher reliability were developed. A permanent committee (the Advisory Group on Reliability of Electronic Equipment, AGREE) was established to guide the emerging reliability discipline and the birth date of reliability engineering is often coincided with the committees foundational report, published in 1957, that provided assurance that reliability could be specified, allocated and demonstrated. [4] The most widely known and used reliability prediction handbook, MIL-HDBK-217, was first published in 1961. Although originally developed for military for aerospace applications, it became widely used for industrial and commercial electronic equipment applications throughout the world. The most recent revision, "Military Handbook, Reliability Prediction of Electronic Equipment", MIL-HDBK-217, Revision F, Notice 2, was released in February of 1995. The MIL-217 is an influential military standard by which reliability predictions are still performed. The reliability development in the western world was driven to a large extent by the customers and particularly by the US military. In order to improve reliability, several standards and procedures were set up, while the suppliers saw little motivation to improve, since they were paid for spares and repairs. By contrast, the Japanese had a different view on improving reliability. Guided by the doctrines of the late W. Edwards Deming, the Japanese industry had learned the fundamental connections between quality, productivity and competitiveness and their basic philosophy was that quality provided the key to greatly increased productivity and competitiveness. [5] The inductive western approach on reliability led to inventions, breakthroughs and to greater reliance on systems for control of people and processes as opposed to the more deductive approach applied by the Japanese. The effectiveness of the deductive approach was shown in the way that many products, such as Japanese cars, machine tools or consumer electronic products won dominant positions in world markets in the last 40 to 50 years. The deductive approach seeks to generate continuous improvements across a broad front, and new ideas are subjected to careful evaluation. Furthermore, it allows a clearer view, particularly in discriminating between sense and nonsense, but it is not conducive to the development of radical new ideas. Obviously, these two philosophies are not exclusive, and elements of both can be seen in modern reliability engineering. #### 2.2 Why is Reliability Important? Today the reliability of electronics is an important factor of success in the highly competitive global markets. The modern consumer will not tolerate products that do not perform in a reliable fashion, or as promised or advertised and if the manufacturer does not design its products with reliability and quality in mind, someone else certainly will. In addition, customer dissatisfaction with a product's reliability can have disastrous financial consequences to the manufacturer. A company's reputation is very closely related to the reliability of their products and while it takes a long time to build up a reputation for reliability, only one malfunctioning product can brand the whole company as unreliable. Furthermore, if a product fails to perform its function within the warranty period, in addition to the unwanted attention the replacement and repair costs can be substantial. Thus it is of utmost importance that a manufacturer knows the reliability of its product and continual assessment of new product reliability and ongoing control of the reliability of everything shipped are critical necessities in today's competitive business arena. Besides Financial issues, reliability can be of paramount importance in some application areas. In today's technological world we depend upon the continued functioning of a wide array of complex equipment and machinery. Constantly more and more important operations are performed with automated equipment, whose failures have varying effects from minor nuisances to catastrophic results. It is clear that in critical fields and key applications, such as medical devices or avionic electronics, where failures can lead to life threatening situations, reliability should be of highest priority to avoid the catastrophic consequences of failure. #### 2.3 Basic Reliability Concepts This section introduces, defines and briefly discusses some basic concepts related to Reliability. #### Reliability An illustrative description of the meaning of reliability today is [6], (originally cited by an unknown reliability engineer): "Reliability - it is when the customer comes back, not the product". More formally reliability is commonly defined as the ability of a product or a component to operate without failure under a set of predetermined conditions over a specified period of time [7]. The definition contains several key requirements: functions of the products, their normal operation and stated operating conditions as well as anticipated lifetime expectations must be specified. Predetermining these conditions makes the designing easier and the reliability considerations are much easier to implement. #### Quality Reliability should not be confused with quality or quality control since the two concepts are quite different. Reliability grew out of the need for a time-based concept of quality and it is the distinction of time that marks the difference between the traditional quality control and the modern approach to reliability. International Organization for Standardization (ISO) defines quality as "the totality of features or characteristics of a product or service that bear on its ability to satisfy stated or implied needs" [8]. The definition of quality refers to this totality at a particular instant of time, whereas reliability deals with the behavior of failure rate over a long period of performance. #### **Failure** Failure is an important concept in reliability, since the measure of reliability is the infrequency in which failures occur in time. Especially for all life tests, some time-to-failure information for the product is required since the failure of a product is the event we want to understand. Failure is generally defined as "the non-performance or inability of a component or system to perform its intended function for a specified time under specified environmental conditions" [9]. Once again, the intended function of the device and its operating conditions need to be specified. #### **Bathtub Curve** The so called bathtub curve (presented in Figure 3) is typically used to illustrate reliability over time and the three principal periods of product failure. The curve does not depict the failure rate of a single item, but describes the relative failure rate of an entire population of products over time. Although it is rare to have enough short- and long-term failure information to actually model a population of products with a calibrated bathtub curve, the curve is nonetheless a widely used concept in the industry. The product's life cycle can be divided into three distinct periods based on the dominant failure mode as shown in figure 3. Figure 3: The Bath tub curve: The green, red and cyan curves represent the infant mortality, constant and wear out failures respectively, while the blue curve is a combined representation of the three individual failure rates The curve consists of the following three periods: - **Infant mortality period** is characterized by a decreasing failure rate. These infant mortality failures are highly undesirable and are caused by defects and blunders. - A constant and relatively low failure rate is a characteristic feature of the normal-life period (also known as "useful life"). The failures during this period are considered as random cases of stresses and strains exceeding the product strength. - Wear-out period exhibits an increasing failure rate due to the fatigue or depletion of materials. #### 2.4 Reliability of PCBs Unlike most non-electronic equipment, the electronic components' reliability is often difficult to be tested and verified since most of the time they are already encapsulated after production. In most of the cases, the PCBs are conformed to specification and are identified as good and can last throughout the life of the equipment. Secondly, the one that are defective will fail and be caught up when first tested under ICT or FCT and will be either fixed or removed, which therefore do not cause equipment failures. However, the culprit that is causing warranty cost are those that do not immediately fail during the traditional testing that will affect performance and result in the decrease of reliability. For example, a weak soldering point, silicon, oxide and conductor imperfections, impurities, inclusions and non-hermetic packages. These components are called "freaks". [10] "Consider a typical failure mechanism in an electronic component: a weak mechanical bond of a lead-in conductor to the conducting material of the device. This may be the case in a resistor, a capacitor, a transistor or an IC. Such a device might function satisfactorily on test after manufacture and in all functional tests when built into a system. No practical inspection method will detect the flaw. However, because the bond is defective it may fail at some later time, due to mechanical stress or overheating due to a high current density at the bond. Several other failure mechanisms lead to this sort of effect, for example, flaws in semiconductor material and defective hermetic sealing. Similar types of failure occur in non-electronic systems, but generally they do not predominate. The typical 'electronic' failure mechanism is a wear out or stress induced failure of a defective item. In this context, 'good' components do not fail, since the application of specified loads during the anticipated life will not lead to failure. While every defective item will have a unique life characteristic, depending upon the nature of the defect and the load(s) applied, it is possible to generalize about the nature of the failure distributions of electronic components. Taking the case of the defective bond, its time to failure is likely to be affected by the voltage applied across it, the ambient temperature around the device and mechanical loading, for example, vibration. Other failure mechanisms, say a flaw in a silicon crystal, may be accelerated mainly by temperature variations. Defects in devices can result in high localized current densities, leading to failure when the critical value for a defective device is exceeded. Of course, by no means all electronic system unreliability is due to defective components. Interconnects such as solder joints and wire bonds can be reliability 'weak links' especially in harsh environment applications (automotive, avionics, military, oil drilling, etc.). [10] Figure 4: - Life Cycle of electronic components in the absence of burn-in test [10] "Figure 4 shows the three categories of component that can be manufactured in a typical process. Most are 'good', and are produced to specification. These should not fail during the life of the equipment. Some are initially defective and fail when first tested, and are removed. They, therefore do not cause equipment failures. However, a proportion might be defective but nevertheless, pass the tests. The defects will be potential causes of failure at some future time. Typical defects of this type are weak wire bond connections, silicon, oxide and conductor imperfections, impurities, inclusions and non-hermetic packages. These components are called "freaks." [11] #### 2.5 Reliability of PCBs at Waters The Reliability engineering and Testing engineering Group at Waters are facing problems in identifying root causes of on field failure, especially for printed circuit boards (PCBs) that fail in the field but passes functional and ICT tests when brought back for testing. Although there were efforts in conducting the HALT procedure, resource issues and opposition from R&D Group in testing beyond products specifications results in a delayed implementation of HALT procedure. These issues resulted in an inconsistent screening process using the HALT chamber and inaccurate data collection process. The team also consulted reliability engineers in the group and found in consistencies in the way the current proposed HALT procedure is conducted as compared to industry practices. Although the HALT chamber has been around for more than 5 years, the inefficient use of the resource meant loss of potential savings [12] from a ruggedized product and depreciation of idle inventories that can otherwise turn into profit. ### Chapter 3 – Literature Review #### 3.1 Introduction to Highly accelerated life test (HALT) HALT is conducted to stimulate and replicate as many failure modes as possible to locate the design weaknesses in the early stage of product design and development process so as to improve the overall robustness of the product. As mentioned in chapter 2, many of the electronics flaw might be latent and cannot be detected via the traditional testing method like ICT (In-circuit testing) and FCT (functional testing). Unlike traditional environmental stress screening (ESS), HALT utilizes a series of thermal and vibrational steps to expose the intermittent failures or latent defects within a shorter period of time. After HALT process, the early failure rate could be reduced significantly and the product will become more mature and therefore not only the failures rate in the field will be reduced significantly but also the variation in quality will also be reduced. #### 3.2 HALT Profiles The HALT profiles can be categorized into two different sections, the temperature stressing and vibration stressing. Before proceeding to the discussion of the detailed profiles, some of the key matrix of measurement are explained and defined as the following: - Lower operating limit (LOL): During the cold temperature stressing step, the system or equipment might output an abnormality at this certain temperature but when returning back to room temperature, the system should be able to go back to normal without the need of repairing. It usually occurred because there is an intermittent failure, or the so-called soft failures. - Lower destructive limit (LDL): Similar concept to LOL, but at this temperature, the system will not go back normal if simply returning back to room temperature. Some degrees of repair or replacement of components might be required. It is usually called the hard failure. - Upper Operating Limit (UOL): During the hot temperature stressing step, the system or equipment might output an abnormality or not be functioning properly as it is intended to be at this certain temperature. But when returning back to room temperature, the system will be able to go back to normal without the need of repairing. It usually occurred because of an intermittent failure, or the so-called soft failures. - Upper Destruct Limit (UDL): Similar concept to UOL, but at this temperature, the system will not go back normal if simply returning back to room temperature. Some degrees of repair or replacement of components might be required. It is usually related to hard failure. - Vibration Operating Limit (VOL): The point at which the product stops operating or a specification is no longer being met, but returns to normal after the vibration level is decreased. - Vibration Destruct Limit (VDL): The point after which the product does not return the vibration level is decreased below the operating limit. #### 3.2.1 Temperature Homogeneity The temperature homogeneity test should be conducted in the beginning of the HALT to make sure that the ducting positions in the chamber are optimized and there is maximum airflow across the products so as to obtain the best thermal response. The actual product temperature was measured by several thermal couples across the boards and compared with and the chamber setpoint to adjust the air duct position in the chamber so as to maximize thermal energy transfer and uniformity on the board. [13] #### 3.2.2 Cold Step Stress During the cold step stress the temperature is lowered in increments less than or equal to 20°C. The test should be started at ambient temperature (which can be defined anywhere between 20°C and 30°C) and the dwell time on each temperature is a minimum of ten minutes or as determined by the temperature homogeneity step. The loading is continued until either the operational limit (OL) of the sample is determined or the chamber maximum is achieved. If the OL can be determined, the test process should continue to the destruct limit (DL) or chamber maximum. However, since the sample will most likely not be operational beyond its OL, it will become necessary to return to a lower temperature (i.e. below the OL) after each additional dwell to determine whether the sample is still operational. It has been reported that cold temperature step is the least detrimental one and therefore was performed first. When irregularities and abnormalities occur, the temperature is then returned back to either the previous level or room temperature and functional test was conducted again to see whether it will go back to normal. If it will, then an LOL has been identified. However, if even after returning to room temperature and the equipment still will not work, then a LDL has been identified. #### 3.2.3 Hot Step Stress The hot step basically follows the same logic as the cold step stress. The hot step stressing will also begin at room temperature, and the temperature will be increased about 10~20°C per step until the UOL and UDL are identified. #### 3.2.4 Thermal Cycling Thermal cycle tests have been designed to simulate the thermal excursions experienced in regular operating conditions. The tests are conducted at wider temperature ranges and higher frequencies to induce failures in much shorter duration than in field use while still under the assumption that failure modes are identical to the ones observed in field use. The actual test process is straightforward: the test specimens are continuously exposed to alternating high- and low temperature extremes. The tests are typically conducted until a predetermined number of cycles are reached or a failure is detected. The test parameters are temperature range (maximum [Tmax] and minimum [Tmin] temperature), dwell time (time spent at both Tmax and Tmin) and ramptime (how fast the temperature changes from Tmax to Tmin and from Tmin to Tmax). The tests are conducted in specially designed environmental test chambers, which are typically equipped with one, two or three zones. In one zone configuration the specimens are heated and cooled in one chamber according to test specifications. Two zone configuration features hot and cold zones, which can be controlled independently and a product carrier to move the test specimen between the zones. Three zone configuration utilizes a third, ambient zone between the hot and cold zones to enable a more constant change of temperature than in two zone chambers. Regardless of the configuration, the chambers are typically air filled and heating and cooling is achieved by convection. Typically the thermal cycling phase consists of a minimum of five thermal cycles (performed at the maximum attainable rate of change) that shall be performed unless a destructive failure is encountered prior to completion. The minimum thermal cycle temperature range should be within 10°C of both the lower and upper thermal operating limits as discovered during the temperature step stress phase. The minimum dwell time is ten minutes following stabilization of the sample at the set point as determined by the sample thermocouple response. The dwell time is increased to allow higher mass components to reach at least 80 percent of the thermal range. [14] #### 3.2.5 Vibration Vibration is one of the most important loading conditions in electronic components especially PCBs. The life cycle of electronic components includes vibration loading at different phases such as during transportation, handling, captive carry flight and free flight. Dynamic analysis of PCBs under vibration loading can be performed by three different approaches (i) analytical methods, (ii) finite element analyses and (iii) experimental studies. However, common practices in the industry is to employ experimental studies using a shaker. The trend is due to the complexity of today's electronic assemblies. The sophisticated structure of PCBs makes analytical modeling and Finite Elemental Analysis difficult and sometimes impossible. HALT has emerged as a promising method to improve the reliability of complex PCBs and includes vibration testing. [15] #### 3.2.5.1 Failure of PCBs from Vibration Excessive deformations and accelerations of PCBs results in damage to mounted components, solder joints and electrical interfaces, as well as the circuit board itself. ## Failure Mechanism due to vibration can be specified as follows - Lead wire fatigue failure (see example shown in Figure 5) - Connector contact fretting corrosion - Structural fatigue failure - Solder joint fatigue failure (see example shown in Figure 6) - Excessive deflection - · Loose hardware Figure 5- Ruptured Lead Wires due to Fatigue [16] Figure 6- Solder joint fatigue failure [17] Soldering is inevitable during mounting processes of electronic components over printed circuit boards. High solder reliability is required since failures of solder joints and lead wires directly result in failure of systems. Main failure mode of solder joints under vibration loading is fatigue failure. High cycle fatigue causes crack growth in solder joints (Figure 7) and excessive stresses in lead wires. Figure 7- Solder joint crack initiation [17] #### 3.2.5.2 Random Vibration HALT Random Vibration is measured in gravity root mean squared (G<sub>rms</sub>), the square root of the area under the acceleration spectral density (ASD) curve in the frequency or time domain. The acceleration spectral density curve can be obtained using an accelerometer. A typical real-time signal from an accelerometer mounted on a HALT table is shown in figure 8. The root mean square (rms) value of this signal can be calculated by squaring the magnitude of the signal at every point, finding the average (mean) value of the squared magnitude, then taking the square root of the average value. The resulting number is the $G_{rms}$ metric. Figure 8- Acceleration vs Time graph as obtained from an accelerometer The ASD or power spectral density (PSD) bandwidth in a HALT vibration system can extend from 20 Hz to 10 kHz. Unlike an electrodynamic shaker, the shape or frequency spectrum of the ASD input produced cannot be directly controlled in a typical multi-axis HALT vibration chamber, only its level of intensity can be expressed in G<sub>rms</sub>. #### 3.2.5.3 HALT vibration chamber Figure 9- HALT Chamber The HALT chamber (Figure 9) uses pneumatic hammers to induce vibration. It is also referred to as repetitive shock (RS) system. HALT chambers provide multi-directional high pulse shock peaks that can be as high as ±10 sigma peak accelerations resulting in a higher rate of fatigue damage. These systems are multiple axis systems which can simultaneously stimulate six degrees of freedom, with three along the x, y and z-axes, and three rotational about x, y and z-axes. Because of this, HALT vibration is sometimes referred to as 6 DOF (six degrees of freedom). Generally, RS vibration in most HALT systems is highest in the z-axis, which is the axis normal to the plane of the vibration table. The vibration level in RS systems is typically controlled by feedback from an accelerometer under the center of the table. #### 3.2.5.4 HALT vibration Profile The HALT methodology involves inducing vibration loading in a stepped manner. Hence the HALT vibration testing is also called Vibration step stress. The vibration step stress profile consists of two parts – ramp and soak (dwell). The Vibration step stress test begins at a chamber set point of 5 to 10 G<sub>rms</sub> (or lowest stable chamber control set point), as measured over a 2 Hz to 10,000 Hz or greater bandwidth and is then increased (ramped) in 5 to 10 G<sub>rms</sub> increments (5 G<sub>rms</sub> recommended) upon completion of the soak (dwell) period and subsequent functional test. The minimum dwell time is ten minutes, with functional testing performed at the very least at the conclusion of the dwell. In the same way as in temperature step stress, the loading is increased until either the operational and destruction limits of the sample are determined or the chamber maximum is achieved. Figure 10 represents a typical vibration step stress profile. Figure 10- HALT Vibration Profile ### 3.2.4 Combined Vibration and Temperature The last step in ruggedizing a product during HALT is to combine all the stresses that were previously used. In this step, the product is basically subjected to the combination of the vibration and thermal cycling profiles. According to McLean [16] in sum, there should be five thermal cycles in conjunction with vibration and other stresses like power cycling, or humidity. For temperature, we will use the same profile that was used during rapid thermal transitions or to within 10°C of the thermal operating limits. For vibration, each step will be equal to the step stress operating limit divided by five. For example if we achieve a vibration operating limit of 50 G<sub>rms</sub>, each thermal cycle will have a vibration dwell of a multiple of 10 G<sub>rms</sub> for 15 minutes. The combination of thermal cycling and vibration stepping was found to be important because the vibration response of many products might change as the temperature varied. Cross-over effect might also take place and stimulate more otherwise cannot-be-seen failure modes. A new set of operating limit and destructive limit might be determined for this combined environment stimulus. [13] #### 3.2.5 Failure Analysis and Corrective Actions After identifying the operating limit or destructive limit in each of the aforementioned steps, failure analysis was conducted and the root-cause must be investigated and fixed whenever possible in order to keep going up to expose higher level of failures. If modifications could not be made because the failure was not easily correctable, the sensitive areas could be kept at a lower stress level than the rest of the product by for example, using thermal barrier material and epoxy to reduce the temperature and vibration response, respectively. In HALT, most of the failures could be fixed at low to no cost because they might simply be due to incorrect component or materials selection and design flows. [13] [16] #### 3.2.6 Benefits of HALT The benefits from increased product reliability during pre-production provides indirect cost savings by freeing up resources and competitive advantages. For instance, R&D engineers could focus on developing the next product instead of executing rework and design adjustments to an existing one. Figure 11 aptly illustrates cost savings in the long run and reduction in failure rates respectively. Figure 11 shows a step increase in spending rate in periods 7-9 because of HALT testing but products could be launched earlier and at lower costs because of the ability to have economies of scale much quickly due to the confidence of robust products. Long term savings is illustrated by shaded regions after period 10. Figure 11: Spending rate against time: cost reductions on HALT [12] # Chapter 4 - Preparation of Standard Operating Procedure ### 4.1 Overview - The Methodology The basic approach towards establishing a HALT standard operating procedure for reliability testing of PCBs was to work with a very robust board and create a standard operating procedure that will serve as a benchmark for screening other PCBs. Waters uses over 200 different kinds of PCB boards in its various line of products. The robustness of the PCB board was characterized by analyzing the field failure data of various PCB boards. The aim was to find a board which has an extremely low failure rate (less than 1%). The board identified for creating the standard operating procedure was the plunger drive board 210000425 (figure 13) used in Waters Alliance equipment (figure 15). The field failure data for the Alliance depicts that the failure rate of the PCB is around 0.1%. In the subsequent chapters, this board will be referred to as the 'Alliance PCB'. Figure 12- Alliance Plunger drive PCB ### 4.2 Functional Testing of the Alliance board An important part of HALT is to establish the operating and destructive limits of the PCB with respect to both temperature and vibration and to do so it is imperative to know when i.e. at what stage during the HALT step did the PCB fail operationally and destructively. Hence it becomes very important to monitor the vitals of the equipment continuously during the entire HALT procedure. This is done by conducting functional tests – that monitor the performance and the functionality of the PCB. Figure 13- The Waters Alliance Separations module. In the case of the Alliance Plunger drive PCB, this is done by connecting it using extended cables to its parent equipment which in this case is the Waters Alliance e2695 separations module (figure 15) and monitoring the essential functions controlled by the PCB through a Hyperterminal software. The main function of the Alliance PCB is to provide the drive electronics for controlling the following (figure 16) - a two-step motor Pump (Primary and Accumulator) - a four solvent Gradient Proportioning Valve - a Plunger Seal Wash solenoid - four Sparge Valves to provide Helium Sparging - to provide Power to a 24 VDC Fan along with detecting Fan Failure by monitoring a Fan Tachometer signal. The Pump Driver PCB Opto-Isolates all input and output control signals from the host. On board intelligence is provided in logic and state machines within the PLD. The Pump Driver PCB requires three power supplies with their grounds tied at the power supply. The required voltages are +24 VDC, +5 VDC (VCC) and an analog +5 VDC. # 4.2.1 Functional theory of operation Figure 14- Functionality of Plunger Drive PCB. ### 4.2.1.1 The pump, primary and accumulator motors The Pump Driver PCB provides a micro-step drive to both the Primary and Accumulator step motors. Inherent in a micro-step drive are two phases being driven, one with a sine current wave form and the other with a cosine wave form. Change of direction is accomplished by inverting either the sine or cosine. The micro-step resolution is determined by the sine and cosine values sent. For example the full period (360 electrical degrees) of the sine and cosine represents four full motor steps, if you were to divide the 90 degrees (1 full step) by 10 and send the sine and cosine current values for every 9 degrees you would be executing a 1/10 micro-step. This is what is currently implemented in the 2690 and 690 products. However where the sine and cosine generation is done by the host, then any micro-step resolution may be performed within the bandwidth of data transmission. The other component of the sine and cosine is the current amplitude which is proportional to torque produced by the motor. With the current gain of the current sense circuit the maximum current amplitude is 4.13 amperes ### 4.2.1.2 The Gradient Proportional Valve (4) There are two modes of operation on the GPV drive, Slam Mode used to open the GPV valve and Hold Mode used to hold the valve open. All four valves are normally closed with no power applied. The four solvent select signals which come from the host CPU go directly to four N channel FETS for each respective GPV valve A, B, C OR D. These signals also go to the PLD which contains logic to determine the proper time to slam or hold the valves. Slam is defined by applying 24 VDC to the selected valve until the current through it reaches 340 ma then the 24 VDC is removed and 5 VDC is applied in its place to hold it open. Therefore slam and hold modes are determined by which P channel FET is activated. This in turn determines what supply voltage is applied to top of the valve coils, +24 VDC or +5 VDC. There are two comparators that monitor current through the selected valve by measuring the voltage drop across a resistence. SLAMED is low when the valve current has reached 340 ma, while GPVFAIL is low only if the control logic has failed or the valve coil is shorted and the current is greater than 450 ma. The host or CPU commanding this board has the responsibility of ensuring that only one GPV valve is commanded at a time and if GPVFAIL is low then any GPV actuation must be removed. ### 4.2.1.3 Plunger seal wash drive The Plunger Seal Wash Drive utilizes two channels of an octal Darlington transistor Array driver chip. The activation signal SEAL\_WASH\_ON comes directly from the host CPU. The Darlington transistors close the circuit to the Seal Wash solenoid and the +24 VDC supply which activates the solenoid. ### 4.2.1.4 Four sparge valve drives The Sparge Valve Drive uses four channels of an Octal Darlington Transistor Array driver chip. Each channel drives one Sparge Valve. The activation signals are SPARGE\_A, SPARGE\_B, SPARGE\_C and SPARGE\_D which come directly from the host CPU. The four valves are totally independent and may be selected in any combination. The Darlington transistors close the circuit to the Sparge Valve and the +24 VDC supply through a 300 OHM resistor which drops 12 volts. The Sparge Valves are 12 volt coils which are normally closed when no power is applied. #### 4.2.1.5 Fan The Fan has a tachometer output which is simply an open collector transistor which is capable of sinking 100 ma maximum. A normally operating fan emits a clock of 100 HZ. Should the fan slow down to less than 65 % (65 HZ) of its' normal speed or lock up, then a failure will be initiated. The failure signal returned to the host CPU is MTR\_FAIL high. This is a multiplexed signal which is used to identify one of three failures, Fan Fail, Primary Motor Failure or Accumulator Motor Failure. The host CPU should try to run thorough its failure diagnostics to determine which failure has occurred. # 4.2.2 Failure Diagnostics using HyperTerminal software There are two signals returned to the host CPU that identify failures, the first is GPV\_FAIL and the second is MTR FAIL. #### 4.2.2.1 GPV failure GPV\_FAIL is low when the current through one of the four GPV valves has exceeded 450 ma. This could happen if the control logic is not operational or if there is a short across the coil. If the signal GPV\_FAIL goes low the host CPU must remove any commanded GPV valve signals, SOL\_SELA, SOL\_SELB, SOL\_SELC or SOL\_SELD. Once the command signal is removed then the fail signal GPV\_FAIL should go inactive (High). If it doesn't this is a hard failure in the electronics. To determine if the control logic is at fault (all four valves not operating) or if a single valve is defective, the following diagnostic procedure may be implemented. If removing the commanded signal the failure signal GPV\_FAIL went high then the host CPU may activate each of the four GPV Valves, A, B, C or D one at a time. If only one of the valves causes GPV\_FAIL to go low then there is one shorted valve, however if more than one valve causes GPV\_FAIL to go low then a control logic problem may be assumed. # 4.2.2.1 Primary motor, accumulator motor or fan failure MTR\_FAIL is a high active signal that is multiplexed to represent three different failures, Primary Motor Fail, Accumulator Motor Fail or Fan Fail. If the Pump driver develops a motor drive problem (EX: shorted FET) on either the Primary or Accumulator Motor Drive then both motor drive circuits will be disabled. The host CPU detects this via encoder that the motors are no longer moving and MTR\_FAIL should be high. Because the motors stopped, this is known to be either a Primary or Accumulator Motor Failure, thus the Fan Fail has been ruled out because a Fan Fail doesn't stop the motors. Of course, if we have a dual failure all bets are off determining the failure status. The motors stopping says we have a motor failure. The following diagnostic procedure may be used to determine which motor has actually failed. The signal from the host CPU MTR\_PWR should have been high if the motors were moving when the failure occurred. The Pump Driver PCB has taken control and disabled the motors. The pump Driver PCB PLD will give control of the motor which didn't fail to the host CPU after the host CPU toggles from high to low back to high the signal MTR\_PWR. Next the host CPU tries to move both motors the one that moves is fine and the other which doesn't move caused the failure. If neither moves then you can assume both failed. To determine a Fan Failure is simple, if you get MTR\_FAIL high and both motors are operational then a Fan Failure has occurred. In the case of a Fan Failure, the motors shouldn't be run and MTR\_PWR should be brought low, inactive. The diagnostics are run continuously throughout the entire HALT testing via hyper-terminal, connected using the serial communication port (RS232). The commands are sent from computer through the HyperTerminal continuously to the CPU board to activate the signal on plunger drive board. The Operational Limit was determined once the hyper-terminal communication outputs the sign "hardware failure" or if any specific functionality can no longer be activated. Figure 17 shows the screenshot of the hyper terminal software while it was performing the functional test. Figure 15- HyperTerminal functional test output # 4.3 Data Acquisition and Equipment Used In order to execute the HALT process accurately, the following equipments were used in different settings so as to measure and optimize the product response. # 4.3.1 Temperature Measurement Within the HALT chamber, there are two thermal couples, one is in the shape of a cylindrical steel bar and it measures the process temperature, while the other measures the Device Under Test (DUT) temperature. For more details on the Temperature measurement setup refer to thesis by Chang [1] #### 4.3.2 Vibration Response Measurement A triaxial accelerometer (Figure 18) was used to monitor the vibrational response of the fixture and also the vibrational response at various locations on the PCB (when mounted on the fixture). The signal from the accelerometer was processed by a DAQ (Data Acquisition) device (figure 19). The DAQ was connected to a National Instruments device (figure 20) to output the vibration levels in X, Y and Z direction on a computer through a LABVIEW code (figure 21). Figure 16- The triaxial accelerometer used for monitoring vibration. 3 wires (X, Y & Z) from the tri-axial accelerometer connect to the DAQ Figure 17- Data acquisition device. 3 wires (X, Y & Z) from the DAQ connect to the National Instrument DAQ Figure 18- National Instruments DAQ. Figure 19: LabVIEW output that measures vibration in X, Y & Z directions # 4.4 Experimental Step and Analysis In each experimental step, the operating limit and the destructive limit was identified. # 4.4.1 Hot Step Stress The hot step stress involved taking the Alliance PCB to extremely high temperatures in a stepped manner till operational failure was observed. The temperature at which the operational failure is observed is called as the operational limit. The operational limit of the Alliance board was found to be 140° C. For details of the experiments and results refer to thesis by Chun Ming Chang [1]. # 4.4.2 Cold Step Stress The cold step stress involved taking the Alliance PCB to extremely low temperatures in a stepped manner till operational failure was observed. The temperature at which the operational failure was observed is called as the operational limit. The operational limit of the Alliance board was found to be -80° C. For details of the experiments and results refer to thesis by Chun Ming Chang [1]. #### 4.4.3 Thermal cycling In thermal cycling, the PCB was exposed to a series of rapid thermal transition from extremely high temperature to extremely low temperatures. The upper temperature limit was set as 75° C and the lower temperature limit was set as -75° C. The dwell time at each extreme temperature was set as 15 minutes. The Alliance PCB was taken through five such cycles. No failures were observed in thermal cycling. For details of the experiments and results refer to thesis by Chun Ming Zhang. #### 4.4.5 Vibration #### 4.4.5.1 Vibration Fixture The vibration fixture for the Alliance PCB consisted of four identical modules. Each module was cuboidal in shape and was machined from a 0.5-inch thick aluminum plate. The external dimension of each module was 3" X 3" X 0.5". Four counter bores were made on each module to screw it down to the vibration table of the HALT chamber. In addition, one tapped hole was machined on the fixture to screw the standoff for mounting the PCB. The module used for the Alliance PCB fixture is shown in figure 22. Each module of the fixture was mounted to the vibration table using four 3/8"-16 screws torqued to 20 ft-pounds. Each module had a standoff screwed on it. The spatial arrangement of the entire fixture is shown in figure 23 .The PCB was mounted on the fixture using four M4 screws at the four corners. (Figure 24). For more details into the design of the fixtures, refer to thesis by Dexter Chew [2]. Figure 20 – Module of the Alliance PCB fixture Figure 21 – Alliance fixture used for vibration step stress and combined cycle Figure 22 – Exploded view of the vibration step stress arrangement # 4.4.5.2 Response Characterization of the Fixture During the Vibration step-stress testing, the goal is to find vibration related failures and to mechanically fatigue the PCB so that the weakest portion of the PCB will fail quickly. Hence during the Vibration Step Stress, accelerometers were placed on the PCB to evaluate the overall transmission of vibration onto the PCB by the fixture. Experiments were conducted to measure the response of the PCB using accelerometers at various vibrational set points (G<sub>rms</sub>) of the HALT chamber. Since the HALT chamber vibration table provides energy in all three axes, a triaxial accelerometer was used for this purpose. # 4.4.5.2.1 Experimental Setup The experimental setup comprises of a HALT chamber, the Alliance fixture, a triaxial accelerometer, National Instrument's data acquisition equipment and the Alliance PCB. Each module of the fixture was mounted to the table of the HALT chamber using four 3/8"-16 bolts and washers, torqued to 20 ft-pounds (figure 25). Each module of the fixture was arranged spatially on the table in such a manner that the mounting holes on the PCB match with the standoffs on the fixture. The PCB was mounted onto the fixture using M4 screws and were hand tightened using a screwdriver. The triaxial accelerometer is mounted on the surface of the PCB using wax (figure 26). The triaxial accelerometer was connected to the National Instrument's Data Acquisition equipment. Data from this equipment was processed through a LABVIEW code, that output the vibration level in the x, y and z-direction in G<sub>rms</sub>. Figure 23 – Alliance fixture mounted on the HALT chamber vibration table Figure 24 – The triaxial accelerometer mounted on the Alliance Figure 25 – Vibration Step Stress experimental setup #### 4.4.5.2.2 Location of Accelerometers Since vibration cannot be characterized at every point on the PCB, a few points were selected on the PCB to measure the vibrational response. Vibration simulations at various frequencies were performed in Solidworks considering the PCB as a flat plate rigidly fixed at it four mounting points (figure 28). The points of maximum deflection were selected as the location for placing the accelerometers. The three points selected for placing the accelerometers are 1, 2 and 3 as shown in figure 29. The x,y and z-directions are the reference for placing the accelerometer. Figure 26 - Vibration Simulation on the Alliance PCB Figure 27 – Locations for placing the accelerometers # 4.4.5.2.3 Experimental Results # **Experiment 1: Location 1** The variation of the vibration level at Location 1 as read from the accelerometer with change in the vibration set point of chamber is given in Table 1. Figure 30 shows the graphical variation of vibration at location 1 with variation of the set point vibration of the HALT chamber | S.No | Vibration Level of Chamber (G <sub>rms</sub> ) | Vibration Level in Grms | | | |------|------------------------------------------------|-------------------------|-----|----| | | | X | Y | Z | | 1 | 10 | 7 | 7.7 | 2 | | 2 | 20 | 13 | 19 | 23 | | 3 | 30 | 19 | 23 | 35 | | 4 | 40 | 26 | 30 | 42 | | 5 | 50 | 33 | 38 | 60 | | 6 | 60 | 42 | 44 | 45 | | 7 | 70 | 49 | 50 | 48 | | 8 | 80 | 58 | 52 | 50 | Table 1 - Vibration set point of HALT chamber and the response of the Alliance PCB at location 1 Figure 28: Graphical representation of Vibrational variation at location 1 on the PCB. Note that the Vibration response is around 50% of input vibration. The spikes on the graph of Vibration response of Z Axis is due to fluctuations observed in the Grms readings from the accelerometers # **Experiment 2: Location 2** The variation of the vibration level at Location 2 as read from the accelerometer with change in the vibration set point of chamber is given in Table 2. Figure 31 shows the graphical variation of vibration at location 2 with variation of the set point vibration of the HALT chamber | S.No | Vibration Level of Chamber (Grms) | Vibration Level in G <sub>rms</sub> | | | |------|-----------------------------------|-------------------------------------|------|-----| | | | X | Y | Z | | 1 | 10 | 4.9 | 4.9 | 5.8 | | 2 | 20 | 9.7 | 10.5 | 11 | | 3 | 30 | 15 | 16 | 24 | | 4 | 40 | 19 | 22 | 31 | | 5 | 50 | 22 | 27 | 40 | |---|----|------|----|----| | 6 | 60 | 25.5 | 32 | 34 | | 7 | 70 | 29 | 37 | 38 | | 8 | 80 | 33 | 43 | 45 | Table 1 - Vibration set point of HALT chamber and the response of the Alliance PCB at location 2 Figure 29: Graphical representation of Vibrational variation at location 2 on the PCB. Note that the Vibration response is around 50% of input vibration. The spikes on the graph of Vibration response of Z Axis is due to fluctuations observed in the $G_{rms}$ readings from the accelerometers. # **Experiment 3: Location 3** The variation of the vibration level at Location 3 as read from the accelerometer with change in the vibration set point of chamber is given in Table 3. Figure 32 shows the graphical variation of vibration at location 2 with variation of the set point vibration of the HALT chamber | S.No | Vibration Level of Chamber (G <sub>rms</sub> ) | Vibration Level in Grms Axis- | | | |------|------------------------------------------------|-------------------------------|----|----| | | | X | Y | Z | | 1 | 10 | 5 | 8 | 7 | | 2 | 20 | 9.5 | 15 | 17 | | 3 | 30 | 14 | 24 | 21 | | 4 | 40 | 18.5 | 32 | 26 | | 5 | 50 | 23 | 39 | 42 | | 6 | 60 | 27 | 47 | 37 | | 7 | 70 | 32 | 54 | 40 | | 8 | 80 | 34.5 | 58 | 45 | Table 2 - Vibration set point of HALT chamber and the response of the Alliance PCB at location 3 Figure 30: Graphical representation of Vibrational variation at location 3 on the PCB. Note that the Vibration response is around 50% of input vibration. The spikes on the graph of Vibration response of Z Axis is due to fluctuations observed in the Grms readings from the accelerometers Based on the results from the response experiment, it can be said that the vibration fixture transmits around half the vibration as compared to the set point vibration of the HALT table. Thus it can be concluded that the transmissibility of the fixture is around 50%. # 4.4.5.3 Vibration Step Stress # 4.4.5.3.1 Vibration Step Stress Profile The vibration step stress profile used for carrying out the vibrational loading test for the PCB Alliance is shown in figure 33. The vibration test starts at $10 G_{rms}$ and increases in increments of $10 G_{rms}$ until the vibration value reaches $90 G_{rms}$ which is the HALT chamber limit for vibration. The dwell time selected at each $G_{rms}$ value was 15 minutes and the temperature of the chamber was set as 25-degree celsius throughout the experiment. Figure 31 – Vibration Step Stress Profile # 4.4.5.3.2 Experimental Setup and Procedure The experimental setup was comprised of a HALT chamber, the Alliance fixture, Waters Alliance equipment and the Alliance PCB. Each module of the fixture was mounted to the table of the HALT chamber using four 3/8"-16 bolts and washers, torqued to 20 ft-pounds. Each module was arranged spatially on the table in such a manner that the mounting holes on the PCB match with the standoffs on the fixture. The PCB was mounted onto the fixture using M4 screws and were hand tightened using a screwdriver. The PCB was connected to the Waters Alliance equipment (present outside the chamber) through a series of extended cables. The Waters Alliance equipment was connected to a computer that runs the HyperTerminal software. The HyperTerminal was used to continuously monitor the vitals of the equipment and run functional tests. Figure 34 and figure 35 shows the experimental setup. Figure 32 – Vibration step stress experimental setup Figure 33 – Vibration step stress experimental setup # 4.4.5.3.3 Experimental Results The operational limit of the Alliance board was found to be 60 G<sub>rms</sub>. At this value, the Hyper Terminal showed hardware failure. An error message showing – 'solvent delivery hardware failure' (figure 37) was observed on the equipment's screen. But when the vibration was stopped and the equipment restarted the PCB started functioning normally. As the vibration was increased to 70 G<sub>rms</sub> the equipment again failed destructively. This was verified by using a separate Functional test fixture (figure 38). Figure 39 depicts the error message observed at the destructive limit of the Alliance PCB – 'GPV valve drive failed'. The vibration step stress was performed on five different Alliance PCB and consistent failure modes were observed at 70 G<sub>rms</sub> i.e. the same error message was observed post the vibration step stress experiment. Figure 34 – Vibration Step Stress Profile Figure 35 – Error message observed during vibrational step stress Figure~36-Alliance~PCB~functional~test~fixture # 4.4.6 Combined Thermal Cycling and Vibration # 4.4.6.1 Combined Cycle Profile The combined cycle consisted of stressing the PCB through a combination of thermal and vibrational stresses. The thermal stress was induced by cycling the temperature from -60 degrees C to +80 degrees C. These temperature limits were set by decreasing the thermal operational limits of the Alliance board by 10 degrees C. The dwell time at each extremes was set as 15 minutes, whereas the ramp rate was set as 35 degrees per minute. For vibration, each step was set equal to the step stress operational limit divided by 5. Thus vibration starts with 12 G<sub>rms</sub> and increases to 60 G<sub>rms</sub> in steps of 12 G<sub>rms</sub>. Hence, the combined cycle consists of 5 thermal cycles with vibration increasing in steps from one cycle to the other. The combined cycle used for the Alliance board is shown in figure 40. Figure 38 – Combined Cycle HALT profile # 4.4.6.2 Experimental Setup and Procedure The experimental setup and procedure was the same as used for the vibration step stress experiment. The experiment was conducted on two Alliance boards separately to verify the consistency of the result. Figure 39 - Combined Cycle HALT profile # 4.4.6.3 Experimental Results Capacitor breakage is observed in the 4<sup>th</sup> cycle when the vibration level was 50 G<sub>rms</sub>. The same result was replicated in another PCB at around the same stress level. The actual combined cycle profile is shown in figure 41. Figure 42 shows the capacitor that broke during the combined cycle HALT testing. Figure 43 shows the microscopic image of the Leeds of the capacitor at the point of fracture. From the image it was inferred that the failure was due to fatigue stress. Figure~40-Capacitor~breakage~observed~during~combined~cycle~of~2100000425~PCB Figure 41-Microscopic Image of the fractured surface # Chapter 5 – Validation of the Standard Operating Procedure #### 5.1 Introduction Since the standard operating procedure for implementing and performing HALT was prepared by extensively testing the Alliance board, it was important that the same standard operating procedure was applied to another board to validate that the same standard Operating procedure is effective in precipitating and detecting failures on a variety of different boards that Waters equipment use. The reason for choosing the Alliance board was its high on field reliability i.e. extremely less number of on-field failures. The failure data revealed that the number of failures for the Alliance between 2011 and 2015 were about 0.1%. The idea was that a standard operating procedure prepared using the Alliance board would set a high standard for all other PCB to meet. The PCB to be used for validating the standard operating procedure of HALT should have more field failure rate (>1%), thus it must be a less reliable and robust board. # 5.2 Selection of Acquity PCB The PCB selected for validating the standard operating procedure was the 210000422 – Binary Solvent Manger Pump Driver Board (figure 44). This board is used in a different class of Waters equipment called Acquity. The following sections discuss in detail the reasons for selecting the Acquity board. Figure 44 – Acquity PCB selected for validating the HALT SOP ### 5.2.1 Ease of testing and root cause analysis Though HALT enables us to find the weak links in the design, the most important reason for performing HALT testing is to be able to redesign the products to eliminate the weak links. An important step in this process is to be able to do root cause analysis. Performing the root cause analysis of a failed PCB can be a tedious process because of the complexity and the number of minute electrical connections and components. Thus most companies including Waters uses customized functional and in-circuit test fixtures to identify the failure modes and troubleshoot them. But since the PCBs used in Waters equipment are manufactured in Singapore, most of these fixtures are present overseas in Singapore. We had access to only a limited number of PCB testing fixtures. Hence the first step was to identify the list of PCBs for which Waters had both the incircuit and functional test fixtures at their Milford, USA facility. The list of such PCBs is shown in table 4. | S.no | Part No | Description | |------|-----------|-------------------------------------------| | 1 | 210000114 | PCB - Fluidic Driver 2790 | | 2 | 210000150 | PCB - 2690 Heater Cooler | | 3 | Acquity | PCB - Acquity Binary Pumps | | 4 | 210000319 | ASSY, PCB Alliance CPU w/ACV and Ethernet | | 5 | 210000339 | PCB - Prep Pump | | 6 | 210000411 | PCB - EALLIANCE_CPU_ROHS | | 7 | 210000444 | PCB - Alliance Heater Cooler 2690 | Table 3 - List of PCBs for which both In circuit and functional test fixture were available #### 5.2.2 Field Failure Data. The next step in the process of selecting the PCB was to identify a board from the list which had the maximum on-field failure rate. Based on the data obtained from Waters, the PCB with the highest failure rate under warranty was 210000422 PCB with around 8% failure. In addition to this, the 210000422 PCB also had a lot of data on the on field failure modes from the service reports. Figure 45 shows the most prominent failures observed on the field as obtained after analyzing over hundred service reports. In subsequent sections the 210000422 PCB will be referred to as the Acquity PCB. Our aim through the HALT methodology was to identify operational limits and precipitate the same failure or identify the same failure modes as observed in the field. It was also predicted that the operational margin of Acquity would be less than that of Alliance. Thus validation was performed on the PCBs and results were compared. Figure 45: On field failure modes of the Waters Acquity Binary Solvent Manager # 5.2 Functional testing of the Acquity PCB The Waters Acquity Binary Solvent Manager (figure 46) contains four pumps – Primary pump A, Accumulator pump A, Primary pump B and Accumulator pump B. These pumps are controlled by two Acquity PCBs. The two PCBs are designated as PCB A and PCB B. Each PCB controls one accumulator and one primary pump. Figure 47 shows the arrangement of pumps and PCBs inside the Binary Solvent Manager. Figure 42 - The Binary Solvent Manager that uses Acquity PCB Figure 43: Arrangement of Pumps and PCBs inside the Binary Solvent Manager The Binary solvent manager has the capability to mix two solvents namely A and B, in any required proportion and pump it through the system at any required flow rate. Waters has a proprietary software called 'CONSOLE' that is used to monitor the flow rate and pressures being output by each of the four pumps in addition to monitoring the over system flowrate and pressure. Figure 48 shows the screen shot of the user interface of the Console. Figure 44: Screen shot of the user interface of the Console All the HALT testing was done on the 2100000422 board B. The console was used to continuously monitor the flow rate and pressure i.e. carry out the functional tests. ## 5.3 Experimental setup Figure 45 – Experimental setup for testing Acquity board The experimental setup for validating the HALT standard operating procedure consisted of HALT chamber, Waters Acquity Binary Solvent Manager Equipment, Acquity PCB board, a set of connection cables connecting the PCB to the equipment and a fixture for mounting the PCB (figure 49). The PCB Acquity was connected to the Binary Solvent Manager using a set of 14 extended connection cables so that the PCB can be inside the chamber while still being connected to the equipment. The Console was used to perform functional tests on the Acquity board. Since the Acquity is a pump driver board, the console monitors the pressure and the flow rate of the accumulator and primary pumps present in the equipment. The console was used to monitor the pressure and flow rate of the pumps to check for any aberrations from expected behavior, while the Acquity was stressed inside the HALT chamber. # 5.4 Experiments – Acquity PCB # 5.4.1 Hot Step Stress In the Hot Step Stress, the PCB was subjected to elevated temperatures in a stepped manner. The experiment was started at 20-degree celsius and increased in steps of 20-degree Celsius. The PCB was soaked at each step for 15 minutes. The flow rate of the system was set as 1ml/min for the entire experiment ,with 50% from pump A and 50% from pump B. The console was used to monitor the pressure of the primary pump of the Waters Acquity equipment in real time. The temperature was increased till the operational limit of the PCB was reached. The profile used for conducting the HALT hot step stress is shown in figure 50. Figure 46: Hot step stress profile Figure 47 - Console output showing pressure fluctuation during Hot Step Stress Figure 51 shows the output of the console as observed during the Hot Step Stress experiment. Rapid pressure fluctuations were observed for both A and B accumulator pumps as well as on the system level at 90° C. When the temperature was brought down to 80° C, the system pressure went back to normal. Thus the UOL (Upper Operational Limit) was found out as 90° C. The same experiment was repeated at 0.5 ml/min and 1.25 ml/min flow rate to check the dependence of the upper operating limit on the system flow rate. Operational failure was again observed at 90° C (figure 52) Figure 48 – Console output showing pressure fluctuation during Hot Step Stress Figure 49 – Console output showing pressure fluctuation during Hot Step Stress At 0.5 ml/min and 1.25 ml/min, operational failure was observed at the same temperature of 90° C (figure 53). Thus the UOL (Upper Operating Limit) of the Acquity board was fixed at 90° C. # 5.4.2 Cold Step Stress In the Cold Step Stress, the PCB was subjected to extremely low temperatures in a stepped manner. The experiment was started at 0-degree Celsius and decreased in steps of 20-degree Celsius. The PCB was soaked at each step for 15 minutes. The flow rate of the system was set as 1ml/min for the entire experiment ,with 50% from pump A and 50% from pump B. The console was used to monitor the pressure of the primary pump of the Waters Acquity equipment in real time. The temperature was decreased till the operational limit of the PCB was reached. The profile used for cold step stress is shown in figure 54. Figure 50: Cold step stress profile Figure 51 – Console output showing pressure fluctuation during Cold Step Stress Figure 55 shows the output of the console as observed during the Cold Step Stress experiment. Rapid pressure fluctuations were observed for both A and B accumulator pumps as well as on the system level at -90° C. When the temperature was brought down to -80° C, the system pressure went back to normal. Thus the LOL (Lower Operational Limit) was found out as -90° C. ## 5.4.3 Thermal Cycling In thermal cycling, the PCB was exposed to a series of rapid thermal transition from extremely high temperature to extremely low temperatures. The upper-temperature limit was set as 75° C and the lower temperature limit was set as -75° C. The dwell time at each extreme temperature was set as 15 minutes. The PCB was taken through five such cycles. The profile used for thermal cycling is shown in figure 56. Figure 52: Thermal cycling profile No failures were observed during the thermal cycling. ## 5.4.3 Vibration Step Stress ## 5.4.3.1 Vibration Step Stress Profile The vibration step stress profile used for carrying out the vibrational loading test for the PCB Acquity is same as that of the Alliance board and is shown in figure 57. The vibration test was started at $10 G_{rms}$ and was increased in increments of $10 G_{rms}$ until the vibration value reached $90 G_{rms}$ which is the HALT chamber limit for vibration or an operational failure was observed. The dwell time selected at each $G_{rms}$ value was 15 minutes and the temperature of the chamber was set as 25-degree celsius throughout the experiment. Figure 53 – HALT profile for vibration step stress #### 5.4.3.2 Vibration Fixture The vibration fixture for the Acquity PCB consisted of five identical modules. Each module is cuboidal in shape and was machined from 0.5-inch thick aluminum plate. The external dimension of each module is 3" X 3" X 0.5". Four counterbores were made on each module to screw it down to the vibration table of the HALT chamber. In addition, several tapped holes were machined on the fixture to screw the standoffs for mounting the PCB. These tapped holes were added to make each module identical, thereby preventing the need for different modules for the same PCB. The module used for Acquity fixture is shown in figure 58. For more details on the fixture design, refer to Dexter's thesis. Figure 54 – Module of the Acquity PCB fixture Each module of the fixture was mounted to the vibration table using four 3/8"-16 screws torqued to 20 ft-pound. Each module had standoffs screwed on it at different locations (figure 59). Figure 55 – Standoffs provided on the Acquity fixture (five module configuration) The spatial arrangement of the entire fixture is shown in figure 60 and figure 61 Figure 56 – Acquity PCB mounted on the fixture Figure 57 – Acquity PCB mounted on the fixture The PCB was mounted on the fixture using four M4 screws at the four corners and one M3 screw at the center. (Figure 62) Figure 58 – M4 & M3 screws for mounting the PCB on the standoffs of the Acquity fixture # 5.4.3.3 Experimental Results. The vibrational operating limit was observed to be 50 $G_{rms}$ . Consistent operational failures were observed at 50 $G_{rms}$ . The error message observed on console – 'Pump B: Pump Motor lost synchronization. At this point, the pumps automatically turned off, which lead to loss of pressure in the entire system (figure 63). Hence the vibration operational limit was fixed at 50 $G_{rms}$ . Figure 59 – Console output showing operational failure of Acquity PCB during vibration step stress ## 5.4.3.4 Vibrational step stress – 4 module configuration of the fixture Though the standard operating procedure calls for using a fixture that simulates the end use environment, research also revealed that it is sometimes recommended to secure the PCB to a fixture that does not secure the product or assembly as it is secured in its end use application (boundary conditions) to precipitate additional failure modes [18]. Thus vibration step stress was also carried out on Acquity PCB using a four module configuration of the fixture (figure 64). The same vibration step stress profile was used. Figure 60: 4 module configuration for vibration step stress Operational failure was observed much earlier level as compared to the five module configuration of the fixture. At the 3rd step of the profile i.e. at 30 Grms Pump B over pressure error was observed (figure 65) Figure 61: Operational failure observed at 30 Grms for four module fixture configuration # 5.4.5 Combined Cycle # 5.4.5.1 Combined Cycle Profile The combined cycle consisted of stressing the PCB through a combination of thermal and vibrational stresses. The thermal stress was induced by cycling the temperature from -75 degrees C to +75 degrees C. These temperature limits were set by decreasing the thermal operational limits of the Acquity board by 15 degrees C. The dwell time at each extremes was set as 15 minutes, whereas the ramp rate was set as 35 degrees per minute. For vibration, each step was set equal to the step stress operational limit divided by 5. Thus vibration starts with 10 G<sub>rms</sub> and increases to 50 G<sub>rms</sub> in steps of 10 G<sub>rms</sub>. Hence, the combined cycle consists of 5 thermal cycles with vibration increasing in steps from one cycle to the other. The combined cycle used for the Acquity board is shown in figure 66. Figure 62: Combined Cycle profile for Acquity # 5.4.5.2 Combined cycle experimental results Operational failure was observed in the beginning of the $4^{th}$ cycle when the vibration level of 40 $G_{rms}$ . The error message that was observed on the console was: Pump B – Pump motor lost synchronization. Figure 63: Error message observed during combined cycle Acquity Post the operational failure the combined cycle experiment was continued for another cycle. Capacitor breakage was observed in the $5^{th}$ cycle when the vibration level was $50~G_{rms}$ . Figure 68 shows the capacitor that broke during the combined cycle HALT testing. Figure 64: Capacitor breakage observed during combined cycle of Acquity PCB #### 5.5 Determinants of HALT success For a HALT test to be successful the closed loop corrective action process must be followed. If failure analysis is not carried through to root cause, the benefits of HALT are lost. To be effective, results must be: - Fed back to design to make a circuit design change, select a different supplier, or improve the existing suppliers process. - Fed back to manufacturing for a process change. - Used to determine the production test profile. Another key for success is the intimate involvement by members of several key areas. These include: - Management Management must allocate sufficient resources, time, and funds for HALT testing to take place. They must also provide support during the failure analysis phase in order to get closed loop corrective action in a timely fashion. - Development / Engineering Design Engineering needs to be immediately available to troubleshoot failures which can sometimes be beyond the scope of the HALT Test Engineers. - Suppliers Suppliers must be willing and able to provide component failure analysis in order to obtain root cause. Other key factors to success include the placement of HALT in the product development timeline, sample size, the perceived relevance of failures, and failure reporting. A HALT test should begin as soon as HW and SW is available and stable. The testing should use as many units as possible, as the probability of uncovering a defect increases with sample size. Failures that occur during testing should be treated as relevant and pursued to root cause. Finally, failure reporting must be visible enough to prevent the failures and solutions gleaned from HALT from being overlooked. ## 5.5 Cost of Implementing HALT For HALT to be cost effective the cost must be less than the anticipated benefits. HALT requires destruction of a few prototypes at a critical stage, and prototype build is a leading cost item in product development. Also, there are other costs such as the manpower required to conduct HALT, the depreciation of test equipment, consumable costs, and corrective action costs. The cost involved for HALT is as follows. - Cost of PCBs: The HALT standard operating procedure calls for doing the testing on 3-5 PCBs to establish operating and destructive margins with a certain amount of confidence. HALT requires destruction of a few prototypes at a critical stage, and prototype build is a leading cost item in product development. - Cost of Fixture: As per Dexter's thesis [2] different sizes of PCBs require customized fixtures. In general most of the boards are supported at a minimum of four points. Hence in most cases at least a four module fixture configuration will be required. These modules are made up of 0.5 inch 1 inch stainless steel plates and can be expensive. Also there is added costs of screws and washers. - Cost of Liquid Nitrogen: Liquid nitrogen is continuously consumed by the HALT chamber during the HALT testing especially during the Cold Step stress, Thermal Cycling and Combined Cycle steps. As per our estimation a total of 4-5 cylinders will be required for taking one PCB through the HALT process. Hence for testing 3 -5 PCBs, one might require anywhere between 12 25 cylinders of Liquid nitrogen. - Cost of other Consumables: Other consumables include the cost of insulated copper cables, connectors, crimps, heat shrink tubes etc. that are required to be prepare the cables so that the unit under test (UTT) can be isolated from the equipment and kept inside the HALT chamber. • Cost of Manpower: HALT process requires the continuous engagement of a technician and/or a reliability engineer to carry out the tests effectively and document the findings. Hence the salary of that particular person is also included in the operational costs. #### 5.6 Cost benefits of Implementing HALT The following are the cost benefits of implementing HALT - Lesser Warranty Costs: Warranty cost is one of the main overhead cost that a company bears on account of less reliable products. Increase in reliability using HALT leads to decrease in the on field failure rate, which decreases the warranty costs associated with replacing the product. - Lesser Inventory Costs: As the reliability of the product increases, the number of on field failures decreases. This results in the company requiring less number of spares to be held in their warehouse for replacement, thereby freeing up warehouse space leading to decreased inventory storage cost. - Lesser Costs associated Product Design Iterations: Usually a product especially PCBs have to go through number of design iterations or engineering changes during its life cycle. With the implementation of HALT the number of design iterations or engineering changes will decrease drastically as the weak links and potential failure modes will be identified early in the product development process. The following cost associated with an engineering change will be saved - (1) Scrap Costs - (2) Rework Costs - (3) Cost of new parts - (4) Cost of redesign/engineering cost - Lesser Product Development Costs: With the implementation of HALT, the product development time decreases significantly. New products can be brought to the market much faster and thus resource spent per product will decrease resulting in great financial benefit for the company. • Increased Market Reputation: Implementation of HALT also provides intangible benefits apart from the direct cost benefits. Increase in product reliability leads to increase in the reputation of the company, which is extremely crucial in the highly competitive market. HALT gives an edge to a company over its competitors. # Chapter 6: Conclusion and Future Work Today the reliability of electronics is an important factor of success in the highly competitive global markets. The modern consumer will not tolerate products that do not perform in a reliable fashion, or as promised or advertised and if the manufacturer does not design its products with reliability and quality in mind, someone else certainly will. In addition, customer dissatisfaction with a product's reliability can have disastrous financial consequences to the manufacturer. A company's reputation is very closely related to the reliability of their products and while it takes a long time to build up a reputation for reliability, only one malfunctioning product can brand the whole company as unreliable. Manufacturers and designers of electronic products are constantly facing new challenges while trying to cope with consumers and market's continuous demands of new features and improved performance in a smaller size. The component sizes, I/O (input/output) pitches and line widths of the printed circuit boards (PCBs) have been decreasing rapidly and the increasing packaging density is causing manufacturing and reliability challenges. It is a generally recognized fact that the reliability of electronics is to a large extent determined by the ability of electrical interconnections to withstand various loadings during product's operational lifetime. Reliability testing is needed to provide assurance that designs and products are reliable and durable in service, and due to modern day's tight time-to-market requirements and shortening product cycles, these tests should be conducted in a timely fashion. Highly Accelerated Life Testing (HALT) is one of the most popular stress testing methodology used for determining and increasing product reliability. HALT testing is currently used by most major manufacturing organizations to improve product reliability in a variety of industries, including electronics, computer, medical and military. This thesis concentrates on the creation and validation of a standard operating procedure for implementing Highly Accelerated Life Testing (HALT) for increasing the reliability of PCBs used in various Waters equipment. Waters uses over 200 different types of PCBs in their wide range of product, thus ensuring good reliability of their PCBs is of high importance to Waters as a company. Another important point is the fact that Waters currently does not do the root cause analysis of the PCBs that fail on the field. They simply swap the failed PCB with the new PCB, and the failed PCB is disposed. Under such circumstances implementing HALT to improve the reliability of PCBs is critical to Waters in order for them to stay cost competitive. At present Waters is losing money on warranty costs as some of the PCBs have a failure rate of over 5%. Also as mentioned earlier it is not only about the money but also about the reputation of Waters as a reliable brand. Contrary to its name, HALT is not a lifetime test: it is not intended to estimate lifetimes or mean times to failure, but rather to find the weakest locations on the product with supreme time compression. In HALT the loadings are stepped up to well beyond the expected field environments of the test specimen. The loadings are not meant to stimulate the field environments at all but to find the weak links in the design in a very short timescale. Flaws and weaknesses are eliminated until the fundamental limit of technology is reached, but only those failures that are likely to occur in normal field conditions are addressed. The fundamental stresses that are applied during HALT are thermal (high & low temperature) and vibration. These stresses are applied individually as well as in combination to precipitate failures and to determine the product's operational and destructive limits. The upper and lower operation limits indicate the stress levels in which the product ceases to function. However, if upper or lower destruction levels are not reached, the product can be restored to normal operation by decreasing the stress level to the specification limits. The upper and lower destruction limits (also referred as endurance limits) indicate the stress levels at which the product fails catastrophically and cannot be restored. The basic approach towards establishing a HALT standard operating procedure for reliability testing of PCBs at Waters was to work with a very robust board and create a standard operating procedure that will serve as a benchmark for screening other PCBs. The robustness of the PCB board was characterized by analyzing the field failure data of various PCB boards. The aim was to find a board which has an extremely low failure rate (less than 1%). The board identified for creating the standard operating procedure was the plunger drive board Alliance used in Waters Alliance equipment. Extensive Accelerated Life testing was done on the Alliance PCB by subjecting it to thermal and vibration stresses through Hot Step Stress profile, Cold Step Stress profile, Thermal Cycling profile, Vibration Step Stress profile and Combined Cycle profile and the operating destructive limits were identified. The Upper Operating limit for temperature was found to be 140°C and the lower operating limit for temperature was found to be -80°C. The Alliance passed the five cycles of thermal cycling. In case of vibrational loading, the operating limit was found to be 60 G<sub>rms</sub> and the destructive limit was found to be 70 G<sub>rms</sub>. During the combined cycle destructive failure was observed in the form of breakage of the capacitor in the 4<sup>th</sup> cycle i.e. at 50 G<sub>rms</sub>. All the vibration and the combined cycle tests were done on custom designed fixtures. For more detail on the fixture design refer the Dexter's thesis (). The values of operating limits and destructive limits were used to characterize the Alliance i.e. it defines the limits of stresses that the board cab sustain. A standard operating procedure for HALT was prepared using the experiments and testing done on the Alliance PCB. Since the standard operating procedure for implementing and performing HALT was prepared by extensively testing the Alliance board, it was important that the same standard operating procedure was applied to another board to validate that the same standard operating procedure is effective in precipitating and detecting failures on a variety of different boards that Waters equipment uses. The PCB used for validating the standard operating procedure of HALT was the Acquity PCB – Pump driver board for the Waters Acquity Binary Solvent Manager. The Acquity PCB had a much higher field failure rate (>5%). In addition, we also had access to specific failure modes in terms of the errors being observed by the customers on the field and our expectation going into testing of the Acquity was to precipitate those exact failures using the standard operating procedure. It was also predicted that the operational margins of Acquity as obtained through HALT would be less than that of Alliance as the latter had a lesser failure rate and is more reliable. Accelerated Life testing was done on the Acquity PCB using the standard operating procedure by subjecting it to thermal and vibration stresses through Hot Step Stress profile, Cold Step Stress profile, Thermal Cycling profile, Vibration Step Stress profile and Combined Cycle profile and the operating limits were identified. The Upper Operating limit for temperature was found to be 90°C and the lower operating limit for temperature was found to be -90°C. The Acquity passed the five cycles of thermal cycling. In case of vibrational loading, the operating limit was found to be 50 G<sub>rms</sub>. During the combined cycle operational failure was observed during the 4<sup>th</sup> cycle at 40 G<sub>rms</sub> and destructive failure was observed in the form of breakage of the capacitor in the 5<sup>th</sup> cycle i.e. at 50 G<sub>rms</sub>. All the vibration and the combined cycle tests were done on custom designed fixtures. For more detail on the fixture design refer the Dexter's thesis (). In addition to this, we were able to precipitate exact failures modes as observed by the customers on the field. Some of the errors observed were – Pump motor lost synchronization, pump homing error, Pump over pressure error, pump inter communication hardware failure etc. Upon comparison of the results of HALT testing of the Alliance – Plunger Drive board and the Acquity – Pump Drive Board, the results were as predicted. As mentioned in previous paragraphs the operating limits of the Acquity board were lesser than the Alliance board. This result is in direct correlation with the on field failure data obtained for these two boards. While the Alliance has extremely high reliability with only 0.1% failure rate, the Acquity is much less reliable with 5% failure rate under warranty. The next steps would be to decrease the field failure rate of the Acquity board by increasing it reliability. The reliability can be increased by increasing the operational limits of the Acquity board in both temperature and vibration and try to bring it as closer to the Alliance board as possible. This can be done by working inconjuction with the R&D engineers to first doing the root cause analysis to find the reason of the operational failure (the weakest link). This weakest link has to be eliminated to increase the operating limit. HALT can be implemented in two possible situations. One is for validating a new board being developed by waters i.e. a new PCB in its alpha or beta phase. The second is to validate existing printed circuit boards in order to improve their reliability. It is highly recommended that Waters implements HALT under both the situations. A good thing for Waters is that they already have a HALT chamber in their reliability engineering department, which is an expensive piece of equipment and is a primary concern for companies looking to implement HALT. Thus the added operational cost of implementing HALT is almost negligible compared to the equipment cost. The benefits associated with implementing HALT outweighs the cost of implementing it. It is also highly recommended that Waters starts getting the failed PCBs back to their facility for doing the root cause analysis and identifying the common field failure modes. This will help them to better identify the relation between how the equipment reacts to specific failures of the PCB. Not only will it give them great insights into the behavior of their products but will also aid in successfully completing the cycle of HALT implementation. #### **Bibliography** - [1] C. M. Chang, "Developing Highly Accelerated Life Test (HALT) Method to Improve Product Robustness and Shorten Development Cycle," MIT, Cambridge, 2016. - [2] D. Chew, "Standard Operating Procedure for Highly Accelerated Life Testing (HALT): Design and Standardization of Fixture Setup for Pinted Circuit Boards," MIT, Cambridge, 2016. - [3] B. D. a. D. Nolan, Reliability Engineering Handbook, New York, 2002. - [4] J. S. a. K. Marais, "Reliability Engineering and System Safety," vol. Highlights from the Early History of Reliability Engineering., p. 91(2):249{256, 2006. - [5] P. O'Connor, "Series Foreword," in *Wiley Series in Quality and Reliability Engineering.*, Wiley, England, 1994, pp. xvii-xix. - [6] E. Suhir, "Accelerated Life Testing (ALT) in Microelectronics And Photonics," *Journal of Electronic Packaging*, vol. 124, pp. 281-292, 2002. - [7] U. S. D. o. Defense, MIL-STD-721C Definition of Terms for Reliability and Maintanabilty, Washington DC, 1981. - [8] ISO, "Quality Management and Quality Assurance ISO 8402:1994," ISO 8402:1994, Geneva, Switzerland, 1994. - [9] R. Fries, Reliable Design of Medical Devices., Boca Raton, Florida: CRC/Taylor & Francis, 2006. - [10] P. D. T. O'Connor, "Chapter 9: Electronic Systems Reliability," in *Practical Reliability Engineering*, fifth ed., John Wiley & Sons, Ltd., 2012, pp. 225-258. - [11] P. O. Connor, "Component Types and Failure Mechanisms," in *Practical Reliability Engineering*, John Wiley & Sons, Ltd, 2012, p. 237. - [12] H. W. McLean, HALT, HASS, AND HASA EXPLAINED, Milwankee: American Society for Quality, 2009. - [13] M. Silverman, "Summary of HALT and HASS Results at an Accelerated Reliability Test Center," in *Proceedings Annual Reliability and Maintainability Symposium*, 1998. - [14] H. W. McLean, "The Importance of High-Reliability Products at Market Introduction How and Why to do a HALT," in *HALT, HASS, and HASA EXPLAINED,* ASQ Quality Press, 2009, pp. 1-35. - [15] B. Aytekin, "Vibration analysis of pcbs and electronic components," The graduate school of natural and applied sciences of middle east technical university, 2008. - [16] Espect Corp, "Espec Technology Report No.17," 2004. - [17] A. D. B. H. M. O. D.Barker, "CALCE Pb-free Solder Interconnect Reliability Modeling Activities," University of Maryland, 2005. - [18] H. W. McLean, HALT, HASS and HASA explained Accelerated Reliability Techniques, Milwaukee, Wisconsin: ASQ Quality Press, 2009. - [19] H. W. McLean, "Comparisons of Products With and Without HALT," in *HALT, HASS, and HASA Explained: Accelerated Reliability Techniques*, Revisied ed., ASQ Quality Press, 2009, pp. 7-9. - [20] Nahum Meadowsong & Edmond L. Kyser, "Economic Justification of HALT Tests: The Relationship Between Operating Margin, Test Costs, and The Costs of Field Returns'," Proc. IEEE/CPMT Workshop on AST, 2002. - [21] P. D. T. O'Connor, "Chapter 3: Life Data Analysis and Probability Plotting," in *Practical Reliability Engineering*, 5th ed., John Wiley & Sons, Ltd, 2012, p. 71. - [22] [Online]. Available: http://www.keysight.com/en/pd-1756491-pn-34972A/lxi-data-acquisition-data-logger-switch-unit?cc=US&lc=eng.