Reliability engineering


Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended function adequately for a specified period of time; or will operate in a defined environment without failure. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.
The reliability function is theoretically defined as the probability of success. In practice, it is calculated using different techniques, and its value ranges between 0 and 1, where 0 indicates no probability of success while 1 indicates definite success. This probability is estimated from detailed analysis, previous data sets, or through reliability testing and reliability modeling. Availability, testability, maintainability, and maintenance are often defined as a part of "reliability engineering" in reliability programs. Reliability often plays a key role in the cost-effectiveness of systems.
Reliability engineering deals with the prediction, prevention, and management of high levels of "lifetime" engineering uncertainty and risks of failure. Although stochastic parameters define and affect reliability, reliability is not only achieved by mathematics and statistics. "Nearly all teaching and literature on the subject emphasize these aspects and ignore the reality that the ranges of uncertainty involved largely invalidate quantitative methods for prediction and measurement." For example, it is easy to represent "probability of failure" as a symbol or value in an equation, but it is almost impossible to predict its true magnitude in practice, which is massively multivariate, so having the equation for reliability does not begin to equal having an accurate predictive measurement of reliability.
Reliability engineering relates closely to quality engineering, safety engineering, and system safety, in that they use common methods for their analysis and may require input from each other. It can be said that a system must be reliably safe.
Reliability engineering focuses on the costs of failure caused by system downtime, cost of spares, repair equipment, personnel, and cost of warranty claims.

History

The word reliability can be traced back to 1816 and is first attested to the poet Samuel Taylor Coleridge. Before World War II the term was linked mostly to repeatability; a test was considered "reliable" if the same results would be obtained repeatedly. In the 1920s, product improvement through the use of statistical process control was promoted by Dr. Walter A. Shewhart at Bell Labs, around the time that Waloddi Weibull was working on statistical models for fatigue. The development of reliability engineering was here on a parallel path with quality. The modern use of the word reliability was defined by the U.S. military in the 1940s, characterizing a product that would operate when expected and for a specified period.
In World War II, many reliability issues were due to the inherent unreliability of electronic equipment available at the time, and to fatigue issues. In 1945, M.A. Miner published a seminal paper titled "Cumulative Damage in Fatigue" in an ASME journal. A main application for reliability engineering in the military was for the vacuum tube as used in radar systems and other electronics, for which reliability proved to be very problematic and costly. The IEEE formed the Reliability Society in 1948. In 1950, the United States Department of Defense formed a group called the "Advisory Group on the Reliability of Electronic Equipment" to investigate reliability methods for military equipment. This group recommended three main ways of working:
  • Improve component reliability.
  • Establish quality and reliability requirements for suppliers.
  • Collect field data and find root causes of failures.
In the 1960s, more emphasis was given to reliability testing on component and system levels. The famous military standard MIL-STD-781 was created at that time. Around this period also the much-used predecessor to military handbook 217 was published by RCA and was used for the prediction of failure rates of electronic components. The emphasis on component reliability and empirical research alone slowly decreased. More pragmatic approaches, as used in the consumer industries, were being used. In the 1980s, televisions were increasingly made up of solid-state semiconductors. Automobiles rapidly increased their use of semiconductors with a variety of microcomputers under the hood and in the dash. Large air conditioning systems developed electronic controllers, as did microwave ovens and a variety of other appliances. Communications systems began to adopt electronics to replace older mechanical switching systems. Bellcore issued the first consumer prediction methodology for telecommunications, and SAE developed a similar document SAE870050 for automotive applications. The nature of predictions evolved during the decade, and it became apparent that die complexity wasn't the only factor that determined failure rates for integrated circuits.
Kam Wong published a paper questioning the bathtub curve—see also reliability-centered maintenance. During this decade, the failure rate of many components dropped by a factor of 10. Software became important to the reliability of systems. By the 1990s, the pace of IC development was picking up. Wider use of stand-alone microcomputers was common, and the PC market helped keep IC densities following Moore's law and doubling about every 18 months. Reliability engineering was now changing as it moved towards understanding the physics of failure. Failure rates for components kept dropping, but system-level issues became more prominent. Systems thinking has become more and more important. For software, the CMM model was developed, which gave a more qualitative approach to reliability. ISO 9000 added reliability measures as part of the design and development portion of certification.
The expansion of the World Wide Web created new challenges of security and trust. The older problem of too little reliable information available had now been replaced by too much information of questionable value. Consumer reliability problems could now be discussed online in real-time using data. New technologies such as micro-electromechanical systems, handheld GPS, and hand-held devices that combine cell phones and computers all represent challenges to maintaining reliability. Product development time continued to shorten through this
decade and what had been done in three years was being done in 18 months. This meant that reliability tools and tasks had to be more closely tied to the development process itself. In many ways, reliability has become part of everyday life and consumer expectations.

Overview

Reliability is the probability of a product performing its intended function under specified operating conditions in a manner that meets or exceeds customer expectations.

Objective

The objectives of reliability engineering, in decreasing order of priority, are:
  1. To apply engineering knowledge and specialist techniques to prevent or to reduce the likelihood or frequency of failures.
  2. To identify and correct the causes of failures that do occur despite the efforts to prevent them.
  3. To determine ways of coping with failures that do occur, if their causes have not been corrected.
  4. To apply methods for estimating the likely reliability of new designs, and for analysing reliability data.
The reason for the priority emphasis is that it is by far the most effective way of working, in terms of minimizing costs and generating reliable products. The primary skills that are required, therefore, are the ability to understand and anticipate the possible causes of failures, and knowledge of how to prevent them. It is also necessary to know the methods that can be used for analyzing designs and data.

Scope and techniques

Reliability engineering for "complex systems" requires a different, more elaborate systems approach than for non-complex systems. Reliability engineering may in that case involve:
  • System availability and mission readiness analysis and related reliability and maintenance requirement allocation
  • Functional system failure analysis and derived requirements specification
  • Inherent design reliability analysis and derived requirements specification for both hardware and software design
  • System diagnostics design
  • Fault tolerant systems
  • Predictive and preventive maintenance
  • Human factors / human interaction / human errors
  • Manufacturing- and assembly-induced failures
  • Maintenance-induced failures
  • Transport-induced failures
  • Storage-induced failures
  • Use studies, component stress analysis, and derived requirements specification
  • Software failures
  • Failure / reliability testing
  • Field failure monitoring and corrective actions
  • Spare parts stocking
  • Technical documentation, caution and warning analysis
  • Data and information acquisition/organisation
  • Chaos engineering
Effective reliability engineering requires understanding of the basics of failure mechanisms for which experience, broad engineering skills and good knowledge from many different special fields of engineering are required, for example:
Reliability may be defined in the following ways:
  • The idea that an item is fit for a purpose
  • The capacity of a designed, produced, or maintained item to perform as required
  • The capacity of a population of designed, produced or maintained items to perform as required
  • The resistance to failure of an item
  • The probability of an item to perform a required function under stated conditions
  • The durability of an object