Often as a reliability engineer, or anyone responsible for researching the reliability of an item, or calculating it, you will find oversimplified published data giving you the impression that reliability is an unchanging physical property like mass or volume, something intrinsic to the materials included in it. This is actually the common sense approach; we know an old thing is less reliable than a new copy of the same thing. But, this common sense gets argued out of us when we are faced with reconciling tables of MTBF (mean time between failures) values, nines (i.e. 0.99999, a measure of reliability), failure rates and other things. Let’s get back to the common sense approach, but with math.
So, what do we already know? We know that older machines are generally less reliable than newer machines. This phenomenon is generally called wearout (even though some things like electronics don’t wearout in the traditional sense). The possible exception to this, is that something brand new that has never been tested might be less reliable than one that has been operating for a little while. This phenomenon is called infant mortality. The message is that the reliability of an item changes over time.
So, when we say or read that an item has an MTBF of 8,000 hours, that must only refer to a specific period of time, whether it is a point value or an average value may not be obvious without further information.
If we use the Weibull failure distribution equation, we can represent these mechanisms with an equation that will also show us how the failure rate or MTBF changes over time for our given item. The equation above allows us to calculate the failure rate over time, given the values of the characteristic life (eta) and the shape factor (beta).
If we apply some reasonable numbers to that equation, what do our results look like? For a characteristic life of 9,500 hours, and a shape factor of 2.4, we will see the following curve with a steadily increasing failure rate.
Futher, we can plot the observed MTBF (the value one could calculate based on counting failed units to date and accumulated operating time) over the same time scale to produce this curves that shows the MTBF declining from near infinite at the start to a value of around 3500 hours near the end.
So, what does it mean when we see a constant value for failure rate or MTBF printed somewhere. Essentially, the publisher made an assumption that the reliability is a constant over a certain period of time. To use that information, however, we need to know what period of time they were referring to, otherwise we risk greatly underestimating or overestimating the reliability of our system.