The Myth of the Random Error.

As presented at the 1998
National Conference of Standards Laboratories
Workshop & Symposium

Paper Author: Dr. Henrik S. Nielsen

Abstract

Man views phenomena as random when he does not understand the underlying mechanisms. The emphasis on statistical tools for uncertainty estimation and lack of knowledge of physics drives our focus towards the apparent random properties of errors.

The paper demonstrates the arbitrary nature of the distinction between systematic and random error. It proposes that there is no reason to believe that any error is random. Finally it concludes that a thorough analysis of the mechanisms that govern variations in measurements integrated into the GUM1 method can yield not only an estimate of the uncertainty, but can also help improve it.

Introduction

Traditionally we have divided errors into systematic and random components. Anything we could explain, such as a temperature influence, as well as errors that followed a certain pattern and looked systematic were characterized as systematic errors. Anything else was considered random errors.

This allowed us to use statistical tools to predict certain aspects of the behavior of the random component of the error. We could find the standard deviation to describe the magnitude of the error and we could perform F-tests or t-tests to convince ourselves that the error was indeed random.

The fact we ignored, but which was there all along, was that the harder we looked at a measuring process and the more resources we put into understanding it, the more errors started appearing systematic to us.

In this paper we will look at the errors found in one measuring process and show how they can be interpreted using different tools. We will see that the only logical explanation is that all errors are systematic, they only appear random when we have limited information or if our sampling is not dense enough.

The repeatability study

The measuring process we are considering is that of measuring a two-point size. Table 1 gives the value of the observed deviation from nominal size in microns for 60 individual measurements.

Observation

  1 2 3 4 5 6 7 8 9 10
0 10 10 10 9 9 10 11 12 12 11
10 10 10 11 12 12 12 11 10 11 11
20 12 12 11 10 10 10 11 11 11 10
30 9 9 9 10 10 10 8 8 8 8
40 9 9 8 7 7 7 8 9 9 8
50 7 7 8 9 9 9 8 7 8 8

Table 1: Observed deviation from nominal size in micrometers.

There are different techniques that can be used to find the standard deviation of the sample. The traditional Gage Repeatability and Reproducibility (GR&R) study out of the Measurement Systems Analysis Reference Manual2, for example uses the ranges of each subset of the observations to derive it. Had the 60 observations represented 2 measurements of each of 10 parts by 3 different observers, the 6 "repeatability" or "instrument error" would have been assessed to be 7.8 Ám and the "reproducibility" or "observer error" would have been assessed to be 8.1 Ám, yielding a total GR&R of 11.3 Ám. This is the value used by the automotive industry as a measure for how capable a measuring process is.

Note that the true value never enters into this analysis, we are purely using the instrument's ability to yield consistent values as a measure of goodness.

If our analysis was a little more sophisticated and we used calibrated parts for our experiment, then we can look at the measurement error by subtracting the calibrated size, X from the observed value Y to get the error E.

E=Y-X

If the values we saw in table 1 represented the measurement error - the deviation from the calibrated value - then we can find the average value of the error to be 9.5 Ám, which is what we traditionally would have called the systematic error.

Based on this analysis we now have a systematic error (9.5 Ám) and a random error (11.3 Ám). We would like an overall measure for how wrong our measurement can be. One technique for doing this is to use the formula:

W=B+3

Where W is "how wrong we can be" (a measure conceptually equivalent to uncertainty), B is the systematic error or bias and 3 is one half of the GR&R value.

This gives us W= 15.2 Ám as a measure for how wrong we can be.

Stability

If we look at the same measuring process over a little longer timescale to evaluate the drift or stability of the process, we may use a series of observations over a 24-hour period. These observations may look as follows:

rm01.gif (3726 bytes)

Figure 1: Observed values over a 24 hour period.

To analyze this data, we use engineering judgement to interpret what the data means. A typical way of doing this is to draw some arbitrary smooth curve through the data and decide that this line represents the systematic error. The deviation between each data point and the curve then becomes the random error.

The curve in figure 1 is such a curve. It is fitted by making the judgement that there is something inexplicable wrong with the 3rd and 6th data points and then fitting a sine curve through the rest of the data points.

We can then take the difference between the curve and each data point to find what we consider the random error in this model. This is shown in figure 2. If we again disregard the 3rd and 6th data points, then we find a standard deviation of 1.24 Ám for the rest of the population. If we again focus on the 6 value, we find it to be 7.44 Ám.

So all in all we have a systematic error that varies roughly between +11 Ám and -9 Ám, a random error of 7.44 Ám and two inexplicable events that do not fit the model. We usually refer to these as fliers or outliers. Adding 3 of the random error (one side of the distribution) to the worst case systematic error to give us a 99.97% worst case uncertainty (disregarding the fliers and assuming that the rest of the observations represent a normal distribution), we get an uncertainty value of 14.72 Ám.

rm02.gif (3274 bytes)

Figure 2: The random error, interpreted as the difference between each individual observation and the fitted line representing the systematic error.

What is important to notice is that although we have put a lot of numbers around our model and made a lot of analysis, we still do not even begin to understand the cause of the variation we are seeing. We do not know if the variation we are seeing is representative for what the measuring system will do over time, nor do we know if it is liable to change and if so, what will make it change. What we have come up with is a pseudo explanation of our measurement system.

The GUM approach

The GUM approach1,3 is more analytical than either of the previous approaches. It starts with an analysis of the variations in the influence factors that are the root causes of the variation in the measuring system and propagates those variations through the laws of physics into variations that can be observed in the values measured by the measuring system.

Assume we are measuring the diameter of parts coming off a production machine. The parts come off the machine at a more or less constant temperature, since they are immersed in cutting fluid. The production machine runs 24 hours per day. We measure the parts on a gage, which is sitting off to the side of the machine by a west facing window. If we want to use the GUM approach to estimate the uncertainty of that measuring process, we have to start by identifying the factors which may cause variation in the results of the measurements.

We identify these factors as the following:

Gage temperature

The temperature is the limiting factor in most dimensional measurements. In this case we expect to see the highest temperature in the gage during the afternoon/early evening as the sun is on the gage and the overall temperature in the shop in rising. When the temperature of the gage is high and the temperature of the workpieces is constant, the gage will see the workpieces as being smaller than they really are.

If the temperature of the cutting fluid that determines the temperature of the workpieces is equal to the average temperature of the gage, then we can model the influence caused by the variation in gage temperature over a 24-hour period as a sinewave.

Figure 3 shows that difference "translated" into microns using the laws of thermal expansion. The amplitude of the sinewave is 10 Ám. This corresponds to a temperature variation of about +/- 5 oC if the part diameter is 200 mm.

rm03.gif (4256 bytes)

Figure 3: Gage temperature influence.

Part Temperature

The part temperature is to a large extent governed by the temperature of the cutting fluid. In this particular situation, the fluid comes from a central reservoir in the plant and is shared by a number of machines, not all of which run 24 hours per day. Since the volume of fluid is large, the temperature of the fluid changes only slowly and a sinewave is once again a good model for the variation.

Figure 4 shows the variation in cutting fluid temperature over a 24-hour period "translated" into microns using the laws of thermal expansion. The amplitude of the sinewave is 2 Ám. This corresponds to a temperature variation of about +/- 1 oC if the part diameter is 200 mm.

rm04.gif (5122 bytes)

Figure 4: Part temperature influence, the part temperature is governed by the temperature of the cutting fluid.

The Operator

The operator influence is in many cases the hardest one to quantify. It is also the one which is hardest to be honest to ourselves about. If we are studying a measuring system and we encounter a bad reading, we want to disregard it, rationalizing that it will not happen to a well-trained operator in the real measuring situation.

This is of course a self-deception. Misreading of gages happen at least as often in production measurements as they do during gage studies, we just do not know about them.

If we try to model the operator influence, including the little variations he causes by the way he puts the parts in the gage and by an occasional misreading of the gage, the resulting influence may be as shown in figure 5. The variations are modeled as a sinewave. This should not be taken to imply that this is a typical shape for this kind of variation. Rather, it is rarely a function that can be described by a simple equation, but for purposes of illustrating the magnitude of this influence relative to the other ones the simple sinewave model is used.

An off-set is included in the influence, modeling a slight difference in the way the operator uses the gage, from the way the laboratory technician, who sets up and calibrates the gage, uses it.

Finally one bad reading is included in the operator influence.

Figure 5 shows the operator influence over a 24-hour period. The influence is shown continuous, indicating: "If the operator was measuring at this time, this would be his influence." Although the operator is not measuring continuously, it is easy to envision that the operator off-set will have a finite value at any given time during the 24-hour period.

rm05.gif (5343 bytes)

Figure 5: Operator influence, contains the combined effect of a +/- 1 Ám variation, an off-set of 0.1 Ám and a scale misreading of 8 Ám.

Digitization/Resolution

The effect of limited resolution, be it in the form of a digital display or the ability of the operator to resolve the scale, is interrelated with the operator influence as well as the other effects. In this example a resolution of 1 Ám is used. The influence of the resolution is calculated by adding up all the other influences and rounding it off to the closest micron.

As for the operator influence, the effect of the limited resolution only comes into play when a measurement takes place, but it can be envisioned that the rounding error will have a value at any given time, if a measurement took place at that time.

Figure 6 shows the value of the digitization/resolution influence over time.

rm06.gif (9295 bytes)

Figure 6: Rounding error. Since the resolution is 1 Ám, the rounding error varies between +/- 0.5 Ám

Combining the influences.

We get the total error by superimposing all the influences on one another. The result of this is shown in figure 7. In the normal situation, we do not know what the error is. Otherwise it would be easy to correct for it. Instead we use uncertainty statements to characterize the nature and the magnitude of the error.

rm07.gif (5333 bytes)

Figure 7: The total error consists of all the individual error components superimposed on each other.

If we apply the GUM uncertainty estimation method in this case, using the format recommended in ISO/TR 14253-23 we can sum up our analysis in table 2.

Contributor Evaluation Type Distribution Type Number of Measurements Variation Limit, a [Ám] Variation Limit, a [Influence Unit] Correlation Coefficient Distribution Factor Uncertainty Component
Gage Temperature B U-shaped   10 Ám 5oC 0 0.7 7 Ám
Part Temperature B U-shaped   2 Ám 1oC 0 0.7 1.4 Ám
Operator Influence B U-shaped   1.1 Ám 1.1 Ám 0 0.7 0.77 Ám
Digitization/Rounding B Step   1Ám 1Ám 0 0.3 0.3 Ám
Combined Uncertainty(square root of the sum of the squares of the uncertainty components) 7.2 Ám
Expanded Uncertainty (the Combined Uncertainty multiplied by k=2) 14.4 Ám

Table 2: GUM uncertainty budget summarizing the analysis of the measuring process.

The GUM analysis results in an Expanded Uncertainty of 14.4 Ám. It disregards the outlier and takes the slightly conservative approach, modeling the operator influence as a 1.1 Ám variation rather than a 0.1 Ám offset and a 1 Ám variation.

Comparing the approaches

Having evaluated the same measuring situation several different ways, we can now compare the different approaches.

First it is important to understand that we have been using the same data in all the estimations.

Figure 8 shows how the data for the repeatability study and the stability evaluation were taken from the total error function that was generated based on the influences discussed in the GUM analysis.

rm08.gif (4692 bytes)

Figure 8: The data for the GR&R study and the stability study, taken from the Total Error for the measuring process.

When we see this, we can make several observations, that illustrate how inadequate both the GR&R study and the 24-hour stability study are in terms of their ability to properly analyze a measuring process.

The results of the studies are given in table 3.

Error/Uncertainty
  Random Systematic Total
GR&R Instrument Operator   11.3 Ám
7.8 Ám 8.1 Ám
GR&R + Bias 11.3 Ám 9.5 Ám 15.2 Ám
Stability 7.4 Ám 11 Ám 14.7 Ám
GUM   14.4 Ám 14.4 Ám

Table 3: Components of Error/Uncertainty as evaluated by the different methods. GR&R, GR&R+Bias and the Stability study all evaluate errors, whereas the GUM method evaluates uncertainty.

The range of values in the underlying data set is -12 Ám to +12 Ám, except the outlier which is 18 Ám. There are no values outside this range. This underlines the fact that none of the underlying effects follow a normal distribution. If they did, the distribution would be unlimited. In practice, as in this example, we never see unlimited distributions, where the more data points we consider, the wider the range. We always see the range grow to a finite size that is governed by the underlying effects.

We see that the random errors we found in the GR&R, GR&R+Bias and the Stability studies are but a myth. All the variation is due to underlying systematic effects. The only random occurrence is the outlier, where the operator misread the gage.

As stated above, the operator influence is somewhat unrealistically modeled, but even when the usual random appearance is modeled more faithfully, it is still clear that the only thing that may be random is the variation of the operator's actions. The response of the measuring system and the workpiece to the operator's actions is fully systematic, e.g., the higher the measuring force, the larger the elastic deformation of the workpiece.

It is only as long as we do not understand these underlying systematic effects that the variation appears random to us. As soon as we understand the underlying effects, the random semblance disappears.

Discussion of the GR&R study

Taken in its purest form, the GR&R study tells us only how much the measuring results vary during the short period of time the study takes. The 11.3 Ám repeatability is not related to the level or nature of the errors that we are trying to analyze. The GR&R study does not know the full extent of the variations we see. It is based only on data varying between +7 Ám and +12 Ám - a range of 5 Ám.

Even when we enhance the GR&R study and investigate the bias of the measurement, we can only see what the process is doing at the particular time when we do our study.

We can also see that terms like operator error and instrument error become meaningless, when the major error source is coming from outside the measurement process as in this case, where the ambient temperature is the main factor.

Looking at figure 8 we find that we would have gotten different results at different times during the day. If we had carried out our study for example from 11:00 to 15:00 instead of from 7:00 to 11:00, we would have seen a higher GR&R value, but a lower bias, so even the relationship between the two is not fixed.

The largest problem, however, is that the GR&R study does not help us understand the measuring process. It only provides us with a couple of numbers to characterize the process. Neither the Repeatability (Operator Error), Reproducibility (Instrument Error) nor the Bias tells us what their names are suggesting. Furthermore, they do not help us understand the measuring process in such a way that we can improve the uncertainty of it or find lower cost ways of achieving the same uncertainty.

Discussion of the stability study

The stability study is doing better than the GR&R study in the example, because all the influences happen to go through their full cycle within the 24-hour period of the study. There is no guarantee that this will be the case, when we set out to do the study, so we are not guaranteed this benefit.

The study arbitrarily concludes that there is something wrong with two of the measuring points. As we see when we look at the full data set, one of the points is indeed an outlier, but the other is very much part of the underlying distribution.

The study uses a quite arbitrary distinction between what is considered systematic and what is considered random. While more completely describing the variations than the GR&R study in this case, it still has the fundamental shortcoming, that it is unable to tell us how to improve the measuring process.

Discussion of the GUM method

A strength and a weakness of the GUM method are that it is based as much on theoretical analysis as it is on actual measurements.

It is a strength, because it allows us to include influences in our analysis, which are hard or cost prohibitive to determine experimentally, such as seasonal variations.

It is a weakness, because it requires the person who is doing the analysis to successfully identify at least all the major contributors to the uncertainty of a measuring process. If a contributor is not identified, then its influence is not reflected in the uncertainty budget and if it is a major contributor, it may invalidate the whole uncertainty budget. If, for example, in the study conducted above, we overlooked the influence from the gage temperature, we would find an expanded uncertainty of 3.2 Ám, rather than the 14.4 Ám we find when we include this influence. If, on the other hand, it was the operator influence we overlooked, then we would find an expanded uncertainty of 14.3 Ám, because this influence is so much smaller than the dominating influence.

The major strength of the GUM method is, however, that it provides an understanding of the contributors, that cause the uncertainty of the measurement process and their relative magnitude. None of the other studies does that.

We find in our example that if we want to improve the measuring uncertainty, we either have to improve the control of the gage temperature, or we have to measure it and correct for it, since it is by far the largest contributor. Changing any of the other contributors will not have any appreciable effect until the gage temperature is under tighter control.

A limitation of the method is that it is based on assumed distributions. The estimate of the measuring uncertainty cannot be any better than these assumptions. Fortunately, the error we commit when assuming the wrong distribution, generally changes the effect of the contributor no more than 15 - 20 % and it is possible to always err on the safe side, by always choosing the most conservative of the two considered distributions, when in doubt. As we see, it is only the largest contributor for which it is critical to assume the correct distribution, since a 15 - 20 % change in the influence of any other contributor would be negligible.

An interesting observation about the GUM method is, that the generally accepted value for the coverage factor k=2, which is designed to provide a confidence level of no less than 95 %, actually covers more than 100 % of the deviations in cases where:

  1. The influences of the contributors have been correctly estimated.
  2. The distribution of the major contributor is rectangular or U-shaped.

I suggest that these are the majority of at least dimensional metrology cases, where temperature is the major contributor, which means that in these cases the GUM method overestimates the uncertainty.

Conclusion

Using an example, which I propose is fairly typical for dimensional measuring processes, I show how different approaches to evaluating the goodness of the measuring process yield very different results and leads the examiner to draw very different conclusions about the measuring process.

I show that when a number of systematic effects are superimposed on one another and evaluated using statistical tools, the resulting variations can easily be interpreted as displaying random characteristics, passing various tests for normal distributions, when observed for a limited time.

Finally I show that when the underlying mechanisms causing the variation are understood, the apparently random variation loses its random appearance and can be shown to be systematic and governed by the laws of physics.

Thus I conclude that, unless our measurement approach the atomic level of resolution, where additional effects come into play, which are beyond the scope of this paper, the variation we observe is systematic in nature and the concept of a random error is but a myth, fueled by our inadequate analysis of the root causes of the variations we see.

References

1. Guide to the Expression of Uncertainty In Measurement. BIPM, IEC, IFCC, ISO, IUPAC,IUPAP, OIML., 1995

2. Measurement System Analysis Reference Manual, the Automotive Industry Action Group, 1995

3. ISO/TR 14253-2:1997 Geometrical Product Specifications (GPS) - Inspection by measurement of workpieces and measuring equipment - Part 2: Guide to the estimation of uncertainty in GPS measurement, in calibration of measuring equipment and in product verification.

hn.gif (1361 bytes)

Return to Papers

Return to Home Page