Confidence Interval

<< Click to Display Table of Contents >>

Navigation:  Analysis Output > Statistical Analysis > Run Analysis > Monte Carlo Analysis >

Confidence Interval

Confidence intervals for the percentage of measurements that are out of spec, provide information about the relationship between the actual percent out of spec, for a given simulation and the theoretical percent associated with the exact distribution.

 

A 90% confidence interval, for example, specifies a range based on the actual percent out of spec within which the theoretical percent would be expected to fall 90% of the time. The width of the interval will increase for a higher confidence level.

 

There are two main uses for this confidence interval.

The confidence interval helps the user decide whether or not enough simulations have been run. If the confidence interval is too large for the particular application then it indicates that not enough simulations have been run. The size of the confidence interval will decrease as the number of simulations increases.

The confidence interval helps the user assess the validity of the curve fit. If the estimated percents out of spec based on the Pearson curve fit do not fall in the confidence intervals for the percents out of spec then the exact measurement distribution is probably not one of the Pearson types.

 

Definitions:  The following terms are used frequently in discussing items on the 3DCS Simulation report.  Several of these terms are sometimes loosely used, and can lead to confusion.  The following definitions are provided to clear up any ambiguity.

 

Population – A population of, say, manufactured items, is a theoretically infinite set of all possible outcomes consistent with all the inputs and assumptions that go into a model.  These inputs include tolerances, assembly operations, conditional logic, assembly sequence, fixture and process tolerances.  Also included are assumptions like rigidity of components, sigma level corresponding to the tolerance zone, etc.  With this set of assumptions, we are using simulation to generate a sufficiently large subset of all possibilities so that we can make inferences about what the results of the planned actions – design, process, assembly, etc – will be.  Ideally, we want to accurately estimate the location, spread and shape of the population distribution, using simulated data.  As seen below, we will use the sample distribution to learn about and make estimates of, the population distribution.

 

Parameter – A Parameter is a characteristic of the population. It is these characteristics that we are trying to estimate using the tool of simulation modeling.  While we can never know the exact value of a population parameter with certainty, because of its infinite size, we can make very accurate estimates. An example is the familiar ì (mean) or ó (standard deviation) in the expression for the normal distribution.

 

Sample – This can be one or more simulations from the theoretically infinite population.  We might say the 47th sample was a no-build or that I ran a sample of size 20,000. Both usages are widespread and the difference should be clear from the context.

 

Statistic – A Statistic is a mathematical function of sample data.  Statistics can be though of as determining a specific characteristic of the data. Familiar examples of statistics include X-bar (average), which is the sum of the measured values divided by the number of measurements, or S2 which is the sum of the squares of the differences of each value from X-bar, divided by the number of values minus 1.

 

Statistical Estimation – The process of using statistics, calculated from simulated samples, to estimate the parameters and characteristics of a population.

 

Estimator – An estimator is a statistic that is used to estimate some unknown parameter or characteristic of the population.  A particular realization of the estimator is called an estimate – we might say the mean of 1000 samples was 1.906.  The estimator here is the familiar X-bar, while the estimate in this particular case is 1.906.  Another set of 1000 samples might produce, using the same estimator, an estimate of 1.917.  This is an example of sampling variation.

 

 

Samples have statistics while populations have parameters.  We use estimates – statistics - to make inference about the population, based on the sample.

 

The 3DCS software simulates a sample of the population in order to make inferences about that population.  The exact parameters of the population (e.g. population mean or population standard deviation) are forever unknown.  However, these parameters can be estimated to a high degree of accuracy using a simulated subset (sample) from that population.  The sample statistics (e.g. sample mean, sample standard deviation) are used to estimate the parameters of the actual population.

 

In general, there are two types of estimators: point and interval.  

A point estimator is a single number that represents a population parameter, like population mean or population percent out of spec.  

An interval estimator is a pair of numbers, with an associated level of probability, which represents a numerical range within which a certain parameter is likely to fall.  As previously stated, the sample statistics, calculated from simulated assemblies, are used to estimate the parameters of the population. These latter types of estimates are called confidence intervals and are typically calculated for 90%, 95% or 99% probability levels.

 

Example:

The 3DCS simulated population mean was found to be 1.096.  How confident are we that the actual population has a mean value that is close to this number?  The actual population mean may vary anywhere between 2.113 and 1.794.  This is where interval estimates come in.  A confidence interval uses probability to provide a range within which the true value is likely to fall.  For this example, we might say that "there is a 95% probability that the actual parameter value will fall within the confidence interval range of 2.113 and 1.794".

 

Several things will be noted about confidence intervals:

1. As the sample size increases, a fixed percentage confidence interval becomes smaller. This result agrees with our intuition – more data gives us more confidence, and it becomes less and less likely that the true value will differ from our point estimate by any specific amount.

2. As we require a higher level of confidence, for a fixed sample size, we must accept a wider confidence interval. For example, in the above case, a 99% confidence interval might be something like 1.901 to 1.911, while a 50% confidence interval might be something like 1.904 to 1.906. Again this result agrees with our intuition, and shows that there is a tradeoff between being very confident of a certain interval, and the size of the interval.

3. Frequently, we want both a high level of confidence and a narrow interval. Using the above results, it is clear that we must make appropriate increases in the sample size.

 

 

In the particular case of the confidence interval for percent out of spec, 3DCS uses the following statistical results:

 

For large sample sizes – in the thousands – the sampling distribution of percent out of spec is normally distributed with

Mean = (samples out of spec / total samples)                (1)

and Standard Deviation = v (pq/n)                        (2)

where p equals the right hand side of (1), q = 1-p, and n = the sample size.

 

The 90%, 95% and 99% confidence intervals for percent out of spec will therefore be given by

Mean ± 1.645 v (pq/n),

Mean ± 1.960 v (pq/n),

and Mean ± 2.326 v (pq/n)

respectively. Because the convergence to normality may be very slow, it is recommended that sample sizes in the 20,000 range be run whenever possible. This assures accurate, as well as narrow confidence intervals.