About Us(CRI) 
This page explains what the confidence interval for the average is and how the customer can use it. The computations of confidence interval for the average and others described in this page are a part of the data analysis services we offer at CRI. Please click "Data Analysis" button above to see other types of data analysis we offer. 

Announcement


Estimations are free. For more information, please send a mail >here< 

Standard deviation and confidence interval for the average; How certain that average? We usually compute standard deviation as a first step to evaluate how certain average is. The standard deviation shows how scattered data are from their average and it becomes larger as the data scatter more. We then compute confidence interval of the average from this standard deviation. At this point, we have to specify how certain this interval is by means of the percentage that is called confidence level. For example, 95% confidence interval means that if we extract samples from the population and compute confidence interval for average of them each time repeatedly (sample average differs each time because samples are not the same), the average of the population would fall in these intervals for the 95% of the time. Thus, we say that the confidence of calculated confidence interval for the average is 95%. In this way we know how certain the average computed from sample as an estimate of true average is. We customary use confidence interval of 99, 95 or 90%. Standard deviation is also useful to know how many data exist within the specific interval from the average as the Tchebycheff's theorem dictates that there are more than 11/C^2 (C^2 means square of C, C must be greater than 1!) data within the interval of C times standard deviation from the average. The blue line in figure 1c shows how many data are included within the specific distance, which is measured by standard deviation in this figure, from the average when data are distibuted normally.@@In this case, there are 68.3% of data within the distance of one standard deviation and 95.5% of data within the distance of the twice of the standard deviation . Tchebycheff's theorem (red line) results smaller values but the good thing about this theorem is that it applicable to nonnormally distributed data. 

Now returning to our examples, the mean and the standard deviation of population A are 100.0 and 10.0 and of population B are 120.0 and 20.0, respectively. We extracted 500 data from each of these two populations. Figure 2 shows frequency distributions of these samples. Let us call them sample A (Fig. 2a) and sample B (Fig. 2b). The average and the standard deviation of sample A are 99.34 and 9.43 and of sample B are 119.76 and 19.66, respectively. The 95% confidence intervals for these sample averages are +/ 0.83 (or between 98.51 and 100.17) for the sample A and +/1.73 (or between 118.27 and 121.73) for the sample B. Here, we computed these intervals from sample standard deviations because it is quite unlikely that we know the standard deviation of the population when we try to estimate average of it in the real world unlike our examples. These results show that the averages of population A and B both fall within the confidence intervals for sample averages. The confidence interval for sample average B is wider than that for sample average A as you might have noticed. This is because the data of sample B tend to scatter more than those of sample A which resulted that the standard deviation of sample B is larger than that of sample A. This, in turn, resulted that the uncertainty of sample average B as an estimation of population average B is larger and, thus, confidence interval becomes wider. 

Now, let us see how certain the confidence level, "95%" of 95% confidence interval, actually is. We made 200 experiments, in each of which we extracted 500 data from population A, and then computed confidence interval for sample average each time. All the data in the population A were used once but never reused in these experiments (100000/500=200). The result of this trial was that the true average fell within the confidence interval for the sample average for 192 times. This is 96% of entire experiment and this experiment shows that the number 95 of 95% confidence interval itself has an uncertainty in the realistic situations. Thus, it usually does not make that much sense to compare confidence intervals of, say, 95% and 96%. How many samples do we need? The number of samples you extract from population also affects the confidence interval for the sample average. Figure 3 shows how the half width of the 95% confidence interval for sample average changes as the number of sample changes. The standard deviation is fixed and we computed for three cases. This figure shows that the confidence interval decreases rapidly as the number of sample increases initially. Decrease of confidence interval means you can get more accurate estimate of true average. However, this trend slows down as you extract more samples from population and, eventually, the confidence interval would not decrease that much any more. Your effort to obtain more data will not be rewarded that much at that point. Therefore, you have to have a strategy to know just how many samples you need to have because obtaining data usually costs some money. You have to have three numbers to know how many samples you need. First of all, you have to specify the confidence level. We customary choose 99, 95 or 90% as described before. Next, you have to determine how much possible difference between sample average and true average, the width of confidence interval, you can accept. At this point you basically have determined the certainty/accuracy of sample average you can have. Then, you need to have an estimate of standard deviation of population. You can estimate it based on past experiences or just give a rough estimate. From these three numbers you can draw a figure similar to figure 3 and estimate how many samples you need to accomplish your task. If you are not so sure about your estimate of standard deviation of population, you could repeat computations with the different value of standard deviation of population and make your decision based on those results along with your financial resource. Figure 3 indicates that 500 samples (the vertical black line) were enough for population A but we probably needed more samples for population B in our examples 

The value of my data is different from sample average. Is this difference significant? We separated our services into several categories for the sole purpose of introducing our services in an organized manner. To serve your needs we do combine our services in different categories. 
