The terms "standard error" and "standard deviation" are oftenconfused. The contrast between these two terms reflects theimportant distinction between data description and inference,one that all researchers should appreciate. The standard deviation (often SD) is a measure of variability.When we calculate the standard deviation of a sample, we areusing it as an estimate of the variability of the populationfrom which the
sample was drawn. For data with a normal distribution,about 95% of
individuals will have values within 2 standarddeviations of the mean, the other 5% being equally scatteredabove and below these limits. Contrary to popular misconception,the standard deviation is a valid measure of variability regardlessof the distribution. About 95% of observations of any distributionusually fall within the 2 standard deviation limits, thoughthose outside may all be at one end. We may choose a differentsummary statistic, however, when data have a skewed distribution.
When we calculate the sample mean we are usually interestednot in the mean of this particular sample, but in the mean forindividuals of this type—in statistical terms, of thepopulation from which the sample comes. We usually collect datain order to generalize from them and so use the sample meanas an estimate of the mean for the whole population. Now thesample mean will vary from sample to sample; the way this variationoccurs is described by the "sampling distribution" of the mean.We can estimate how much sample means will vary from the standarddeviation of this sampling distribution, which we call the standarderror (SE) of the
estimate of the mean. As the standard erroris a type of standard deviation, confusion is understandable.Another way of considering the standard error is as a measureof the precision of the sample mean. The standard error of the sample mean depends on both the standarddeviation and the sample size, by the simple relation SE = SD/√ (samplesize). The standard error falls as the sample size increases,as the extent of chance variation is reduced—this ideaunderlies the sample size calculation for a controlled trial,for example. By contrast the standard deviation will not tendto change as we increase the size of our sample. So, if we want to say how widely scattered some measurementsare, we use the standard deviation. If we want to indicate theuncertainty around the estimate of the mean measurement, wequote the standard error of the mean. The standard error ismost useful as a means of calculating a confidence interval.For a large sample, a 95% confidence interval is obtained asthe values 1.96xSE either side of the mean. We will discussconfidence intervals in more detail in a subsequent StatisticsNote. The standard error is also used to calculate P valuesin many circumstances. The principle of a sampling distribution applies to other quantitiesthat we may estimate from a sample, such as a proportion orregression coefficient, and to contrasts between two samples,such as a risk ratio or the difference between two means orproportions. All such quantities have uncertainty due to samplingvariation, and for all such estimates a standard error can becalculated to indicate the degree of uncertainty. In many publications a ± sign is used to join the standarddeviation (SD) or standard error (SE) to an observed mean—for example, 69.4±9.3 kg. That notation gives no indicationwhether the second
figure is the standard deviation or the standarderror (or indeed something else). A review of 88 articles publishedin 2002 found that 12 (14%) failed to identify which measureof dispersion was reported (and three failed to report any measureof variability).The policy of the BMJ and many other journalsis to remove ± signs and request authors to indicateclearly whether the standard deviation or standard error isbeing quoted. All journals should
follow this practice.
The Standard Error of a Proportion
Sometimes, it's easier to do the algebra than wave hands. It has already
been argued that a proportion is the mean of a variable that is 1 when the
individual has a characteristic and 0 otherwise. The standard deviation of any variable involves the expression
Diagram 1. https://i.servimg.com/u/f40/11/10/02/04/110.jpg
Let's suppose there are m 1s (and n-m 0s) among the n subjects. Then, Diagram 1-a https://i.servimg.com/u/f40/11/10/02/04/1-a10.jpg and Diagram 1-b https://i.servimg.com/u/f40/11/10/02/04/1-b10.jpg is equal to (1-m/n) for m observations and 0-m/n for (n-m) observations. When these results are combined, the final result is
Diagram 2. https://i.servimg.com/u/f40/11/10/02/04/210.jpg
and the sample variance (square of the SD) of the 0/1 observations is
Diagram 3 https://i.servimg.com/u/f40/11/10/02/04/310.jpg
The sample proportion is the mean of n of these observations, so the
standard error of the proportion is calculated like the standard error of the
mean, that is, the SD of one of them divided by the square root of the sample size or
Diagram 4. https://i.servimg.com/u/f40/11/10/02/04/410.jpg
Computing Standard Deviations for Proportions
You already learned about the standard error for the sampling distribution
of means,
s.emean = Diagram 5. https://i.servimg.com/u/f40/11/10/02/04/510.jpg
My lecture notes for yesterday gave the formula for computing the standard error for proportions, which is simply a mean computed for data scored 1 (for p) or 0 (for q). It so happens that the variance for data in proportions is simply Variance = pq
So the standard deviation = Diagram 6 https://i.servimg.com/u/f40/11/10/02/04/610.jpg
In case you don't believe this, here is a computed example for these data
inspired by the CBS/New York Times poll reported on October 29, 2001.
Sixty-one percent think the war in Afghanistan would be worth it even if it meant several thousand American troops would lose their lives; 27 percent say the war there would not be worth that cost. Let's round off the 61% to 60% for easier computation and consider only a sub-sample of ten cases:
We can compute the s.e. of
the proportion for the CBS/New York Times poll of 1,024 respondents, using yesterday's formula:
Diagram 7 https://i.servimg.com/u/f40/11/10/02/04/710.jpg
sample was drawn. For data with a normal distribution,about 95% of
individuals will have values within 2 standarddeviations of the mean, the other 5% being equally scatteredabove and below these limits. Contrary to popular misconception,the standard deviation is a valid measure of variability regardlessof the distribution. About 95% of observations of any distributionusually fall within the 2 standard deviation limits, thoughthose outside may all be at one end. We may choose a differentsummary statistic, however, when data have a skewed distribution.
When we calculate the sample mean we are usually interestednot in the mean of this particular sample, but in the mean forindividuals of this type—in statistical terms, of thepopulation from which the sample comes. We usually collect datain order to generalize from them and so use the sample meanas an estimate of the mean for the whole population. Now thesample mean will vary from sample to sample; the way this variationoccurs is described by the "sampling distribution" of the mean.We can estimate how much sample means will vary from the standarddeviation of this sampling distribution, which we call the standarderror (SE) of the
estimate of the mean. As the standard erroris a type of standard deviation, confusion is understandable.Another way of considering the standard error is as a measureof the precision of the sample mean. The standard error of the sample mean depends on both the standarddeviation and the sample size, by the simple relation SE = SD/√ (samplesize). The standard error falls as the sample size increases,as the extent of chance variation is reduced—this ideaunderlies the sample size calculation for a controlled trial,for example. By contrast the standard deviation will not tendto change as we increase the size of our sample. So, if we want to say how widely scattered some measurementsare, we use the standard deviation. If we want to indicate theuncertainty around the estimate of the mean measurement, wequote the standard error of the mean. The standard error ismost useful as a means of calculating a confidence interval.For a large sample, a 95% confidence interval is obtained asthe values 1.96xSE either side of the mean. We will discussconfidence intervals in more detail in a subsequent StatisticsNote. The standard error is also used to calculate P valuesin many circumstances. The principle of a sampling distribution applies to other quantitiesthat we may estimate from a sample, such as a proportion orregression coefficient, and to contrasts between two samples,such as a risk ratio or the difference between two means orproportions. All such quantities have uncertainty due to samplingvariation, and for all such estimates a standard error can becalculated to indicate the degree of uncertainty. In many publications a ± sign is used to join the standarddeviation (SD) or standard error (SE) to an observed mean—for example, 69.4±9.3 kg. That notation gives no indicationwhether the second
figure is the standard deviation or the standarderror (or indeed something else). A review of 88 articles publishedin 2002 found that 12 (14%) failed to identify which measureof dispersion was reported (and three failed to report any measureof variability).The policy of the BMJ and many other journalsis to remove ± signs and request authors to indicateclearly whether the standard deviation or standard error isbeing quoted. All journals should
follow this practice.
The Standard Error of a Proportion
Sometimes, it's easier to do the algebra than wave hands. It has already
been argued that a proportion is the mean of a variable that is 1 when the
individual has a characteristic and 0 otherwise. The standard deviation of any variable involves the expression
Diagram 1. https://i.servimg.com/u/f40/11/10/02/04/110.jpg
Let's suppose there are m 1s (and n-m 0s) among the n subjects. Then, Diagram 1-a https://i.servimg.com/u/f40/11/10/02/04/1-a10.jpg and Diagram 1-b https://i.servimg.com/u/f40/11/10/02/04/1-b10.jpg is equal to (1-m/n) for m observations and 0-m/n for (n-m) observations. When these results are combined, the final result is
Diagram 2. https://i.servimg.com/u/f40/11/10/02/04/210.jpg
and the sample variance (square of the SD) of the 0/1 observations is
Diagram 3 https://i.servimg.com/u/f40/11/10/02/04/310.jpg
The sample proportion is the mean of n of these observations, so the
standard error of the proportion is calculated like the standard error of the
mean, that is, the SD of one of them divided by the square root of the sample size or
Diagram 4. https://i.servimg.com/u/f40/11/10/02/04/410.jpg
Computing Standard Deviations for Proportions
You already learned about the standard error for the sampling distribution
of means,
s.emean = Diagram 5. https://i.servimg.com/u/f40/11/10/02/04/510.jpg
My lecture notes for yesterday gave the formula for computing the standard error for proportions, which is simply a mean computed for data scored 1 (for p) or 0 (for q). It so happens that the variance for data in proportions is simply Variance = pq
So the standard deviation = Diagram 6 https://i.servimg.com/u/f40/11/10/02/04/610.jpg
In case you don't believe this, here is a computed example for these data
inspired by the CBS/New York Times poll reported on October 29, 2001.
Sixty-one percent think the war in Afghanistan would be worth it even if it meant several thousand American troops would lose their lives; 27 percent say the war there would not be worth that cost. Let's round off the 61% to 60% for easier computation and consider only a sub-sample of ten cases:
Case | Worth It? | Score (X) | Mean | (X-mean) | (X-mean)2 |
1 | yes | 1 | 0.6 | 0.4 | 0.16 |
2 | no | 0 | 0.6 | -0.6 | 0.36 |
3 | no | 0 | 0.6 | -0.6 | 0.36 |
4 | yes | 1 | 0.6 | 0.4 | 0.16 |
5 | yes | 1 | 0.6 | 0.4 | 0.16 |
6 | yes | 1 | 0.6 | 0.4 | 0.16 |
7 | yes | 1 | 0.6 | 0.4 | 0.16 |
8 | no | 0 | 0.6 | -0.6 | 0.36 |
9 | yes | 1 | 0.6 | 0.4 | 0.16 |
10 | no | 0 | 0.6 | -0.6 | 0.36 |
| | 6/10 =.6 (mean of proportion) | | | · = 2.4 (sum of squares) |
- Given a sum of squares of 2.4
for ten cases, the variance is .24. - Now let's multiply the
p (.6) by the q (.4): .6 * .4 = .24 -- so pq = variance.
the proportion for the CBS/New York Times poll of 1,024 respondents, using yesterday's formula:
- This result is one standard
error of a proportion; we multiply by 100 to make it a percentage: 1.5% - But remember we need to
double the 1.5% to produce an estimate of +/- 3%--such that it will
embrace 95% of the possible samples.
Sat Apr 08, 2023 8:31 am by Dr Abdul Aziz Awan
» Video for our MPH colleagues. Must watch
Sun Aug 07, 2022 11:56 pm by The Saint
» Salam
Sun Jan 31, 2021 7:40 am by mr dentist
» Feeling Sad
Tue Feb 04, 2020 8:27 pm by mr dentist
» Look here. Its 2020 and this is what we found
Mon Jan 27, 2020 7:23 am by izzatullah
» Sad News
Fri Jan 11, 2019 6:17 am by ameen
» Pakistan Demographic Profile 2018
Fri May 18, 2018 9:42 am by Dr Abdul Aziz Awan
» Good evening all fellows
Wed Apr 25, 2018 10:16 am by Dr Abdul Aziz Awan
» Urdu Poetry
Sat Apr 04, 2015 12:28 pm by Dr Abdul Aziz Awan