|Introductory Statistics:Concepts,Models,and Applications by David W. Stockburger|
source ref: ebookstat.html
Suppose an educator had a theory which argued that a great deal of learning occurrs before children enter grade school or kindergarten. This theory explained that socially disadvantaged children start school intellectually behind other children and are never able to catch up. In order to remedy this situation, he proposes a head-start program, which starts children in a school situation at ages three and four.
A politician reads this theory and feels that it might be true. However, before he is willing to invest the billions of dollars necessary to begin and maintain a head-start program, he demands that the scientist demonstrate that the program really does work. At this point the educator calls for the services of a researcher and statistician.
Because this is a fantasy, the following research design would probably never be used in practice. This design will be used to illustrate the procedure and the logic underlying the hypothesis test. At a later time, we will discuss a more appropriate design.
A random sample 64 four-year old children is taken from the population of all four-year old children. The children in the sample are all enrolled in the head-start program for a year, at the end of which time they are given a standardized intelligence test. The mean I.Q. of the sample is found to be 103.27.
On the basis of this information, the educator wishes to begin a nationwide head-start program. He argues that the average I.Q. in the population is 100 (m =100) and that 103.27 is greater than that. Therefore, the head-start program had an effect of about 103.27-100 or 3.27 I.Q. points. As a result, the billions of dollars necessary for the program would be well invested.
The statistician, being in this case the devil's advocate, is not ready to act so hastily. He wants to know whether chance could have caused the large mean. In other words, head start doesn't make a bit of difference. The mean of 103.27 was obtained because the sixty-four students selected for the sample were slightly brighter than average. He argues that this possibility must be ruled out before any action is taken. If not ruled out completely, he argues that although possible, the likelihood must be small enough that the risk of making a wrong decision outweighs possible benefits of making a correct decision.
To determine if chance could have caused the difference, the hypothesis test proceeds as a thought experiment. First, the statistician assumes that there were no effects; in this case, the head-start program didn't work. He then creates a model of what the world would look like if the experiment were performed an infinite number of times under the assumption of no effects. The sampling distribution of the mean is used as this model. The reasoning goes something like this:
He or she then compares the results of the actual experiment with those expected from the model, given there were no effects and the experiment was repeated an infinite number of times. He or she concludes that the model probably could explain the results.
Therefore, because chance could explain the results, the educator was premature in deciding that head-start had a real effect.
Suppose that the researcher changed the experiment. Instead of a sample of sixty-four children, the sample was increased to N=400 four-year old children. Furthermore, this sample had the same mean (=103.27) at the conclusion as had the previous study. The statistician must now change the model to reflect the larger sample size.
The conclusion reached by the statistician states that it is highly unlikely the model could explain the results. The model of chance is rejected and the reality of effects accepted. Why? The mean that resulted from the study fell in the tail of the sampling distribution.
The different conclusions reached in these two experiments may seem contradictory to the student. A little reflection, however, reveals that the second experiment was based on a much larger sample size (400 vs. 64). As such, the researcher is rewarded for doing more careful work and taking a larger sample. The sampling distribution of the mean specifies the nature of the reward.
At this point it should also be pointed out that we are discussing statistical significance: whether or not the results could have occurred by chance. The second question, that of practical significance, occurs only after an affirmative decision about the reality of the effects. The practical significance question is tackled by the politician, who must decide whether the effects are large enough to be worth the money to begin and maintain the program. Even though head-start works, the money may be better spent in programs for the health of the aged or more nuclear submarines. In short, this is a political and practical decision made by people and not statistical procedures.
1.) A significance test comparing a single mean to a population parameter () was discussed.
2.) A model of what the world looks like, given there were no effects and the experiment was repeated an infinite number of times, was created using the sampling distribution of the mean.
3.) The mean of the experiment was compared to the model to decide whether the effects were due to chance or whether another explanation was necessary (the effects were real). In the first case, the decision was made to retain the model. It could explain the results. In the second case, the decision was to reject the model and accept the reality of the effect.
4.) A final discussion concerned the difference between statistical significance and practical significance.