|Summarising and presenting data|
source ref: ebook.html
These are statistics which summarise how spread out the data values are. They are also called measures of dispersion.
The range is the difference between the lowest value and the highest value: the maximum minus the minimum. For the Commodore data, the maximum is $29,500 and the minimum is $2,200:
Range = (Maximum - Minimum) = (29500 - 2200) = 27300
The range depends only on the extreme values in the data set.
Mistakes in data, such as reversing digits (e.g. 52 for 25) or omitting digits (e.g. 12 for 132) may produce extreme values. A measure of the spread of data which is not so much affected by extreme values as the range is to take values 5% in from either end, or 1/4 in from either end.
When the data are arranged in order of magnitude (i.e. they are ranked) the quartiles are 3 numbers which divide the data into four groups each having approximately the same number of values.
The inteaquartile range is defined as IQR = Q3 - Q1.
EXAMPLE. Consider first 9 Commodore prices ( in $,000) 6.0, 6.7, 3.8, 7.0, 5.8, 9.975, 10.5, 5.99, 20.0
Arrange these in order of magnitude
The median is Q2 = 6.7 (there are 4 values on either side)
Q1 = 5.9 (median of the 4 smallest values)
Q3 = 10.2 (median of the 4 largest values)
IQR = 10.2 - 5.9 = 4.3.
[Some textbooks and computer programs use slightly different definitions for Q1 and Q3 from the ones given here. The calculated values, however, are usually very similar. Use HELP DESCRIBE to see the MINITAB definition.]
Just as the median is not affected much by extreme values, neither is the IQR. For example, for the Commodore prices MINITAB gives
Quartiles divide the ordered data into quarters, but we can consider any fractions we please. The most common are "percentiles", where we take hundredths. The first quartile is thus the 25th percentile, the median is the 50th percentile and the upper quartile is the 75th percentile.
The percentiles most commonly used, after the 50th, are those close to 100. Thus the 90th percentile is the value that is exceeded by only 10% of the sample or the population, and the 99th percentile is exceeded by only 1 in 100.
You will occasionally also see "deciles", which are found by dividing the data into tenths, and "quintiles", which divide the data into fifths. The first quintile is identical to the 20th percentile, the median is the fifth decile, and so on.