| Summarising and presenting data source ref: ebook.html |
A measure of central tendency is a number which indicates the middle of the distribution of data values. The three main measures are the median, the mode and the mean.
The median is a number which is greater than half the data values and less than the other half. If there are an odd number of values, the median is the middle one when they are sorted in order of magnitude. If there are an even number of values, the median is the average of the two middle values. Eg.1 The first five Commodore prices in $,000 are:
6, 6.7, 3.8, 7, 5.8
Arranged in order of magnitude these are
Eg.2 The first 6 Commodore
prices in $,000 are
6, 6.7, 3.8, 7, 5.8, 9.975
Arranged in order of magnitude these are:
For all Commodores there are n = 38 values, so the middle ones are 19th and 20th values (so that there are 18 on either side). The 19th and 20th values are both $9,500 so the median is $9,500.
The MINITAB command for obtaining the median of a data set stored in column C1 is
MTB> median c1
MEDIAN = 9500.0
The mode is the value or category which occurs most frequently. If several data values occur with the same maximal frequency, they are all modes. For example, in the Commodore data, using the grouped data, the class interval, [8,000 - 10,999], is the mode.
Denote data values by
where n is the total number of values
This is denoted by
(read as 'x bar') and defined as the arithmetic mean of all the data values.
e.g. for the Commodore prices
The MINITAB command is MEAN C1. Also the mean and other summary values are given by DESCRIBE C1
Data set A: 2,3,3,4,5,7,8
Data set B: 2,3,3,4,5,8,20
Both have n = 7 values.
It is necessary to sort the data in order of magnitude before you can find the median. For large data sets this may be time consuming and this is the reason why medians were not used much until computers became readily available.
The median is not affected by extreme values, but the mean is changed (compare results for data sets A and B above).
In many situations the median is a better description of central tendency (e.g. many more people have less than the average income than have more).
The median for grouped data is calculated as the midpoint of the class interval that comes closest to having half the values above and below it.
(or 18.75 + 0.5 years if we use the mid-point interval)
For grouped data like the Commodore prices take the x-values as interval
mid-points
e.g. for interval 2000-4999 use
, etc then
(which is close to the mean calculated from the individual values, 10080)
Progress check |
|
|
|