|Summarising and presenting data|
source ref: ebook.html
In this section we will consider ways of presenting data on two or more variables in tables
A table containing the number of applicants for admission to the University of California at Berkley according to gender and major (equivalent to Faculty in Australia) is shown below.
How can you discover and present the main "messages" in the data?
To compare preferences for major (outcome variables) between men and women (predictor variable) we might calculate the percentage of total admissions by major for each gender (ie column percentages). To do this, multiply each number in the "Men" column by 100/2691 and multiply each number in the "Women" column by 100/1835.
*Doesn't add to %100 due to rounding.
Conclusion - men preferred majors A or B, while women preferred majors C, D, E or F.
You could show the column %s on a bar graph or a dot chart.
To compare gender distributions (outcome) for majors (predictor), calculate % of men and women for each major (ie row percentages)
Conclusion - Majors A and B were predominantly men, C and E were predominantly women, and majors D and F were approximately balanced between men and women.
A Newcastle Restaurant Survey was conducted in 1990. Restaurants were classified by type of ownership and size.
The variable OWNER takes 3 values :
The variable SIZE also takes 3 values:
The raw data for 20 restaurants are as follows.
For example, the first restaurant in the data set was owned by a corporation (code 3) and had more than 20 employees (code 3).
The data were entered into a MINITAB worksheet by the following commands:
MTB> read c1 c2 DATA> 3 3 DATA> 1 2 DATA> 1 1 DATA> 3 1 .... .... .... DATA> end 20 ROWS READ MTB> name c1 'owner' c2 'size'
The command TABLE C1 C2 will result in a table in which the rows are OWNER, the columns are SIZE and the numbers in the cells of the table are the counts (frequencies).
A table is often easier to interpret if the counts are converted to percentages.
The subcommand COLPERCENT calculates column percentages and the subcommand ROWPERCENT calculates row percentages.
MTB> table c1 c2 ROWS: owner COLUMNS: size 1 2 3 ALL 1 3 4 0 7 2 2 0 1 3 3 4 3 3 10 ALL 9 7 4 20 CELL CONTENTS -- COUNT MTB> table c1 c2; SUBC> colpercent. ROWS: owner COLUMNS: size 1 2 3 ALL 1 33.33 57.14 -- 35.00 2 22.22 -- 25.00 15.00 3 44.44 42.86 75.00 50.00 ALL 100.00 100.00 100.00 100.00 CELL CONTENTS -- % OF COL MTB> table c1 c2; SUBC> rowpercent. ROWS: owner COLUMNS: size 1 2 3 ALL 1 42.86 57.14 -- 100.00 2 66.67 -- 33.33 100.00 3 40.00 30.00 30.00 100.00 ALL 45.00 35.00 20.00 100.00 CELL CONTENTS -- % OF ROW
The table below shows the numbers of deaths in Australia in 1995 for people aged 15-24 years (Source: Australian Bureau of Statistics, 3303.0, pp.33-35):
|Cause of death||Males||Females||Total|
|Motor vehicle accident||448||146||594|
Each person who died was categorised by sex (M or F) and by cause of death. A cross-classified table is sometimes called a contingency table.
Do males and females in this age group die from the same causes?
To compare patterns of cause of death you need to consider relative frequencies or percentages because the total numbers of deaths are not the same for males and females.
(e.g. 1408/1915 is approximately 0.74)
Conclusion - In this age group there are about 3 times more male deaths than female deaths.
|Motor vehicle accident||594||0.31|
This table is obtained by collapsing the original table over the factor 'sex'. Conclusion - The main causes of death in this age group are motor vehicle accidents, which cause 31% of all deaths, and suicides, which account for 23%.
|Motor vehicle accidents||448||32||146||29|
Conclusion - Motor vehicle accidents were the major cause of death for both males and females in the age group 15-24 years, accounting for about 30% of deaths. Suicides were more common for males than for females.
In order to obtain valid comparisons among groups it is necessary to consider the sizes of the groups and to report the results similarly for all groups.
Numbers of babies who died before or just after birth in the Hunter Region in 1989.
|Area||Neonatal deaths||Live births||Total births||Death rate (%)|
Source: Hunter Health Statistics Unit
You cannot directly compare the numbers of deaths in each area because these depend on the number of births. You need to convert the numbers of deaths to death rates:
United Kingdom Merchant Vessels in Service
(500 gross tons and over)
An "improved" version of Table 1
Unemployment in Great Britain - Original Version
Unemployment in GB - Rounded
Rows and Columns Interchanged
|Appl Statist. (1986)
35, No. 3, pp. 237-244
Reading a Table: An Example
By A. S. C. EHRENBERG
London Business School. UK. **
[Received February 1985. Revised January 1986]
This note tries to develop some precepts in terms of a specific example. It concerns a table about paper tissue which was shown to me by a journalist who was then in the paper industry (Ehrenberg, 1984). She had just received it from a Finnish source (without any supporting text) and asked: "What do I do with this? I can't just reproduce it in my article, can I?"
In starting to tell her what the table was saying and how one could get at this, I quickly realised that I had seldom tried to explain this process to others. Nor did there seem to be much literature on itWright (1981) for example mainly discusses formal tables rather than statistical ones.
Section 2 therefore now describes what I did in reading the table, as a personal case-history. Subsequently this appeared representative of what I do more generally and also seemed to make sense to others. Hence this note.
Section 3 then discusses an improved lay-out for the table, using earlier rules or guidelines like rounding and ordering by size (e.g. Ehrenberg 1982). This should make the data easier to read. But reading a given table, whether quite well laid-out or not, remains a separate process from improving it.
**Address for correspondence : Professor A. S. C. Ehrenberg, London Business School, Sussex Place, Regent's Park, London NW1 4SA, UK.
© 1995 1986 Royal Statistical Society
|2. The Case History
This section deals with Table 1. It tries to make explicit the five steps which I think I took in first reading it, arrived at by an attempt at self-observation.
What does the table tell us? (Some readers may find it useful to stop reading the text at this stage and just look at the table instead, making notes of the steps they took in trying to read it themselves. How would they tell somebody else to do so?)
Step 1: What are the Variables?
At this early stage there seemed to be no need to worry about how "Demand" etc had been defined and measured, or whether the EEC did or did not include Greece, or about the author's purpose in giving the table, or about how the forecasts were made. Nor yet to read the footnotes. (This may be more productive later, when one has begun to know something about the numbers, as in Step 4 here. Or one may never need to do so, if the table turns out uninteresting.)
Step 2: Focussing on One Row or Colunmn
Reading across the bottom line, "Demand" in 1978 was 7.99, "Supply" 8.04, and the "Balance" virtually zero at 0.05. So the 1978 Demand should almost equal Supply. Yes, it does!
Next, glancing all the way across the bottom line showed that the highest figure was 16.00 over on the nght (for "Demand in 1995") and that some of the other figures were negative (but smallish). This gave some basic markers about the range of variation: small negative up to 16.00.
Given that there were three variables in the bottom line, it seemed best to go on with just one: Perhaps "Balance" would always be near-zero, and hence nice and simple to take in and to remember? But the next entry, the Balance for 1985, was rather bigger, at 0.93. Nonetheless, it still represented only a small short-fall in Supply, less than 10%. (This I noted by glancing at the 1985 Supply and Demand figures "out of the corner of my eye", i.e. without as yet trying to take them in and in any way remember them.)
Mental Rounding. In looking at these various figures it helps one to round each to two or even just one effective digits in one's head: This mental rounding is essential for then doing mental arithmetic on the numbers. It means reading the 1985 Total Balance as.9 or1, and the Demand as 10.7 or 11. By mental arithmetic this gave just under 10%.
But in describing the process I think it helps the reader to quote the full numbers, like0.93 and 10.67, for ease of visual identification.
In summary, there were four crucial elements here: Focussing on just one row or column (and one variable); getting some "feel" for the data; noting the range of variation; drastic mental rounding throughout.
Step 3. Are the Balance Totals Representative?
Step 4. The Full Trendfor "Balance"
To see how this bigger short fall in 1990 had come about (was it Twyman's Law"Any figure that looks different or interesting is usually wrong"?), I first compared the Demand total in 1985 with that in 1990 and saw a sizeable increase of 2 or 3 million, from 10.67 to 13.06. But the 1985 and 1900 Supply totals were identical, both 9.74! And so they were in each country or region for 1985 and 1990!
Checking Footnote 2 now showed that
Step 5: The Trends in "Demand "
Having a Focus. Looking at the individual rows was vastly helped by using the single row of summary figures at the bottom of the table as a mental norm. Did the overall doubling of Total Supply between '78 to '95 from 8 to 16 also hold for the individual regions or countries? Without such a memorable focus one would hardly know which figures inside the table to compare with which, or see the row-by-column interaction, i.e. the less than 2-fold versus the 4-fold increases noted in (iii) above.
If no summary figures like the Totals had been given one could pick either a row with big figures, or one which is visually well-positioned, or both. Thus to look at the columns in Table 1 (which have no summary), one can use the '78 Demand on the far left as a focus:
This concluded my initial reading of the table. It seemed to make two points, which one could then also communicate to others:
More work on the table would only be needed if it became part of some bigger study or writeup (but this seemed unlikely in this instance).
If a table which one has to look at in detail lacks summary figures, I normally get some row and/or column averages (or totals) worked out first, before even looking at it. (Averages are usually easier to compare with the individual figures since they are in the same units. But Totals are occasionally meaningful in their own right, as for the columns in Table 1, but not the rows.)
|3. Better Table Lay-out
Redesigning such a table can make it a good deal easier to readespecially for the quantitative detailand better to present to others. We can use various rules of rounding, ordering by size, giving averages (or totals), and generally attending to the lay-out, as has been described before (e.g. Ehrenberg 1982, Chapters 3, 15 and 16). The process is illustrated in Section 3.1 for the Demand figures.
A further point is that when a table contains several variables in juxtapositionhere Demand, Supply, and Balanceit seems generally better to give a separate table for each, as is shown in Section 3.2.
Reading such improved tables still requires the procedures in Section 2, like first picking out the extreme vanation in a summary row or column. But the process should be simpler.
3.1. A Better Demand Table
We can now read the data better (including doing the mental arithmetic required) to seemore easily than in Section 2tha
Table 2 satisfies the criterion of a good table: One can see the patterns and exceptions at a glance, especially once one knows (e.g. has been told) what they are. This is not so with Table 1.
We can probably also recall the known patterns more easily than from Table 1. Just a glance back at Table 2 seems to remind us almost instantly that the Demand increases to '95 were very roughtly two-fold at the top of the table, and three- to four-fold lower down.
3.2. Separate Tables
The key to understanding the data does not lie in facilitating detailed comparisons of every triplet of Demand, Supply and Balance figures for each region, country, and year in turn. Instead, it is to establish (and take in) the main patterns. This can best be done for one variable at a time, comparing like with like. After that we can compare one pattern with that for the next variable.
Thus for Demand in Table 2 we could establish the 2-fold or 4-fold trends over time, and the relative size of the regions. We can now compare this with the Supply patterns in Table 3. Only after that might we have to make specific compansons for some individual pairs of figures. This itself is simpler once we have an overt and broadly memorable framework for all the figures.
To comment in more detail, in Table 3 the relative steadiness of the Supply figures over the three years stands out (in contrast with the doubling of Demand in Table 2). So does the fact that the forecast Supplies for '85 and '90 are in each case identical, especially once one has noticed or been told. These patterns are more immediately clear, and also much easier to keep in mind, than in Table l or probably even with any improved (e.g. rounded and reordered) three-variable table. Similarly, in Table 4 the small Balances generally, the larger (negative) Balances in 1980, and the positive ones for Scandinavia stand out more.
The steps in reading Table 1 that were outlined in Section 2 can probably be reduced to five general precepts for reading a table:
If more work is to be done on the table, or the results are to be presented to others, we need also to be prepared to revamp the table, with rounded figures, rows and columns ordered by size, adequate summary figures to provide a focus, and better lay-out generally.
Two more general points about the reading process may be worth making
In conclusion, this note does not mean to suggest that everyone would or should follow exactly the sequence of steps or guidelines in reading a table that have been outlined here. They are merely ones which I believe I tend to follow in such situations and which seem to have made sense to some others.
If we are to teach people better numeracy, we need to make such guidelines explicit. But how far do other people work very differently? If so, what are their procedures? And which of them are generalisable? It would be good to debate more cases and more points of view.
I am indebted for helpful comments on drafts of this note from Chris Beaumont, John Bound, Derek Bunn, Len England, Peter Gorb, Helen Lewis, Bill Kruskal, John Nelder, David Targett, Patncia Wright, and especially also the editor and one of thc referees. The note is part of a programme of work at the LBS's Centre for Marketing and Communication which is supported by some forty leading companies and institutions in the UK and USA.
I am also indebted to the Editors for allowing some departures from the Journal's housestyle, such as being able to use vertical rules in printing numerical tables, and using visually helpful capitalised initials in table headings.
Baddeley, A. D. (1976) The Psychology of Memory. Ncw York: Basic Books.
-----(1982) Your Memory: A User's Guide. London and New York: Penguin Boolcs
Ehrenberg, A. S. C. (1982) A Primer in Data Reduction. Chichester ant New York: Wiley.
Ehrenberg, D. S. (1984) Personal communication.
Wright, P. (1981) Tables in Text: Thc Subskills Needed for Reading Formatted Information. In The Reader and the Text (C. J. Chapman, ed.). London: Heineman.