close this bookSummarising and presenting data
source ref: ebook.html
View the documentMetadata
View the document1.Types of Data
View the document2.Discrete Data
View the document3.Continuous Data
View the document4.Characteristics of a Distribution
View the document5.Measures of Central Tendenct
View the document6.Measures of Variability
View the document7.Normal Distribution
View the document8.Two Continuous Measurements
View the document9.Exploring Data in Tables

9.Exploring Data in Tables

EXPLORING DATA IN TABLES

In this section we will consider ways of presenting data on two or more variables in tables

Example - Number of applicants to UCB
Source: Freedman, Pisani & Purves, "Statistics", Norton 1978.

A table containing the number of applicants for admission to the University of California at Berkley according to gender and major (equivalent to Faculty in Australia) is shown below.

How can you discover and present the main "messages" in the data?

No. of applicants
Major Men Women Total
A 825 108 933
B 560 25 585
C 325 593 918
D 417 375 792
E 191 393 584
F 373 341 714
TOTAL 2691 1835 4526

To compare preferences for major (outcome variables) between men and women (predictor variable) we might calculate the percentage of total admissions by major for each gender (ie column percentages). To do this, multiply each number in the "Men" column by 100/2691 and multiply each number in the "Women" column by 100/1835.

Major Men Women
A 31 6
B 21 1
C 12 32
D 15 20
E 7 21
F 14 19
TOTAL 100 99*

*Doesn't add to %100 due to rounding.

Conclusion - men preferred majors A or B, while women preferred majors C, D, E or F.

Dot Chart

You could show the column %s on a bar graph or a dot chart.

To compare gender distributions (outcome) for majors (predictor), calculate % of men and women for each major (ie row percentages)

Major Men Women Total
A 88 12 100
B 96 4 100
C 35 65 100
D 53 47 100
E 33 67 100
F 52 48 100

Conclusion - Majors A and B were predominantly men, C and E were predominantly women, and majors D and F were approximately balanced between men and women.

Example - Newcastle restaurant survey

A Newcastle Restaurant Survey was conducted in 1990. Restaurants were classified by type of ownership and size.

The variable OWNER takes 3 values :

  1. sole proprietorship
  2. partnership,
  3. corporation.

The variable SIZE also takes 3 values:

  1. fewer than 5 employees
  2. between 5 to 20 employees
  3. more than 20 employees

The raw data for 20 restaurants are as follows.

Owner Size   Owner Size
3 3   1 2
1 2   3 2
1 1   3 2
3 1   3 3
1 1   1 2
3 1   3 3
1 2   3 1
1 1   3 2
2 1   3 1
2 3   2 1

For example, the first restaurant in the data set was owned by a corporation (code 3) and had more than 20 employees (code 3).

The data were entered into a MINITAB worksheet by the following commands:

MTB> read c1 c2
DATA> 3 3
DATA> 1 2
DATA> 1 1
DATA> 3 1
....
....
....
DATA> end
     20 ROWS READ
MTB> name c1 'owner' c2 'size'

The command TABLE C1 C2 will result in a table in which the rows are OWNER, the columns are SIZE and the numbers in the cells of the table are the counts (frequencies).

A table is often easier to interpret if the counts are converted to percentages.

The subcommand COLPERCENT calculates column percentages and the subcommand ROWPERCENT calculates row percentages.

MTB> table c1 c2

 ROWS: owner     COLUMNS: size

           1        2        3      ALL
 
  1        3        4        0        7
  2        2        0        1        3
  3        4        3        3       10
 ALL       9        7        4       20

  CELL CONTENTS --
                  COUNT

MTB> table c1 c2;
SUBC> colpercent.
 
 ROWS: owner     COLUMNS: size

           1        2        3      ALL
 
  1    33.33    57.14      --     35.00
  2    22.22      --     25.00    15.00
  3    44.44    42.86    75.00    50.00
 ALL  100.00   100.00   100.00   100.00

  CELL CONTENTS --
                  % OF COL

MTB> table c1 c2;
SUBC> rowpercent.
 
 ROWS: owner     COLUMNS: size

           1        2        3      ALL
 
  1    42.86    57.14      --    100.00
  2    66.67      --     33.33   100.00
  3    40.00    30.00    30.00   100.00
 ALL   45.00    35.00    20.00   100.00

  CELL CONTENTS --
                  % OF ROW

Which category of OWNER had the highest frequency? What percentage of all OWNER were in that category? Which category of SIZE had the highest frequency?

Contingency Tables

Example - Number of deaths in Australia (1989)

The table below shows the numbers of deaths in Australia in 1995 for people aged 15-24 years (Source: Australian Bureau of Statistics, 3303.0, pp.33-35):

Cause of death Males Females Total
Motor vehicle accident 448 146 594
Suicide 350 84 434
Other accident 257 74 331
Malignant cancer 86 50 136
Other diseases 267 153 420
Total 1,408 507 1,915

Each person who died was categorised by sex (M or F) and by cause of death. A cross-classified table is sometimes called a contingency table.

Do males and females in this age group die from the same causes?

To compare patterns of cause of death you need to consider relative frequencies or percentages because the total numbers of deaths are not the same for males and females.

Marginal frequency distribution for Deaths By Gender
  Males Females Totals
Numbers 1408 507 1915
Relative Frequency 0.74 0.26 1.00

(e.g. 1408/1915 is approximately 0.74)

Conclusion - In this age group there are about 3 times more male deaths than female deaths.

Marginal frequency distribution for Cause of Death
Cause Number Relative frequency
Motor vehicle accident 594 0.31
Suicide 434 0.23
Other accidents 331 0.17
Malignant neoplasms 136 0.07
Other diseases 420 0.22
Total 1915 1.00

This table is obtained by collapsing the original table over the factor 'sex'. Conclusion - The main causes of death in this age group are motor vehicle accidents, which cause 31% of all deaths, and suicides, which account for 23%.

Conditional frequency distribution
  Males Females
Cause No. % No. %
Motor vehicle accidents 448 32 146 29
Suicide 350 25 84 17
Other accidents 257 18 74 15
Cancer 86 6 50 10
Other diseases 267 19 153 30
Total 1408 100% 507 100%

Conclusion - Motor vehicle accidents were the major cause of death for both males and females in the age group 15-24 years, accounting for about 30% of deaths. Suicides were more common for males than for females.

The need to consider group size

In order to obtain valid comparisons among groups it is necessary to consider the sizes of the groups and to report the results similarly for all groups.

Example - Infant deaths in 1989

Numbers of babies who died before or just after birth in the Hunter Region in 1989.

Area Neonatal deaths Live births Total births Death rate (%)
Lake Macquarie 24 2304 2328 1.03
Newcastle 26 1835 1861 1.40
Maitland 5 814 819 0.61
Cessnock 7 725 732 0.96
Port Stephens 10 631 641 1.56
Muswellbrook 2 295 297 0.67

Source: Hunter Health Statistics Unit

You cannot directly compare the numbers of deaths in each area because these depend on the number of births. You need to convert the numbers of deaths to death rates:

Rules for presentation of tables

  • Two examples demonstrating patterns and exceptions.

EXPLORING DATA IN TABLES - RULES FOR PRESENTATION OF TABLES

  1. The patterns and exceptions should be obvious at a glance
  2. Round the numbers to 2 or even just 1 effective digit
  3. Numbers are easier to compare in columns than in rows
  4. Order rows and columns by size with big numbers first
  5. Don't try to present everything on one table. It is better to use several smaller tables.

Example - UK Merchant Vessels

Table 1
United Kingdom Merchant Vessels in Service
(500 gross tons and over)

Table 2
An "improved" version of Table 1

Example - GB Unemployment

Table 6
Unemployment in Great Britain - Original Version

Table 7
Unemployment in GB - Rounded

Table 8
With Averages

Table 9
Rows and Columns Interchanged

 

  • An article by Ehrenberg on designing readable tables.

SUMMARISING AND PRESENTING DATA - APPENDIX ONE

RULES FOR PRESENTATION OF TABLES

 

Appl Statist. (1986)
35, No. 3, pp. 237-244

      Reading a Table: An Example

      By A. S. C. EHRENBERG

      London Business School. UK. **

      [Received February 1985. Revised January 1986]

      SUMMARY
      In reading a table of numbers it helps to focus first on the variation in a single row and a single column, preferably of summary figures such as averages or totals. This then provides a base for looking at the rest of the data.

      Keywords.
      Looking at Numbers; Visual Focus; Averages or Totals; Main Variation; Memorable patterns.

1. Introduction
Many people have at times to look at a table of data to see what the numbers are saying. As statisticians we may be quite good at this. But we usually take our reading skill for granted and do not talk about it very much. We therefore do not find it easy to teach others.

This note tries to develop some precepts in terms of a specific example. It concerns a table about paper tissue which was shown to me by a journalist who was then in the paper industry (Ehrenberg, 1984). She had just received it from a Finnish source (without any supporting text) and asked: "What do I do with this? I can't just reproduce it in my article, can I?"

In starting to tell her what the table was saying and how one could get at this, I quickly realised that I had seldom tried to explain this process to others. Nor did there seem to be much literature on it—Wright (1981) for example mainly discusses formal tables rather than statistical ones.

Section 2 therefore now describes what I did in reading the table, as a personal case-history. Subsequently this appeared representative of what I do more generally and also seemed to make sense to others. Hence this note.

Section 3 then discusses an improved lay-out for the table, using earlier rules or guidelines like rounding and ordering by size (e.g. Ehrenberg 1982). This should make the data easier to read. But reading a given table, whether quite well laid-out or not, remains a separate process from improving it.

**Address for correspondence : Professor A. S. C. Ehrenberg, London Business School, Sussex Place, Regent's Park, London NW1 4SA, UK.

      © 1995 1986 Royal Statistical Society


RULES FOR PRESENTATION OF TABLES

 

2. The Case History
This section deals with Table 1. It tries to make explicit the five steps which I think I took in first reading it, arrived at by an attempt at self-observation.

What does the table tell us? (Some readers may find it useful to stop reading the text at this stage and just look at the table instead, making notes of the steps they took in trying to read it themselves. How would they tell somebody else to do so?)

Step 1: What are the Variables?
Faced with an unfamiliar table and no verbal summary or stated purpose, my own first step uas to glance briefly at the various captions. They showed that it was about Paper Tissue, about "Demand" (is this purchases?), "Supply" (production?), and "Balance" (the difference?). The table covered various regions and countries and years, presumably actuals for 1978, and then forecasts. (This was in 1984—there were no notes to tell one more.)

At this early stage there seemed to be no need to worry about how "Demand" etc had been defined and measured, or whether the EEC did or did not include Greece, or about the author's purpose in giving the table, or about how the forecasts were made. Nor yet to read the footnotes. (This may be more productive later, when one has begun to know something about the numbers, as in Step 4 here. Or one may never need to do so, if the table turns out uninteresting.)

Step 2: Focussing on One Row or Colunmn
To begin eye-balling the actual numbers I first aimed to focus on just one row or column. Since there is a row of TOTALS that's where I chose to start.

Reading across the bottom line, "Demand" in 1978 was 7.99, "Supply" 8.04, and the "Balance" virtually zero at 0.05. So the 1978 Demand should almost equal Supply. Yes, it does!

Next, glancing all the way across the bottom line showed that the highest figure was 16.00 over on the nght (for "Demand in 1995") and that some of the other figures were negative (but smallish). This gave some basic markers about the range of variation: small negative up to 16.00.

Given that there were three variables in the bottom line, it seemed best to go on with just one: Perhaps "Balance" would always be near-zero, and hence nice and simple to take in and to remember? But the next entry, the Balance for 1985, was rather bigger, at 0.93. Nonetheless, it still represented only a small short-fall in Supply, less than 10%. (This I noted by glancing at the 1985 Supply and Demand figures "out of the corner of my eye", i.e. without as yet trying to take them in and in any way remember them.)

Mental Rounding. In looking at these various figures it helps one to round each to two or even just one effective digits in one's head: This mental rounding is essential for then doing mental arithmetic on the numbers. It means reading the 1985 Total Balance as—.9 or—1, and the Demand as 10.7 or 11. By mental arithmetic this gave just under 10%.

But in describing the process I think it helps the reader to quote the full numbers, like—0.93 and 10.67, for ease of visual identification.

In summary, there were four crucial elements here: Focussing on just one row or column (and one variable); getting some "feel" for the data; noting the range of variation; drastic mental rounding throughout.

Step 3. Are the Balance Totals Representative?
Next I quickly checked whether the small balances in 1978 and 1985 were at all typical for individual countries and regions. Glancing up and down the balance columns for 1978 and 1985 showed that they were all .0 or .1 something, and sometimes even exactly zero—all small compared with the Demand and Supply columns. This gave more "feel" for the data: near-zero balances generally in 1978 and smallish ones in 1985.

Step 4. The Full Trendfor "Balance"
Continuing still with "Balance" in the bottom line because so far that had been fairly simple, showed that in 1990 it was—3.32 (i.e. about—3). This was much bigger than the—.9 Balance in 1985. Indeed, the short-fall now was about a third of the total 1990 Supply. Was this perhaps the message of the table, i.e. an increasing short-fall of Supply against Demand: - .05, - .9, —3?

To see how this bigger short fall in 1990 had come about (was it Twyman's Law—"Any figure that looks different or interesting is usually wrong"?), I first compared the Demand total in 1985 with that in 1990 and saw a sizeable increase of 2 or 3 million, from 10.67 to 13.06. But the 1985 and 1900 Supply totals were identical, both 9.74! And so they were in each country or region for 1985 and 1990!

Checking Footnote 2 now showed that

  • (a) Even the Supply figures for 1985 had not been full forecasts. (The reported —0.93 shortfall in the 1985 Balance therefore seemed a bit dubious.)
  • (b) The Supply figures for 1990 were not "forecasts" at all.
  • (c) By 1995 the table-producer had given up on Supply altogether.

Step 5: The Trends in "Demand "
Having now pretty much gone off the Supply and Balance figures since there were hardly any, it seemed possible to concentrate on Demand:

  • (i) The World Total doubled from about 8 (or 7.99) in 1978 to 16 in 1995.
  • (ii) The trend was fairly smooth over the intervening years (10.67 and 13.06),
  • (iii) Looking at the individual rows in the table (and using mentally rounded figures), the '78 to '95 increases were often less than 2-fold for the seven more developed regions or countries higher up in the table (e.g. 3.9 to 5.4 for North America right at the top), and as much as 3- or 4-fold for the others (e.g. 0.5 to 2.0 for Latin America at the bottom).

Having a Focus. Looking at the individual rows was vastly helped by using the single row of summary figures at the bottom of the table as a mental norm. Did the overall doubling of Total Supply between '78 to '95 from 8 to 16 also hold for the individual regions or countries? Without such a memorable focus one would hardly know which figures inside the table to compare with which, or see the row-by-column interaction, i.e. the less than 2-fold versus the 4-fold increases noted in (iii) above.

If no summary figures like the Totals had been given one could pick either a row with big figures, or one which is visually well-positioned, or both. Thus to look at the columns in Table 1 (which have no summary), one can use the '78 Demand on the far left as a focus:

  • (iv) In 1978 itself, North America, USA, Western Europe and the EEC were much the largest,
  • (v) This also applied all the way across the table (other than for the Balances).

This concluded my initial reading of the table. It seemed to make two points, which one could then also communicate to others:

  • A. Demand for paper tissue was expected to double by 1995; more so in less developed parts of the world.
  • B. There would be a short-fall of Supply unless it was increased!

More work on the table would only be needed if it became part of some bigger study or writeup (but this seemed unlikely in this instance).

If a table which one has to look at in detail lacks summary figures, I normally get some row and/or column averages (or totals) worked out first, before even looking at it. (Averages are usually easier to compare with the individual figures since they are in the same units. But Totals are occasionally meaningful in their own right, as for the columns in Table 1, but not the rows.)

3. Better Table Lay-out

Redesigning such a table can make it a good deal easier to read—especially for the quantitative detail—and better to present to others. We can use various rules of rounding, ordering by size, giving averages (or totals), and generally attending to the lay-out, as has been described before (e.g. Ehrenberg 1982, Chapters 3, 15 and 16). The process is illustrated in Section 3.1 for the Demand figures.

A further point is that when a table contains several variables in juxtaposition—here Demand, Supply, and Balance—it seems generally better to give a separate table for each, as is shown in Section 3.2.

Reading such improved tables still requires the procedures in Section 2, like first picking out the extreme vanation in a summary row or column. But the process should be simpler.

3.1. A Better Demand Table
Table 2 concentrates on the "Demand" figures. The improvements are five-fold:

  • (a) Reduce the risks of double-counting and general confusion by separating the main regions from those for the constituent countries.
  • (b) Order the rows by size (using the '77-'79 Actuals as criterion), with the big numbers at the top.
  • (c) Round all figures to just one decimal place (rather than to two effective digits, which is the general precept), since deliberate over-rounding for the smaller entries should be acceptable for such an overview. (The table is clearly not a precise look-up table for Scandinavia or Africa.)
  • (d) Use fewer grid-lines (but keep to single spacing of the rows with gaps, as in the original table).
  • (e) Use better labelling.

We can now read the data better (including doing the mental arithmetic required) to see—more easily than in Section 2—tha

  • (i) World demand doubled from';8 to '95.
  • (ii) North America, much the largest user, increased rather less than 2-fold (i.e. 5.414 is about 1.4), while the next two, Western Europe and Japan, increased fractionally more.
  • (iii) The rest (Latin America, USSR etc, China, and Africa) all increased about 3- or ~fold. (The self-evident over-rounding has reduced the increases for China and Africa from 4to 3-fold!)
  • (iv) The growth rates for the individual countries in the lower section of the table are roughly in line with their region.
  • (v) In each case, the projected Demand growth through the intervening years is pretty smooth.

Table 2 satisfies the criterion of a good table: One can see the patterns and exceptions at a glance, especially once one knows (e.g. has been told) what they are. This is not so with Table 1.

We can probably also recall the known patterns more easily than from Table 1. Just a glance back at Table 2 seems to remind us almost instantly that the Demand increases to '95 were very roughtly two-fold at the top of the table, and three- to four-fold lower down.

3.2. Separate Tables
When a table contains two or more variables like Table l, it is usually best to construct a separate table for each variable, as Table 2 did for Demand (see also Ehrenberg 1982, p. 228).

The key to understanding the data does not lie in facilitating detailed comparisons of every triplet of Demand, Supply and Balance figures for each region, country, and year in turn. Instead, it is to establish (and take in) the main patterns. This can best be done for one variable at a time, comparing like with like. After that we can compare one pattern with that for the next variable.

Thus for Demand in Table 2 we could establish the 2-fold or 4-fold trends over time, and the relative size of the regions. We can now compare this with the Supply patterns in Table 3. Only after that might we have to make specific compansons for some individual pairs of figures. This itself is simpler once we have an overt and broadly memorable framework for all the figures.

To comment in more detail, in Table 3 the relative steadiness of the Supply figures over the three years stands out (in contrast with the doubling of Demand in Table 2). So does the fact that the forecast Supplies for '85 and '90 are in each case identical, especially once one has noticed or been told. These patterns are more immediately clear, and also much easier to keep in mind, than in Table l or probably even with any improved (e.g. rounded and reordered) three-variable table. Similarly, in Table 4 the small Balances generally, the larger (negative) Balances in 1980, and the positive ones for Scandinavia stand out more.

4. DISCUSSION

The steps in reading Table 1 that were outlined in Section 2 can probably be reduced to five general precepts for reading a table:

  • I. Take in the broad subject-matter and the variables, without yet worrying over details, sources, etc.
  • II. Focus first on one row and/or one column, preferably of averages. Establish the range of variation, i.e. the highest and lowest readings, as mental "markers". (Also note what form the intervening variation seems to take, without yet trying to take the data in fully.)
  • III Round all figures one looks at to one or two effective digits in one's head, to facilitate mental arithmetic and make the results more memorable.
  • IV. Compare the detailed readings in the body of the table against these patterns as norms.
  • V. Now possibly consider the definitions, sources, the wider meaning of the results, and a more formal analysis.

If more work is to be done on the table, or the results are to be presented to others, we need also to be prepared to revamp the table, with rounded figures, rows and columns ordered by size, adequate summary figures to provide a focus, and better lay-out generally.

Two more general points about the reading process may be worth making

  • a. If there is text or commentary, one should probably glance at this first; it may, or should, provide a way into what the table is saying. But if one is merely skimming a publication, the more numerate of us may prefer to glance at tables rather than the text! In one's technical work, one usually has seen similar tables before and this will guide one in what to look for.
  • b. In reading a table we have to compare the different entries. For this we need our shortterm memory and related information-processing routines (e.g. Baddeley, 1976, 1982). But our abort-term memory has very limited capacity and we must not overburden it. In particular, we remember patterns more easily, whether in the short or longer term, than isolated numbers.

In conclusion, this note does not mean to suggest that everyone would or should follow exactly the sequence of steps or guidelines in reading a table that have been outlined here. They are merely ones which I believe I tend to follow in such situations and which seem to have made sense to some others.

If we are to teach people better numeracy, we need to make such guidelines explicit. But how far do other people work very differently? If so, what are their procedures? And which of them are generalisable? It would be good to debate more cases and more points of view.

Acknowledgements

I am indebted for helpful comments on drafts of this note from Chris Beaumont, John Bound, Derek Bunn, Len England, Peter Gorb, Helen Lewis, Bill Kruskal, John Nelder, David Targett, Patncia Wright, and especially also the editor and one of thc referees. The note is part of a programme of work at the LBS's Centre for Marketing and Communication which is supported by some forty leading companies and institutions in the UK and USA.

I am also indebted to the Editors for allowing some departures from the Journal's housestyle, such as being able to use vertical rules in printing numerical tables, and using visually helpful capitalised initials in table headings.

References

Baddeley, A. D. (1976) The Psychology of Memory. Ncw York: Basic Books.

-----(1982) Your Memory: A User's Guide. London and New York: Penguin Boolcs

Ehrenberg, A. S. C. (1982) A Primer in Data Reduction. Chichester ant New York: Wiley.

Ehrenberg, D. S. (1984) Personal communication.

Wright, P. (1981) Tables in Text: Thc Subskills Needed for Reading Formatted Information. In The Reader and the Text (C. J. Chapman, ed.). London: Heineman.

Progress check

  1. A column of numbers has total T. To convert to percentages you can

    • divide each number by T, then multiply by 100
    • multiply each number by (100/T)
    • divide each number by (T/100)

     

  2. A contingency table classified by two factors A and B is collapsed over B. If A has three levels and B has four levels, how many cells are there in the marginal table, not counting the grand total?

    • 1
    • 3
    • 4

     

  3. In designing tables of statistical data, it is helpful

    • to avoid filling the table with too many unimportant digits
    • to always present row and column percentages as well as the numbers
    • to include as much information as possible in each table

     

to previous section