|Statistical Thinking for Decision Making: Revealing Facts from Figures.By.Dr. Hossein Arsham|
source ref: stat.htm
Today's good decisions are driven by data. In all aspects of our lives, and importantly in the business context, an amazing diversity of data is available for inspection and enlightenment. Moreover, business managers and professionals are increasingly encouraged to justify decisions on the basis of data.
Business managers need statistical model-based decision support systems. Statistical skills enable you to intelligently collect, analyze and interpret data relevant to their decision-making. Statistical concepts and statistical thinking enable you:
In competitive environment, business managers must design quality into products, and into the processes of making the products. They must facilitate a process of never-ending improvement at all stages of manufacturing and service. This is a strategy that employs statistical methods, particularly statistically designed experiments, and produces processes that provide high yield and products that seldom fail. Moreover, it facilitates development of robust products that are insensitive to changes in the environment and internal component variation. Carefully planned statistical studies remove hindrances to high quality and productivity at every stage of production. This saves time and money. It is well recognized that quality must be engineered into products as early as possible in the design process. One must know how to use carefully planned, cost-effective statistical experiments to improve, optimize and make robust products and processes.
Business Statistics is a science assisting you to make business decisions under uncertainties based on some numerical and measurable scales. Decision making processes must be based on data, not on personal opinion nor on belief.
The Devil is in the Deviations: Variation is inevitable in life! Every process, every measurement, every sample has variation. Managers need to understand variation for two key reasons. First, so that they can lead others to apply statistical thinking in day-to-day activities and secondly, to apply the concept for the purpose of continuous improvement. This course will provide you with hands-on experience to promote the use of statistical thinking and techniques to apply them to make educated decisions, whenever you encounter variation in business data. You will learn techniques to intelligently assess and manage the risks inherent in decision-making. Therefore, remember that:
Just like weather, if you cannot control something, you should learn how to measure and analyze it, in order to predict it, effectively.
If you have taken statistics before, and have a feeling of inability to grasp concepts, it may be largely due to your former non-statistician instructors teaching statistics. Their deficiencies lead students to develop phobias for the sweet science of statistics. In this respect, Professor Herman Chernoff (1996) made the following remark:
Inadequate statistical teaching during university education leads even after graduation, to one or a combination of the following scenarios:
Plugging numbers into the formulas and crunching them have no value by themselves. You should continue to put effort into the concepts and concentrate on interpreting the results.
Even when you solve a small size problem by hand, I would like you to use the available computer software and Web-based computation to do the dirty work for you.
You must be able to read the logical secret in any formulas not memorize them. For example, in computing the variance, consider its formula. Instead of memorizing, you should start with some why:
i. Why do we square the deviations from the mean.
Because, if we add up all deviations, we get always zero value. So, to deal with this problem, we square the deviations. Why not raise to the power of four (three will not work)? Squaring does the trick; why should we make life more complicated than it is? Notice also that squaring also magnifies the deviations; therefore it works to our advantage to measure the quality of the data.
ii. Why is there a summation notation in the formula.
To add up the squared deviation of each data point to compute the total sum of squared deviations.
iii. Why do we divide the sum of squares by n-1.
The amount of deviation should reflect also how large the sample is; so we must bring in the sample size. That is, in general, larger sample sizes have larger sum of square deviation from the mean. Why n-1 not n? The reason for n-1 is that when you divide by n-1, the sample's variance provides an estimated variance much closer to the population variance, than when you divide by n. You note that for large sample size n (say over 30), it really does not matter whether it is divided by n or n-1. The results are almost the same, and they are acceptable. The factor n-1 is what we consider as the "degrees of freedom".
This example shows how to question statistical formulas, rather than memorizing them. In fact, when you try to understand the formulas, you do not need to remember them, they are part of your brain connectivity. Clear thinking is always more important than the ability to do arithmetic.
When you look at a statistical formula, the formula should talk to you, as when a musician looks at a piece of musical-notes, he/she hears the music.
computer-assisted learning: The computer-assisted learning provides you a "hands-on" experience which will enhance your understanding of the concepts and techniques covered in this site.
Unfortunately, most classroom courses are not learning systems. The way the instructors attempt to help their students acquire skills and knowledge has absolutely nothing to do with the way students actually learn. Many instructors rely on lectures and tests, and memorization. All too often, they rely on "telling." No one remembers much that's taught by telling, and what's told doesn't translate into usable skills. Certainly, we learn by doing, failing, and practicing until we do it right. The computer assisted learning serves this purpose.
A course in appreciation of statistical thinking gives business professionals an edge. Professionals with strong quantitative skills are in demand. This phenomenon will grow as the impetus for data-based decisions strengthens and the amount and availability of data increases. The statistical toolkit can be developed and enhanced at all stages of a career. Decision making process under uncertainty is largely based on application of statistics for probability assessment of uncontrollable events (or factors), as well as risk assessment of your decision.
The main objective for this course is to learn statistical thinking; to emphasize more on concepts, and less theory and fewer recipes, and finally to foster active learning using the useful and interesting Web-sites. It is already a known fact that "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." So, let's be ahead of our time.
Chernoff H., A Conversation With Herman Chernoff, Statistical Science, Vol. 11, No. 4, 335-350, 1996.
Churchman C., The Design of Inquiring Systems, Basic Books, New York, 1971. Early in the book he stated that knowledge could be considered as a collection of information, or as an activity, or as a potential. He also noted that knowledge resides in the user and not in the collection.
Rustagi M., et al. (eds.), Recent Advances in Statistics: Papers in Honor of Herman Chernoff on His Sixtieth Birthday, Academic Press, 1983.
The Birth of Probability and Statistics
The original idea of "statistics" was the collection of information about and for the "state". The word statistics derives directly, not from any classical Greek or Latin roots, but from the Italian word for state.
The birth of statistics occurred in mid-17th century. A commoner, named John Graunt, who was a native of London, began reviewing a weekly church publication issued by the local parish clerk that listed the number of births, christenings, and deaths in each parish. These so called Bills of Mortality also listed the causes of death. Graunt who was a shopkeeper organized this data in the form we call descriptive statistics, which was published as Natural and Political Observations Made upon the Bills of Mortality. Shortly thereafter he was elected as a member of Royal Society. Thus, statistics has to borrow some concepts from sociology, such as the concept of Population. It has been argued that since statistics usually involves the study of human behavior, it cannot claim the precision of the physical sciences.
Probability has much longer history. Probability is derived from the verb to probe meaning to "find out" what is not too easily accessible or understandable. The word "proof" has the same origin that provides necessary details to understand what is claimed to be true.
Probability originated from the study of games of chance and gambling during the 16th century. Probability theory was a branch of mathematics studied by Blaise Pascal and Pierre de Fermat in the seventeenth century. Currently in 21st century, probabilistic modeling is used to control the flow of traffic through a highway system, a telephone interchange, or a computer processor; find the genetic makeup of individuals or populations; quality control; insurance; investment; and other sectors of business and industry.
New and ever growing diverse fields of human activities are using statistics; however, it seems that this field itself remains obscure to the public. Professor Bradley Efron expressed this fact nicely:
Daston L., Classical Probability in the Enlightenment, Princeton University Press, 1988.
The book points out that early Enlightenment thinkers could not face uncertainty. A mechanistic, deterministic machine, was the Enlightenment view of the world.
Gillies D., Philosophical Theories of Probability, Routledge, 2000. Covers the classical, logical, subjective, frequency, and propensity views.
Hacking I., The Emergence of Probability, Cambridge University Press, London, 1975. A philosophical study of early ideas about probability, induction and statistical inference.
Peters W., Counting for Something: Statistical Principles and Personalities, Springer, New York, 1987. It teaches the principles of applied economic and social statistics in a historical context. Featured topics include public opinion polls, industrial quality control, factor analysis, Bayesian methods, program evaluation, non-parametric and robust methods, and exploratory data analysis.
Porter T., The Rise of Statistical Thinking, 1820-1900, Princeton University Press, 1986. The author states that statistics has become known in the twentieth century as the mathematical tool for analyzing experimental and observational data. Enshrined by public policy as the only reliable basis for judgments as the efficacy of medical procedures or the safety of chemicals, and adopted by business for such uses as industrial quality control, it is evidently among the products of science whose influence on public and private life has been most pervasive. Statistical analysis has also come to be seen in many scientific disciplines as indispensable for drawing reliable conclusions from empirical results. This new field of mathematics found so extensive a domain of applications.
Stigler S., The History of Statistics: The Measurement of Uncertainty Before 1900, U. of Chicago Press, 1990. It covers the people, ideas, and events underlying the birth and development of early statistics.
Tankard J., The Statistical Pioneers, Schenkman Books, New York, 1984.
This work provides the detailed lives and times of theorists whose work continues to shape much of the modern statistics.
Statistical Modeling for Decision-Making under Uncertainties:
From Data to the Instrumental Knowledge
In this diverse world of ours, no two things are exactly the same. A statistician is interested in both the differences and the similarities; i.e., both departures and patterns.
The actuarial tables published by insurance companies reflect their statistical analysis of the average life expectancy of men and women at any given age. From these numbers, the insurance companies then calculate the appropriate premiums for a particular individual to purchase a given amount of insurance.
Exploratory analysis of data makes use of numerical and graphical techniques to study patterns and departures from patterns. The widely used descriptive statistical techniques are: Frequency Distribution; Histograms; Boxplot; Scattergrams and Error Bar plots; and diagnostic plots.
In examining distribution of data, you should be able to detect important characteristics, such as shape, location, variability, and unusual values. From careful observations of patterns in data, you can generate conjectures about relationships among variables. The notion of how one variable may be associated with another permeates almost all of statistics, from simple comparisons of proportions through linear regression. The difference between association and causation must accompany this conceptual development.
Data must be collected according to a well-developed plan if valid information on a conjecture is to be obtained. The plan must identify important variables related to the conjecture, and specify how they are to be measured. From the data collection plan, a statistical model can be formulated from which inferences can be drawn.
As an example of statistical modeling with managerial implications, such as "what-if" analysis, consider regression analysis. Regression analysis is a powerful technique for studying relationship between dependent variables (i.e., output, performance measure) and independent variables (i.e., inputs, factors, decision variables). Summarizing relationships among the variables by the most appropriate equation (i.e., modeling) allows us to predict or identify the most influential factors and study their impacts on the output for any changes in their current values.
Frequently, for example the marketing managers are faced with the question, What Sample Size Do I Need? This is an important and common statistical decision, which should be given due consideration, since an inadequate sample size invariably leads to wasted resources. The sample size determination section provides a practical solution to this risky decision.
Statistical models are currently used in various fields of business and science. However, the terminology differs from field to field. For example, the fitting of models to data, called calibration, history matching, and data assimilation, are all synonymous with parameter estimation.
Your organization database contains a wealth of information, yet the decision technology group members tap a fraction of it. Employees waste time scouring multiple sources for a database. The decision-makers are frustrated because they cannot get business-critical data exactly when they need it. Therefore, too many decisions are based on guesswork, not facts. Many opportunities are also missed, if they are even noticed at all.
Knowledge is what we know. Information is the communication of knowledge. In every knowledge exchange, there is a sender and a receiver. The sender makes common what is private, does the informing, the communicating. Information can be classified as explicit and tacit forms. The explicit information can be explained in structured form, while tacit information is inconsistent and fuzzy to explain.
Data is known to be crude information and not knowledge by itself. The sequence from data to knowledge is: from Data to Information, from Information to Facts, and finally, from Facts to Knowledge. Data becomes information, when it becomes relevant to your decision problem. Information becomes fact, when the data can support it. Facts are what the data reveals. However the decisive instrumental knowledge is expressed together with some statistical degree of confidence.
Fact becomes knowledge, when it is used in the successful completion of a decision process. Knowledge needs wisdom. Wisdom is the power to put our time and our knowledge to the proper use. Once you have a massive amount of facts integrated as knowledge, then your mind will be superhuman in the same sense that mankind with writing is superhuman compared to mankind before writing. The following figure illustrates the statistical thinking process based on data in constructing statistical models for decision making under uncertainties.
The above figure depicts the fact that as the exactness of a statistical model increases, the level of improvements in decision-making increases. That's why we need Business Statistics. Statistics arose from the need to place knowledge on a systematic evidence base. This required a study of the laws of probability, the development of measures of data properties and relationships, and so on.
Statistical inference aims at determining whether any statistical significance can be attached that results after due allowance is made for any random variation as a source of error. Intelligent and critical inferences cannot be made by those who do not understand the purpose, the conditions, and applicability of the various techniques for judging significance.
Statistical Decision-Making Process
- Unlike the deterministic decision-making process, in decision making under pure uncertainty, the variables are often more numerous and more difficult to measure and control. However, the steps are the same. They are:
Fortunately the probabilistic and statistical methods for analysis and decision making under uncertainty are more numerous and powerful today than ever before. The computer makes possible many practical applications. A few examples of business applications are the following:
Questions Concerning Statistical the Decision-Making Process:
Corfield D., and J. Williamson, Foundations of Bayesianism, Kluwer Academic Publishers, 2001. Contains Logic, Mathematics, Decision Theory, and Criticisms of Bayesianism.
Lapin L., Statistics for Modern Business Decisions, Harcourt Brace Jovanovich, 1987.
Pratt J., H. Raiffa, and R. Schlaifer, Introduction to Statistical Decision Theory, The MIT Press, 1994.
What is Business Statistics?
The main objective of Business Statistics is to make inferences (e.g., prediction, making decisions) about certain characteristics of a population based on information contained in a random sample from the entire population. The condition for randomness is essential to make sure the sample is representative of the population.
Business Statistics is the science of good' decision making in the face of uncertainty and is used in many disciplines, such as financial analysis, econometrics, auditing, production and operations, and marketing research. It provides knowledge and skills to interpret and use statistical techniques in a variety of business applications. A typical Business Statistics course is intended for business majors, and covers statistical study, descriptive statistics (collection, description, analysis, and summary of data), probability, and the binomial and normal distributions, test of hypotheses and confidence intervals, linear regression, and correlation.
Statistics is a science of making decisions with respect to the characteristics of a group of persons or objects on the basis of numerical information obtained from a randomly selected sample of the group. Statisticians refer to this numerical observation as realization of a random sample. However, notice that one cannot see a random sample. A random sample is only a sample of a finite outcomes of a random process.
At the planning stage of a statistical investigation, the question of sample size (n) is critical. For example, sample size for sampling from a finite population of size N, is set at: N½+1, rounded up to the nearest integer. Clearly, a larger sample provides more relevant information, and as a result a more accurate estimation and better statistical judgement regarding test of hypotheses.
Activities Associated with the General Statistical Thinking
Click on the image to enlarge it
The above figure illustrates the idea of statistical inference from a random sample about the population. It also provides estimation for the population's parameters; namely the expected value µx, the standard deviation, and the cumulative distribution function (cdf) Fx, s and their corresponding sample statistics, mean , sample standard deviation Sx, and empirical cumulative distribution function (cdf), respectively.
The major task of statistics is to study the characteristics of populations whether these populations are people, objects, or collections of information. For two major reasons, it is often impossible to study an entire population:
The process would be too expensive or too time-consuming. The process would be destructive.
In either case, we would resort to looking at a sample chosen from the population and trying to infer information about the entire population by only examining the smaller sample. Very often the numbers which interest us most about the population are the mean m and standard deviation s. Any number -- like the mean or standard deviation -- which is calculated from an entire population, is called a Parameter. If the very same numbers are derived only from the data of a sample, then the resulting numbers are called Statistics. Frequently, Greek letters represent parameters and Latin letters represent statistics (as shown in the above Figure).
Statistics is a tool that enables us to impose order on the disorganized cacophony of the real world of modern society. The business world has grown both in size and competition. Corporate executive must take risk in business, hence the need for business statistics.
Business statistics has grown with the art of constructing charts and tables! It is a science of basing decisions on numerical data in the face of uncertainty.
Business statistics is a scientific approach to decision making under risk. In practicing business statistics, we search for an insight, not the solution. Our search is for the one solution that meets all the business's needs with the lowest level of risk. Business statistics can take a normal business situation, and with the proper data gathering, analysis, and re-search for a solution, turn it into an opportunity.
While business statistics cannot replace the knowledge and experience of the decision maker, it is a valuable tool that the manager can employ to assist in the decision making process in order to reduce the inherent risk.
Business Statistics provides justifiable answers to the following concerns for every consumer and producer:
- What is your or your customer's, Expectation of the product/service you sell or that your customer buys? That is, what is a good estimate for m ?
- Given the information about your, or your customer's, expectation, what is the Quality of the product/service you sell or that you customer buys. That is, what is a good estimate for s ?
- Given the information about your or your customer's expectation, and the quality of the product/service you sell or you customer buy, how does the product/service compare with other existing similar types? That is, comparing several m 's, and several s 's .
Common Statistical Terminology with Applications
- Like all profession, also statisticians have their own keywords and phrases to ease a precise communication. However, one must interpret the results of any decision making in a language that is easy for the decision-maker to understand. Otherwise, he/she does not believe in what you recommend, and therefore does not go into the implementation phase. This lack of communication between statisticians and the managers is the major roadblock for using statistics.
Population: A population is any entire collection of people, animals, plants or things on which we may collect data. It is the entire group of interest, which we wish to describe or about which we wish to draw conclusions. In the above figure the life of the light bulbs manufactured say by GE, is the concerned population.
Qualitative and Quantitative Variables: Any object or event, which can vary in successive observations either in quantity or quality is called a "variable." Variables are classified accordingly as quantitative or qualitative. A qualitative variable, unlike a quantitative variable does not vary in magnitude in successive observations. The values of quantitative and qualitative variables are called "Variates" and "Attributes", respectively.
Variable: A characteristic or phenomenon, which may take different values, such as weight, gender since they are different from individual to individual.
Randomness: Randomness means unpredictability. The fascinating fact about inferential statistics is that, although each random observation may not be predictable when taken alone, collectively they follow a predictable pattern called its distribution function. For example, it is a fact that the distribution of a sample average follows a normal distribution for sample size over 30. In other words, an extreme value of the sample mean is less likely than an extreme value of a few raw data.
Sample: A subset of a population or universe.
Statistical Experiment: An experiment in general is an operation in which one chooses the values of some variables and measures the values of other variables, as in physics. A statistical experiment, in contrast is an operation in which one take a random sample from a population and infers the values of some variables. For example, in a survey, we "survey" i.e. "look at" the situation without aiming to change it, such as in a survey of political opinions. A random sample from the relevant population provides information about the voting intentions.
In order to make any generalization about a population, a random sample from the entire population; that is meant to be representative of the population, is often studied. For each population, there are many possible samples. A sample statistic gives information about a corresponding population parameter. For example, the sample mean for a set of data would give information about the overall population mean m .
It is important that the investigator carefully and completely defines the population before collecting the sample, including a description of the members to be included.
Example: The population for a study of infant health might be all children born in the U.S.A. in the 1980's. The sample might be all babies born on 7th of May in any of the years.
An experiment is any process or study which results in the collection of data, the outcome of which is unknown. In statistics, the term is usually restricted to situations in which the researcher has control over some of the conditions under which the experiment takes place.
Example: Before introducing a new drug treatment to reduce high blood pressure, the manufacturer carries out an experiment to compare the effectiveness of the new drug with that of one currently prescribed. Newly diagnosed subjects are recruited from a group of local general practices. Half of them are chosen at random to receive the new drug, the remainder receives the present one. So, the researcher has control over the subjects recruited and the way in which they are allocated to treatment.
Design of experiments is a key tool for increasing the rate of acquiring new knowledge. Knowledge in turn can be used to gain competitive advantage, shorten the product development cycle, and produce new products and processes which will meet and exceed your customer's expectations.
Primary data and Secondary data sets: If the data are from a planned experiment relevant to the objective(s) of the statistical investigation, collected by the analyst, it is called a Primary Data set. However, if some condensed records are given to the analyst, it is called a Secondary Data set.
Random Variable: A random variable is a real function (yes, it is called " variable", but in reality it is a function) that assigns a numerical value to each simple event. For example, in sampling for quality control an item could be defective or non-defective, therefore, one may assign X=1, and X = 0 for a defective and non-defective item, respectively. You may assign any other two distinct real numbers, as you wish; however, non-negative integer random variables are easy to work with. Random variables are needed since one cannot do arithmetic operations on words; the random variable enables us to compute statistics, such as average and variance. Any random variable has a distribution of probabilities associated with it.
Probability: Probability (i.e., probing for the unknown) is the tool used for anticipating what the distribution of data should look like under a given model. Random phenomena are not haphazard: they display an order that emerges only in the long run and is described by a distribution. The mathematical description of variation is central to statistics. The probability required for statistical inference is not primarily axiomatic or combinatorial, but is oriented toward describing data distributions.
Sampling Unit: A unit is a person, animal, plant or thing which is actually studied by a researcher; the basic objects upon which the study or experiment is executed. For example, a person; a sample of soil; a pot of seedlings; a zip code area; a doctor's practice.
Parameter: A parameter is an unknown value, and therefore it has to be estimated. Parameters are used to represent a certain population characteristic. For example, the population mean m is a parameter that is often used to indicate the average value of a quantity.
Within a population, a parameter is a fixed value that does not vary. Each sample drawn from the population has its own value of any statistic that is used to estimate this parameter. For example, the mean of the data in a sample is used to give information about the overall mean min the population from which that sample was drawn.
Statistic: A statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population. For example, the average of the data in a sample is used to give information about the overall average in the population from which that sample was drawn.
A statistic is a function of an observable random sample. It is therefore an observable random variable. Notice that, while a statistic is a "function" of observations, unfortunately, it is commonly called a random "variable" not a function.
It is possible to draw more than one sample from the same population, and the value of a statistic will in general vary from sample to sample. For example, the average value in a sample is a statistic. The average values in more than one sample, drawn from the same population, will not necessarily be equal.
Statistics are often assigned Roman letters (e.g. and s), whereas the equivalent unknown values in the population (parameters ) are assigned Greek letters (e.g., µ, s).
The word estimate means to esteem, that is giving a value to something. A statistical estimate is an indication of the value of an unknown quantity based on observed data.
More formally, an estimate is the particular value of an estimator that is obtained from a particular sample of data and used to indicate the value of a parameter.
Example: Suppose the manager of a shop wanted to know m , the mean expenditure of customers in her shop in the last year. She could calculate the average expenditure of the hundreds (or perhaps thousands) of customers who bought goods in her shop; that is, the population mean m . Instead she could use an estimate of this population mean m by calculating the mean of a representative sample of customers. If this value were found to be $25, then $25 would be her estimate.
Descriptive Statistics: The numerical statistical data should be presented clearly, concisely, and in such a way that the decision maker can quickly obtain the essential characteristics of the data in order to incorporate them into decision process.
The principal descriptive quantity derived from sample data is the mean (), which is the arithmetic average of the sample data. It serves as the most reliable single measure of the value of a typical member of the sample. If the sample contains a few values that are so large or so small that they have an exaggerated effect on the value of the mean, the sample is more accurately represented by the median -- the value where half the sample values fall below and half above.
The quantities most commonly used to measure the dispersion of the values about their mean are the variance s2 and its square root , the standard deviation s. The variance is calculated by determining the mean, subtracting it from each of the sample values (yielding the deviation of the samples), and then averaging the squares of these deviations. The mean and standard deviation of the sample are used as estimates of the corresponding characteristics of the entire group from which the sample was drawn. They do not, in general, completely describe the distribution (Fx) of values within either the sample or the parent group; indeed, different distributions may have the same mean and standard deviation. They do, however, provide a complete description of the normal distribution, in which positive and negative deviations from the mean are equally common, and small deviations are much more common than large ones. For a normally distributed set of values, a graph showing the dependence of the frequency of the deviations upon their magnitudes is a bell-shaped curve. About 68 percent of the values will differ from the mean by less than the standard deviation, and almost 100 percent will differ by less than three times the standard deviation.
Inferential Statistics: Inferential statistics is concerned with making inferences from samples about the populations from which they have been drawn. In other words, if we find a difference between two samples, we would like to know, is this a "real" difference (i.e., is it present in the population) or just a "chance" difference (i.e. it could just be the result of random sampling error). That's what tests of statistical significance are all about. Any inferred conclusion from a sample data to the population from which the sample is drawn must be expressed in a probabilistic term. Probability is the language and a measuring tool for uncertainty in our statistical conclusions.
Inferential statistics could be used for explaining a phenomenon or checking for validity of a claim. In these instances, inferential statistics is called Exploratory Data Analysis or Confirmatory Data Analysis, respectively.
Statistical Inference: Statistical inference refers to extending your knowledge obtained from a random sample from the entire population to the whole population. This is known in mathematics as Inductive Reasoning, that is, knowledge of the whole from a particular. Its main application is in hypotheses testing about a given population. Statistical inference guides the selection of appropriate statistical models. Models and data interact in statistical work. Inference from data can be thought of as the process of selecting a reasonable model, including a statement in probability language of how confident one can be about the selection.
Normal Distribution Condition: The normal or Gaussian distribution is a continuous symmetric distribution that follows the familiar bell-shaped curve. One of its nice features is that, the mean and variance uniquely and independently determines the distribution. It has been noted empirically that many measurement variables have distributions that are at least approximately normal. Even when a distribution is non-normal, the distribution of the mean of many independent observations from the same distribution becomes arbitrarily close to a normal distribution, as the number of observations grows large. Many frequently used statistical tests make the condition that the data come from a normal distribution.
Estimation and Hypothesis Testing:Inference in statistics are of two types. The first is estimation, which involves the determination, with a possible error due to sampling, of the unknown value of a population characteristic, such as the proportion having a specific attribute or the average value m of some numerical measurement. To express the accuracy of the estimates of population characteristics, one must also compute the standard errors of the estimates. The second type of inference is hypothesis testing. It involves the definitions of a hypothesis as one set of possible population values and an alternative, a different set. There are many statistical procedures for determining, on the basis of a sample, whether the true population characteristic belongs to the set of values in the hypothesis or the alternative.
Statistical inference is grounded in probability, idealized concepts of the group under study, called the population, and the sample. The statistician may view the population as a set of balls from which the sample is selected at random, that is, in such a way that each ball has the same chance as every other one for inclusion in the sample.
Notice that to be able to estimate the population parameters, the sample size n must be greater than one. For example, with a sample size of one, the variation (s2) within the sample is 0/1 = 0. An estimate for the variation (s2) within the population would be 0/0, which is indeterminate quantity, meaning impossible.