Descriptive statistics deal with the
characteristics of the available data. Inferential statistics are concerned with
extending any conclusions beyond the immediately available data. For example, if
in our sample of cases we find a significant relationship between the incidence
of lung cancer and the number of cigarettes smoked we would hope to demonstrate
that this relationship is more general and applicable to a much wider
population. In general this is achieved by the application of statistical tests
of significance. The identity of the statistical test depends on the nature of
the data and the questions to be answered.
Not all statisticians support the use of significance tests.
Many suggest that we obtain much more useful, and less misleading, information
from confidence limits.
The methodology used is common to all statistical tests, all
that differs is the identity of the test statistic. You will need to think
clearly and carefully about the following. It is important that you understand
the logic underlying significance tests.
We begin by specifying a pair of hypotheses.
- Null Hypothesis (Ho) :- Any variation / difference
/ relationship is due to chance.
- Alternative Hypothesis
(H1) :- At least some of the variation /
difference / relationship is a consequence of some process under test.
|Ho : There is no
significant difference in hours of sleep between insomniacs treated with a
new drug and those given a placebo. |
|H1 : There is a significant
difference in hours of sleep between insomniacs treated with a new drug
and those given a placebo. |
Note the use of the term 'no significant
difference', it is used to indicate that although a difference may be
observed it is not a meaningful one. This could mean that:
the differences have arisen by chance (experimental error) rather than as a
consequence of our experimental treatments; or thatthe effect was so small that we do not consider it to be biologically important.
We begin our analysis by assuming that the Ho is
true. This is analogous to the presumption of innocence if you are committed
for trial. If the Ho is
true, what is the biggest difference between the
sample means that can occur by chance at a reasonable level of probability? In
other words, how different can we expect means to become simply as a result of
If the observed difference is less than this chance amount
we have insufficient evidence for an effect since the difference could have
arisen by chance. In such circumstances we must conclude that there is no
significant difference between the two mean values. Again this analogous to
finding someone innocence if there was insufficient evidence of
Many people find the concept of statistical significance
rather confusing. Often it is a matter of phrasing, for example Michael Wood
from the AMS Department, University of Portsmouth, suggested that it may be
better if a statistically significant result was described as "surprising if the
null hypothesis is true". Another difficulty is that statistical significance
does not automatically imply practical significance. For example, using a large
enough sample size we may be able to demonstrate
that a particular drug decreases blood pressure by 1%. Although this is a real
drop in blood pressure it is of no practical value.
It is not possible to define an absolute value for
the effect of sampling variation but we can define an effect size that has a
specified probability of occurring. The specified probability is called the
level of significance and is symbolised by alpha, . We need to assign a
value to alpha before we do any analysis otherwise we could still be subjective.
Alpha may be thought of as our definition of 'improbability', anything with a
probability <= to alpha is deemed to be improbable and therefore is unlikely
to occur by chance. Any event whose probability is > alpha is said to be
To understand how we arrive at our value for alpha we must
first examine the logic of the analysis. There are 5 steps to perform.
||Define Ho and Ha.
||Assign a value to alpha.
||Calculate a value for the
appropriate test statistic (for example, Student's t ). |
||Using our calculated value
from 3, look up in tables of test statistics the probability (P) of
obtaining our 'experimental' data. |
||If P is <= alpha we can
reject Ho and accept Ha.
If P is > alpha we cannot reject Ho.
According to the rules of significance testing we must
establish a strict criterion for rejection of the null hypothesis. If P is true
is less than alpha we can reject Ho and accept Ha. If we reject Ho we are
reasonably confident that a real difference exists, i.e.there is evidence beyond
'reasonable doubt to presume guilt'.
|Note that this automated process of 'accept/reject'
is one of the major criticisms that some statisticians make about the
current use of significance testing in scientific research - Fisherian
versus Neyman-Pearson Decision-Theoretic Significance Testing versus
Bayesian methods versus confidence
Because we are working with probabilities we can never be
certain that we have reached the correct conclusion! (as in a court case).
Suppose we set alpha to 0.05 (5%) we will reject Ho if P is less than 5%.
Suppose that from our calculations we find that P =0.04 (4%), i.e. there is 4%
chance of obtaining our results if Ho is true. According to our rules we will
reject Ho since P < alpha. But, an event that has a
4% chance of occurring should occur, on average, 1 in 25 times. Therefore we may
have falsely rejected Ho because an event with a 4% probability has occurred.
Indeed we expect to make such a mistake 1 in 20 times (0.05 is 5% or 1 in 20).
Your response to this obvious problem may be to set alpha to
some very low value such as 0.001 (0.1%) to overcome the problem of rejecting a
true Ho. However, now we have the opposite problem, we may fail to reject Ho
even though it is false!
Falsely rejecting a true Ho is called a
TYPE I ERROR (finding an innocent person
guilty). The probability of committing a type I error is always equal to alpha.
Failure to reject a false Ho is called a TYPE II ERROR (finding a guilty
person innocent). It is more difficult to calculate the probability of
committing a type II error as it depends upon the power of the test.
When we write down a Ho it is either a true statement or a
false statement. The purpose of the statistical test is to decide between these
two alternatives. The outcome of our statistical test will be:
- rejection of Ho or
- failure to reject Ho (this is not the same as acceptance,
i.e. proof, of Ho)..
We can construct a table that shows the possible
relationships between the Ho and the statistical test outcome. The relationships
between these errors and the null hypothesis are shown below. If we reject a
true Ho we have made a mistake, but we can also make a mistake by 'accepting' a
false Ha. ß is used to indicate the probability of making a Type II error. 1-ß
is a measure of the power of a test.
||Ho is true
||Ho is false
||Type 1 Error
|Do not reject
We need a compromise value for alpha which takes into
account the chances of making type I and II errors. For most biological purposes
0.05 (5%) is used. In certain circumstances it may be possible to attach 'costs'
to the two types of error, for example when a type I error could result in
damage to patients. If it is possible to cost the two errors the value of alpha
can be adjusted to take account of this information. In the above example we may
wish to reduce alpha to minimise the possibility of damaging patients.
In many parts of the world the arrival of rats on islands
can be disastrous for ground nesting birds. The Shiant islands in the Outer
Hebrides support the largest population of the ship rat Rattus rattus
remaining in the United Kingdom. The islands also support 60,000+ pairs of
breeding puffin Fratecula arctica. It has been suggested that the rats
are responsible for a decline in the puffin on the Shiants. Suppose we decide to
test this theory by experimentation. We begin by setting up two hypotheses:
- Ho : the rats have no significant effect on puffin
- H1 : the rats have a significant effect on
Depending on the outcome of our experiment a rat eradication
programme will be started. What are the consequences of type I and Type II
errors in this situation?
If Ho is true the rats
have no signicant effect on the puffin.
- A Type I error means that we will begin a costly
eradication of the United Kingdom's most significant population of this mammal
even though it is doing no harm.
If Ho is false the
rats are having an effect on the puffin.
- A Type II error means that we will do nothing to the rats
and the puffin decline will continue.
Which of these mistakes would you rather make? Note that
your conclusion will not be independent of the sample size that you use. This is
because the power of the analysis is related to the
sample size. If the sample size is small you may only be able to detect a large
effect, whereas a very large sample would detect a very small effect (a real but
possibly inconsequential effect). So perhaps we should begin with the question
'what size of effect do you wish to detect?'.
The logic behind all statistical tests is that we define a
pair of complementary hypotheses, the first of which is called the Null
Hypothesis (Ho). The Ho states that any differences are due to chance and are
not due to any real or measurable effects. We calculate a value for an index
(the test statistic) that enables us to determine, from statistical tables, the
probability (P) of getting results that are as least as extreme as those
observed, given that Ho is true. If P is less than a predefined value for alpha
we reject Ho and accept Ha as the more likely explanation. If P is greater than
alpha we have insufficient evidence to reject Ho. Note that if we did not
predefine alpha we could assign any value we wished to give us the conclusion we
- A significance test can never prove a null hypothesis, it
only provides support for the validity of the alternative hypothesis (this is
analogous to the Popperian approach to science).
- A very small p value (e.g. 0.0001) does not
signify a large effect - it signifies that the observed data are highly
improbable given the null hypothesis. A very small p value can arise
when an effect is tiny but the sample sizes are large, conversely a larger
p value can arise when the effect is large but the sample size is
small. In other words the magnitude of p is at least partly dependent
on the power of the test.
This page, with acknowledgement, from a web site on univariant statistics by Dr Alan Fielding BSc MSc PhD FLS FHEA, Senior Learning and Teaching Fellow, School of Biology, Chemistry and Health Science, Manchester Metropolitan University. Alan has a new site with information on monitoring and statistics. He may be contacted at firstname.lastname@example.org or via his web page.