Environmental Monitoring

Environmental Monitoring

Descriptive statistics deal with the characteristics of the available data. Inferential statistics are concerned with extending any conclusions beyond the immediately available data. For example, if in our sample of cases we find a significant relationship between the incidence of lung cancer and the number of cigarettes smoked we would hope to demonstrate that this relationship is more general and applicable to a much wider population. In general this is achieved by the application of statistical tests of significance. The identity of the statistical test depends on the nature of the data and the questions to be answered.

Not all statisticians support the use of significance tests. Many suggest that we obtain much more useful, and less misleading, information from confidence limits.

The methodology used is common to all statistical tests, all that differs is the identity of the test statistic. You will need to think clearly and carefully about the following. It is important that you understand the logic underlying significance tests.

We begin by specifying a pair of hypotheses.

  1. Null Hypothesis (Ho) :- Any variation / difference / relationship is due to chance.
  2. Alternative Hypothesis (H1) :- At least some of the variation / difference / relationship is a consequence of some process under test.

For example:

Ho : There is no significant difference in hours of sleep between insomniacs treated with a new drug and those given a placebo.
H1 : There is a significant difference in hours of sleep between insomniacs treated with a new drug and those given a placebo.

Note the use of the term 'no significant difference', it is used to indicate that although a difference may be observed it is not a meaningful one. This could mean that:

  • the differences have arisen by chance (experimental error) rather than as a consequence of our experimental treatments; or that
  • the effect was so small that we do not consider it to be biologically important.

We begin our analysis by assuming that the Ho is true. This is analogous to the presumption of innocence if you are committed for trial. If the Ho is true, what is the biggest difference between the sample means that can occur by chance at a reasonable level of probability? In other words, how different can we expect means to become simply as a result of chance?

If the observed difference is less than this chance amount we have insufficient evidence for an effect since the difference could have arisen by chance. In such circumstances we must conclude that there is no significant difference between the two mean values. Again this analogous to finding someone innocence if there was insufficient evidence of guilt.

Many people find the concept of statistical significance rather confusing. Often it is a matter of phrasing, for example Michael Wood from the AMS Department, University of Portsmouth, suggested that it may be better if a statistically significant result was described as "surprising if the null hypothesis is true". Another difficulty is that statistical significance does not automatically imply practical significance. For example, using a large enough sample size we may be able to demonstrate that a particular drug decreases blood pressure by 1%. Although this is a real drop in blood pressure it is of no practical value.

It is not possible to define an absolute value for the effect of sampling variation but we can define an effect size that has a specified probability of occurring. The specified probability is called the level of significance and is symbolised by alpha, . We need to assign a value to alpha before we do any analysis otherwise we could still be subjective. Alpha may be thought of as our definition of 'improbability', anything with a probability <= to alpha is deemed to be improbable and therefore is unlikely to occur by chance. Any event whose probability is > alpha is said to be probable.

To understand how we arrive at our value for alpha we must first examine the logic of the analysis. There are 5 steps to perform.

1 Define Ho and Ha.
2 Assign a value to alpha.
3 Calculate a value for the appropriate test statistic (for example, Student's t ).
4 Using our calculated value from 3, look up in tables of test statistics the probability (P) of obtaining our 'experimental' data.
5 If P is <= alpha we can reject Ho and accept Ha.

If P is > alpha we cannot reject Ho.

According to the rules of significance testing we must establish a strict criterion for rejection of the null hypothesis. If P is true is less than alpha we can reject Ho and accept Ha. If we reject Ho we are reasonably confident that a real difference exists, i.e.there is evidence beyond 'reasonable doubt to presume guilt'.

Note that this automated process of 'accept/reject' is one of the major criticisms that some statisticians make about the current use of significance testing in scientific research - Fisherian versus Neyman-Pearson Decision-Theoretic Significance Testing versus Bayesian methods versus confidence limits).

Because we are working with probabilities we can never be certain that we have reached the correct conclusion! (as in a court case). Suppose we set alpha to 0.05 (5%) we will reject Ho if P is less than 5%. Suppose that from our calculations we find that P =0.04 (4%), i.e. there is 4% chance of obtaining our results if Ho is true. According to our rules we will reject Ho since P < alpha. But, an event that has a 4% chance of occurring should occur, on average, 1 in 25 times. Therefore we may have falsely rejected Ho because an event with a 4% probability has occurred. Indeed we expect to make such a mistake 1 in 20 times (0.05 is 5% or 1 in 20).

Your response to this obvious problem may be to set alpha to some very low value such as 0.001 (0.1%) to overcome the problem of rejecting a true Ho. However, now we have the opposite problem, we may fail to reject Ho even though it is false!

Falsely rejecting a true Ho is called a TYPE I ERROR (finding an innocent person guilty). The probability of committing a type I error is always equal to alpha. Failure to reject a false Ho is called a TYPE II ERROR (finding a guilty person innocent). It is more difficult to calculate the probability of committing a type II error as it depends upon the power of the test.

When we write down a Ho it is either a true statement or a false statement. The purpose of the statistical test is to decide between these two alternatives. The outcome of our statistical test will be:

  • rejection of Ho or
  • failure to reject Ho (this is not the same as acceptance, i.e. proof, of Ho)..

We can construct a table that shows the possible relationships between the Ho and the statistical test outcome. The relationships between these errors and the null hypothesis are shown below. If we reject a true Ho we have made a mistake, but we can also make a mistake by 'accepting' a false Ha. ß is used to indicate the probability of making a Type II error. 1-ß is a measure of the power of a test.

  Ho is true Ho is false
Reject Ho Type 1 Error Valid
Do not reject Ho Valid Type II Error

We need a compromise value for alpha which takes into account the chances of making type I and II errors. For most biological purposes 0.05 (5%) is used. In certain circumstances it may be possible to attach 'costs' to the two types of error, for example when a type I error could result in damage to patients. If it is possible to cost the two errors the value of alpha can be adjusted to take account of this information. In the above example we may wish to reduce alpha to minimise the possibility of damaging patients.

An example

In many parts of the world the arrival of rats on islands can be disastrous for ground nesting birds. The Shiant islands in the Outer Hebrides support the largest population of the ship rat Rattus rattus remaining in the United Kingdom. The islands also support 60,000+ pairs of breeding puffin Fratecula arctica. It has been suggested that the rats are responsible for a decline in the puffin on the Shiants. Suppose we decide to test this theory by experimentation. We begin by setting up two hypotheses:

  • Ho : the rats have no significant effect on puffin numbers
  • H1 : the rats have a significant effect on puffin numbers.

Depending on the outcome of our experiment a rat eradication programme will be started. What are the consequences of type I and Type II errors in this situation?

If Ho is true the rats have no signicant effect on the puffin.

  • A Type I error means that we will begin a costly eradication of the United Kingdom's most significant population of this mammal even though it is doing no harm.

If Ho is false the rats are having an effect on the puffin.

  • A Type II error means that we will do nothing to the rats and the puffin decline will continue.

Which of these mistakes would you rather make? Note that your conclusion will not be independent of the sample size that you use. This is because the power of the analysis is related to the sample size. If the sample size is small you may only be able to detect a large effect, whereas a very large sample would detect a very small effect (a real but possibly inconsequential effect). So perhaps we should begin with the question 'what size of effect do you wish to detect?'.

The logic behind all statistical tests is that we define a pair of complementary hypotheses, the first of which is called the Null Hypothesis (Ho). The Ho states that any differences are due to chance and are not due to any real or measurable effects. We calculate a value for an index (the test statistic) that enables us to determine, from statistical tables, the probability (P) of getting results that are as least as extreme as those observed, given that Ho is true. If P is less than a predefined value for alpha we reject Ho and accept Ha as the more likely explanation. If P is greater than alpha we have insufficient evidence to reject Ho. Note that if we did not predefine alpha we could assign any value we wished to give us the conclusion we require!


  • A significance test can never prove a null hypothesis, it only provides support for the validity of the alternative hypothesis (this is analogous to the Popperian approach to science).
  • A very small p value (e.g. 0.0001) does not signify a large effect - it signifies that the observed data are highly improbable given the null hypothesis. A very small p value can arise when an effect is tiny but the sample sizes are large, conversely a larger p value can arise when the effect is large but the sample size is small. In other words the magnitude of p is at least partly dependent on the power of the test.

This page, with acknowledgement, from a web site on univariant statistics by Dr Alan Fielding BSc MSc PhD FLS FHEA, Senior Learning and Teaching Fellow, School of Biology, Chemistry and Health Science, Manchester Metropolitan University. Alan has a new site with information on monitoring and statistics. He may be contacted at alan@alanfielding.co.uk or via his web page.


Hosted by Keysoft Pty Ltd