# Environmental Monitoring If the number of observations is large, frequency histograms of many biological variables will have similar shapes ( they will be 'bell shaped' ). Many, but not all, bell shaped frequency distributions are examples of a special type of distribution known as the Normal or Gaussian Distribution.

A normal distribution can be characterized by two parameters, the mean and the standard deviation (see most statistics textbooks for the equation describing a normal distribution). Compare this with the binomial distribution which is also controlled by two parameters: the number of trials and the probability of a success (n and p) and the Poisson distribution which is controlled by a single parameter, the mean number of events.

Although populations may be normally distributed this is not always the case for samples, particularly small ones. The term 'normal' has some unfortunate connotations. It should not be taken as an implication that all other frequency distributions are abnormal.

The normal distribution has a central role in many statistical tests because the tests assume that data are normally distributed. If this assumption is not met many significance tests become unreliable. Part of the mystique surrounding the normal distribution is that measures derived from samples obtained from obviously non-normal populations will themselves be normally distributed. This relationship is formalised by the central limit theorem.

The reason why many biological variables are normally distributed is that their values are a consequence of multifactorial processes, i.e. they have many causes. If these causes are additive it is almost inevitable that the values will follow a normal distribution. Consider the following rather simplified scenario. The height of a particular organism is due mainly to 5 non-allelic genes, each of which has a dominant (tall) and recessive (short) allele. Since the genes are non-allelic we can assume that their effects would be additive. The possible genotypes and their associated phenotypes are shown below. In this example the heterozygote phenotype is assumed to be identical to that of the homozygote dominant, thus we do not need to differentiate between homozygote dominant and heterozygote individuals.

 Genotype Height Number of Combinations (using nCr) 'Tall' genes 'Short' genes 0 5 0 1 1 4 1 5 2 3 2 10 3 2 3 10 4 1 4 5 5 0 5 1

From this simple example we can see that most individuals would be of average height, few would be very short or very tall. Indeed, if enough heights were obtained they would follow a normal distribution.

An understanding of the characteristics of a normal curve is central to your understanding of the whole of parametric statistics. Some examples of normal curves are shown below. 3 distributions in which µ = 0 but which have different standard deviations. Note how increasing the standard deviation 'flattens' the curve and makes it more likely that observations will be further from the mean. Since they have identical means the 3 curves all have their maximum height at the same value (µ).

Three distributions in which the standard deviations are identical (s = 1) but which differ in their means (m = 2, 4 and 6). This time the curves have the same profiles. Indeed if the curves could be 'slid' along the x axis they would overlap perfectly. Changing the mean has not affected the variability it has just affected the position of the curve along the x axis. The normal distribution is very important because it allows to answer probability questions about many important biological variables. Consider the following example. A normal population of 1000 body weights has a mean of 70 g. This means that half of the weights will be less than 70 g while the other half or more. This is obvious because the distribution is symmetrical about the mean. However, suppose we wish to know what proportion of the population is heavier than 80 g. We can determine the answer to this question if we also know the standard deviation of the population from the observation was obtained (This is because a normal distribution is controlled by two parameters, µ and s).

The method involves converting our normal distribution to a standard form in which µ = 0 and s = 1.

This is relatively simple since any set of values can be made to have a mean of 0. If we calculate their original mean and then subtract it from each of the observations: e.g. the mean of 3, 4 & 5 is 4, if we now subtract 4 from each observation we have -1, 0 & 1 : the mean of these is 0. Similarly we can make the standard deviation equal to 1 if we divide by the original standard deviation. If we do this to any set of data we convert from the original units to something called Z scores. The reason for doing all of this is that we will then be able to use one set of standard statistical tables for all populations irrespective of the original values of µ and s.

Z = (xi - µ) / s

Z is known as a normal deviate (sometimes symbolised by D). The value of Z can be used to determine the proportion of the population which has a value equal to or greater than the value of x (see diagram above). In the body weight example if we assume that s = 10 then

Z = (80 - 70) / 10 = 1.0

From tables of standard normal deviates we find that Z = 1.0 corresponds to a value of 0.1587. This means that 15.87% of the population weighs more than 80g.

This page, with acknowledgement, from a web site on univariant statistics by Dr Alan Fielding BSc MSc PhD FLS FHEA, Senior Learning and Teaching Fellow, School of Biology, Chemistry and Health Science, Manchester Metropolitan University. Alan has a new site with information on monitoring and statistics. He may be contacted at alan@alanfielding.co.uk or via his web page.   Hosted by Keysoft Pty Ltd 