Environmental Monitoring

Environmental Monitoring
   


We all use the word, or at least the concept of, probability in everyday speech. For example, "it will probably rain tomorrow", "the Swans will probably win", "he (she) will probably get drunk again". If we make a bet we are making a judgment about relative probabilities. When used in a statistical context the word has a more formal description. A probability provides a quantitative description of the likely occurrence of a particular event. It is usually expressed on a scale from 0 to 1 (proportion rather than percentage). An improbable event has a probability close to 0, a very common event has a probability close to 1.

A subjective probability describes an individual's opinion about how likely a particular event is to occur. It is not based on any precise computation but is often a reasonable assessment by a knowledgeable person. Indeed this kind of information, especially when derived from experts, is used in many expert systems (a computer based artificial intelligence technique). Most statistical techniques require a more robust estimate of a probability which is often based on a large sample.

David Grimmett asked the following question on one of the statistics mailing lists….

If a weatherman has historically been 80% accurate in his forecasts, and now forecasts an 80% chance of rain for tomorrow, what is the true chance of rain?

See below for an answer

Statisticians make use of a particular set of terms to avoid confusion!

  • An outcome is what we observe or measure in any situation where the outcome is subject to some uncertainty, i.e. a particular outcome has a certain probability of occurring. In some situations it may be possible to list all of the possible outcomes.
  • This exhaustive list is called the sample space. For example the sample space for a coin toss is (Heads, Tails), for the roll of a dice it is (1,2,3,4,5,6), heart rate is any real number within the range (0 to maximum possible heart rate).
  • Strictly an event is a collection of outcomes. Thus, it is a subset of the sample space.

The probability of more than one event

Two events are said to be independent if the occurrence of one of them provides no information about whether or not the other event will occur. For example, if we know a person's gender we gain no extra information about whether they will be left or right handed. The same is not true for colour blindness since colour blindness is more common in males. In probability theory we say that two events, A and B, are independent if the probability that they both occur is equal to the product of the probabilities of the two individual events.

If two events are independent then they cannot be mutually exclusive (disjoint) and vice versa. In statistical notation this is written as P(A Ç B) = P(A).P(B). The symbol Ç means 'and'.

The addition rule is used to determine the probability that event A or event B occurs or both occur. However, some events are mutually exclusive (or disjoint) and they cannot occur together. For example, a person cannot be both colour sighted and colour blind. If two events are mutually exclusive, they cannot be independent and vice versa.

The formal notation for calculating probabilities is shown below.

P(A È B) = P(A) + P(B) - P(A Ç B).
where:

  • P(A) = probability that A occurs
  • P(B) = probability that B occurs
  • P(A È B) = probability that A or B occurs
  • P(A Ç B) = probability that A and B occur

For mutually exclusive events, P(A Ç B) must equal 0 so the addition rule simplifies to P(A È B) = P(A) + P(B) or the Probability that A OR B will occur is P(A) + P(B)

For independent events, P(A Ç B) = P(A).P(B) so the addition rule reduces to P(A È B) = P(A) + P(B) - P(A).P(B).

Example

Suppose we wish to find the probability of picking either a king or an ace from a pack of 52 playing cards. These are mutually exclusive events. A card cannot be both a king and an ace. We define the events A = 'pick a king' and B = 'pick an ace'. Since a card cannot be both a king and an ace P(A Ç B) = 0. So we ignore this and find.

P(A È B) = P(A) + P(B)

Hence, the probability of picking either a king or an ace is 1/13 + 1/13 = 0.154.

Suppose now we wish to find the probability of picking either a king or a diamond from a pack of 52 playing cards. This time one of the kings is a diamond so these are not mutually exclusive events. If we define the events as:

A = 'pick a king' and

B = 'pick a diamond'.

then

P(A È B) = P(A) + P(B) + P(A Ç B)

There are 4 kings and 13 diamonds in the pack, but 1 card is both a king and a diamond. So, the probability of drawing either a king or a diamond is 1/13 + 1/4 - 1/52 = 0.308.

The multiplication rule is used to determine the probability that two events, A and B, both occur simultaneously. For independent events, that is events which have no influence on one another, there is a simple rule: P(A Ç B) = P(A).P(B).

Example

What is the probability that a person selected at random will be female and left-handed? For the purposes of this illustration we will assume that 20% of the population is left-handed and that gender and handedness are independent of each other.

Let p1 = probability of being female = 0.5

Let p2 = probability of being left-handed = 0.2

Therefore the probability that a person selected at random will be female and left-handed is p1p2 = 0.5 * 0.2 = 0.10 ( 10% or approximately 1 in 10).

Conditional Probabilities (Bayes Theorem)

sometimes the probability of an event is conditional on some other event, e.g. the probability of developing lung cancer is influenced by the number of cigrattes smoked. This type of conditional probability is best dealt with using Bayes's theorem. Bayes was an English cleric who invetigated the way in which probabilties changed as more more information becomes available. If A and B are two related (dependent) events the fact that A has occured will alter the probability that B occurs. The basic form of the the theorem is.

P(A,B) = P(B|A) . P(A)

P(A) is the probability that A will occur.

P(B) (not used so far) is the probability that B occurs.

P(A,B) is the joint probability that both A and B occur.

P(B|A) is the conditional probability that B will occur if A occurs.

For example, if P(A) is the probability that someboby smokes and P(B) is the probability of developing lung cancer, then P(A,B) is the probability that a person smokes and develops lung cancer. However, P(B|A) is more useful since it gives us the probability that lung cancer will develop given the fact that someone smokes. How do we find P(B|A)?

Because A and B are related events we can also write Bayes's theorem as:

P(B|A). P(A)= P(A|B) . P(B)

which means that we can rearrange it to give:

P(B|A) = [P(A|B) . P(B)] / P(A)

This turns out to be calculation that has great practical significance. For example, using the lung cancer example:

P(A|B) is the probability that a person with lung cancer is also a smoker. We can find this information from patient records.

P(B) and P(A) can be found from general population surveys and hospital records.

Hence, we can now estimate P(B|A), the probability that a smoker will develop lung cancer.

One of the big advantages of Bayes's theorem is that we can use additional information to refine our estimate of P(B|A). Thus, as more data become available we can be more confident about the relationship.

The following simple example is taken from Gelman et al (1995).

Haemophilia is a sex-linked disorder. Thus most sufferers are male, having inherited the abnormal gene from their mother. Consider a mother whose brother is haemophilic, what is the probability that she is a carrier? In order for the brother to be a sufferer their mother must have been a carrier, thus the unconditional probability that his sister is a carrier is 0.5, i.e. P(carrier) = 0.5.

If the mother has sons how can we use this information about the sons to refine our estimate that the mother is a carrier?

Even if only one son has haemophilia the mother must be a carrier, and P(carrier) = 1 (ignoring a very small probability of a mutation).

Suppose she has two sons, both normal, what does this tell us? Presumably we begin to think that prehaps she isn't a carrier, or P(carrier) < 0.5. Bayes's theorem allows us to refine our estimate of P(carrier) a little more precisely. Let P(C+) = P(carrier) and P(C-) = P(not carrier).

P(C+ ,2 normal sons) =

P(2 normal sons|C+) . P(C+)

  P(2 normal sons|C+).P(C+) + P(2 normal sons|C-) . P(C-)

P(C+) and P(C-) are both 0.5 if the mother was a carrier.

P(2 normal sons|C+) is the probability of having 2 normal sons if the mother is a carrier. The probabilty that a carrier will have a nomal son is 0.5, there the probability of having two normal sons (normal son AND normal son) is 0.5 . 0.5 = 0.25.

However, if the mother is not a carrier then P(2 normal sons|C-) must be 1.0, i.e. a non-carrier mother must have normal sons.

Substituting the above P values gives

P(C+ ,2 normal sons) = (0.25) . (0.5)
(0.25) . (0.5) + (1.0) . (0.5)
   
P(C+ ,2 normal sons) =

0.125 / 0.625

=

0.20

Hence, we now estimate the probability that the mother is a carrier to be 0.2, given the information that she has had two normal sons.

One of the biggest advantages of Bayes's theorem is that we can use new evidence or data to refine our estimates.

Suppose the mother has another normal son. In the previous calculation we began with the assumption that P(C+) = P(C-) = 0.5, but finished with estimates of P(C+) = 0.2 and P(C-) = 0.8. Therefore, we use these new estimates, P(C+) = 0.2 and P(C-) = 0.8, to estimate P(C+|3 normal sons). For example, P(3 normal sons | C+) is (0.8 . 0.8 . 0.8) . (0.2) = (0.512) .(0.2).

P(C+|3 normal sons) = (0.512) . (0.2)
(0.512) . (0.2) + (1) . (0.8)
   

=

0.113

As we might expect the birth of a 3rd normal son increases our confidence that she is not a carrier.


Self Assessment






Possible answers to the weather man problem…

Milo Schield's reply

I'd say 68% (assuming base rate = actual rate = 50%.

Say the past history was as follows:

Suppose the forecaster says there is an 80% chance of rain. There is an 80% chance the forecaster should forecast "RAIN". There is a 20% chance the forecaster should forecast "NO RAIN" If rain is forecast, there is an 80% chance of actual rain. If no-rain is forecast, there is a 20% chance of actual rain.

Assuming independence, there is a 68% chance of rain: 64% + 4%. Note that 64% = .8 x .8 and 4% = .2 x .2.

Note that the 20% forecast of no-rain actually implies a 32% real chance of no rain….. Obviously a different table could give different values.

Don Esslemont said..

I don't think the question as formulated can be answered. What can it mean to say the the forecasts have been "80% accurate" when they took the form of statements about the probability of rain?

Derek Christie had a different view!

I agree that the 80% weather problem is not well defined but here is a possible interpretation.

The god of weather has an adjustable spinner with two sections - rain and fine. Each evening she sets the size of the rain sector to whatever she fancies (say a uniform distribution from 0% to 100%). The forecaster has to try and guess what she has set it to. He observes her from afar through binoculars. If he has a record of 80% accuracy, this means that 80% of the time he gets a clear view of the spinner and knows where it is set. The remaining 20% of the time his view is obscured and he just has to guess. The weather god now spins the pointer and presses the rain or fine button on the weather machine, whichever is appropriate.

Take 100 occasions on which the forecaster has predicted 80%. On 80 of those occasions he got a good view and the spinner was indeed set at 80%. 64 of those 80 days will have rain. On the other 20 occasions he has no idea of the spinners setting. Assuming a random setting, it will rain on 10 of those 20 days. Total - 74 days rain out of 100 or a probability of rain of 74%. ( If the actual distribution of settings is known this figure can be adjusted accordingly.)

This page(slightly modified for an Australian audience), with acknowledgement, from a web site on univariant statistics by Dr Alan Fielding BSc MSc PhD FLS FHEA, Senior Learning and Teaching Fellow, School of Biology, Chemistry and Health Science, Manchester Metropolitan University. Alan has a new site with information on monitoring and statistics. He may be contacted at alan@alanfielding.co.uk or via his web page.


   
 

Hosted by Keysoft Pty Ltd