Contents
 What is measurement?
 Why should I care about measurement theory?
 What are permissible transformations?
 What are levels of measurement?
 What about binary (0/1) variables?
 Is measurement level a fixed, immutable property of the data?
 Isn't an ordinal scale just an interval scale with error?
 What does measurement level have to do with discrete vs. continuous?
 Don't the theorems in a statistics textbook prove the validity of statistical methods
without reference to measurement theory?
 Does measurement level detemine what statistics are valid?
 But measurement level has been shown empirically to be irrelevant to statistical
results, hasn't it?
 What are some more examples of how measurement level relates to statistical
methodology?
 What's the bottom line?
Measurement theory is a branch of applied mathematics that is useful in
measurement and data analysis. The fundamental idea of measurement theory is
that measurements are not the same as the attribute being measured. Hence, if
you want to draw conclusions about the attribute, you must take into account the
nature of the correspondence between the attribute and the measurements.
The mathematical theory of measurement is elaborated in:


Krantz, D. H., Luce, R. D., Suppes, P., and Tversky, A. (1971). Foundations
of measurement. (Vol. I: Additive and polynomial representations.). New York:
Academic Press.


Suppes, P., Krantz, D. H., Luce, R. D., and Tversky, A. (1989). Foundations
of measurement. (Vol. II: Geometrical, threshold, and probabilistic
respresentations). New York: Academic Press.


Luce, R. D., Krantz, D. H., Suppes, P., and Tversky, A. (1990). Foundations
of measurement. (Vol. III: Representation, axiomatization, and invariance).
New York: Academic Press.
Measurement theory was popularized in psychology by S. S. Stevens, who
originated the idea of levels of measurement. His relevant articles include:


Stevens, S. S. (1946), On the theory of scales of measurement. Science,
103, 677680.


Stevens, S. S. (1951), Mathematics, measurement, and psychophysics. In S.
S. Stevens (ed.), Handbook of experimental psychology, pp 149). New York:
Wiley.


Stevens, S. S. (1959), Measurement. In C. W. Churchman, ed., Measurement:
Definitions and Theories, pp. 1836. New York: Wiley. Reprinted in G. M.
Maranell, ed., (1974) Scaling: A Sourcebook for Behavioral Scientists, pp.
2241. Chicago: Aldine.


Stevens, S. S. (1968), Measurement, statistics, and the schemapiric view.
Science, 161, 849856.
Measurement of some attribute of a set of things is the process of
assigning numbers or other symbols to the things in such a way that
relationships of the numbers or symbols reflect relationships of the attribute
being measured. A particular way of assigning numbers or symbols to measure
something is called a scale of measurement.
Suppose we have a collection of straight sticks of various sizes and we
assign a number to each stick by measuring its length using a ruler. If the
number assigned to one stick is greater than the number assigned to another
stick, we can conclude that the first stick is longer than the second. Thus a
relationship among the numbers (greater than) corresponds to a relationship
among the sticks (longer than). If we lay two sticks endtoend in a straight
line and measure their combined length, then the number we assign to the
concatenated sticks will equal the sum of the numbers assigned to the individual
sticks (within measurement error). Thus another relationship among the numbers
(addition) corresponds to a relationship among the sticks (concatenation).
Measurement theory helps us to avoid making meaningless statements. A typical
example of such a meaningless statement is the claim by the weatherman on the
local TV station that it was twice as warm today as yesterday because it was 40
degrees Fahrenheit today but only 20 degrees yesterday. This statement is
meaningless because one measurement (40) is twice the other measurement (20)
only in certain arbitrary scales of measurement, such as Fahrenheit. The
relationship 'twiceas' applies only to the numbers, not the attribute being
measured (temperature).
When we measure something, the resulting numbers are usually, to some degree,
arbitrary. We choose to use a 1 to 5 rating scale instead of a 2 to 2
scale. We choose to use Fahrenheit instead of Celsius. We choose to use miles
per gallon instead of gallons per mile. The conclusions of a statistical
analysis should not depend on these arbitrary decisions, because we could have
made the decisions differently. We want the statistical analysis to say
something about reality, not simply about our whims regarding meters or feet.
Suppose we have a rating scale where several judges rate the goodness of
flavor of several foods on a 1 to 5 scale. If we want to draw conclusions about
the measurements, i.e. the 1to5 ratings, then we need not be concerned about
measurement theory. For example, if we want to test the hypothesis that the
foods have equal mean ratings, we might do a twoway ANOVA on the ratings.
But if we want to draw conclusions about flavor, then we must
consider how flavor relates to the ratings, and that is where measurement theory
comes in. Ideally, we would want the ratings to be linear functions of the
flavors with the same slope for each judge; if so, the ANOVA can be used to make
inferences about mean goodnessofflavors, providing we can justify all the
appropriate statistical assumptions. But if the judges have different slopes
relating ratings to flavor, or if these functions are not linear, then this
ANOVA will not allow us to make inferences about mean
goodnessofflavor. Note that this issue is not about statistical interaction;
even if there is no evidence of interaction in the ratings, the judges may have
different functions relating ratings to flavor.
We need to consider what information we have about the functions relating
ratings to flavor for each judge. Perhaps the only thing we are sure of is that
the ratings are monotone increasing functions of flavor. In this case, we would
want to use a statistical analysis that is valid no matter what the particular
monotone increasing functions are. One way to do this is to choose an analysis
that yields invariant results no matter what monotone increasing functions the
judges happen to use, such as a Friedman test. The study of such invariances is
a major concern of measurement theory.
However, no measurement theorist would claim that measurement theory provides
a complete solution to such problems. In particular, measurement theory
generally does not take random measurement error into account, and if such
errors are an important aspect of the measurement process, then additional
methods, such as latent variable models, are called for. There is no clear
boundary between measurement theory and statistical theory; for example, a Rasch
model is both a measurement model and a statistical model.
Permissible transformations are transformations of a scale of measurement
that preserve the relevant relationships of the measurement process.
Permissible is a technical term; use of this term does not imply that
other transformations are prohibited for data analysis any more than use of the
term normal for probability distributions implies that other
distributions are pathological. If Stevens had used the term mandatory
rather than permissible, a lot of confusion might have been avoided.
In the example of measuring sticks, changing the unit of measurement (say,
from centimeters to inches) multiplies the measurements by a constant factor.
This multiplication does not alter the correspondence of the relationships
'greater than' and 'longer than', nor the correspondence of addition and
concatenation. Hence, change of units is a permissible transformation with
respect to these relationships.
There are different levels of measurement that involve different properties
(relations and operations) of the numbers or symbols that constitute the
measurements. Associated with each level of measurement is a set of permissible
transformations. The most commonly discussed levels of measurement are as
follows:
 Nominal

 Two things are assigned the same symbol if they have the same value of
the attribute.
 Permissible transformations are any onetoone or manytoone
transformation, although a manytoone transformation loses information.
 Examples: numbering of football players; numbers assigned to religions
in alphabetical order, e.g. atheist=1, Buddhist=2, Christian=3, etc.
 Ordinal

 Things are assigned numbers such that the order of the numbers reflects
an order relation defined on the attribute. Two things x and
y with attribute values a(x) and a(y) are
assigned numbers m(x) and m(y) such that if m(x) >
m(y), then a(x) > a(y).
 Permissible transformations are any monotone increasing transformation,
although a transformation that is not strictly increasing loses information.
 Examples: Moh's scale for hardness of minerals; grades for academic
performance (A, B, C, ...); blood sedimentation rate as a measure of
intensity of pathology.
 Interval

 Things are assigned numbers such that differences between the numbers
reflect differences of the attribute. If m(x)  m(y) > m(u) 
m(v), then a(x)  a(y) > a(u)  a(v).
 Permissible transformations are any affine transformation t(m) = c *
m + d, where c and d are constants; another way of
saying this is that the origin and unit of measurement are arbitrary.
 Examples: temperature in degrees Fahrenheit or Celsius; calendar date.
 Loginterval

 Things are assigned numbers such that ratios between the numbers reflect
ratios of the attribute. If m(x) / m(y) > m(u) / m(v), then
a(x) / a(y) > a(u) / a(v).
 Permissible transformations are any power transformation t(m) = c *
m ** d, where c and d are constants.
 Examples: density (mass/volume); fuel efficiency in mpg.
 Ratio

 Things are assigned numbers such that differences and ratios between the
numbers reflect differences and ratios of the attribute.
 Permissible transformations are any linear (similarity) transformation
t(m) = c * m, where c is a constant; another way of saying this is
that the unit of measurement is arbitrary.
 Examples: Length in centimeters; duration in seconds; temperature in
degrees Kelvin.
 Absolute

 Things are assigned numbers such that all properties of the numbers
reflect analogous properties of the attribute.
 The only permissible transformation is the identity transformation.
 Examples: number of children.
These measurement levels form a partial order based on the sets of
permissible transformations: Weaker <> Stronger
 Interval 
/ \
Nominal  Ordinal < > Ratio  Absolute
\ /
Loginterval
In real life, a scale of measurement may not correspond precisely to any of
these levels of measurement. For example, there can be a mixture of nominal and
ordinal information in a single scale, such as in questionnaires that have
several nonresponse categories. It is common to have scales that lie somewhere
between the ordinal and interval levels in that the measurements can be assumed
to be a smooth monotone function of the attribute. For many subjective rating
scales (such as the 'strongly agree,' 'agree,' ... 'strongly disagree' variety)
it cannot be shown that the intervals between successive ratings are exactly
equal, but with reasonable care and diagnostics it may be safe to say that no
interval represents a difference more than two or three times greater than
another interval.
Unfortunately, there are also many situations where the measurement process
is too illdefined for measurement theory to apply. In such cases, it may still
be fruitful to consider what arbitrary choices were made in the course of
measurement, what effect these choices may have had on the measurements, and
whether some plausible class of permissible transformations can be determined.
For a binary variable, the classes of onetoone transformations, monotone
increasing/decreasing transformations, and affine transformations are
identicalyou can't do anything with a onetoone transformation that you can't
do with an affine tranformation. Hence binary variables are at least at the
interval level. If the variable connotes presence/absence or if there is some
other distinguishing feature of one category, a binary variable may be at the
ratio or absolute level.
Nominal variables are often analyzed in linear models by coding binary dummy
variables. This procedure is justified since binary variables are at the
interval level or higher.
Measurement level depends on the correspondence between the measurements and
the attribute. Given a set of data, one cannot say what the measurement level is
without knowing what attribute is being measured. It is possible that a certain
data set might be treated as measuring different attributes at different times
for different purposes.
Consider a rat in a Skinner box who pushes a lever to get food pellets. The
number of pellets dispensed in the course of an experiment is obviously an
absolutelevel measurement of the number of pellets dispensed. If number of
pellets is considered as a measure of some other attribute, the measurement
level may differ. As a measure of amount of food dispensed, the number of
pellets is at the ratio level under the assumption that the pellets are of equal
size; if the pellets are not of equal size, a more elaborate measurement model
is required, perhaps one involving random measurement error if the pellets are
dispensed in random order. As a measure of duration during the experiment, the
number of pellets is at an ordinal level. As a measure of response effort, the
number of pellets might be approximately ratio level, but we would need to
consider whether the rat's responses were executed in a consistent way, whether
the rat may miss the lever, and so forth. As a measure of amount of reward, the
number of pellets could only be justified by some very strong assumptions about
the nature of rewards; the measurement level would depend on the precise nature
of those assumptions. The main virtue of measurement theory is that it
encourages people to consider such issues.
Once a set of measurements have been made on a particular scale, it is
possible to transform the measurements to yield a new set of measurements at a
different level. It is always possible to transform from a stronger level to a
weaker level. For example, a temperature measurement in degrees Kelvin is at the
ratio level. If we convert the measurements to degrees Celsius, the level is
interval. If we rank the measurements, the level becomes ordinal. In some cases
it is possible to convert from a weaker scale to a stronger scale. For example,
correspondence analysis can convert nominal or ordinal measurements to an
interval scale under appropriate assumptions.
You can view an ordinal scale as an interval scale with error if you really
want to, but the errors are not independent, additive, or identically
distributed as required for many statistical methods. The errors would involve
complicated dependencies to maintain monotonicity with the interval scale. In
the example above with number of pellets as a measure of duration, the errors
would be cumulative, not additive, and the error variance would increase over
time. Hence for most statistical purposes, it useless to consider an ordinal
scale as an interval scale with measurement error.
Measurement level has nothing to do with discrete vs. continuous variables.
The distinction between discrete and continuous random variables is commonly
used in statistical theory, but that distinction is rarely of importance in
practice. A continuous random variable has a continuous cumulative distribution
function. A discrete random variable has a stepwiseconstant cumulative
distribution function. A discrete random variable can take only a finite number
of distinct values in any finite interval. There exist random variables that are
neither continuous nor discrete; for example, if Z is a standard normal random
variable and Y=max(0,Z), then Y is neither continuous nor discrete, but has
characteristics of both.
While measurements are always discrete due to finite precision, attributes
can be conceptually either discrete or continuous regardless of measurement
level. Temperature is usually regarded as a continuous attribute, so temperature
measurement to the nearest degree Kelvin is a ratiolevel measurement of a
continuous attribute. However, quantum mechanics holds that the universe is
fundamentally discrete, so temperature may actually be a discrete attribute. In
ordinal scales for continuous attributes, ties are impossible (or have
probability zero). In ordinal scales for discrete attributes, ties are possible.
Nominal scales usually apply to discrete attributes. Nominal scales for
continuous attributes can be modeled but are rarely used.
Mathematical statistics is concerned with the connection between inference
and data. Measurement theory is concerned with the connection between data and
reality. Both statistical theory and measurement theory are necessary to make
inferences about reality.
Measurement theory cannot determine some single statistical method or model
as appropriate for data at a specific level of measurement. But measurement
theory does in fact show that some some statistical methods are inappropriate
for certain levels of measurement if we want to make meaningful inferences about
the attribute being measured.
If we want to make statistical inferences regarding an attribute based on a
scale of measurement, the statistical method must yield invariant or equivariant
results under the permissible transformations for that scale of measurement. If
this invariance or equivariance does not hold, then the statistical inferences
apply only to the measurements, not to the attribute that was measured.
If we record the temperature in degrees Fahrenheit in Cary, NC, at various
times, we can compute statistics such as the mean, standard deviation, and
coefficient of variation. Since Fahrenheit is an interval scale, only statistics
that are invariant or equivariant under change of origin or unit of measurement
are meaningful. The mean is meaningful because it is equivariant under change of
origin or unit. The standard deviation is meaningful because it is invariant
under change of origin and equivariant under change of unit. But the coefficient
of variation is meaningless because it lacks such invariance or equivariance.
The mean and standard deviation can easily be converted back and forth from
Fahrenheit to Celsius, but we cannot compute the coefficient of variation in
degrees Celsius if we know only the coefficient of variation in degrees
Fahrenheit.
Paul Thompson provides an example where interval and ratio levels are
confused: > ... I recently published a
> paper in Psychiatric Research. We discuss the BPRS, a very common rating
> scale in psychiatry. Oddly enough, in the BPRS in the US, '1' means NO
> PATHOLOGY. However, a frequent statistic computed is percent improved:
>
> PI = (BPRS(Base)BPRS(6week) ) / BPRS(Base) * 100
>
> If you use the '1 implies no pathology' model, you are not measuring
> according to a ratio scale, which requires a true 0. We show that this
> has very bad characteristics, which include a flat impossibility for
> a certain % improvement at certain points in the scale.
>
> This is pretty trivial, but should have an effect. As one reviewer said,
> 'This is so obvious that I am surprised that no one has ever thought of it
> before.' Nonetheless, the scale was being misused.
It is clear that if we are estimating a parameter that lacks invariance or
equivariance under permissible transformations, we are estimating a chimera. The
situation for hypothesis testing is more subtle. It is nonsense to test a null
hypothesis the truth of which is not invariant under permissible
transformations. For example, it would be meaningless to test the null
hypothesis that the mean temperature in Cary in July is twice the mean
temperature in December using a Fahrenheit or Celsius scalewe would need a
ratio scale for that hypothesis to be meaningful.
But it is possible for the null hypothesis to be meaningful even if the error
rates for a given test are not invariant. Suppose that we had an ordinal scale
of temperature, and the null hypothesis was that the distribution of
temperatures in July is identical to the distribution in December. The truth of
this hypothesis is invariant under strictly increasing monotone transformations
and is therefore meaningful under an ordinal scale. But if we do a ttest of
this hypothesis, the error rates will not be invariant under monotone
transformations. Hardcore measurement theorists would therefore consider a
ttest inappropriate. But given a null hypothesis, there are usually many
different tests that can be performed with accurate or conservative significance
levels but with different levels of power against different alternatives. The
fact that different tests have different error rates does not make any of them
correct or incorrect. Hence a softcore measurement theorist might argue that
invariance of error rates is not a prerequisite for a meaningful hypothesis
testonly invariance of the null hypothesis is required.
Nevertheless, the hardcore policy rules out certain tests that, while not
incorrect in a strict sense, are indisputably poor tests in terms of having
absurdly low power. Consider the null hypothesis that two random variables are
independent of each other. This hypothesis is invariant under onetoone
transformations of either variable. Suppose we have two nominal variables, say,
religion and preferred statistical software product, to which we assign
arbitrary numbers. After verifying that at least one of the two variables is
approximately normally distributed, we could test the null hypothesis using a
Pearson productmoment correlation, and this would be a valid test. However, the
power of this test would be so low as to be useless unless we were lucky enough
to assign numbers to categories in such a way as to reveal the dependence as a
linear relationship. Measurement theory would suggest using a test that is
invariant under onetoone transformations, such as a chisquared test of
independence in a contingency table. Another possibility would be to use a
Pearson productmoment correlation after assigning numbers to categories in such
a way as to maximize the correlation (although the usual sampling distribution
of the correlation coefficient would not apply). In general, we can test for
independence by maximizing some measure of dependence over all permissible
transformations.
However, it must be emphasized that there is no need to restrict the
transformations in a statistical analysis to those that are permissible. That is
not what permissible transformation means. The point is that
statistical methods should be used that give invariant results under the class
of permissible transformations, because those transformations do not alter the
meaning of the measurements. Permissible was undoubtedly a poor choice
of words, but Stevens was quite clear about what he meant. For example (Stevens
1959):
In general, the more unrestricted the permissible transformations,
the more restricted the statistics. Thus, nearly all statistics are applicable
to measurements made on ratio scales, but only a very limited group of
statistics may be applied to measurements made on nominal scales.
The connection between measurement level and statistical analysis has been
hotly disputed in the psychometric and statistical literature by people who fail
to distinguish between inferences regarding the attribute and inferences
regarding the measurements. If one is interested only in making inferences about
the measurements without regard to their meaning, then measurement level is, of
course, irrelevant to choice of statistical method. The classic example is Lord
(1953), 'On the Statistical Treatment of Football Numbers', American
Psychologist, 8, 750751, who argued that statistical methods could be applied
regardless of level of measurement, and concocted a silly example involving the
jersey numbers assigned to football players, which Lord claimed were
nominallevel measurements of the football players. Lord contrived a situation
in which freshmen claimed they were getting lower numbers than the sophomores,
so the purpose of the analysis was to make inferences about the numbers, not
about some attribute measured by the numbers. It was therefore quite reasonable
to treat the numbers as if they were on an absolute scale. However, this
argument completely misses the point by eliminating the measured attribute from
the scenario.
The confusion between measurements and attributes was perpetuated by Velleman
and Wilkinson (1993), 'Nominal, Ordinal, Interval, and Ratio Typologies Are
Misleading', The American Statistician, 47, 6572. Velleman and
Wilkinson set up a series of straw men and knocked some of them down, while
consistently misunderstanding the meaning of meaning and of
permissible transformation. For example, they claimed that the number
of cylinders in an automobile engine can be treated, depending on the
circumstances, as nominal, ordinal, interval, or ratio, and hence the concept of
measurement level 'simplifies the matter so far as to be false.' In fact, the
number of cylinders is at the absolute level of measurement. Thus, measurement
theory would dictate that any statistical analysis of number of cylinders must
be invariant under an identity transformation. Obviously, any analysis
is invariant under an identity transformation, so all of the analyses that
Velleman and Wilkinson claimed might be appropriate are acceptable
according to measurement theory. What is false is not measurement theory but
Velleman and Wilkinson's backwards interpretation of it.
It is important to understand that the level of measurement of a variable
does not mandate how that variable must appear in a statistical model. However,
the measurement level does suggest reasonable ways to use a variable by default.
Consider the analysis of fuel efficiency in automobiles. If we are interested in
the average distance that can be driven with a given amount of gas, we should
analyze miles per gallon. If we are interested in the average amount of gas
required to drive a given distance, we should analyze gallons per mile. Both
miles per gallon and gallons per mile are measurements of fuel efficiency, but
they may yield quite different results in a statistical analysis, and there may
be no clear reason to use one rather than the other. So how can we make
inferences regarding fuel efficiency that do not depend on the choice between
these two scales of measurement? We can do that by recognizing that both miles
per gallon and gallons per mile are measurements of the same attribute on a
loginterval scale, and hence that the logarithm of either can be treated as a
measurement on an interval scale. Thus, if we were doing a regression, it would
be reasonable to begin the analysis using log(mpg). If evidence of nonlinearity
were detected, then other transformations could still be considered.
What has been shown is that various statistical methods are more or less
robust to distortions that could arise from smooth monotone transformations; in
other words, there are cases where it makes little difference whether we treat a
measurement as ordinal or interval. But there can hardly be any doubt that it
often makes a huge difference whether we treat a measurement as nominal or
ordinal, and confusion between interval and ratio scales is a common source of
nonsense.
Suppose we are doing a twosample ttest; we are sure that the assumptions of
ordinal measurement are satisfied, but we are not sure whether an equalinterval
assumption is justified. A smooth monotone tranformation of the entire data set
will generally have little effect on the p value of the ttest. A robust variant
of a ttest will likely be affected even less (and, of course, a rank version of
a ttest will be affected not at all). It should come as no surprise then that a
decision between an ordinal or an interval level of measurement is of no great
importance in such a situation, but anyone with lingering doubts on the matter
may consult the simulations in Baker, B. O., Hardyck, C, and Petrinovich, L. F.
(1966) 'Weak measurement vs. strong statistics: An empirical critique of S.S.
Stevens' proscriptions on statistics,' Educational and Psychological
Measurement, 26, 291309, for a demonstration of the obvious.
On the other hand, suppose we were comparing the variability instead of the
location of the two samples. The F test for equality of variances is not robust,
and smooth monotone transformations of the data could have a large effect on the
p value. Even a more robust test could be highly sensitive to smooth monotone
transformations if the samples differed in location.
Measurement level is of greatest importance in situations where the meaning
of the null hypothesis depends on measurement assumptions. Suppose the data are
1to5 ratings obtained from two groups of people, say males and females,
regarding how often the subjects have sex: frequently, sometimes, rarely, etc.
Suppose that these two groups interpret the term 'frequently' differently as
applied to sex; perhaps males consider 'frequently' to mean twice a day, while
females consider it to mean once a week. Females may report having sex more
'frequently' than men on the 1to5 scale, even if men in fact have sex more
frequently as measured by sexual acts per unit of time. Hence measurement
considerations are crucial to the interpretation of the results.
As mentioned earlier, it is meaningless to claim that it was twice as warm
today as yesterday because it was 40 degrees Fahrenheit today but only 20
degrees yesterday. Fahrenheit is not a ratio scale, and there is no meaningful
sense in which 40 degrees is twice as warm as 20 degrees. It would be just as
meaningless to compute the geometric mean or coefficient of variation of a set
of temperatures in degrees Fahrenheit, since these statistics are not invariant
or equivariant under change of origin. There are many other statistics that can
be meaningfully applied only to data at a sufficiently strong level of
measurement.
Consider some measures of location: the mode requires a nominal or stronger
scale, the median requires an ordinal or stronger scale, the arithmetic mean
requires an interval or stronger scale, and the geometric mean or harmonic mean
require a ratio or stronger scale.
Consider some measures of variation: entropy requires a nominal or stronger
scale, the standard deviation require an interval or stronger scale, and the
coefficient of variation requires a ratio or stronger scale.
Simple linear regression with an intercept requires that both variables be on
an interval or stronger scale. Regression through the origin requires that both
variables be on a ratio or stronger scale.
A generalized linear model using a normal distribution requires the dependent
variable to be on an interval or stronger scale. A gamma distribution requires a
ratio or stronger scale. A Poisson distribution requires an absolute scale.
The general principle is that an appropriate statistical analysis must yield
invariant or equivariant results for all permissible transformations. Obviously,
we cannot actually conduct an infinite number of analyses of a real data set
corresponding to an infinite class of transformations. However, it is often
straightforward to verify or falsify the invariance mathematically. The
application of this idea to summary statistics such as means and coefficients of
variation is fairly widely understood.
Confusion arises when we come to linear or nonlinear models and consider
transformations of variables. Recall that Stevens did not say that
transformations that are not 'permissible' are prohibited. What Stevens said was
that we should consider all 'permissible' transformations and verify
that our conclusions are invariant.
Consider, for example, the problem of estimating the parameters of a
nonlinear model by maximum likelihood (ML), and comparing various models by
likelihood ratio (LR) tests . We would want the LR tests to be invariant under
the permissible transformations of the variables. One way to do this is to
parameterize the model so that any permissible transformation can be inverted by
a corresponding change in the parameter estimates. In other words, we can make
the ML and LR tests invariant by making the inversepermissible transformations
mandatory (this is the same set of transformations as the permissible
transformations except for a degeneracy here and there which I won't worry
about).
To illustrate, suppose we are modeling a variable Y as a function
f() of variables N, O, I, L, R, and A at the nominal,
ordinal, etc. measurement levels, respectively. Then we can ensure the desired
invariance by setting up the model as: Y = f( arb(N), mon(O), a+bI, cL^d, eR, A, ...)
where arb() is any (estimated) function, mon() is any
(estimated) monotone function, and a, b, c, d, and e are
parameters. Then any permissible transformations of N, O, I, L, R, and
A can be absorbed by the estimation of the arb() and
mon() functions and the parameters. The function f() can
involve any other transformations such as sqrts or logs or whatever.
f() can be as complicated as you likethe presence of the permissible
transformations as part of the model to be estimated guarantees the desired
invariance.
If we were designing software for fitting linear or nonlinear models, we
might want to provide these 'permissible' or 'mandatory' transformations in a
convenient way. This, in fact, was the motivation for numerous programs
developed by psychometricians that anticipated many of the features of ACE and
generalized additive models.
Measurement theory shows that strong assumptions are required for certain
statistics to provide meaningful information about reality. Measurement theory
encourages people to think about the meaning of their data. It encourages
critical assessment of the assumptions behind the analysis. It encourages
responsible realworld data analysis.
Measurement theory: Frequently asked questionsWarren S. Sarle SAS
Institute Inc. SAS Campus Drive Cary, NC 27513,
USA saswss@unx.sas.com
URL: ftp://ftp.sas.com/pub/neural/measurement.html
Originally published in the Disseminations of the International
Statistical Applications Institute, 4th edition, 1995, Wichita: ACG Press, pp.
6166. Revised March 18, 1996.
Copyright (C) 1996 by Warren S. Sarle, Cary, NC, USA.
Permission is granted to reproduce this article for educational purposes
only, retaining the author's name and copyright notice. LBA for WSS, 18 Mar 1996

