Transformations: an introduction
-------------------------------------------------------------------------------
or relationship.
understanding are Emerson and Stoto (1983) and Emerson (1983). Behind
This help item covers the following topics. You can read in sequence or
skim directly to each section. Starred sections are likely to appear more
esoteric or more difficult than the others to those new to the subject.
* Transformations as a family
Typographical notes:
The Stata notation == for "is equal to" and != for "is not equal to"
There are many reasons for transformation. The list here is not
comprehensive.
1. Convenience
2. Reducing skewness
3. Equal spreads
4. Linear relationships
5. Additive relationships
If you are looking at just one variable, 1, 2 and 3 are relevant, while
if you are looking at two or more variables, 4 and 5 are more important.
value - level
spread
Standardised values have level 0 and spread 1 and have no units: hence
units. Most commonly a standard score is calculated using the mean and
z = -------------.
sd of x
statistical methods.
equal spreads, despite marked variations in level, which again makes data
easier to handle and interpret. Each data set or subset having about the
linear than about patterns that are highly curved. This is vitally
regression.)
y = a + bx
in which two terms a and bx are added is easier to deal with than
y = ax^b
in which two terms a and x^b are multiplied. Additivity is a vital issue
transformations are used only over ranges on which they yield (finite)
Reciprocal
can be applied to negative values, it is not useful unless all values are
population density (people per unit area) becomes area per person;
that are easy to manage, but that itself has no effect on skewness or
linearity.)
The reciprocal reverses order among values of the same sign: largest
Logarithm
or decline
y = a exp(bx)
is made linear by
ln y = ln a + bx
natural logarithms.)
y = a exp(0) = a,
y = ax^b = 0,
so the power function for positive b goes through the origin, which often
imply zero for y? This kind of power function is a shape that fits many
data sets rather well.
Examples are
males / females;
dependants / workers;
skewed data, because there is a clear lower limit and no clear upper
distributed.
Cube root
logarithm. It is also used for reducing right skewness, and has the
advantage that it can be applied to zero and negative values. Note that
(2)(2)(2) = 8 and (-2)(-2)(-2) = -8. These examples show that the cube
root of a negative number has negative sign and the same absolute value
This property is a little delicate. For example, change the power just a
smidgen from 1/3, and we can no longer define the result as a product of
useful.
Square root
and the cube root. It is also used for reducing right skewness, and also
has the advantage that it can be applied to zero values. Note that the
could be used to reduce left skewness. In practice, the main reason for
Otherwise quadratics are typically used solely because they can mimic a
relationship within the data region. Outside that region they may behave
very poorly, because they take on arbitrarily large values for extreme
Which transformation?
The main criterion in choosing a transformation is: what works with the
questions.
What makes physical (biological, economic, whatever) sense, for example
prefer measurement scales that are easy to think about. The cube root of
a volume and the square root of an area both have the dimensions of
has to be made.
all logarithmic.
analysts. Some use them routinely, others much less. Various views,
extreme or not so extreme, are slightly caricatured here to stimulate
"This seems like a kind of cheating. You don't like how the data are, so
"I see that this is a clever trick that works nicely. But how do I know
when this trick will work with some other data, or if another trick is
inverse transformation:
reciprocal t = 1 / x reciprocal x = 1 / t
1. Draw a graph of the data to see how far patterns in data match the
2. See what range the data cover. Transformations will have little effect
Some transformations are not defined mathematically for some values, and
often they make little or no scientific sense. For example, I would never
(unless to Kelvin).
generate:
Cube roots of negative numbers require special care. Stata uses a general
routine to calculate powers and does not look for special cases of
powers. Whenever negative values are present, a more general recipe for
working with it. In particular, many graph commands allow the options
labelled using the original values, but it does not leave behind a
Other commands
of a variable with the aim of showing how far they produce a more nearly
they can suggest a transform at odds with what your scientific knowledge
would indicate. boxcox and lnskew0 are more advanced commands that should
be used only after studying textbook explanations of what they do. Box
function, but results are reported on the original scale of the response.
100) often benefit from special transformations. The most common is the
symmetrically, pulling out the tails and pulling in the middle around 0.5
spreading its increase becomes more rapid and then in turn slows; and
finally the last few percent may be very slow in converting to literacy,
as we are left with the isolated and the awkward, who are the slowest to
pick up any new thing. The resulting curve is thus a flattened S-shape
against time, which in turn is made more nearly linear by taking logits
contacts between those who do and those who do not, which will rise and
(100%). Using logits is one way of ensuring this: otherwise models may
can be rewritten
transformations
transform of p = something done to p - something done to (1 - p).
This way of writing it brings out the symmetrical way in which very high
and very low values are treated. (If p is small, 1 - p is large, and vice
versa.) The logit is occasionally called the folded log. The simplest
other such transformation is the folded root (that means square root)
As with square roots and logarithms generally, the folded root has the
and 1 (100%). The folded root is a weaker transformation than the logit.
Two other transformations for proportions and percents met in the older
literature (and still used occasionally) are the angular and the probit.
The angular is
arcsin(root of p)
very like
p^0.41 - (1 - p)^0.41,
which in turn is close to
p^0.5 - (1 - p)^0.5,
which is another way of writing the folded root (Tukey 1960). The probit
the logit, but also more awkward to work with. As a result, it is now
advantages.
logarithm, namely the reciprocal, cube root, square root and square, are
reciprocal -1
square 2
sight, the logarithm really belongs in the family too. Knowing this is
reciprocal square -2
reciprocal -1
(yields one) 0
identity 1
square 2
cube 3
fourth power 4
Powers less than 1 squeeze high values together and stretch low values
yields 1 as a result. However, we will now see that in a strong sense log
If you know calculus, you will know that the sequence of powers
t_p(x) = x^p if p != 0,
= ln x if p == 0.
reference here is Box and Cox (1964), although note also earlier work by
t_p(x) = (x^p - 1) / p if p != 0,
= ln x if p == 0.
resemblances.
1. If p = 1, t_p(x) = x.
extended more easily to variables that can be both positive and negative,
Transformations for variables that are both positive and negative (more advance
> d)
reciprocals); if the second situation does not hold, then some other
the most awkward property that may invite transformation is heavy (long
melts or freezes.)
-ln(-x + 1) if x <= 0,
ln(x + 1) if x > 0.
sign(x) ln(|x| + 1)
passes through the origin, behaves like x for small x, positive and
negative, and like sign(x) ln(abs(x)) for large |x|. The gradient is
relative to those near the origin. It has recently been dubbed the
or
sign(x) * ln(abs(x) + 1)
that are both positive and negative were discussed by Yeo and Johnson
(2000).
function arsinh (also known as arg sinh, sinh^-1 and arcsinh). This is
The sinh and arsinh functions can be computed in Mata as sinh(x) and
The arsinh function also too passes through the origin and is steepest at
practice neglog(x) and arsinh(x) have loosely similar effects. See also
Johnson (1949).
Acknowledgements
Austin Nichols pointed out that cube roots are well defined for negative
values.
Author
n.j.cox@durham.ac.uk
Postscript
which the independent variable is changed. The forms these take will
Modulo some small changes in terminology, this applies here too. Either
way, the advice that "experience and experiment must guide the student"
References
Press.
Brooks-Cole, 211-219.
Also see
On-line: generate, egen, graph