Intelligent Designs: Selected Topics in Physics

INTELLIGENT DESIGNS
selected topics in physics
Assigned by
Juan Pablo Fernández
Department of Physics
University of Massachusetts
Amherst, Massachusetts
INTELLIGENT DESIGNS
INTELLIGENT DESIGNS
x
y
Assigned by
Juan Pablo Fernández
Department of Physics
University of Massachusetts
Amherst, Massachusetts
Selected Topics in Physics: Intelligent Designs
© University of Massachusetts, October 2008
AUTHORS
Sam Bingham sbingham @student.umass.edu
Samuel Boone sboone
Morgan-Elise Cervo mcervo
Jose Clemente jaclemen
Adam Cohen afcohen
Robert Deegan rdeegan
Matthew Drake mdrake
Christopher Emma cpemma
Sebastian Fischetti sfischet
Keith Fratus kfratus
Douglas Herbert dherbert
Paul Hughes phughes
Christopher Kerrigan crkerrig
Alexander Kiriakopoulos akiriako
Collin Lally clally
Amanda Lund alund
Christopher MacLellan cmaclell
Matthew Mirigian mmirigia
Tim Mortsolf tmortsol
Andrew O’Donnell anodonne
David Parker dparker
Robert Pierce rpierce
Richard Rines rrines
Daniel Rogers drrogers
Daniel Schmidt dschmidt
Jonah Zimmerman jzimmerm
v
CONTENTS
preface xi
i Seeing the Light 1

1 morgan-elise cervo: why is the sky blue? 3
1.1 Introduction 3
1.2 Waves 3
1.3 Electromagnetic Waves 3
1.4 Radiation 5
1.5 Blueness of the Sky 7
1.6 Problems 8
2 matthew mirigian: the physics of rainbows 9
2.1 Introduction 9
2.2 The Primary Bow 9
2.3 The Secondary Bow 10
2.4 Dispersion 11
2.5 Problems 12
3 matthew drake: the camera and how it works 13
3.1 Introduction 13
3.2 Lenses 14
3.3 The Camera Itself 15
3.4 Questions 17
3.5 Solutions 17
4 sebastian fischetti: holography: an introduction 19
4.1 Introduction 19
4.2 The Geometric Model 19
4.3 Types of Holograms 21
4.4 Making Holograms 22
4.5 Applications of Holography 23
4.6 Problems 24
5 colin lally: listening for the shape of a drum 27
5.1 Introduction 27
5.2 Eigenvalues 28
5.3 Problems 29
ii Mind Over Matter 31

6 christopher maclellan: glass 33
6.1 Introduction to Glass 33
6.2 Amorphous Solids 33
6.3 The Glass Transition 34
6.4 Simulating Amorphous Materials with Colloids 35
6.5 Practice Questions 37
7 robert pierce: the physics of splashing 39
7.1 Introduction 39
7.2 Pressure, Surface Tension, and other Concepts 39
7.3 Splashing 42
7.4 Summary 43
7.5 Chapter Problems 44
7.6 Multiple Choice Questions 45
8 sam bingham: freak waves 47
8.1 Introduction 47
vii
viii contents
8.2 Linear Model 48

8.3 Interference 49
8.4 Nonlinear Effects 52
8.5 Conclusion 53
8.6 Problems 54
9 paul hughes: friction 55
9.1 Overview 55
9.2 Amontons/Coulomb Friction 55
9.3 Toward A Conceptual Mesoscopic Model 56
9.4 Summary 58
9.5 Problems 58
10 keith fratus: neutrino oscillations in the standard
model 61
10.1 Introduction 61
10.2 A Review of Quantum Theory 61
10.3 The Standard Model 62
10.4 The Weak and Higgs Mechanisms 64
10.5 The Origin of Neutrino Oscillations 67
10.6 Implications of the Existence of Neutrino Oscillations 71
10.7 Problems 73
10.8 Multiple Choice Test Problems 75
iii Information is Power 77

11 andy o’donnell: fast fourier transform 79
11.2 Fourier Transform 79
11.3 Discrete Transform 80
11.4 The Fast Fourier Transform 81
11.5 Multiple Choice 83
11.6 Homework Problems 83
12 tim mortsolf: the physics of data storage 85
12.2 Bits and Bytes — The Units of Digital Storage 85
12.3 Storage Capacity is Everything 87
12.4 The Physics of a Hard Disk Drive 88
12.5 Summary 94
12.6 Exercises 94
13 tim mortsolf: the physics of information theory 97
13.2 Information Theory – The Physical Limits of Data 97
13.3 Shannon’s Formula 99
13.4 The Physical Limits of Data Storage 101
13.5 Summary 103
13.6 Exercises 103
14 sam boone: analytical investigation of the optimal
traffic organization of social insects 107
14.2 Ant Colony Optimization 109
14.3 Optimization By Hand 109
14.4 Applying ACO Meta-Heuristic to the Traveling Salesman
Problem 110
contents ix
14.5 Solution to the Traveling Salesman Problem Using an

ACO Algorithm 112
15 christopher kerrigan: the physics of social insects 117
15.2 The Electron and the Ant 117
15.3 Insect Current 118
15.4 Insect Diagram 118
15.5 Kirchoff’s Rules (for ants) 119
15.6 Differences 119
15.7 The Real Element 120
15.8 Problems 121
iv What’s Out There 123

16 robert deegan: the discovery of neptune 125
16.2 Newton’s Law of Universal Gravitation 125
16.3 Adams and Le Verrier 126
16.4 Perturbation 127
16.5 Methods and Modern Approaches 128
16.6 Practice Questions 129
16.7 Answers to Practice Questions 129
17 alex kiriakopoulos: white dwarfs 131
17.2 The Total Energy 132
17.3 Question 133
18 daniel rogers: supernovae and the progenitor the-
ory 135
18.2 Creation of Heavy Elements 135
18.3 Dispersal of Heavy Elements 136
18.4 Progenitor Theory 138
18.5 A Mathematical Model 138
18.6 Conclusion 139
18.7 Problems 140
19 david parker: the equivalence principle 143
19.2 Weak? Strong? Einstein? 143
19.3 Consequences 144
19.4 Example 144
19.5 Problems 145
20 richard rines: the fifth interaction 147
20.1 Introduction: The ‘Four’ forces 147
20.2 The Beginning: Testing Weak Equivalence 147
20.3 A New Force 148
20.4 The Death of the Force 150
20.5 Problems 151
21 douglas herbert: the science of the apocalypse 153
21.2 Asteroid Impact 153
21.3 Errant Black Holes 155
21.4 Flood volcanism 155
21.5 Giant Solar Flares 156
21.6 Viral Epidemic 157
22 amanda lund: extraterrestrial intelligence 159
x contents

22.2 The Possibility of Life in the Universe 159
22.3 The Search for Intelligent Life 160
22.4 Will We Find It? And When? 161
22.5 Problems 162
22.7 Summary 164
a Bibliography 165
Index 173
Physics is to be regarded not so much as the study of something
a priori given, but rather as the development of methods
for ordering and surveying human experience.
— N. Bohr [59]
P R E FA C E
his book has been produced as an assignment for Physics 381,

T Writing in Physics, taught by the Department of Physics of the
University of Massachusetts Amherst in the Fall 2008 semester.
instructor
Juan Pablo Fernández; 1034 LGRT
jpf@alumni.umass.edu or juanf@physics.umass.edu
teaching assistant
Benjamin Ett
bett@physics.umass.edu
class times and office hours

Most of this class will be taught long distance. I plan to come to
Amherst twice a month; on those days we will meet at the usual
class time and hold one-on-one conferences to discuss work in
progress. The best dates and times we will agree upon in class.
A good fraction of our communication will take place via email.
Feel free to email me at any time with any questions, comments,
requests for help, etc. There is one exception, though: Any ques-
tions or comments about end-of-semester grades must be submitted in
hard copy.
textbook
Nicholas J. Higham, Handbook of Writing for the Mathematical Sci-
ences, Second Edition. Philadelphia, SIAM, 1998.
description
As a professional physicist you will be expected to communicate
with four kinds of audiences, each of which has a direct bearing on
your livelihood: Professionals—including you—that work on your
field, professionals that work on other fields of physics or science,
students of physics or other disciplines, and the general public—
i.e., the taxpayer. Most of this communication will be in writing,
which in physics includes not just prose but also mathematics and
displayed material. In this course you will acquaint yourself with
the many different elements that contribute to successful physics
writing and will put them to work in different contexts.
objectives
1. Articulate concepts, methods, and results of theoretical or
experimental physics to other physicists, other scientists,
students, and laypeople.
2. Be confident in the use of LATEX and other public-domain
productivity tools for scientists.
3. Appreciate the amount of work that goes into correct, clear
writing.
xi
xii contents
4. Show proper respect to your readers, making sure not to

write above their heads nor “write down at them.”
5. Find the right combination of prose, mathematics, tables, and
graphics that will help you make your points most clearly
and economically.
6. Learn to deal with the limitations of a given medium. By the
same token, learn to appreciate (and not abuse) the marvels
that technology affords you nowadays.
7. Practice proper attribution when making use of other peo-
ple’s work.
8. Collaborate with your classmates in the development of
written materials.
9. Have a working knowledge of the peer-review system of
publication.
evaluation
The grade for this course will be based on five writing projects
assigned in the following order:
1. A journal paper
2. A grant proposal
3. A textbook
4. A science newspaper
5. A final project
In due time I will provide more details about each of the projects
and propose a few different topics from which to choose. If
you would rather write about something else you must tell me
promptly.
You will hand in two drafts of each project. The first one must
be submitted in “draft” form and will receive extensive feedback.
The second draft will be considered final in terms of content and
form and will be assessed as such.
You will have roughly three weeks to complete each project. The
first week you can devote to experimenting with the physics and
the technology, the second to producing the first draft, and the
third to producing the final draft.
At least the first two projects will be peer-reviewed: everybody
will (anonymously) evaluate two papers and have theirs evaluated
by two classmates. The evaluations will include a suggested grade,
usually an integer from 7 to 10, and will themselves be graded.
For the third and fourth projects you will have to collaborate with
your classmates.
The fifth project will be freestyle and may involve media other
than paper.
Part I
SEEING THE LIGHT

M O R G A N - E L I S E C E RV O : W H Y I S T H E S K Y B L U E ? 1
1.1 introduction
ave you ever looked up at the sky and been amazed by its brilliant
H shade of blue? Or why the sun changes color at sunset? In this
chapter we will provide an answer to these questions through the study
of electromagnetic waves. We will uncover why sky is blue and not
white. We will also investigate sunsets and learn why sunsets in some
locations are more beautiful than others.
1.2 waves
To understand what is happening up in the sky we first need to review

the general properties of waves. A wave is a disturbance of a continuous
medium that propagates with a fixed shape at a constant velocity [36].
A familiar wave equation is that of a sinusoidal wave. This wave can be
represented by the equation,
f (z, t) = A cos[k(z − vt) + δ]. (1.1)
In the above equation the variable A represents the amplitude. The

argument of cosine represents the phase of the wave and δ represents
a phase constant. We are familiar with the idea that the wavenumber
k = 2π/λ. If we know the velocity v of the wave then we can find the
period, T (the amount of time it takes for the wave complete a cycle),
using the equation,
2π
T= . (1.2)
kv
Another useful property of waves to know is the frequency of the wave,
which represents the number of oscillations that occur per a unit of
time. The frequency is
1 kv v
f = = = . (1.3)
T 2π λ
Frequency can also be solved for in terms of angular frequency. We can
find angular frequency by
ω = 2π f = kv. (1.4)
Now that we understand the different properties of waves it is easy to

rewrite the equation for a sinusoidal wave as
f (z, t) = A cos(kz + ωt − δ). (1.5)
1.3 electromagnetic waves
Electromagnetic waves are formed when an electric field is combined

with a magnetic field. The electric and magnetic fields lie orthogonal
3
4 morgan-elise cervo: why is the sky blue?
Figure 1.1: This chart provides wavelengths for the different types of electro-
magnetic waves [9].
to motion of the electromagnetic wave. Electromagnetic waves are

categorized by their wavelength. In Figure 1, we have a chart which
gives the wavelengths of familiar electromagnetic waves such as x-rays
and microwaves.
In order to understand how the equation for electromagnetic waves is
derived we will first review Maxwell’s Equations. Gauss’s law describes
the distribution of electric charge in the formation of an electric field by
the equation,
ρ
∇·E = (1.6)
e0
where ρ represents the charge density and e0 is the electric constant.

Gauss’s law, in other words, is a mathematical definition of an electric
field. Another equation from Gauss that is useful to know is,
∇ · B = 0. (1.7)
The above equation states that the divergence in a magnetic field is zero;
in other words, there are no monopoles in magnetic fields. The third
Maxwell equation is known as Faraday’s induction law and it forms the
basis of electrical inductors and transformers [62]. The equation reads,
∂B
∇×E = − . (1.8)
∂t
In other words, the line integral of the electric field around a closed
loop is equal to the negative of the induced magnetic field [8]. The final
Maxwell equation is a correction to Ampere’s law. It states that in a
static electric field, the integral of the magnetic field around a closed
loop is proportional to the current flowing through the loop. Rewritten
in mathematical terms we have,
∂E
∇ × B = µ 0 J + µ 0 e0 . (1.9)
∂t
1.4 radiation 5
Figure 1.2: Notice how from the horizon the color is gradient of color of longest
wavelength to shortest.
Now we have the necessary tools to write an equation for an electro-

magnetic wave. The equations to describe an electromagnetic wave in a
vacuum are as follows:
E(r, t) = E0 cos(k · r − ωt + δ)n̂. (1.10)
1
B(r, t) = E0 cos(k · r − ωt + δ)(k̂ × n̂). (1.11)
c
where k is the propagation vector, n̂ is the polarization and ω is the
frequency.
1.4 radiation
Now that we know electromagnetic waves exist you might question,

where do electromagnetic waves come from? The answer is radiation.
When a charge is accelerating and therefore changing currents an
electromagnetic wave is produced.
Let’s consider the case of a charge being driven across a wire con-
necting to metal spheres. The equation for the charge with respect to
time can be written as
q(t) = q0 cos(ωt). (1.12)
The produced electric dipole as a result of the oscillating charge can
then be written as,
p(t) = q0 d cos(wt)ẑ, (1.13)
where q0 d = p0 is the maximum value of the dipole moment [36]. The
retarded potential of an electromagnetic wave describes the potential for
an electromagnetic field of a time-varying current or charge distribution.
The retarded potential of the dipole system, for a wave traveling through
a vacuum at the speed of light, can be derived using the equation,
1 ρ (r 0 , tr ) 0
Z
V (r, t) = dτ , (1.14)
4πe0 γ
where tr represents the retarded time. Using spherical coordinates the
equation for the retarding potential can be rewritten as,
p0 ω cos Θ
V (r, Θ, t) = − ( ) sin[ω (t − r/c)] (1.15)
4πe0 c r
q+
dz
r
q-
Figure 1.3: Problem Geometry
To find the potential vector A, we use the equation for the current in
the wire (the derivative of the z component of charge with respect to
time),
I (t) = −q0 ω sin(ωt)ẑ. (1.16)
Making reference to Figure 3 we derive,
d/2 − q0 ω sin[ ω ( t − γ/c )]

Z
µ0
A(r, t) = −d/2 dz. (1.17)
4π γ
To eliminate the integral’s variable of d, we replace it with the value at
the center leading to,
µ0 p0 ω
A(r, Θ, t) = − sin[ω (t − r/c)]ẑ. (1.18)
4πr
The equation of the electric field can then be derived by plugging
obtain values into the equation,
∂A
E = −∇V − (1.19)
∂t
Using the values we obtained for the vector potential and potential we
find,
µ0 p0 ω 2 sin Θ
E=− ( ) cos[ω (t − r/c)]Θ̂ (1.20)
4π r
To find the magnetic field we use B = ∇ × A.
µ0 p0 ω 2 sin Θ
B=− ( cos[ω (t − r/c)]Φ̂. (1.21)
4πc r
So far we have described a wave moving radially outward at a frequency
of ω. We have also defined the electric and magnetic fields, which are
orthogonal to each other and in the same phase. We now find the
intensity of the wave. The energy radiated by an oscillating dipole can
be found by the Poynting vector [36],
1
hSi = ( E × B) (1.22)
µ0
When we use our values for the magnetic and electric field we find,
µ0 p20 ω 4 sin2 θ
hSi = r̂. (1.23)
32π 2 c r2
1.5 blueness of the sky 7
Figure 1.4: Plot of three color cones that shows which wavelengths they best
receive [9]
A visual interpretation of the intensity shows that the function takes

the shape of a donut with the hole along the axis of the dipole. In other
words there is no radiation along this axis. If we integrate hSi over a
sphere of radius r we can find the total power radiated.
µ0 p20 ω 4 sin2 Θ 2 µ0 p20 ω 4

Z
h Pi = r sin ΘdΘdφ = . (1.24)
32π 2 c r 2 12πc
Notice that the power of radiation is not dependent on the radius, and
therefore size, of the sphere. However, the power is highly dependent
on the frequency, ω.
1.5 blueness of the sky
If we consider the radiation of the sun’s light to be compatible with our

equations for an electromagnetic wave traveling in a vacuum, then we
can say that the large dependence of frequency in the power equation
produces the blueness of the sky. We know already that white light is
composed of several wavelengths; each wavelength represents a differ-
ent color. Shorter wavelengths, or light waves of high frequencies are
more effective in the power equation than light waves of longer lengths.
Blue has a shorter wavelength than red for example and consequently
the sky appears to be blue to us.
example 1.1: why isn’t the sky violet?

We now know that light of greater frequency is radiated most strongly. If we
look at Figure 1 we would expect that the sky should be violet instead of blue
because violet has the shortest wavelength. The reason that the sky looks blue
and not purple is because our eyes are able to see some colors more easily
than others. Our eyes have three types of color receptors, or cones. The cones
are called blue, green and red; the names of the cones come from the color
that the receptor most strongly correlates to. The red receptor best sees red,
orange and yellow light. The green receptor, best sees green and yellow light
and the blue receptor, blue light. Even though the sky appears blue we know
that there is strong presence of violet light because we see blue with red tints.
If the violet light were absence the sky would appear blue with a green tint.
So why does the sun turn red when the sun is setting? This is
especially puzzling be cause we just explained that red light, with
the lowest frequency, is the least powerful. When the sun is about to
pass over the horizon, the sun’s light has to travel a greater distance
through the atmosphere to reach the observer. The particles in the
earth’s atmosphere cause more waves of shorter wavelengths to be
scattered across this distance. For this reason, the sky is red when the
sun is setting. Similarly, sunsets may be more brilliant in coastal areas
or very dusty area because there are more particles in the air to scatter
light.
1.6 problems
1. Using the equation for combining waves, A3 eiδ3 = A1 eiδ1 + A2 eiδ2

determine A3 and δ3 in terms of A1 , A2 , δ1 , and δ2 [36].
Solution: If we use the fact that
eiΘ = cos Θ + i sin Θ
we can get that
A3 (cos δ3 + i sin δ3 ) = A2 (cos δ2 + i sin δ2 ) + A1 (cos δ1 + i sin δ1 ).
Now by separating the real and imaginary parts, we have two equations
A3 cos δ3 = A1 cos δ1 + A − 2 cos δ2
A3 i sin δ3 = A1 sin δ1 + A − 2 sin δ2
Dividing the second equation by the first and taking the inverse tangent,
sin δ1 + A−2 sin δ2
δ3 = tan−1 ( AA11 cos δ1 + A−2 cos δ2
A3 is found by squaring and adding the two previous equations to get,
1/2
A3 = A21 + A22 + 2A1 A2 cos(δ1 + δ2 )

2. Check that the retarded potentials of an oscillating dipole ((1.15) and

(1.18)) satisfy the Lorentz gauge condition.
Solution: The Lorentz Gauge condition is ∇· = −µ0 e0 ∂V
∂t . Where A is the
potential due to the current flowing through a loop of wire,
−µ0 pω
A= 4πr sin(ω (t − r/c))ẑ.
Then,
1 ∂ 2 1
Θ ∂∂Θ (sin ΘAΘ ) + r sin Θ ∂φ
1 ∂Aφ
∇·A = r2 ∂r
(r Ar ) + r sin
nh i o
µ0 p0 ω 1 ∂ 1 2 2 sin Θ cos Θ
cos Θ − ) cos Θ −
ωr
∇· A = 4π r2 ∂r r
r sin(ωt − ωr
c ) c cos(ωt − ωr
c r2 sin Θ
sin(ωt − ωr
c )
sin(ωt− ωr
h i
µ0 p0 ω c )
c ) cos Θ
cos(ωt − ωr
ωr
= 4π r2
+ c
n h i o
p0 ω 1
= µ 0 e0 4πe0 r2
sin ω (t − r/c) + ω
rc cos ω (t − r/c) cos Θ .
Solving for the partial of the scalar potential,
p0 cos Θ −ω 2
n o
∂V
∂t = 4πe0 r c cos [ ω ( t − r/c )] − ω
r sin ( t − r/c )
p0 cos Θ omega
n o
ω2
= 4πe0 r r sin(ω (t − r/c) + c cos [ω (t − r/c)]
h i
p0 ω 1
= 4πe 0 r2
sin [ω (t − r/c)] + rc ω
cos ω (r − tc) cos Θ
By plugging the solved values into the Lorentz Gauge condition we have,
∇· = −µ0 e0 ∂V
∂t .
M AT T H E W M I R I G I A N : T H E P H Y S I C S O F R A I N B O W S 2
2.1 introduction
ainbows are one of nature’s most beautiful sights. Some might

R argue that the rainbow is equally beautiful in its neat and compact
packaging of optical phenomena in the physics of the rainbow. The
rainbow is formed by a combination of physical properties of light,
namely refraction, reflection and dispersion that occur in a drop of rain.
When conditions are right a bright inner bow can be observed as well
as a fainter outer bow. The inner, primary bow, is red on the outside
and violet on the inside, whereas the outer secondary bow is red on
the inside and violet on the outside. In this chapter we will explore the
physics responsible for producing the rainbow.
2.2 the primary bow
To understand how rainbows form we start by analyzing the geometri-

cal optics involved. At its most fundamental level, a rainbow is formed
by the reflection and diffraction of sunlight in spherical raindrops. To
understand how the shape of the rainbow comes about it is useful
to think of the light as monochromatic, made up of one wavelength.
Secondly, it should be understood that the sunlight strikes raindrops as
rays that are parallel to one another.
Figure 2.1 shows a ray incident on the surface of a drop. The ray is
parallel to an axis that, if extended backward, would pass through the
sun. The ability of a medium, water in this case, to bend light is called
the refractive index, designated n. It is the ratio of the speed of light c in
a vacuum to the speed of light v in a medium.
c
n= (2.1)
v
Snell’s Law summarizes an important experimental observation relat-
ing the refractive index of media to the angle that light is refracted. It
says that if the incident angle i and the refracted angle r are measured
with respect to the normal, the ratio of the sines of the angles is equal
to the inverse ratio of the corresponding indexes of refraction. [60].
sin i n
= 1. (2.2)
sin r n2
In the drop of rain the light rays that form the primary bow are twice
refracted and once reflected. The angle D1 between the incident light
ray and the ray that enters the eye to produce the image of a rainbow is
determined by the incident and reflected angles, i and r. More precisely
it is given by
D1 = 180◦ − 2(2r − i ). (2.3)
9
10 matthew mirigian: the physics of rainbows
Figure 2.1: This is the path of a ray producing the primary bow. [93]
Figure 2.2: This is the path of a ray producing the secondary bow. [93]
It is clear that D1 is determined from the incident angle, i. If the values

of D1 are plotted with varying i we see that D1 reaches a minimum
of approximately 138◦ . The supplementary angle, 42◦ , corresponds to
angle above the horizon that the peak of the primary rainbow is seen,
and subsequently accounts for the circular shape of the bow, shown in
figure 2.3. This means that as the sun rises in elevation a smaller and
smaller portion of the bow is visible to the observer. This also means
that an increase in elevation of the observer would allow for a larger
part of the bow to be visible, and it is even possible for a complete
circular rainbow to be observed [70]. Figure 2.3 demonstrates where
the rainbow is visible to the observer. It is also important to note that
this means that the sun can not be more than 42◦ above the horizon.
2.3 the secondary bow
The secondary bow, which can be visible above the primary bow, is
formed by two internal reflections in the drop of rain shown in figure 2.2.
Using the same treatment as for the primary bow we can see the angle
of deviation, D2 to be
D2 = 2(180◦ − 2(3r − i ). (2.4)
If we again plot D2 for values of i we see that a minimum occurs

where D2 = 231◦ . This corresponds to the angle of 51◦ that the observer
2.4 dispersion 11
can view the peak of the secondary rainbow. Notice in figure 2.3 that
the ray is deviated in the opposite direction as for the primary bow.
This explains the reverse in color distribution compared to the primary
bow.
2.4 dispersion
What we neglected in the discussion of the bow formation from monochro-

matic light we will address in a discussion of dispersion which is cen-
tral to the dramatic separation of the visible wavelengths that compose
white light. The previous sections merely explain how sunlight shining
from behind the viewer will be diffracted and internally reflected in
raindrops so as to produce a bow shape. Dispersion explains why it is
that the viewer sees the light separated into the full visible spectrum.
Dispersion is simply the phenomenon in which the change in phase
velocity of waves due to interactions in a medium is related to their
frequency. A simplified consequence of this effect is that the amount
light is refracted depends on the frequency of the light.
A light ray that passes through a vacuum will arrive at some detector,
like our eyes, unchanged with constant velocity. A correct assumption
would be that each photon that is detected originated from the light
source. However, this is not true for light that passes through any
medium, such as air, glass, or water. The light is transmitted through
the medium when the incoming photons are absorbed by the atoms of
the medium and then immediately reemitted by each of the atoms in the
ray’s path. The principle that light propagates by successive emission
of wavelets from particles in the beam’s path is known as Huygens’
Principle and is seemingly in opposition to Newton’s corpuscular theory
of light. This mechanism of transmission through media results in
the slowing in phase velocity, thus leading to refraction as it passes
between media of differing refractive properties, like from vacuum into
a medium such as glass. We saw a consequence of this earlier, in our
discussion of Snell’s Law.
Newton observed the spectrum of visible light by using a prism. He
determined that the index of refraction is related to the the wavelength
of light, what we call dispersion. He observed “that the rays which differ
in refrangibility will have different limits of their angles of emergence,
and by consequence according to their different degrees of refrangibility
emerge most copiously in different angles, and being separated from
one another appear each in their proper colours. [93]" This is the
mechanism by which white light is split into the colors seen in rainbows.
Trends of dispersion through a wider range beyond visible display
some consequences of complex interactions on the atomic level like
absorptions into lattice vibrations of low energy and absorptions due
to electronic excitations of high energy. However, as we consider the
rainbow, it is enough to know that in the region of visible wavelengths
the index of refraction of water increases for higher energy.
As sunlight enters a raindrop the violet end of the spectrum is
refract the most and red light the least so the full spectrum of visible
wavelengths is observed. We see that the refractive index should be
expressed as some function of the frequency of light n( f ).
12 matthew mirigian: the physics of rainbows
Figure 2.3: The paths of light forming the primary and secondary bow. [89]
example 2.1: an example

Light traveling through air (n=1) is incident on the surface of water (n=1.33)
at an angle of 30◦ . To determine the direction of the refracted ray we can use
Snell’s law, equation 1.2.
1 sin(r )
=
1.33 sin(30◦ )
r = 22.2◦
2.5 problems
1. A red laser produces light with wavelength 630 nm in air and

475 nm in water. Determine the index of refraction water and the
speed of the light in water.
Solution: We know that in any material v = λ f . In vacuum this is c = λ0 f .
The frequency is constant in all materials and wavelength changes in
correspondence with the change in velocity. So we can say, f = c/λ0 =
v/λw . We can combine this with equation 1.1 and see
λ0 λ0 630 nm
λw = , n= = = 1.33
n λw 475 nm
Then from n = c/v we can determine the velocity of the light in water.
c 3.00 × 108 m/s

v= = = 2.26 × 108 m/s
n 1.33
2. A secondary bow is fainter than the primary bow of a rainbow because

A. It is a partial reflection of the primary bow.
B. Only larger drops produce secondary bows.
C. The sunlight reaching the raindrops is less intense.
D. There is an extra partial reflection in the raindrops.
Answer: D
The secondary bow is produced by two internal reflections inside a
raindrop shown in figure 2.2.
M AT T H E W D R A K E : T H E C A M E R A A N D H O W I T
WORKS
3
3.1 introduction
o examine the camera, we must first look back to earlier times.

T The first reference to a modern camera was known as a “camera
obscura,” which consisted of a large dark room with a pin-hole on one
side. An image would be formed on the wall opposite the pin-hole as
shown in Fig. 3.1.
The camera obscura was first introduced as a model for the eye by
Abu Ali Al-Hasen ibn al-Hasan ibn Al-Haytham, or Alhazen for short.
Alhazen investigated the camera obscura by placing three candles
outside of the camera obscura and systematically obstructing each
candle and observing the effects on the produced image. Alhazen
observed that the image produced was reflected about the pin-hole
point, as seen in Fig. 3.1. To explain why the human eye sees images
right-side up, Alhazen interpreted these findings to mean that the
image was sensed on the outside of the eye, despite having the nerve
and retina structure in his model [43].
The camera obscura was later improved by Girolamo Cardano when
he suggested putting a convex lens in the pin-hole to help focus the im-
age. By inserting a convex lens of proper focal length, the image would
become far more clear. In his original description however, Cardano
failed to explain this mathematically [44].
Take a piece of paper, and place it opposite the lens as

much removed [from the lens], that you minutely see on
that paper all that is outside the house. This happens most
distinctly at a certain point, which you find by approaching
or withdrawing with respect to the lens, until you find the
convenient place.
Figure 3.1: The inverted image is projected on to the opposite wall from the
light coming in through the pin-hole. Image courtesy of [43]
13
14 matthew drake: the camera and how it works
Principles from the camera obscura are still used in modern cameras
which we will explore in detail later in this chapter. The camera obscura,
along with the lenses will be explained in further sections.
3.2 lenses
To begin our discussion of the camera, we must first understand the

role of the lens. Lenses come in two forms, converging and diverging.
To use the most simple description, a converging lens is shaped like
the outside of a sphere, while a diverging lens is shaped like the inside
of a sphere. To understand a lens’ application to the camera, we only
need to understand the thin lens approximation as opposed to more
detailed equations. In general, the thin lens approximation is valid
when the thickness of the lens is small compared to the object and
image distances. The thin lens equation states that
1 1 1
= + 0 (3.1)
f s s
where f is the focal length, s is the distance of the object to the lens,
and s0 is the distance of the image that is created by the lens, where
all distances are measured along the optical axis. If the image has a
positive distance, then it is a real image. If the image is a negative
distance, then it is a virtual image. Two important properties emerge
from the form of Eq. 3.1.
1. If the object is at the same distance as the focal length, than for
Eq. 3.1 to hold, s10 must go to zero and thus, the image forms at
| ∞ |.
2. If the object is very far away (approaching |∞|), than for Eq. 3.1
to hold, the image forms at the focal length of the lens.
I specify that the distances are at absolute value of ∞ because they can
be towards the positive or negative end of the optical axis. Which end
of the optical axis an image is located on is dependent on whether the
lens is converging or diverging. By definition, if the lens is a converging
lens, than the focal length is positive. If the lens is a diverging lens, than
the focal length is negative. Figure 3.2 shows a typical image formation
from a converging lens. The optical axis has the origin located at the
center of the lens with the left side of the lens typically being the
negative direction.
Along with the placement of the image, we can also examine the size
of the image that is formed. Because the lens bends the light rays, the
size of the image will be different from the size of the object, except in a
special case. The magnification of the image comes from the equation
s0
m=− (3.2)
s
The magnification tells us the ratio of the size of the image to the size of
the object. A negative sign in the magnification tells us that the image
is inverted, as is the case in the camera obscura.
3.3 the camera itself 15
Figure 3.2: The rays from point Q are refracted by the thin lens to bend toward
the focal point. Because the lens is thin, the ray is considered to bend at the
middle of the lens. Image courtesy of [99]
example 3.1: forming an image

Suppose there is an object 1 m away from a converging lens. If the focal
length of the lens we use is 20 cm, where will the image be formed? What
will the magnification of the image be? What does the placement of the image
and the magnification tell us about the image? To do this, we apply Eq. 3.1
and solve for s0 . Then we use Eq. 3.2 to find the magnification.
1 1 1
cm−1 + 0 = cm−1 (3.3)
100 s 20
1 5 1 4
= cm−1 − cm−1 = cm−1 (3.4)
s0 100 100 100
s0 = 25 cm (3.5)
25 cm
m=− = −.25 (3.6)
100 cm
The negative magnification tells us that the image is inverted and one fourth
of it’s original size. Since the image distance is positive, we also know that it
is a real image.
3.3 the camera itself
Now that we have an understanding of lenses, we may apply that

knowledge to understand how the camera works. The basic method
that the camera uses is to form a real, inverted image onto a small
screen using a converging lens. In general, only a real image may be
projected onto a screen, so a converging lens is the natural choice for
this application. The camera is made up of a fully enclosed box, a
converging lens, a shutter to open the lens for a small period of time,
and a recording medium. Figure 3.3 shows how an object forms on a
camera screen.
The screen of the camera is typically some photosensitive material or
an electronic detector, depending on if the camera is a digital camera
or not. When the picture is taken, the shutter opens to allow the light
in for a short period of time and the image is first recorded inverted.
16 matthew drake: the camera and how it works
Figure 3.3: The image of the key is real, inverted, and smaller than the original
object when it is seen through the camera lens. Image courtesy of [99]
The image will have to be inverted again in order to be viewable in the

correct orientation. On a digital camera, there is an electronic process
which will do this. On a film-style camera, this is done when printing
the pictures through a screen process.
Most objects that one would want to photograph would be at a
distance considerably larger than the distance of the lens to the screen,
and so the focal length of the lens must be accordingly small. To see this,
you can work out the Forming the Image example again, but instead
solve for the focal length with a small image distance. As the object
distance goes towards infinity, you will see that the image distance
becomes equal to the focal length.
For an image to be properly recorded on to the medium, the correct
amount of light intensity needs to reach the screen. If the screen gathers
too much light, the image will look white from being too bright. If the
screen gathers too little light, the image will be too dim to recognize.
The amount of light gathered is controlled by the time the shutter is
open, and also by a property known as the f-number. The f-number
is dependent on the focal length of the lens and on the diameter of
the aperture as controlled by the diaphragm. The aperture is a circular
area where the lens is placed and the diaphragm is an adjustable piece
which controls the size of the aperture. The f-number is defined as
f
f-number = (3.7)
D
where f is the focal length and D is the diameter of the aperture. The
light intensity that may reach the film is proportional to the square of
the inverse f-number, the time that the shutter is open, and also the
brightness of the actual object itself.
Nearly all cameras contain options to zoom in or out on objects. One
method of doing this is to arrange two converging lenses as shown in
Fig. 3.4
The primary lens is a converging lens and brings the light rays
together near it’s focal point and inverted. The image is then seen
through the secondary lens, another converging lens. Because the image
is inverted a second time, it now has the correct orientation and can
be magnified. To obtain the magnification of the zoom lens, you must
3.4 questions 17
Figure 3.4: The secondary lens projects the image of the primary lens to make a
magnified image. Image courtesy of [2]
simply combine the two magnifications of the lenses by multiplying

them.
mtotal = m1 m2 (3.8)
This type of zoom lens is one of the most simple lens and is often
referred to as the “astronomer’s lens.”
3.4 questions
1. An image produced by a converging lens of an object at a distance

greater than twice the focal length is
• a) real
• b) inverted
• c) smaller
• d) all of the above
2. If the focal length of the lens on a camera is 1cm, how far should
the screen be from the lens to take a picture of an object 10 meters
away? Explain why camera manufacturers would place the screen
a distance equal to the focal length.
3. Consider two converging lenses in an “astronomer’s lens” setup
with a primary focal length of 30cm and at a separation of 50cm
apart. What focal length should the secondary lens have if you
desire a magnification of 2 for an object at a distance of 20m? You
may use the approximation that for s » f, s0 ≈ f
3.5 solutions
1. Answer: d. If you work out the magnification, it will be negative

and have absolute value less than 1. The image distance will be
positive
2. Answer: The image distance would be s0 = 1000 999 cm. This is very
nearly equal to the distance of the focal length and since most
pictures will have an object distance much greater than that of the
focal length, the screen will be placed at a focal length away.
3. Answer: You must use Eq. 3.8 and solve for m1 and m2 and then fit
those to the conditions. Since f 2 is only used in the s20 expression,
solve to get s20 = 2sf1 s2 Continuing from there, you will eventually
1
end up with
f1 1
f2 = ( − ) −1 (3.9)
2s1 s2 s2
S E B A S T I A N F I S C H E T T I : H O L O G R A P H Y: A N
INTRODUCTION
4
4.1 introduction
he art of holography, although a relatively recent discovery in

T itself, is based on the same principles of optical interference that
you have undoubtedly studied in your introductory physics courses.
A detailed discussion of holography would be too lengthy to include
here; rather, we will only provide an introduction to the mechanisms,
production, and applications of holography. The interested student is
welcome to read more detailed books on the subject [47].
Furthermore, because this section assumes a thorough understanding
of optical interference, we encourage you to review the topic if you feel
the need; see, for example, any standard introductory physics text [100]
or, for a more enjoyable read, the Feynman Lectures on Physics [32].
4.2 the geometric model
Let us begin with a brief review of the mechanism behind classical

photography. An object is photographed by using an optical setup to
project the three-dimensional object into a two-dimensional image; this
image, which consists almost always of incoherent light, strikes a plate
covered in a chemical emulsion, triggering a reaction that essentially
stores the “brightness” of the light striking it; in this sense, a standard
photograph only records information about the amplitude of the light
waves striking each section of it, and can only record a two-dimensional
image (we assume, for simplicity, that the light and photograph are
monochromatic, so we needn’t worry about how the light’s wavelength
is recorded).
How, then, are we able to store a third dimension in a two-dimensional
plate? The answer lies in exploiting the interference effects of coherent
light. If coherent light is emitted from two point sources, it will interfere
with itself to yield constructive and destructive interference fringes:
you have studied this phenomenon in your lab as Young’s double-slit
experiment. If we were to trace the locations of constructive interference
throughout space, we would obtain a set of surfaces called interference
fringes. By placing a photographic plate in the setup, we can record
the locations and directions of these surfaces (see Figure 4.1(a)). This
will effectively create a set of “partially reflecting surfaces” within the
plate, which, when illuminated with the same type of light used to
create the image, will reflect light in such a way as to reproduce the
original image (see Figure 4.1(b)).
example 4.1: the shape of interference fringes

Assume coherent light is being emitted from two point sources A and B, as
in Figure 4.1(a)). What shape will the resulting interference fringes be?
The interference fringes occur at points of total constructive interfer-

ence. Imagine we find a single point P in space where the beams from the
19
20 sebastian fischetti: holography: an introduction
(a) (b)
Figure 4.1: Figure (a) shows the hyperbolic interference fringes produced by
two point sources of coherent light, and how these fringes are recorded in a
photographic plate; Figure (b) shows how the interference fringes recorded
by the holographic plate create a virtual image when illuminated by coherent
light [47]. This is an example of a simple transmission hologram.
two sources interfere constructively; this means that the difference in the
path lengths is an integral number of wavelengths: AP − BP = nλ. The
interference fringe passing through P must therefore consist of the locus of
points such that the difference of the distance to A and B is a constant. This is
none other than the geometric definition of a hyperbola; thus the interference
fringes produced by two point sources of coherent light will be hyperbolic
surfaces. In particular, the total number of interference fringes is limited to
the number of wavelengths that can fit into the distance between A and B.
In order to take holograms of extended objects, we exploit this same

principle: we split a beam of coherent light, using one beam (the object
beam) to illuminate the object to be photographed, and using the
other one to illuminate the plate directly (the reference beam). The
interference between the two beams will be far more complex than in
the case of two point sources mentioned above, but can nonetheless be
recorded within a photographic plate. By illuminating the processed
plate from the same side as the reference beam, the interference patterns
within the plate will reproduce the image.
Note that as a result, holograms are redundant: every section of the
plate contains an image of the entire object. This can be conceptualized
by imagining the plate as a “window” through which we look to
see the holographic image: if all of the plate is covered except for a
small opening, we can still discern the entire image through the the
opening, indicating that the uncovered portion of the plate still contains
information about the entire image (this ceases to be true, however, as
we approach the length scale of the interference fringes themselves).
In this sense, holograms store far more information about an image
than ordinary photographs, where each part of the plate only contains
information about a small portion of the image.
example 4.2: diffraction

Based on your knowledge of the propagation of light, what important
phenomenon are we neglecting to take into account in the above geometrical
model?
We are neglecting to take into account the diffraction of light as it

passes through the various “slits” formed by the interference fringes within
the plate. For our purposes, the geometrical does give a good enough sense
4.3 types of holograms 21
Figure 4.2: Here is how interference fringes are recorded in a reflection holo-
gram; notice that the fringes are approximately parallel to the surface of the
plate [47].
of what’s happening, but there are cases where it fails, at which point we
need to take into account diffraction effects with the zone plate model,
which will be discussed further in Example 4.3.
4.3 types of holograms
There are two broad types of holograms: transmission and reflection

holograms. A transmission hologram is one in which the reference
beam and the object beam strike the plate from the same side, as in
the simple model shown in Figure 4.1(a). In this case, the plane of the
interference fringes recorded in the plate is more or less perpendicular
to the plane of the plate, as shown in Figure 4.1(a), which allows us
to explain the image as nothing but the reflection of light off of the
various “partially reflecting surfaces” produced by the interference
fringes. In fact, two images are produced simultaneously: a virtual
image is produced when the interference patterns within the plate cause
the reference beam to diverge, making the image appear behind the
plate; this is the image we generally view when looking at holograms,
and is the one illustrated in Figure 4.1(b). However, the interference
patterns can also cause some of the reference beam to converge into
the real image, in which case the image can be projected onto a screen
(or, with more difficulty, be viewed directly by placing the eye at its
location). However, attempting to focus all will be impossible because
of the image’s depth; this is a result of the hologram’s inherent three-
dimensional information content.
In contrast, a reflection hologram is produced by illuminating the
plate with the object and reference beams from opposite sides, as shown
in Figure 4.2. Now, the plane of the interference fringes is approximately
parallel to the plate, so that when the plate is illuminated to produce a
hologram, the light undergoes Bragg reflection as it penetrates through
the various interference fringes. As a result, the reflected light, and
hence the image, is only visible from a relatively small range of angles.
Furthermore, the virtual and object images are not produced simultane-
ously; if the plate is illuminated from one side, the reflected light will
diverge, creating a virtual image; if the plate is illuminated from the
other side, the reflected light will converge, producing a real image.
example 4.3: thick vs. thin holograms

Based on the above discussion of how light is reflected from a holographic
plate to produce images, explain why thicker holographic plates tend to
yield higher-quality images than thinner plates.
The thicker the emulsion used to store the interference fringes, the
more well-defined the shape of the fringes, and the more effectively light can
reflect off of them to produce an image. In the case of reflection holograms,
the thickness of the emulsion is especially crucial, because it places an upper
limit to how many Bragg-reflecting surfaces can fit within the emulsion. If the
thickness of the emulsion is significantly greater than the separation between
successive interference fringes, then the hologram is considered thick. If
instead, the emulsion is thinner than the separation between interference
fringes, the hologram is considered thin. A reflection hologram ceases to
exist when it becomes too thin because the Bragg interference is lost. A thin
transmission hologram can still exist, but in this case is called a surface
hologram, since it essentially consists only of surface striations and yields a
lower-quality image than its thick counterpart. A surface hologram cannot be
explained using the geometric model we described, and instead requires
use of the zone plate model described above. In general, thick holograms
are better than thin, as they are capable of containing more information
throughout their depth.
4.4 making holograms
Because the quality of holograms relies so heavily on the formation of

interference fringes within the emulsion, the components of the optical
setup cannot move more relative to each other over a distance greater
than the wavelength of the light being used. For creating visible-light
holograms, this necessitates the use of an optical workbench dampened
to external vibrations. Furthermore, in order to maintain coherence
of the light, a single source of light must be used, and the scale of
the image is dictated by the coherence length of the light source. The
typical light source is a laser, modern versions of which can have very
long coherence length, so there is essentially no limit to the scale of
the hologram (except, perhaps, for budgetary concerns). Finally, the
possible emulsions to use vary greatly, depending on the wavelength
and intensity of the light used, the type of hologram being produced,
budget, and desired exposure time.
Also of importance is the reduction of noise in the image; generally,
this is done by adjusting the beam ratio. The beam ratio is defined as
the ratio of the amplitude of the reference beam to that of the object
beam, and is crucial for filtering out intermodulation noise, which
is caused by the object beam interfering with itself (while we want
it to interfere only with the reference beam). By changing the beam
ratio, we can change the relative amplitudes of the object and reference
beams, and therefore change the relative amplitudes of the various
possible interference effects between them. Generally, the ideal beam
ratio depends on the particular geometry of a setup, and is best found
by trial and error.
To produce a high-quality transmission hologram, we set up an ar-
rangement similar to that shown in Figure 4.3. Notice first of all that
only a single laser is used; all three beams originate from the laser.
The reason for this is mentioned above. The particular arrangement
illustrated is convenient because it actually uses two object beams to il-
luminate the object more uniformly. Generally, transmission holograms
use a beam ratio greater than 1:1.
4.5 applications of holography 23
Figure 4.3: A typical setup for producing a transmission hologram [47].
Figure 4.4: A typical setup for producing a reflection hologram [47].
To produce reflection holograms, the apparatus used looks more

like Figure 4.4. This particular setup works best with a beam ratio of
approximately 1:1.
4.5 applications of holography
The applications of holography are varied and complex; here we can

only mention them in passing, but we encourage you to research them
on your own if any seem particularly interesting.
The most promising application of holography is data storage. Un-
like conventional storage devices (optical disks, magnetic disk drives,
etc.), which store information on a two-dimensional surface, holo-
grams are capable of storing information throughout their entire three-
dimensional volume. As a result, holograms can (theoretically) store
information much more densely and efficiently than current means.
The current challenges that holographic data storage faces is the lack of
read-write holographic media and the complexity involved in reading
holograms via computerized means. As of yet, current progress in this
field has been limited, but the potential for significant advancement
exists.
Figure 4.5: An example of pattern recognition using Fourier holograms. To

the left, the transparency of the page in which a certain letter is to be found
is illuminated by a beam of laser light; the leftmost lens creates a Fourier
transform of the transparency, which is projected onto the Fourier hologram of
the letter to be identified. The lens at right reverses the Fourier transform and
projects an array of dots onto the screen wherever the letter was found.
The “ordinary” holograms discussed so far are lensless – they do not

require focusing light onto the holographic plate, as conventional pho-
tography does. However, it is possible to make a hologram using lenses,
made by placing a converging lens between the illuminated object and
the holographic plate such that the object is in the focal plane of the
lens. The resulting hologram cannot be viewed via conventional means
because the lens destroys the crisp image of the object. Nonetheless,
optical information about the object is stored in the hologram. In fact,
this configuration produces the Fourier transform of the object at the
plate; the resulting hologram is called a Fourier hologram. On its own,
a Fourier hologram is not of much use, but can be very useful in pattern
recognition. Imagine, for instance, we wish to find all instances of a
given pattern (say, a particular letter) in a page of text. We can do so
with the arrangement like the one illustrated in Figure 4.5. We first
create a Fourier hologram of the desired pattern. Then we create an
optical Fourier transform of the page of text and project it onto the
Fourier hologram of the desired pattern. Finally, we reverse-Fourier
transform the combined beams. The result will be an array of dots
indicating the location of every instance of the desired pattern in the
original text. You might have heard of this process of combining two
Fourier transforms in your math classes: it is called convolution.
4.6 problems
1. Transmission holograms are visible over a virtually 180◦ range (as

long as the plate remains in the line of sight between yourself and
the virtual image), while the angular range over which reflection
holograms are visible is very limited. Explain.
Solution: This difference can be understood from the geometric model. In
a transmission hologram, the plane of the interference fringes is essen-
tially perpendicular to the plane of the plate; this geometry allows the
reference beam illuminating the plate from behind to reflect in virtually
any direction as it passes through the plate. On the other hand, the
plane of the interference fringes in a reflection hologram is parallel to
the plane of the plate, and the image is formed via Bragg reflection. In
Bragg reflection, a light wave passing through multiple reflective layers
is reflected multiple times, interfering constructively for some angles of
incidence and destructively at others. Therefore, a reflection hologram
can only be viewed over a narrow angular range in the vicinity of the
4.6 problems 25
angle of incidence (or reflection) at which the constructive interference

in maximum.
2. Why would a higher beam ratio be preferable to a low one in the making
of a hologram?
Solution: The beam ratio is meant to reduce intermodulation noise, due
to the interference of the object beam with itself. If the beam ratio is high,
the reference beam’s amplitude is stronger than the object beam’s, and
so the interference of the object beam with itself gives a low amplitude
compared to the high amplitude of the interference of the object beam
and the reference beam, drowning out intermodulation noise.
3. Imagine that in the apparatus of Figure 4.5 we replace the screen to the
right with a transparency containing an array of dots and we replace the
transparency to the left with a screen; then we project a laser beam from
the right. What might we expect to happen?
a) Nothing - the apparatus only works in one direction.
b) An array of letters will be projected on the screen at left, mirroring
the array of dots.
c) It is impossible to tell, since the answer depends on what exactly
the reference hologram is of.
Solution: The correct answer is choice 3b. This is simple symmetry: the
Fourier transform of the array of dots, when convoluted with the Fourier
hologram of the letter, will yield an array of letters.
C O L I N L A L LY: L I S T E N I N G F O R T H E S H A P E O F A
DRUM
5
5.1 introduction
classic problem in analysis is that posed by Mark Kac in his land-

A mark paper “Can One Hear the Shape of a Drum?” [45]. We shall
follow his exposition, for a while. Then, we shall become absorbed in
the simpler question “Can one hear the size of a drum?” Along the
way, we shall develop the important tool of normal-mode analysis, and
have a quick introduction to asymptotic analysis. For now, consider the
following:
A membrane Ω, such as that depicted in Figure 5.1, stationary along
it’s boundary Γ, is set in motion. Its displacement F ( x, y; t) ≡ F (r; t) in
the direction perpendicular to its original plane is known to obey the
wave equation
∂2 F
= c2 ∇2 F, (5.1)
∂t2
√
where c is some constant that we shall normalize to be c = 1/2.
There exist special solutions to the wave equation, of the form
F (r; t) = U (r)eiωt , (5.2)
which are called normal modes. Each of these solutions corresponds

to a fundamental frequency at which the membrane can vibrate. By
substituting U (r)eiωt into (5.1), we find the corresponding equation for
U:
1 2
∇ U + ω 2 U = 0, U = 0 on Γ (5.3)
2
An illustration of the solution of (5.3) follows as part of the example
below.
example 5.1: normal modes of a string
This is the one-dimensional limiting case of the general problem being set-up
above
Consider a string of length ` stretched taut between two walls (Figure 5.2).
Its end points are fixed; that is, it obeys the boundary conditions
F (0, t) = 0 = F (`, t),
Figure 5.1: A membrane Ω; from [45]
27
28 colin lally: listening for the shape of a drum
where F is the displacement of the string, as given before. We want to find

the normal modes of this string.
Let F = U ( x )eiωt be one of the normal modes we seek. Then U ( x ) is a
solution of the equation
d2 U
+ k2 U = 0, (k2 = 2ω 2 )
dt2
which is just (5.3) in one dimension (minus the boundary conditions included
in 5.3). The general solution to this equation is
U = A cos kx + B sin kx.
Note that since the boundary conditions on F involve only x, they are effec-
tively boundary conditions on U. Applying them, we have
U (0) = 0 =⇒ A=0
nπ
U (`) = 0 =⇒ B sin k` = 0 =⇒ k = ,
`
where n is a positive integer. Thus,
nπ
ω= √
2`
and our normal mode (now labeled by the integer n) is
nπx √
Fn ( x, t) = B sin einπt/ 2` ,
`
where B is a normalization constant.
A very interesting phenomenon appears in this example: there is a

discrete sequence of normal-mode frequencies ω1 ≤ ω2 ≤ ω3 ≤ . . . .
Each of these frequencies corresponds to exactly one Un through the
relation between k and ω.
5.2 eigenvalues
It turns out that this result holds generally, regardless of problem

dimensionality or geometry. Thus, for any region Ω bounded by a
smooth (i.e., differentiable) curve Γ there exists a sequence of eigenvalues
λ1 ≤ λ2 ≤ . . . such that there corresponds to each an eigenfunction
ψn (r), which satisfies
1 2
∇ ψn + λ2n ψn = 0.
2
Naturally, ψn (r) = 0 for any point r that lies on the boundary Γ. Note
that the eigenfunctions are normalized such that
Z
ψn2 (r) d2 r = 1.
Ω
We are now in a position to formulate the problem to which we

alluded earlier. We wish to consider two separate regions (Ω1 and
0 l
Figure 5.2: The string considered in 5.1

5.3 problems 29
Ω2 ) with distinct boundaries (Γ1 and Γ2 ). Let us then consider these

membranes’ respective eigenvalue problems:
1 2
Ω1 : ∇ U + λU = 0, U = 0 on Γ1 (5.4)
2
1 2
Ω2 : ∇ V + µV = 0, V = 0 on Γ2 (5.5)
2
Suppose that, for each n, λn = µn . We want to determine whether, if the
eigenvalue spectra are identical, the two regions Ω1 and Ω2 necessarily
“have the same shape.” Kac more correctly (but less intuitively) couches
the question in terms of Euclidean congruence [45].
Before we continue, we should note that this particular problem has
recently been answered in the negative: it is possible to have differently-
shaped membranes that possess the same spectrum [35]. This result
was discovered only in 1992, and is highly mathematical in nature. We
shall therefore do no more than note it, and proceed to examine some
interesting things that can be deduced from eigenvalue spectra. Hence,
we shall first see whether one can “hear” the size (really the area) of a
drum.
In order to answer this question, we shall use the methods of asymp-
totic analysis [5]. Quite briefly, this analysis deals with finding the lim-
iting behavior of some expression. If we have two functions f ( x ) and
g( x ), then we call them approximately equivalent and write
f ∼g ( x → ∞)
if
f (x)
lim = 1.
x →∞ g( x )
We can thus find the qualitative behavior of f at large values of x.
Proceeding, we would like to know how many eigenvalues (from our
eigenproblem) exist that are less than a given number λ (keep in mind
that we are considering large λ). A result posited by H. A. Lorentz (in
different form than is used here) and proved by Hermann Weyl is that,
for any eigenproblem, the number N (λ) of eigenvalues less than some
given λ is
|Ω|
N (λ) = ∑ 1∼
2π
λ, (5.6)
λn <λ
where |Ω| is just the area of the membrane. It follows that

N (λ)
|Ω| ∼ 2π . (5.7)
λ
That is, the area |Ω| of a “drum” (the membrane of Figure 5.1) can
be inferred from a knowledge of the number of small normal-mode
frequencies. We can “hear” the area of a drum.
5.3 problems
1. Carry out the analysis done in the example, but use the boundary
conditions
∂F
F (0, t) = 0, = 0.
∂x x=`
These boundary conditions correspond to the physical scenario
of a string with one fixed end at x = 0, and one end free to move
in the plane at x = `.
30 colin lally: listening for the shape of a drum
2. Prove the result (5.6)
3. Exam question: Can one hear the shape of a drum?

a) Yes
b) No
Part II
M I N D O V E R M AT T E R
CHRISTOPHER MACLELLAN: GLASS 6
6.1 introduction to glass
lass is a substance that people encounter frequently in their ev-

G eryday lives. The average person may assume there is nothing
particularly interesting about glass and would describe it as a brittle
clear solid formed from molten sand. Many others believe the common
misconception that glass is a liquid that flows extremely slowly and
that proof can be found in medieval windowpanes that are thicker
at the bottom than at the top. As we will see, this misunderstanding
is not entirely rooted in fiction. Glasses are actually a diverse group
of substances with a unique set of properties. Despite the fact that
humans have been making use of glass for thousands of years and its
importance in our everyday lives, the details that underlie the formation
of glasses is still a hotly debated topic in the fields of chemistry and
physics. [16]
In the common sense glasses are often considered to be a group of
substances like the ones we encounter in our everyday lives. These
glasses are usually hard, brittle, and sometimes transparent. In the
scientific sense glasses are a much broader group of substances. For
example, many organic polymers such as polystyrene and polyvinyl
chloride are technically glasses. Although the exact definition of glasses
may vary slightly from publication to publication, glasses can be de-
scribed as amorphous solids that undergo a glass transition. To further
understand the properties of glasses one must understand what it
means to be an amorphous solid and what a glass transition is. [27]
6.2 amorphous solids
On the macroscopic scale the differences between a solid and liquid are
obvious. A commonly unknown fact about solids is that there are two
fundamental types: crystalline and amorphous. The distinction between
these types can only be seen on the atomic scale. Crystalline solids are
made up of molecules that form a well defined lattice structure that
is repeated throughout the substance. This is called long-range order,
or translational periodicity. On the other hand amorphous solids have
no long-range order. Atoms in amorphous solids do show connectivity
that resembles the structural units of crystalline states but it is not
regular and does not repeat. This quasi-periodicity is indicative of what
is called short-range order [31]. The difference between the molecular
structures of crystalline solids and amorphous solids is shown in Figure
1.
On the molecular level liquids look exactly like amorphous solids
and as one may expect the molecules in an amorphous solid have a
significant degree of freedom. Then what is the difference between and
amorphous solid and a liquid, and why don’t they actually flow? There
are a few different ways to draw the line between amorphous solids
and liquids. One way is to say that a substance is a solid when the time
required for it to undergo a structural change is much longer than the
33
34 christopher maclellan: glass
Figure 6.1: Atomic Makeup of Crystalline Solids vs. Amorphous Solids [7]
time of the observation [11]. In other words, a substance is a solid when

it doesn’t flow in a reasonable amount of time. In fact, some amorphous
solids would take billions of years or more time than the universe has
existed before they would show a noticeable structural change due to
flow. Another way to describe the difference is to define a liquid as
a substance with a viscosity below an arbitrary value. For example,
in his book Physics of Amorphous Materials SR Elliot defines a solid
as a material with a viscosity above 1014.6 poise. Although these two
definitions appear different at first glance, they both attempt to quantify
the same obvious difference between liquids and amorphous solids. If
we return to the common misconception that glass is a slow flowing
liquid we will find that under these definitions what we commonly
consider to be glass cannot be a liquid; it simply does not flow fast
enough. So why are medieval windowpanes thicker at the bottom than
at the top? The answer lies in the method used to create the glass, not
in its atomic structure. Now that we have defined amorphous solids
and how they are different from liquids and crystalline solids we must
examine the glass transition, which is the property that distinguishes
glasses from amorphous solids. [27] [16]
example 6.1: glass in our everyday lives

We have defined glass in a scientific sense, but what about the glass we use
in our everyday lives? Glass in the common sense is nothing more than an
amorphous solid made up of silica(SiO2 ). Silica is found in many forms,
the most abundant of which is sand. This is not the whole story though.
Almost all of what we call glass is not made up of pure silicon dioxide for
practical purposes. Silicon dioxide (crystalline form) has a very high melting
point of around 1700o C [27]. Because of this high melting point soda ash
( Na2 CO3 ) is often added to the silicon dioxide. This lowers the melting point
to about 1000o C, making the molten glass much easier to handle. To provide
extra hardness and chemical stability lime (CaCO3 ) or dolomite ( MgCO3 ) is
added to the mixture, which is called soda-lime glass. The concentrations of
these materials can vary, but the makeup is commonly 60-75% silicon dioxide,
12-18% soda and 5-12% lime. Other impurities can be added to change the
glass properties for aestetic or functional purposes. Soda-lime glass makes
up most of the glass we use everyday and can be found in everything from
drinking glasses to windows.
6.3 the glass transition
When a liquid is cooled it may undergo a first order phase transition

and crystallize at the melting point of the substance. In our everyday
experience the transition from water to ice is a perfect example of this.
6.4 simulating amorphous materials with colloids 35
Figure 6.2: A Thermodynamical View of Liquid-Solid Transitions [7]. The path

AD represents the transition from liquid to glass (for an arbitrary cooling rate),
while the path ABC represents the transition from liquid to crystalline solid.
Notice the discontinuous change in volume at the melting point during the
crystalline transition . Compare this to the glass transition where there is a
continuous change in volume at the the glass transition temperature.
This is not the only thing that can happen when a liquid is cooled. If
cooled fast enough it will not crystallize and will remain in a liquid
state below its melting point. This is called a supercooled liquid and is
also indicated by increasing viscosity as temperature is reduced. Even-
tually the supercooled liquid cools to the point that it forms a glass.
This transition is marked by a glass transition temperature. This tem-
perature is defined as the range in which thermodynamical variables
(e.g. volume, enthalpy, entropy) begin to change. It is a range because
the value of the variables changes in a continuous fashion, unlike the
change that occurs at the melting point. It is important to note that
a substance does not have a specific glass transition temperature. As
mentioned earlier, supercooled liquids only result from sufficiently fast
cooling rates. The cooling rates that will allow a supercooled liquid
to form depends on the substance being cooled. This cooling rate also
changes the region over which a substance can be supercooled. If one
reduces the rate of cooling, the range over which the temperature can
be supercooled increases, which effectively lowers the glass transition
temperature. In fact, the glass transition temperature can vary as much
as 10-20% for very different cooling rates. An odd consequence of these
properties is that the measurement of the glass transition temperature
varies by how the glass is prepared. Despite this knowledge of the glass
transition, there are still many questions left to be answered. Physicists
and chemists still debate the details of the glass transition on a regu-
lar basis and new methods are constantly being formulated to try to
analyze glass structure and transition. [27]
6.4 simulating amorphous materials with colloids
Some of the most promising ways to study the structure of glasses

are being developed using colloids to simulate the molecular structure
of amorphous materials. A colloid is a mixture in which a system
Figure 6.3: A diagram of the experiment. A binary colloid is allowed to settle in

a capillary tube and is imaged with a confocal microscope. The larger particles
are found in higher concentrations at the bottom, while the smaller particles
are found in higher concentration at the top [69]
of particles between the sizes of 10−7 and 10−5 cm is dispersed in a

continuous medium [17]. In our case we will consider colloids in which
the medium is a liquid and the particles are micron sized hard spheres.
When colloids are allowed to settle the particles pack together and
resemble a continuous solid. In fact, the arrangement of particles is
directly analogous to the arrangement of molecules in a solid. Because
of the relatively large size of colloid particles we can measure each one’s
position individually. We exploit this property of colloids to analyze
their structure and extend our findings to actual solids.
In the laboratory we create a colloid made up of a binary system of
the aforementioned hard spheres. We use two different sizes because it
is well established that monodisperse systems only settle in crystalline
arrangements. In contrast, binary systems show areas of amorphous
arrangement. In our case we place our colloid is in a capillary tube
so gravity determines the distribution of large and small particles. As
you would expect the larger particles are more prevalent at the bottom
while the smaller particles are found in higher frequency at the top.
In the middle there is a continuum of different compositions with
respect to sphere size. It is this middle area where a truly binary colloid
exists with amorphous structure. A confocal microscope can be used to
measure the position of each particle in the capillary tube. A diagram
of this setup is shown in figure 3. [69]
The particle arrangement is determined by defining a radial distribu-
tion function g (r ) which is defined as,
L2
g (r ) =
2π∆rN ( N − 1) ∑ δ (r − |~rik |) , (6.1)
i 6=k
where g (r ) is the ratio of the ensemble average of particles in a region

r∼= r + ∆r to the average number density N/L2 where L is the square
image length and δ is the Dirac delta function. This can be thought of
as the relative number of particles present at a distance r from a central
particle. This single parameter can give a quantitative measurement of
the degree of long-range order present [69].
The radial distribution function at different heights is shown in
Figure 4. You can see that when z = 1.8mm and z = 6.5mm g (r ) some
6.5 practice questions 37
Figure 6.4: The radial distribution function g (r ) at different heights where σ is

the mean particle diameter. Notice how the oscillatory behavior first decreases
and then increases as r increases indicating a change transition from crystalline
state to glassy state back to crystalline state. Not that the lines graphs are offset
for clarity. [69]
oscillatory nature exists. This indicates that there is some regular change
in density as you get further away from the central particle, which is
what we would expect from a structure with long-range order. This
also agrees with our assertion that crystallinity exists where there is
a relatively monodisperse composition of particles. When z = 3.3mm
we would expect more or less equal mixing between the different size
spheres and an amorphous structure. g (r ) at this height confirms this,
as there is no pattern in how the density of particles changes as r
increases [69].
Using this analysis we have found a way to measure the long range
order in a substance. Although the above treatment is mainly quali-
tative, more analysis techniques are being developed to describe the
positioning of the spheres in both 2D and 3D. [69] Nevertheless, this
method has been proven to be a efficient way of creating system that
is analogous to solids that can be experimentally measured and easily
manipulated. Analyzing colloidal system like the one above provides
one of the best chances for physicists and chemists to unlock the secrets
of how and why glasses form.
6.5 practice questions
1. In a technical sense glasses are identified by their lack of

• a)long range order
• b)short range order
• c)flexibility
• d)color
2. When cooling a liquid to form a gas, raising the cooling rate the
glass transition temperature
• a)raises
• b)lowers
• c)does not change
3. Colloids are used to simulate the formation of solids because

they:
• a)are the same thing
• b)colloid particles are large enough to have their individual
positions measured
• c)never show crystalline arrangements
• d)all of the above
R O B E RT P I E R C E : T H E P H Y S I C S O F S P L A S H I N G 7
7.1 introduction
ave you ever noticed how the water droplets that come from
H your faucet splash at the bottom of the sink? Have you ever
watched rain droplets splash as they fall and hit your car windshield
while driving? The phenomenon known as splashing occurs in many
situations such as these. Splashing is also important in technological
and industrial situations such as the ignition of fuel in your car, or when
your printer puts ink onto a piece of paper. This chapter will help you
understand situations such as these by developing an understanding of
the physics of splashing.
Scientists have studied splashing for over a hundred years. Because
of the beauty of the motion involved with splashing, it has been one of
the most highly praised phenomena studied via the use of high speed
photography. From famous sketches and photographs of scientists
such as Harold Edgerton and A.M Worthington[96] , the beauty of
splashing has been available to mainstream society. Yet the physics of
this tremendous phenomenon is still not fully understood.
The goal of this chapter is to develop an understanding of some of
the physics involved when a drop of liquid falls and strikes a smooth
surface. We will see that the there are many factors (characteristics of
our system) that determine how the system will evolve in time. We will
review fundamental concepts such as pressure, surface tension, viscos-
ity, and others to develop a groundwork for understanding splashing
as well.
7.2 pressure, surface tension, and other concepts
Recently, fronted by the research of Dr. Wendy Zhang[97], Sidney

Nagel[97][98], and others, it has been shown that air pressure has a
Figure 7.1: A photograph taken by Martin Waugh[90] as part of his liquid

sculpture images. This is an image of milk splashing on a smooth surface. This is
just another example of the spectacular beauty of splashing, and why splashing
has grabbed the attention of many photographers around the world.
39
40 robert pierce: the physics of splashing
tremendous influence on whether or not a splash will occur when a

liquid drop falls onto a smooth surface. The research has shown that
the lower the gas pressure surrounding the splash zone, the smaller
the splashes become, until they disappear all together. You can see
through high speed imaging that at a certain critical air pressure, the
corona (a layer of liquid spreading out from the center of impact with
the smooth surface) disappears all together, and there is no splashing.
Instead, the liquid just spreads out along the surface in the same way
that one would see spilled milk spread out across the kitchen table. This
area of the physics of splashing is a bit too complicated to be addresed
here, and we will focus on something less difficult. We will focus our
attention towards what characteristics of the system produce, not only
a corona, but splashing (when the corona breaks up into thousands
of individual droplets). In order to understand these characteristics of
splashing to an appropriate degree, we need to review some of the
physics of pressure, viscosity, and surface tension.
7.2.1 Pressure
Pressure is defined as the force per unit area of a surface directed

perpendicularly to the surface. In our context of falling liquid drops,
we will be focusing on atmospheric pressure due to the force of the
gases in the atmosphere above the surface. We have that
FN
P= , (7.1)
A
where P is pressure, FN is the normal force on the surface, and A is the
area of the surface. In our system, FN will be equivalent to the weight
of the gas particles above the surface, as well as the force these gas
particles exert onto the surface due to their speed. For ideal gases, we
may express the pressure as
N ρ
PV = Nk b T =⇒ P = k T =⇒ P = A k b T, (7.2)
V b Mg
Figure 7.2: Photographs taken by Lei Xu, Wendy W. Zhang, and Sidney R. Nagel
in their experiment involving falling alcohol drops onto a smooth surface. The
top three frames represent alcohonl under an air pressure of 100 kPa, and the
bottom three frames represent alcohol under an air pressure of 30 kPa. The left
frames (top and bottom) are the drop just above the surface, at time t = 0 ms,
the middle frams are the drop at time t = .276 ms, and the right frames are the
drop at time t = .552 ms. We see that in the middle frames a corona or liquid
layer is spreading out from the center, and by the time expressed in the right
frames, the stress due to air pressure has won, and we see splashing.
7.2 pressure, surface tension, and other concepts 41
where Mg is the molecular mass of the gas g, V is volume, N is the

number of particles, T is the temperature, k b is the Boltzmann constant,
and ρ A = Mg /V is the density in the atmosphere. Equation 7.2 is the
Ideal Gas Law. From 7.2 we readily find an expression for the gas
density in the atmosphere to be
PMg
ρA = (7.3)
kb T
We will see that after a drop strikes the surface, a liquid layer will
spread outwards (the corona), and interact with the atmosphere. There
will be stress on the liquid layer due to the atmosphere that will be
proportional to, among other things, the density of the gas in the
atmosphere. We will learn what the other things are later on, but
first make sure that you are capable of using the above equations to
analyze this system in terms of the atmospherically applied stress.
Make sure that you understand pressure in the context of our system
by performing the following example.
example 7.1: using the ideal gas law
1. When observing a liquid drop spread out after coming into contact
with a smooth surface, we notice that the liquid spreads out very
rapidly. Under these circumstances the stress applied to the spreading
liquid layer due to the atmosphere will be proportional to the density
of the gas in the atmosphere. If our atmosphere is made of air, and
if air has a molecular mass of M29 = 29u (u = 1.66x10−23 kg), what is
the pressure that the atmosphere may exert on a liquid layer that is
spreading outwards? Assume that we observe this at room temperature
(T = 295K). The density of air is measured to be ρ A = 1.2kg/m3 ,
and Boltzmann’s constant is k b = 1.381x10−23 m2 kgs−2 K −1 . Does this
pressure look familiar?
Solution: Here we may use 7.2 to solve the problem. We are given values
for the variables ρ A , Mg , and T. Solving for pressure, we find
ρ 1.2kg/m3
P= kb T → P = (1.381x10− 23m2 kgs− 2K − 1)(295K ).
Mg 29 ∗ (1.66x10− 23kg)
(7.4)
So we find that pressure P is,
P = 101, 500kgm− 1s− 2 → P = 101.5KPa (7.5)
This pressure is known as atmospheric pressure. It is the pressure

measured due to the atmosphere at sea level. We get this value because
the density given in the problem is the density of air molecules at sea
level.
7.2.2 Viscosity and Surface Tension
Viscosity may be defined as the resistance to flow in a liquid. A general

way to view a system involving viscosity is to imagine a flowing liquid
section that is divided into many infinitesimally thick layers from
top to bottom. Individual layers will move at different velocities from
eachother. This is because the layers at the bottom will interact with
the surface, layers at the top will move with the initial speed of the
liquid section itself, and the middle layers will move with velocities in
between the two extremes. Let’s make shear stress more clear: stress
is defined as the average force applied to a surface per unit area (see
equation 7.6), and shear means that the stress is being applied parallel
to the surface. Shear stress will result as layers move towards or away
from other layers. Stress τ is proportional to force F by
F
τ= . (7.6)
A
Where A is area. Shear stress is directed parallel to the liquid. Ultimately
we will see that the stress in the liquid is acting to hold the liquid
together. We will see that this stress will interact with the stress on the
liquid from the atmosphere (due to pressure as described in earlier
sections), and when the atmospheric stress is stronger, splashing will
occur.
First, let us devote some time to viscosity. In our study of viscosities
we will be interested in what is known as kinematic viscosity. This is
when we look at the relationship between the resistive force in a liquid
due to viscosity, and the inertial force of the liquid. Kinematic viscosity
νL of a liquid is defined as
µ
νL = , (7.7)
ρ
where µ is the dynamic viscosity of the liquid, and ρ is the density of

the liquid. In our system of a liquid drop falling onto a smooth surface,
layers of thickness d will advance away from the center (where the drop
initially made contact with the surface). We can estimate the thickness
of the boundary layer, the first layer closest to the surface, to be
√
d = νL t, (7.8)
where t is the time elapsed since the drop struck the surface.
Now we are in a position to consider the stress on the expanding
liquid layer due to the liquid layer itself. The stress on the boundary
liquid layer will be due to the surface tension of the liquid striving to
keep the layer intact. With d defined as above, we have that the surface
tensional stress (shear stress) is
σ σ
ΣL = → ΣL = √ (7.9)
d νL t
And since we have this knowledge of the stress on the expanding

liquid layer due to the liquid layer itself (ultimately the force trying to
hold the liquid layer together), we can use it in combination with the
stress on the expanding liquid layer due to the atmosphere (ultimately
the force trying to break the liquid layer apart), and observe under
what combinations we get splashing.
7.3 splashing
Splashing in our system will occur when the stress on the liquid layer
due to the atmosphere is stronger than the stress on the liquid layer
due to the liquid. This is because the stress due to the atmosphere is
trying to break the liquid layer apart, and the stress due to the liquid is
trying to keep the liquid layer together. We’ll have two equations for the
7.4 summary 43
two different stresses. Earlier we discussed how the stress due to the
atmosphere should be dependent, among other things, on the density
of the air. We also know that the stress is proportional to the rate of
expansion of the liquid layer, and the speed of sound in the atmosphere.
If Σ A is the stress due to the atmosphere, then our equation for the
stress is
s r
PMg γk b T RV0
Σ A = (ρ A )(C A )(vl ) −→ Σ A = . (7.10)
kb T Mg 2t
Here C A is the speed of sound in the atmosphere, and vl is the velocity

of the boundary layer that has undergone shear stress due to the
surface.1 Also in the second form of the equation, R is the initial radius
of the liquid drop, V0 is the velocity of the liquid drop as it lands on the
surface, gamma is the adiabatic constant of the gas in the atmosphere,
and t is the amount of time elapsed after the drop impact.
We also saw in equation 7.9 that
σ
ΣL = √ . (7.11)
νL t
If we take a ratio of Σ A with Σ L , we’ll get
√σ
s √
ΣA νL t
q RV0 νL
= = γMg P (7.12)
ΣL
q
2k b T σ
q
PMg γk b T RV0
kb T Mg 2t
From this ratio we can see that a more viscous liquid will splash
more easily than a less viscous liquid. This is counter intuitive because
one would think that a more viscous liquid would stay together more
easily, however, it is apparent that a more viscous liquid will make the
value of the ratio in (7.12) larger, which implies that the liquid layer
is more likely to break apart and splash. So splashing is clearly not a
straightforward phenomenon at all.
7.4 summary
Ultimately we see that splashing is an extremely interesting phe-

nomenon that involves some physics that even runs contrary to in-
tuition. We see that atmospheric pressure determines whether or not a
liquid will splash, and we see that when the pressure is low enough
to allow for splashing, we can quantify when we will and will not see
splashing. We will see splashing when the stress due to the atmosphere
is greater than the stress due to the liquid layer. The stress due to the
atmosphere is working to pull the liquid layer apart, and the stress
within the liquid layer is working to hold the layer together, if the
atmospheric stress is stronger, the liquid will break apart. Ultimately a
strong knowledge of splashing may be used to control when we want
to have splashing with some kind of liquid, and when we don’t want
to have splashing. This, in turn, can be used to improve many forms of
industry and technology, and may prove to be very important towards
a more efficient future.
1 For a derivation of this equation, see the paper Drop Splashing on a Dry Smooth Surface by
Lei Xu, Wendy W. Zhang, and Sidney R. Nagel, 2005.
7.5 chapter problems
1. A liquid drop of alcohol with initial radius R = 1.77 mm falls onto

a smooth surface. After the drop strikes the surface, a corona or a
liquid layer moves outward with an initial velocity V0 = 5 m/s,
and we observe the moving corona for t = .5 ms. Let’s suppose
that our experiment is performed at room temperature (T =
295 K, but our atmosphere is made of helium. The pressure due
to the helium atmosphere near our experiment is still equivalent
to atmospheric pressure, and the molecular mass of helium is
Mg = 4u, whereu = 1.66x10−23 kg. If the speed of sound in our
helium atmosphere is 1000 m/s, what will be the stress on the
corona due to the atmosphere Σ A ?
Solution: Here we may use the given conditions to solve for Σ A using the
two parts of equation 7.13.
s r
PMg γk b T RV0
Σ A = (ρ A )(C A )(vl ) −→ Σ A = . (7.13)
kb T Mg 2t
Looking at the left part of the equation; we are given the speed of sound
C A , and we can readily find ρ A and vl by looking at their terms in the
right part of the equation: Namely:
PMg
ρ A −→ , (7.14)
kb T
r
RV0
vl −→ . (7.15)
2t
Plugging in all of the values given, we find:
!
(101.3 kPa)(4u)
ΣA = ×
(1.381 × 10−23 m2 kg-s−2 K−1 )(295 K)
s !
(1.77 mm)(5 m/s)
(1000 m/s) , (7.16)
2(.5 ms)
and we arrive at the answer,
Σ A = (1651.1)(1000)(2.97) = 4903.8 kPa. (7.17)
2. Consider the answer found in the previous question. If the shear stress in
the expanding liquid due to the liquid is Σ L = 4500 kPa when observing
for time t = .5 ms, will there be any splashing? Explain, why or why
not?
Solution: From the first question we have that the stress on the expanding
liquid layer due to the atmosphere is
Σ A = 4903.8 kPa. (7.18)
We know that if the stress due to the atmosphere is greater than the
stress due to the liquid, then the expanding liquid layer will break apart,
and we will see splashing. So the ratio from equation 7.12 will be greater
than 1 when we have splashing, and less than 1 when we don’t have
splashing. In our case,
ΣA 4903.8 kPa
= = 1.09 > 1 (7.19)
ΣL 4500 kPa
We do see splashing. Again, this is because of the fact that the stress due
to the atmosphere (the stress that is trying to break the expanding liquid
layer apart), is stronger than the stress due to the liquid itself (the stress
that is trying to hold the liquid together). Ultimately the atmosphere
wins, the liquid breaks apart, and we see splashing.
7.6 multiple choice questions 45
7.6 multiple choice questions
1. Which of the following does the atmospheric stress Σ A on the

expanding liquid layer NOT depend on?
(a) The speed of sound in the atmosphere.
(b) The density of the atmosphere.
(c) The density of the expanding liquid.
(d) The speed of the expanding liquiud boundary layer.
(e) None of the above.
Solution: (c) The density of the expanding liquid. See equation 7.10
2. How thick is the boundary layer of an expanding liquid on a smooth

surface of it has been expanding for time t = .4 ms with a kinematic
viscosity of νL = 1 × 10−6 m2 /s?
(a) 2x10−5 m
(b) 2x10−4 m
(c) 2x10−3 m
(d) 2 m
(e) None of the above
Solution: (a) 2x10−5 m
Here we may use equation 7.8 to solve the problem:
√ p
d = νL t −→ d = 10−6 m2 /s.0004s = 2x10−5 m. (7.20)
S A M B I N G H A M : F R E A K WAV E S 8
8.1 introduction
n the deep ocean it is common for mariners to see wave heights that
I are in the 5-meter to 12-meter range. In the worst storm conditions
waves generally peak at 15 meters (49 ft). Commercial ships have been
built to accommodate waves of such heights. About once a week a ship
at sea sinks to the bottom of the ocean [4]. Over the last two decades
more than two dozen cargo ships measuring over 200 meters in length
have been lost at sea [38]. Most of the time these ships will sink without
a mayday signal or any trace left behind of its existence.
For centuries mariners have described waves that appear out of
nowhere that approach 35 meters (115 ft.) in height. However there was
no scientific evidence to back up this anecdotal evidence. It is important
to distinguish that these types of waves are described to appear out of
nowhere in the deep sea. This freakish wave is drastically different than
that of the tsunami. A tsunami is created by a disturbance in the ocean.
The most basic example of this is the example of a stone being dropped
in water, which creates waves that are generated radially outward.
Likewise, a quick shift of plates making up the sea floor creates a series
of low waves with long periods. Disturbances known to create tsunamis
range from seismic activity, explosive volcanism, submarine landslides,
or even meteorite impacts with the ocean [13]. Tsunamis occur near
land as their name suggests; tsun translates from Japanese in harbor
and ami into wave (The phenomena known was originally known as a
tidal wave, however, this name incorrectly implies that they are related
to the tides).
While tsunamis are much more deadly than the freak waves de-
scribed by mariners (The megatsunami of December 26, 2004, killed
over 250,000 with waves in the 10-to-30 meter range along the coast of
Indonesia, India, Thailand, Sri Lanka, and elsewhere), their behavior is
much more understood as about three fourth’s of tsunamis originate
from submarine seismic disturbances. These disturbances produce a
great amount of energy that translates into the waves as they travel at
speeds approaching 200m/s with wavelengths on the order of hundreds
of kilometers [67]. Tsunamis are not felt at the deep sea as they only
produce wave heights around a meter or two, which is not detectable
with the typical ocean swell. However, once the waves created by the
disturbance approach areas of shallow water, more energy is packed
into the waves, creating waves in the tens of meters along the shore. In
the 20th century over 1000 tsunamis have been recorded with 138 being
labeled as a “damaging” tsunamis [12]. Since the last half of the 20th
century oceanographers have been able to identify the instigating dis-
turbance for the occurrence of a particular tsunami. This is completely
different than freak waves out of nowhere described by mariners which
carried the same type of lore and about as much comprehension as
mythical monsters like Loch Ness.
For a long time oceanographers dismissed the possibility of the exis-
tence of these types of waves with such a high rate of occurrence. What
47
48 sam bingham: freak waves
Figure 8.1: The crest heights during a 20-minute interval at the Draupner Oil
platform at 15:20 GMT. The peak is 18.6 meters from the surface.
is known as the linear model has been used to predict wave heights out
at the deep sea with great accuracy. The commercial shipping industry
has based most of the designs for its ships off of this model because of
the accuracy of the linear model. The freak waves seen by the mariners
are part of the natural wave spectrum but should only occur about
once every ten thousand years according to the accepted linear model.
Clearly something was wrong.
While oceanographers had started to put more faith and energy into
the belief of such freak waves in the 1970’s the major breakthrough
came with the Draupner wave on January 1st, 1995 at the Draupner
oil platform in the North Sea. The Draupner oil platform, which is
located directly west of Norway, had a laser-based wave sensor at
an unmanned platform that would record waves in a fashion that
would be uninterrupted by the platform’s legs [41]. It should be noted
that oceanographers base much of their analysis on what is called
the significant wave height, or swh, which is the average height from
trough to crest of one third of the largest waves during a given period.
On January 1st at the Draupner Oil platform the swh was in the 11-to-
12-meter range. The maximum wave height measured that day was 26
meters and the peak elevation was 18.5 meters, as shown in figure 8.1.
The annual probability for a wave of that height in the 12 meter swh is
10−2 , or once in a hundred years. While the causes for such waves are
understood with much more confidence since the Draupner wave, it is
important to determine what waves are freak waves by nature.
8.2 linear model
Now that we understand the Draupner wave and a bit about these
freak waves let’s take a look at just how odd these waves are when
using the linear model. In the most simple model of ocean waves,
the sea elevation is taken to be a summation of sinusoidal waves of
different frequencies with random phases and amplitudes. In the linear
approximation the wave field is taken as a stationary random Gaussian
process. The probability density distribution is described as
1 2 /2σ2
f( η ) = √ e−η . (8.1)
2πσ
8.3 interference 49
In Eq. 8.1, η is the elevation of the sea surface with a zero mean level,
hη i = 0, and σ2 is the variance computed from the frequency of the
spectrum, S(ω )
Z
σ2 = h η 2 i = S(ω )dω. (8.2)
If we take the wind wave spectrum to be narrow, the cumulative prob-

ability function of the wave heights will be described by the Rayleigh
distribution such that
2 /8ω 2
P( H ) = e− H . (8.3)
Equation 8.3 gives the probability that the wave heights will exceed a
certain height, H. We can now introduce the significant wave height,
Hs , into the equations and begin to see its relevance. Extensive work by
Stanislaw Massel has shown that Hs ≈ 4σ[58]. From this we can find a
direct relation into the probability for a certain wave height to occur
under the conditions that are present by rewriting Eq. 8.3 as
2 /H 2
P( H ) = e−2H s . (8.4)
At this point a mathematical definition for what constitutes a wave

being a freak wave is necessary. Looking back at the Draupner wave, the
maximum wave height would not be completely out of the ordinary for
severe storm conditions that created 15 meter significant wave heights.
The fact that the Draupner wave occurred with a significant wave height
of 12 meters is what makes it a freak wave. From this point on we will
refer to these freak waves as rogue waves, which by definition are waves
that have a wave height that is more than double the significant wave
height.
example 8.1: shallow water rogue waves
Could a rogue wave occur in shallow water of only 20 meters in depth with
the top third average waves only around 2 meters in height? If one could
what would be the most probable rogue wave and with what probability
would it occur?
Solution: This problem is quite simple but requires careful thought on the
definition we established for a rogue wave. We stated that a rogue wave is a
wave with a wave height of twice that of the significant wave height, thus a
5-meter wave would be a rogue wave if the swh was 2 but not if it was 4. So if
we return to the question we can surely have a rogue wave in shallow water
with a swh of 2 meters if the wave height is greater than 4. This 4-meter wave
would also be the most probable rogue wave with a probability that is found
2 2
from Eq. 8.4. With our conditions we get P(4) = e−2(4) /2 = 3.5 × 10−4 .
These types of waves are referred to as shallow water rogue waves and are
just as freakish as the giant ones in the deep sea. The highest recorded ratio
H/HS was a wave height of 6m in conditions with a significant wave height
of 2m.
8.3 interference
While the Draupner wave was a tremendous breakthrough for the field
of oceanography, as it proved that such waves do exist and that there
were inaccuracies in the standard linear model, it did not show how
or why such waves form. In order to examine the possible causes for
these waves researchers looked at the areas in which they occurred. By
plotting areas in which freak wave phenomena were described in the
past a pattern was found. A great number of the waves occur in areas
where there are fast currents that run counter to the wave direction. The
most extreme example of this is the Agulhas current off of the Cape
of Good Hope and the tip of South Africa. The Agulhas current is the
fastest current on the planet. Figure 8.3 shows 14 reported freak waves
found in a study by Captain J. K. Mallory prior to 1975. His study
found similar results in the Gulfstream current, the Kuroshio current
and any others in the Pacific Ocean.
Figure 8.2: These are the results of a study by Captain Mallory done in 1976.
The circles represent cases where abnormal waves were observed, the dotted
line is the continental shelf, and the other depths show the continental slope.
The abrupt change in depth is what creates the Agulhas current and gives it its
jet like function parallel to the shoreline [49].
Traveling around the tip of South Africa has always been a rough
stretch for mariners, but the area of Agulhas current created waves that
mariners said could reach 30 meters in wave height. In this region the
waves are generated off Antarctica and move northward towards South
Africa. They run unopposed until they meet the Agulhas current, which
moves in a southwest direction parallel to the east coast of South Africa.
Because of this area of interaction the wind waves are made up of two
different systems: the long swell waves coming from the Antarctica
and short/steep waves generated by the sea wind of South Africa. The
superposition of these two systems will undoubtedly contribute to an
increase in wave height due to possible constructive interference, but
not enough to account for the rogue waves reported in the area.
The solution to these types of waves relates to geometric optics. If
we think about a strong variational current as a lens then it could
focus opposing waves and create points where enough energy would
converge to explain the freak waves. This would be possible because
the areas of the strong flow of the current would retard the wave
advancement much more than the weaker flow, which would cause the
wave crests to bend. This refraction allows for the focusing of the waves
energy and allows the waves to grow in height as shown in figure 8.3.
8.3 interference 51
The most commonly accepted research on this subject variational

current focusing was done by I.V. Lavrenov in the late 1990’s. His work
showed that the focusing by currents would shorten the wavelength
of storm-forced waves that ran against the current and the wave trains
would press together possibly into a rogue wave. A major concept that
Lavrenov derived was that H/H0 would reach a maximum value of
2.19 for the conditions of the Agulhas current. In this formula H0 is
the mean height of the swell propagating on still water. From this we
can easily see that the maximum height for a wave caused by current
refraction would 2.19 times the mean height of the swell.
Since rogue waves are not limited to occurring in areas where there
is a strong counter current, there has to be some explanation for these
other extreme waves. The solution is actually very similar to the one
just discussed. The thought was to look for other possible ways that
wave energy could be focused into one spot. The easiest way to do
this is through the ocean basin. It is not too hard to see that geometric
focusing can occur with the help of the underwater topography. The
behavior of the rays in the basin, however, is rather complicated as the
real topography creates many caustics.
Variational Current
b
Wave
crests
Wave
orthogonals
Figure 8.3: a shows the interaction of the two systems that are present off the tip
of South Africa [49]. b shows the refraction that waves undergo when flowing
against a current that has a variable surface speed. This refraction can lead
to the build up of rogue waves when the wave orthogonals converge [Figure
made in Inkscape].
8.4 nonlinear effects
With the understanding of refraction applied to water waves oceanog-

raphers could generally predict where the rogue waves would occur.
Since the ships in the shipping industry are built to handle 15-meter
waves there was no need to travel through areas in which they could
encounter the rogue waves. The ships could simply travel another route
and there was no need to redesign millions of ships across the world.
This apparent success turned out to be rather short lived. In February
of 2001 two ships, the Caledonian Star and the Bremen, were nearly
sunk when they came into contact with 30-meter waves in a region of
the South Atlantic. In this region there is no current strong enough to
create the waves and the basin cannot focus enough energy to account
for the waves. Once again, something was missing from the deep ocean
wave models.
The breakthrough came from the world of quantum mechanics this
time and specifically the non-linear Schrödinger’s equation. There is a
modified version of the non-linear Schrödinger’s equation that can be
used to describe motion in deep water. The simplified nonlinear model
of 2D deep water wave trains is
∂A ∂A ω ∂2 A ω0 k20
i( + c gr )= 0 + | A|2 A, (8.5)
∂t ∂x 8k0 ∂x2 2
and the surface elevation, η ( x, t) is given by
1
η ( x, t) = ( A( x, t)ei(k0 x−ω0 t) + c.c. + ...). (8.6)
2
Here k0 and ω0 are the wave number and frequency of the carrier
wave, c.c. denotes the complex conjugate, (...) determines the weak
highest harmonics of the carrier wave, and A is the complex wave
amplitude, which is a function of x and t. At this point what is known as
the Benjamin-Feir instability comes into play. It is very well known that a
uniform wave train with an amplitude of A0 is unstable to the Benjamin-
Feir modulational instability corresponding to long disturbances of
wave number, ∆k, of the wave envelope satisfying the relation
∆k √
< 2 2k0 A0 . (8.7)
k0
The highest instability will occur at ∆k/k0 = 2k0 A0 which gives a
maximum growth a rate of ω0 (k0 A0 )2 /2 [48]. There has been a great
deal of research on the nonlinear stage of the Benjamin Feir instability
analytically, numerically, and experimentally (see Grue and Trulsen
(2006) and Kharif (2003)). Through this research it is apparent that wave
groups can appear or disappear on the order of 1/[ω0 (k0 A0 )2 ] on the
timescale. This behavior can be explained by breather solutions of the
non-linear Schrödinger equation. Figure 8.4 shows the formation of
waves that have very high energies and are created by the Benjamin
Feir instability. There are many different breather solutions that can
create various large waves. What is referred to as the algebraic solution
has a maximal wave height of 3 times the waves around it and some
solutions such as the Ma-breather and the Akhmediev breather are
periodic in time and space.
8.5 conclusion 53
Figure 8.4: The evolution of weakly modulated wave train (numbers - time
normalized by the fundamental wave period). Highly energetic wave group
formed due to the Benjamin Feir instability [48]
In short, these breather solutions are considered as simple analytical

models of freak waves. To get more qualitative with the analysis these
models say that there are two types of waves in the deep sea. There
are waves that move around comprised of sines and cosines that can
become focused to create the special rogue waves, and then there are
these non-linear beasts that can really come out of nowhere. These
non-linear waves come from an almost rogue sea where as you watch
the sea state it is random yet tame and controlled. Then one of these
waves comes up by sucking energy from the waves around it in a very
bizarre non-linear fashion. This non-linear fashion also tends to help
describe the verticality of the waves described by the mariners. The
waves become so steep that they actually can tend to break and have
been likened to the white cliffs of Dover [4]. These breaking waves can
lead to as much as 100 tons per square meter, much greater than the
15t/m2 that the shipping industry has its ships designed for. It would
be practically impossible to design ships to withstand these monsters
of the deep, and because of that they will remain untamed monsters.
8.5 conclusion
One area that has received the least amount of research to date comes
from the focusing potential of wind. Experimental research in this
area has shown that with out wind waves will focus at a fixed are
and create a rather high amplitude wave but with strong wind the
high amplitude wave will be further downstream and with a higher
amplitude. Experimental results has also showed that rogue waves can
occur under the conditions of a strong wind when they typically would
not if the wind was not present. This line of research has lead most to
believe that strong winds can increase the rate of occurrence of rogue
waves. The effect the wind has is to weaken the defocussing process
which leads to a greater chance for the rogue wave. As a result of this
finding it makes sense that rogue waves are more likely to occur when
storms with vicious winds are present.
8.6 problems
1. Under the simplified linear model what is the probability of

finding a 30-meter wave in a storm off the coast of Iceland in
which the average of the highest third of the waves is 11 meters?
Solution: In this problem the height of the wave and the significant wave
height are given so the solution is rather simple and only requires the
use Eq. 8.4. Since H is 30 and HS is 12 we simply plug those into the
equation and get
2 2 2 2
P( H ) = e−2H /Hs ⇒ P(30) = e−2(30) /12 = 3.73 × 10−6 .
2. Which of the following is not a possible instigating factor for a tsunami?
a Meteorites landing in the ocean

b Underwater earthquakes
c Constructive interference of waves caused by wave trains in the deep
ocean
d Underwater volcanoes
3. Which of the following are possible causes for rogue waves?
a Focusing of wave energy by ocean basins

b Focusing of wave energy by variational currents that run counter to the
swell
c Sudden changes in the depth of the ocean caused by a steep continental
slope
d Non-linear effects that lead to a giant wave from a random background
of smaller waves
Answer: a, b, c.
PA U L H U G H E S : F R I C T I O N 9
9.1 overview
riction represents one of the most basic and yet impenetrable

F phenomena within the purview of classical physics. While its role
in an idealized system can appear deceptively simple, its mechanisms
and intrusions into realistic systems are of a complexity well beyond
the scope of any undergraduate course, much less a single textbook
chapter.
9.2 amontons/coulomb friction
The quintessential description of friction is that of Amontons at the

end of the 17th century, according to which friction is a force linearly
proportional to the normal force and independent of the macroscopic
surface area of contact–nearly a century later, Coulomb (of electrostatic
fame) appended this to include that the proportionality of friction to
the normal force is independent of the relative velocity of the surfaces
in question. This coefficient of friction µ in the Coulomb/Amontons
formulation does vary in relation to the material compositions of the
interfacing surfaces, however.
In fact, every set of material interfaces has its own coefficient of friction.
This is where the matter begins to grow more complicated: applying
this simple Amontons-Coulomb description of friction by itself would
require a standardized table of friction coefficients for every possible
combination of materials, as well as sub-tables to account for any
lubricants or adhesives applied to the surface of interface. Velocity
and temperature also play complicated roles in determining the actual
friction force between two real surfaces.
example 9.1: simple friction
F
n v
w
Ff
Figure 9.1: The basic Amontons-Coulomb model of friction, ~Ff = µ~F.
55
56 paul hughes: friction
m
n
v
w
θ
A block of mass m sits on a conveyor belt inclined at an angle θ from the

horizontal. The conveyor belt surface moves at a velocity ~v such that the block
remains stationary with respect to the surroundings. The coefficient of kinetic
friction between the block and the belt surface µk = 0.3. Using only simple
Amontons/Coulomb friction, we can find θ (v).
First, we know that the normal force ~n = −~ w · cos(θ ): clearly, the magni-
tude of ~n should be equal to that of w~ when θ = 0 and 0 when θ = π/2, and
will be in the positive direction rather than the negative like w ~ = m~g. From
the block’s perspective, it appears to be sliding with velocity v~b l down along
the slope of the conveyor belt due to the effective force ge~f f = m~g sin(θ ),
where v~b l = −~v. It is opposite this velocity that we see a friction force arise
which, for the equilibrium condition stated above,
Ff = g~e f f = m~g sin(θ ) = µ~n (9.1)
m~g sin(θ ) = µm~g cos(θ ) (9.2)
tan(θ ) = µ (9.3)
So we see that θ is in fact independent of velocity ~v, at least in the Coulomb

representation of kinetic friction. The critical angle θc = arctan(µ).
This direct relationship between incline slope and coefficient of static

friction is one method of measuring µ values for different materials.
Another is the arrangement of a mass m1 on a string connected via
pulley to a mass m2 on a flat plane: m2 , attached to the upper test
surface, with the lower attached to the level plane, provides the normal
force, and m1 provides the motive force which the friction force will
resist [20]. The drawback, of course, is that all µ values must be acquired
experimentally, and apply only to the specific materials tested.
9.3 toward a conceptual mesoscopic model
In actuality, friction is the result of a number of independent micro-

scopic surface phenomena appearing in tangential interactions. These
phenomena appear on the properties of the surface materials as well
as the substrate materials, the geometries and small-scale topologies
of the interaction surfaces, viscosity and other properties of any inter-
stitial substance such as lubrication, relative velocity of the interaction
surfaces, and other influences. Even these individual sources of friction
may each represent a range of models, and any one model may address
any one or more of the above contributory phenomena [29].
Amontons’s familiar Ff = µFn law evolves on the microscopic scale
from the assumption that the microscopic contact area A = αFn in-
creases as the product of the normal force—i.e., with the force de-
9.3 toward a conceptual mesoscopic model 57
a.
b.
c.
Figure 9.2: a. Interface surfaces near contact showing asperities. b. Interface

surfaces under normal force Fn , showing some deformation of asperities. c.
Interface surface with lubrication, showing fluid space-filling and “coasting”
ability.
forming the surfaces to reduce areas of separation—and the plastic

deformability α characteristic of the weakest major constituent of the
surfaces. At the same time, until the characteristic shear strength τ is
exceeded, the deformation remains elastic, so that Ff = ταFn . Thus we
see that µ = τα. Taking into account the direction of motion, the result
is the familiar friction law,
Ff = −µ~n · sgn(v), (9.4)
where sgn(v) = |v|/v. This model assumes that what we call “rough-
ness” on the macroscopic scale appears, microscopically, as “asperities”
or projections from the mean plane of each surface [29].
One way in which this model can break down is in the case of
microscopically smooth, geometrically complementary surfaces: the
microscopic contact area reaches its maximum with very little surface
deformation, meaning that an increased normal force will no longer
increase shearing stress or friction. In such a case, various atomic forces
(such as covalent bonds, electrostatic attraction, and “Van der Waals”
forces between dipoles) are also particularly likely to play a role: as
a noteworthy example, the energy required to separate the interface
surfaces is equal to the free energy associated with their bonding, minus
the elastic potential energy made available as elastic deformation is
relaxed [28]. Obviously, then, a limit on elastic deformation implies
greater influence of adhesion forces as surfaces become smoother, and
when adhesive forces become involved, the friction quickly becomes less
Coulombic–that is, the relationship between friction force and normal
force becomes less linear–as a dependence on the area of interface is
introduced.
In the same way, lubricants serve to fill gaps and minimize the defor-
mation necessary for asperities to bypass one another by functioning
somewhat like ball bearings: asperities are permitted to “coast” past
one another, rather than striking, pressing, and producing a deforma-
tion on the scale of their geometric overlap—the friction gains, instead,
a dependence on the lubricant’s viscosity. Unsurprisingly, then based
on this conceptual model, the highest coefficients of friction are seen
58 paul hughes: friction
in materials which are either pourous and highly elastic (e.g., rubber),
or which otherwise exhibit a highly elastic ‘clinging’ action (e.g., vel-
cro). Interfaces such as a highly porous, elastic rubber on hard, rough
concrete can yield coefficients of friction well above µ = 1; others in-
volving such materials as teflon or ice (especially near the melting point,
where the contact surface can melt and self-lubricate) can drop below
µ = 0.04.
9.4 summary
Friction is one of the most familiar, everyday forces, experienced when-

ever we rub our hands together to warm them or wrestle with a jammed
door, but the mechanisms from which it evolves are very complicated
and remain fertile ground for the development of new descriptions.
With today’s faster and more powerful computer modeling, highly
sophisticated models of friction can be tested more completely than
ever before, and the advancement of nanotechnology may play a signif-
icant role in the experimental methods employed to verify new models’
predictions. The study of friction encompasses not only physics, but
also mechanical engineering, chemical engineering, and even consumer
products: for example, many cleaning products are specially engineered
so that their residues have frictional properties we associate with a
“clean” feeling.
9.5 problems
1. A model elevator of mass m = 1.5kg is allowed to slide freely

down a shaft, starting with v0 = 0 at height h0 = 4m. At h1 = 3m
a brake is activated, consisting of a brake pad with coefficient of
friction µ = 0.42 on a spring with spring constant k = 1000N/m.
The spring, whose natural length `n = 5cm is compressed to
` = 1cm. Where will the model elevator stop? If it strikes the
bottom of the shaft, what will its velocity be?
Fs p = −∆` · k = −(` − `n ) · k = (0.04m)(1000N/m) = 40N
v21 = v20 + 2g∆h = −2(9.8m/s2 )(1m)
√
v1 = − 19.6m/s = −4.43m/s
|~n| = ~Fsp ergo Ff = µ~n = µ · 40N = 16.8N
Ftotal = ~Ff + w
~ = (16.8 + (1.5kg)(−9.8m/s2 ))ẑN
16.8
atotal = F/m = 1.5kg − (9.8m/s2 ))ẑ = 1.4ẑm/s2
v2 = v1 + atotal t
Since we are looking for the time at which v2 = 0,
t2 = − a v1 = 4.43
1.4 = 3.16s
total
h2 = h1 + v1 t2 + 21 atotal t22
h2 = 3m + (−4.43m/s)(3.16s) + 21 (1.4m/s2 )(3.16s)2
h2 = 0.019m
2. A block of mass m is dropped onto an inclined conveyor belt, with

which it has a coefficient of kinetic friction µ. The conveyor belt’s
upper surface is running at a rate v from the lower end toward
the higher end, a total distance of `, and its angle of inclination
from the horizontal is θ.
a. Assuming that θ is below θc , how long will the box slide on the
9.5 problems 59
conveyor belt?
We define the initial velocity vblock of the block relative to the
conveyor belt as vblock (t = 0) = −v.
Thus Ff = µn = µmg cos(θ ), opposed by mg sin(θ )
Ftotal = mg[µ cos(θ ) − sin(θ )] so a = g[µ cos(θ ) − sin(θ )]
The box may accelerate to match the conveyor belt’s velocity, in
which case:
v1 = v0 + at, where v1 = v and v0 = 0, so tstop = g[µ cos(θv)−sin(θ )]
Conversely, the box may still be sliding relative to the conveyor
belt when it reaches the end:
s1 = s0 + v0 t + 12 at2 where s0 = v0 = 0 and s0 = `, so tedge =
q
2`
g[µ cos(θ )−sin(θ )]
Whichever happens first is the time at which the block stops slid-
ing and reaches its peak velocity.
π
b. If m = 1kg, µ = 0.75, v = 20m/s, θ = 12 , and ` = 2m, at
what horizontal distance d from the edge of the conveyor belt will
the the block q
land?
4
First, tedge = 9.8[0.75 cos(π/12 )−sin(π/12)]
= 0.936s,
20
and tstop = 9.8[0.75 cos(π/12 )−sin(π/12)]
= 4.38s.
Therefore, we see that the block leaves the conveyor belt at
t = 0.936s with velocity
vedge = at = g[µ cos(θ ) − sin(θ )]t = 4.27m/s and at height
hedge = ` sin(θ ) = 0.518m
From here, the problem is simply one of projectile motion:
h gnd = 0 = hedge + vedge sin(θ )t − 12 gt2 so tland = 0.231s
d = vedge cos(θ )tland = 0.953m.
3. A very soft, microscopically smooth surface is in contact with

a hard, smooth surface. The foremost contributor to the friction
force is likely to be:
a. A very strong normal force.
b. Lack of lubrication.
c. Elastic deformation.
d. Interatomic forces.
e. Particulate contamination between the surfaces.
d: Since the surfaces are smooth, increasing normal force will

play relatively little role in increasing the friction force. There
is little asperity interaction to be mitigated by lubrication or to
produce significant elastic deformations. Because the first surface
is soft, contamination would tend to deform it and press into the
deformed recess, reducing the contaminating particles’ influence.
K E I T H F R AT U S : N E U T R I N O O S C I L L AT I O N S I N T H E
S TA N D A R D M O D E L
10
10.1 introduction
rguably one of the most peculiar aspects of quantum mechanics,

A that most undergraduates are familiar with, is that of the prin-
ciple of superposition . That is, because a quantum system is treated
in ways very similar to wave-like phenomena, it is possible to take
two quantum states and add them together in a linear combination,
generating another valid quantum state. But as it turns out, this “weird”
property of quantum mechanics can be extended to some even more
counter-intuitive realms. It just so happens that a fundamental particle
in and of itself can exist as a linear combination of two other particles. This
amazing property of fundamental particles can lead to some very inter-
esting phenomena, such as the one we will discuss in this chapter - that
is, neutrino oscillations. But before we get into just exactly what these
neutrino oscillations are, let’s start with a quick review of quantum
theory, and also touch upon what it is we currently know about the
fundamental particles of nature.
10.2 a review of quantum theory
As we all have been taught in our introductory quantum mechanics

classes, the theory of quantum mechanics deals with the idea of a
“state,” which a particle (or system of particles) may happen to find
itself in. Every observable property of that particle is represented by an
“operator,” or a transformation that acts on the state of the particle. We
learn that the eigenfunctions (or eigenstates, as they are called) of this
operator represent a set of possible states the particle can attain, and
the corresponding eigenvalues represent the values of the observable
property associated with that state. If a particle happens to find itself
in a particular eigenstate of an observable’s operator, every time that
observable is measured, it will return the corresponding eigenvalue
associated with that state [37].
For example, we can have the “spin” operator,
!
h̄/2 0
Sz = , (10.1)
0 −h̄/2
which is the operator corresponding to the z-component of intrinsic

angular momentum for a particle with a spin of 1/2, in matrix form.
Because our operator is represented by a two by two matrix, this
observable can only acquire two eigenvalues, which correspond to
the spin pointing along or opposite to the z-axis. Actually, different
components of angular momentum never commute in quantum theory,
which means their values can never be simultaneously known. Thus, if
the spin were to be measured to be completely along the z direction, this
would imply that we know the x and y components to be zero, which
is not possible. So, in reality, these two states represent the projection
61
62 keith fratus: neutrino oscillations in the standard model
of spin along the z axis being either along the positive or negative z
axis [37].
We can represent the two eigenstates as vectors in a complex-valued
Hilbert Space:
! !
1 0
; . (10.2)
0 1
The theory makes a restriction on the form of the operators that can
represent observables, which is that they must be hermitian. Because
of this fact, all possible eigenvalues must be real, and the eigenstates
of the operator must form a basis of all possible states that the particle
can attain. In our example of spin, this means that the particle’s z-
component of intrinsic angular momentum can in general be a linear
combination of the two basis states, or,
! ! !
a 1 0
= a + b , (10.3)
b 0 1
where a and b are some given complex numbers. But the question now
arises, what value will we get when we measure the z-component of
spin, for the particle in this generalized state? The answer is that we
could get either of the two eigenvalues, with some given probability.
The probability of returning each eigenvalue is given by the square of
the norm of the coefficient on the associated eigenstate. This, in essence,
is the principle of superposition in quantum mechanics [37].
Because there are often many physical properties that a particle can
possess, there will generally be many different hermitian operators
that can act on the state of a particle. Each of these operators will thus
define a new basis in which we can define the state of our particle,
corresponding to the eigenstates of this operator.
10.3 the standard model
The world as we know it today, at least from the perspective of physics,

can generally be summed up in a theory dubbed the “Standard Model.”
This theory asserts that all of physical phenomena can be traced to
a few fundamental particles, interacting through a few fundamental
interactions. This is often described pictorially, as in Figure 10.1.
There are two types of fundamental particles, fermions, which have
half-integer spin, and bosons, which have integer spin. The fermions
are generally what we would consider particles that make up matter,
and the bosons are the particles that transmit the fundamental “forces”
that act on the fermions. The fermions tend to be arranged in three
different “families,” or “generations,” with the particles in one family
being identical to the particles in the next family, except for mass (in
the figure, the families are delineated by the first three columns). For
example, in the figure, an “up quark” is a fermion, and it interacts with
other quarks partly through interaction with what is called the “strong
force.” The strong force is mediated by the gluon, which is a boson,
and we can think of it sort of like the two quarks interacting with each
other by passing gluons back and forth. Quarks can combine together
to make composite particles, such as protons (two up quarks and a
down quark) or neutrons (two down quarks and an up quark) [88].
10.3 the standard model 63
Figure 10.1: The Standard Model of elementary particle physics. Image courtesy
Fermilab.
The fermions come in two types, the quarks, which are in the first
two rows, and the leptons, which are in the last two rows, with one
of the biggest differences among them being that the leptons do not
interact with the strong force. In reality, matter particles can form
composites which have integer spin and are thus bosons, and the
particular bosons represented in the standard model are specifically
referred to as gauge bosons. But in terms of fundamental particles, we
can safely make the distinction between fermions and bosons without
too much consequence. This general analogy of passing bosons back
and forth to mediate a force can be extended to the rest of the standard
model. For each interaction in the standard model, there is a set of
mathematical rules describing how it acts on certain particles, and one
or more bosons that mediate it. In addition to the strong force, there is
the familiar electromagnetic force, mediated by the photon, and also
the weak force, something we will discuss in more detail later. One of
the more peculiar aspects of the standard model is that the strong force
actually grows stronger with distance, as if the quarks were attached by
springs. The result of this is that a free quark can never be observed, and
that the existence of quarks must be inferred by indirect means. There
are some unresolved issues in the standard model, some of which hint
at physics beyond the standard model, but aside from this, the standard
model, for the most part, still represents our current understanding of
fundamental physics [88].
The origin of these particles is described by something called “Quan-
tum Field Theory.” Quantum Field Theory (or QFT) generally states
that each of the fundamental particles is sort of a “bump” in something
called a quantum field, that can move through space. For example,
there is an electron field that extends throughout space, and actual
electrons are like ripples moving across the “surface” of that field (of
course, this visual analogy can only be pushed so far). Speaking some-
what more mathematically, the fields that exist throughout space have
certain normal modes of oscillation, and when energy is input into
these fields, the resulting excitations of the field are what we experience
as fundamental particles [14, 78].
Furthermore, QFT associates with each of these fields something
called a “creation operator” (there are also annihilation operators, but
never mind that for now). Many students have encountered these sorts
of operators without ever having been aware of it. A common problem
in introductory quantum mechanics is to solve for the possible energy
eigenstates of the simple harmonic oscillator. One method for solving
this problem is to use lowering and raising “ladder” operators, which
create lower and higher energy states from an existing one. When we
take an eigenstate of the simple harmonic oscillator, and apply the
lowering operator to it, we attain the eigenstate with one less unit
of energy. What we are actually modeling when we do this is the
creation of a photon that is emitted from the system, as it carries
away the difference of the energies of the two states. In QFT, these
operators “create” all of the possible particles. A given operator acts
on the vacuum of space, and “creates” an instance of the particle
associated with that operator. For example, the operator associated with
the electron field “creates” instances of electrons. When a photon turns
into an electron-positron pair, the creation operators for the electron
and positron model this process (along with the annihilation operator
for the photon) [37, 14].
10.4 the weak and higgs mechanisms
While all of this may seem like useless information, seeing as how we
do not intend to pursue all of the associated mathematical details of
Quantum Field Theory, there is an important caveat to all of this. As we
mentioned before, there are often many different operators associated
with a particle. This is also true in QFT. Looking at Figure 10.1, we
see a type of boson called a W particle, which is responsible for medi-
ating an interaction called the weak force. In essence, the weak force
is responsible for transforming one type of particle into another. The
W boson actually represents two different particles, the W − and W +
particles, which are named according to their electric charge. Because
of this, a particle must lose or gain a unit of charge when emitting a W
particle (remember that forces are generally mediated by the exchange
of bosons). Since total electric charge must always be conserved, and
because every particle has an intrinsic value of electric charge, this
means that one type of particle must transform into another type of
particle when emitting a W boson. There is actually another type of
particle associated with the weak force, the Z0 boson, but it is involved
in a slightly different interaction. It is responsible for the scattering
of one particle off of another, similar to how charged particles can
scatter off of one another via the electromagnetic interaction. This re-
semblance to electromagnetic phenomena is actually due to a very deep
connection between the electromagnetic and weak forces. Electroweak
theory states that above a certain energy threshold, electromagnetic and
weak interactions become indistinguishable, and the two interactions
are “unified.” Of course, this subject could easily form the basis of an
entirely separate chapter, but for now we can safely concentrate on just
10.4 the weak and higgs mechanisms 65
the W bosons (though, not to discourage the reader from investigating

this subject on his or her own!) [76].
As we can see in Figure 10.1, the fermions appear to come in pairs.
Each particle in a pair is capable of turning into the other particle in the
pair via the weak force. As a matter of fact, the weak interaction actually
predicted the existence of the charm, top, and bottom quarks, partly
due to this pair behavior. For example, it is possible for a down quark
to turn into an up quark, and for a muon to turn into a muon neutrino.
Of course, there is a (somewhat complex) mathematical formalism
that describes how all of this occurs, which involves a set of quantum
mechanical operators. So in some sense, remembering our previous
discussion, it is possible to create a “basis” in which we describe
the action of the weak force, represented by the eigenstates of these
quantum mechanical operators. These eigenstates are often referred to
as the “weak states” of the given fundamental particles. In short, we
can describe the behavior of the fundamental particles of nature by
discussing them in the context of how the weak force acts on them [76].
As a matter of fact, the weak force represents a sort of tarnished
beauty in the standard model. It turns out that the weak force is
capable of predicting the existence of the fundamental fermions, but, by
itself, it predicts that they should all be massless. Of course, we know
that particles in the real world have mass, so something else must be
responsible for giving mass to the fundamental particles of nature. This
mechanism is referred to as the Higgs mechanism. It creates “Higgs
bosons” out of the vacuum, and these particles interact with the other
particles in the standard model, creating a “drag” on them which we
observe as inertia. Of course, it also has its own associated mathematical
operators, which constitute another “basis” in which to describe the
different particles of nature. The Higgs boson is not included in the
standard model currently, because, at the time of publication, it has still
not been definitively detected. However, so much of physical theory
depends on its existence that is essentially assumed to exist [76, 88].
But here’s the crucial point to all of this: it turns out that the manner in
which the two mechanisms describe the fundamental particles of nature
are actually different. In other words, when we approach the theory from
the standpoint of the weak force, and describe the interactions of the
particles from this perspective, we find that the interactions behave
differently than if we were talking about them from the standpoint of
the Higgs mechanism! For example, an up quark defined by the weak
force is not the same particle as an up quark defined by the Higgs mechanism.
So we in essence have two different descriptions of the natural world.
How are we to reconcile this? [76]
There is actually a simple solution to this dilemma. Quantum mechan-
ics postulates that it is indeed possible to describe a physical system
from the vantage points of two different mechanisms (or operators, or
whatever you care to call your mathematical devices), and that these
two representations can be related via a change of basis, which is very
straightforward from a mathematical perspective, using the language
of linear algebra. To talk about how we would do this, let’s consider
the simplified case where we only consider the first two families of
fermions (which, as it turns out, is actually a fairly reasonable simplifi-
cation). For now, let’s look at the first two pairs of quarks. Because the
phenomenon we are dealing with is related to the weak force, which
acts on particles in pairs, we need only consider one particle from each
pair of quarks, since the weak interaction will take care of how the
second particle from each pair is affected by the phenomenon. Let’s
work with the down and strange quarks, and call them d’ and s’ when
referring to their representation by the weak force, and d and s when
discussing them in terms of the Higgs mechanism. Quantum theory
says that these two representations should be related to each other by a
sort of “change of basis” matrix in some given Hilbert space in which
we describe the state of our physical system, which will look something
like
! ! !
d cos θ − sin θ d0
= . (10.4)
s sin θ cos θ s0
The angle θ is a measure of how much of a “rotation” there is between
the two bases. If we expand this relation, we get
d = cos θ d0 − sin θ s0 (10.5a)
s = sin θ d0 + cos θ s0 . (10.5b)

But look at what we are saying: it is possible for one particle to
actually be represented as a linear combination of other particles. This
may seem incredibly counter-intuitive, and perhaps non-sensical or
even impossible, but the truth of the matter is that in a mathematical
sense, this explains a whole host of phenomena that would appear
as inexplicable anomalies otherwise. One example that illustrates this
quite simply has to do with the rate of neutron and lambda beta decay
(see the first homework problem) [76].
Actually, there are other scenarios in which we can speak of linear
combinations of particles. It turns out that if we have two particles
whose interactions are identical, it should be possible to interchange
them, and the physics of the situation should be the same overall. As
a matter of fact, we can even take a generalized linear combination of
these particles, and the physics will be the same. Suppose we have two
particles in a physical system, called m and n. We can represent this as
a vector:
!
m
. (10.6)
n
It is then actually possible to apply a “rotation” to this set of particles,
just as for the weak and mass states of the down and strange quarks. If
we have a unitary matrix M that is two by two and has a determinant
equal to one, then applying this matrix to the above vector actually
represents a new set of particles, ones which do not change the overall
physics of the system (matrices of this form are members of a group
of matrices called SU(2), or the special unitary group of degree two,
which has tremendous implications for high-energy physics). If you
have objections to the fact that two particles can be added to each other
like mathematical objects, welcome to the world of quantum mechanics.
While it may seem questionable to speak of fundamental particles in
this way, it fits quite naturally into the quantum mechanical description
of the universe, in which a given physical system can lend itself to a
variety of outcomes upon measurement, each outcome being weighted
with a certain probability. Only now, we are talking about the very type
of particle that we will detect upon measurement [14].
10.5 the origin of neutrino oscillations 67
10.5 the origin of neutrino oscillations
But back to the issue of the weak and Higgs mechanisms. It is often
customary, as mentioned previously, to call d and s “mass states,” and
d’ and s’ “weak states.” Each of them can be represented as linear
combinations of the others, and are related by the equation given
previously. This analogy actually extends to other particles. In particular,
we will consider the neutrinos, which are in the bottom row of Figure
10.1. Recent evidence suggests that they behave in a manner similar
to quarks, in that their mass states and weak states are different (for
a while it was believed that neutrinos were massless, and hence were
not subject to this behavior, but this has been shown to not be true).
Neutrinos are fermions, and there are three known types of them. They
each come paired with another fermion, one that is either the electron
(in the case of the electron neutrino), or one that is similar to the electron
(either the muon or tau particle, being identical to the electron, except
for mass) [14].
So what is the significance of this property of neutrinos? It actually
just so happens that this property of neutrinos is responsible for some-
thing called neutrino oscillations. That is, it is possible for one type
of neutrino to turn into another type of neutrino as it travels through
space. This can also occur for quarks, but because of the fact that a
free quark can never be observed, it involves the composite particles
made up of quarks, so the details are somewhat simpler for the case of
neutrinos. The case of neutrinos is also particularly relevant, not only
because this phenomenon occurs to a greater extent among neutrinos
than it does for quarks, but because the existence of neutrino oscilla-
tions implies that they interact with the Higgs mechanism, and thus
indeed have mass, something that was not believed to be true for several
decades. Neutrino oscillations are also significant in the sense that they
are of great experimental interest. Our current models that describe
the nuclear activity in the sun predict that we should be detecting a
greater number of electron neutrinos being emitted from it than we
actually are. It is believed that the explanation for this discrepancy is
that of neutrino oscillations; that is, electron neutrinos emitted from the
sun have the ability to change into other types as they travel between
the sun and earth. There are of course many other reasons why the
phenomenon is of great interest, but suffice it to say, it is a behavior
that beautifully demonstrates some of the basic properties of quantum
mechanics, without too much mathematical complexity [14].
Having discussed the significance of these “oscillations,” let’s see if
we can get a basic mathematical derivation of them. For the sake of
simplicity, we’ll consider the case of neutrino oscillations among the
first two families. That is, oscillation among the electron and muon
neutrinos. Once again, this is a somewhat reasonable approximation to
the more complex case of three-family mixing. Because each neutrino
comes paired with an electron-type particle by the weak force, we can
consider the mixing matrix with respect to just one particle in each of
the pairs. In other words, we can write the mixing equation with respect
to just the neutrinos. If we denote the electron and muon neutrinos with
respect to the Higgs mechanism as e and µ, respectively, and denote
them as e’ and µ’ with respect to the weak mechanism, the general

relation between the two representations will be given by:
! ! !
e0 cos θ sin θ e
= , (10.7)
µ0 − sin θ cos θ µ
where θ is the angle that describes the “rotation” among the repre-
sentations (note that we are now referring to the weak states as the
result of the mass states being rotated through some angle, instead
of the other way around; this is possible because rotations are always
invertible). Let’s consider a muon neutrino that is created through the
weak force. Because it is created through the weak force, and the weak
force operates with respect to weak states, the neutrino can definitely
be said to be a muon neutrino with respect to the weak mechanism. In
other words, it is in a pure weak state. This means that it can be given
by a linear combination of mass states, or,
| µ0 i = − sin θ | ei + cos θ | µi, (10.8)
in traditional ket notation.

We now want to determine how this state evolves with time. A
particle that is moving freely through empty space of course has no
time-dependent influences acting on it, so we can treat this as a typical
time-independent problem in quantum mechanics. If our weak state
is treated as a superposition of mass states, then to describe the time-
dependence of our neutrino, we simply add the time-dependent factor
to each mass state. So we can write
| ν(t)i = − sin θ | eie−iEe t/h̄ + cos θ | µie−iEµ t/h̄ , (10.9)
where Ee and Eµ are the energies of the electron and muon neutrino
mass states, respectively. We use the letter ν now, because as our state
evolves over time, it will not necessarily be a muon neutrino, or even
an electron neutrino; in general it will be some linear combination of
the two.
What we are interested in, of course, is the probability that we will
detect the original neutrino as being an electron neutrino some time
after it is created. To do this, we follow the standard prescription of
quantum mechanics, which says that the probability of measuring an
electron neutrino is the square of the norm of the projection of our
state at a given time along the basis vector representing the electron
neutrino. Now, when me detect a neutrino, we do so by studying the
way it interacts with matter via the weak force. So any neutrino we
detect will be in a definite weak state, since the weak force always acts
on weak states. This means we need to calculate the projection of our
state onto the weak state of the electron neutrino. Symbolically, we can
write this as
P( µ0 → e0 ) = |he0 | ν(t)i|2 . (10.10)
If we take this inner product in the basis of the mass states, then
mathematically, our expression for the inner product becomes
!
0 − sin θ e−iEe t/h̄
he | ν(t)i = ( cos θ , sin θ ) . (10.11)
cos θ e−iEµ t/h̄
10.5 the origin of neutrino oscillations 69
The computation of the dot product is straightforward, and leads to
he0 | ν(t)i = − cos θ sin θ e−iEe t/h̄ + sin θ cos θ e−iEµ t/h̄ , (10.12)
which allows us to write
P( µ0 → e0 ) = ( cos θ sin θ )2 | − e−iEe t/h̄ + e−iEµ t/h̄ |2 . (10.13)
The square of the norm of a complex number of course is that number

multiplied by its complex conjugate, so, using this fact, along with a
trigonometric identity for the sinusoidal factor, we have
1
P( µ0 → e0 ) = ( sin 2θ )2 ( −e−iEe t/h̄ + e−iEµ t/h̄ ) ( −eiEe t/h̄ + eiEµ t/h̄ ).
2
(10.14)
If we expand this expression, we come to
1
P( µ0 → e0 ) = sin2 2θ ( e0 + e0 − (ei(Ee − Eµ )t/h̄ + e−i(Ee − Eµ )t/h̄ )),
4
(10.15)
which of course we can simplify to
∆Et

1
P( µ0 → e0 ) = sin2 2θ 2 − 2 cos , (10.16)
4 h̄
where ∆E is the difference between the two energies, and we have used
the definition of the cosine function in terms of complex exponentials.
If we factor out the two, and use the trigonometric identity
χ
2 sin2 = 1 − cos χ, (10.17)
2
then this ultimately simplifies to
∆Et

P( µ0 → e0 ) = sin2 2θ sin2 (10.18)
2h̄
for the probability that we will measure an electron neutrino instead of

the original muon neutrino. Amazingly, this real world problem with ex-
perimental relevance indeed lends itself to such a simple mathematical
treatment.
One of the most immediately obvious things about this expression
is the factor determined by the rotation angle. Note that this factor
can be any value between zero and one, determined entirely by the
rotation angle. If the rotation angle happens to be equal to π/4, then
the factor will be equal to one, and for any angle between zero and π/4,
the factor will be less than one. If we consider the second sinusoidal
term, the one that oscillates with time, we can see that the factor
determined by the rotation angle becomes the amplitude of these
oscillations in time. So the probability of measuring an electron will
reach a maximum when it is equal to the factor determined by the
rotation angle, since the maximum value of the oscillation term is of
course one, being a sinusoidal function. So immediately we see that the
amount of rotation, or “mixing” between the two families of neutrinos
is what determines the maximum probability of measuring a switch in
the type of neutrino. Anything less than 45 degrees of rotation implies
that the muon neutrino will never have a one hundred percent chance
of being measured as an electron neutrino, implying that the state of

the neutrino will never be exactly that of an electron neutrino (speaking
in terms of weak states, of course).
It is also immediately apparent that the oscillations depend on the
energy difference of the two mass states. This is a feature common
to most quantum mechanical systems, since the Hamiltonian, or total
energy, is what determines the time evolution of a state. The best way
to explore this phenomenon is with an example, similar to what might
be encountered when working in an actual neutrino experiment.
example 10.1: a relativistic neutrino experiment
Suppose we have a scenario in which muon neutrinos are created with a
given energy at a known location, an energy large enough that they are
highly relativistic. Experimentally, we would like to know the probability of
measuring an electron neutrino at a given distance from the source of muon
neutrinos. Let’s see how we could go about doing this.
First, we’ll make an approximation when it comes to the energy of the
neutrinos. We know from special relativity that the energy of an object can
be given by
1
E = ( p2 c2 + m2 c4 ) 2 , (10.19)
where p is the momentum of the object, c is the speed of light, and m is the
mass of the object. If we factor out the first term in the square root, we can
write
21
m2 c2

E = pc 1+ 2 . (10.20)
p
Using a Taylor expansion, the approximation
1 x
(1 + x ) 2 ≈ 1 + (10.21)
2
allows us to write the energy as being approximately
m2 c2

E ≈ pc 1 + , (10.22)
2p2
or,
m2 c3
E ≈ pc + . (10.23)
2p
This approximation is reasonable, because our neutrinos will generally be
moving relativistically, and this implies that the energy attributable to the
mass of the particle is small compared to the kinetic energy of the particle,
and the second term in equation 10.20 is small enough for the approximation
to be valid.
If we now consider the energy difference of the two mass states, we can
use the above approximation to write
m2e c3 m2µ c3
∆E = Ee − Eµ ≈ pc + − pc − , (10.24)
2p 2p
which simplifies to
c3
∆E ≈ (m2 − m2µ ), (10.25)
2p e
or,
c3
∆E ≈ (∆m2 ), (10.26)
2p
10.6 implications of the existence of neutrino oscillations 71
0.9
Probability to Detect an Electron Neutrino

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 5 10 15
Distance (m) 4
x 10
Figure 10.2: The probability to measure an electron neutrino as a function of

distance from the source. See the second homework problem for the parameters
used. Image created in MATLAB.
where ∆m2 is the difference of the squares of the two different masses.
We have assumed that the momentum of the two mass states is the same,
which is a somewhat reasonable approximation. Because the neutrinos move
relativistically, we can assume that the distance x traveled by a neutrino is
related to the time after creation by
x
t≈ . (10.27)
c
If we now use the approximations given in equations 10.26 and 10.27, our
expression for the probability of measuring an electron neutrino is
c ∆m2 x
2
P( µ0 → e0 ) ≈ sin2 2θ sin2 . (10.28)
4ph̄
Once again using a relativistic approximation, we can say that the overall
energy of the neutrino, Eν , is roughly given by
Eν ≈ pc. (10.29)
Also, because the masses of elementary particles are typically measured in

units of eV/c2 , it is convenient to introduce a factor of c to the numerator
and denominator of the argument to the time dependent sinusoidal term. So
if we make these last two changes to our expression, our final result is
!
c4 ∆m2 x
P( µ0 → e0 ) ≈ sin2 2θ sin2 . (10.30)
4h̄cEν
Figure 10.2 shows the probability to measure an electron neutrino as

a function of the distance from the source, using the relation derived
in the example. The parameters used are those given in the second
homework problem.
10.6 implications of the existence of neutrino oscillations
There are several reasons why our final expression in example 10.1 is
of great interest to experimentalists. First, notice that the oscillations
explicitly depend on the difference of the squares of the two masses. If
the difference were to be zero, then our expression would be identically
zero for all time, implying that there would be a zero probability to mea-
sure an electron neutrino. In other words, there would be no oscillation.
Because neutrino oscillations have now been experimentally verified,
we know that neutrinos must indeed have mass (since the masses can-
not all be zero and different from each other at the same time), which
has tremendous implications for several branches of physics, from high
energy physics to cosmology.
The fact that neutrinos have mass shows a similarity between quarks
and leptons. Because quarks and (some) leptons were known to have
mass, it was known that quarks and (some) leptons must interact with
the Higgs mechanism in some way. Thus, it was surprising that only
quarks exhibited oscillation, and not leptons. Now that leptons have
been shown to exhibit oscillation, it shows a similarity between the
two groups of fermions. This is crucial for “Grand Unified Theories,”
theories which hypothesize that above some threshold energy, all of
the interactions in the standard model unify into one fundamental
interaction that acts amongst all of the fermions in the standard model
on an equal footing. The fact that we see slight differences amongst the
different fermions, but in general they have the same behavior, suggests
a sort of “broken symmetry” that would presumably be restored above
this energy threshold. This energy threshold (referred to as the Planck
energy) however is believed to be around 1028 eV, which is much higher
than the energies that can be reached in modern accelerators (the Large
Hadron Collider at CERN will only be able to reach energies of 14
TeV) [76, 14].
The fact that neutrinos have mass is also of interest to cosmologists,
since it could have an influence on the rate of expansion of the universe.
Neutrinos are generally created quite frequently in nuclear processes, so
a large number of them flood the universe. Because of this, the fact that
they have mass has serious implications for the energy-mass density of
the universe, which, according to the theory of general relativity, affects
the manner in which the fabric of space-time deforms and evolves with
time [76].
The first real evidence for neutrino oscillations was provided by the
Sudbury Neutrino Observatory in Sudbury, Ontario, Canada, which
detected neutrinos from the sun. While previous experiments had
strongly suggested the existence of neutrino oscillations, the experi-
ment in Canada was the first one to definitively identify the number
of neutrinos of each type, and the experimentally determined num-
bers agreed with theoretical predictions. Previous experiments had
only shown a lack of electron neutrinos being detected from the sun,
and gave no information regarding the number of neutrinos of other
types [14].
Amazingly, without resorting to any of the more advanced mathemat-
ical tools of Quantum Field Theory, it is possible to get a quantitative
understanding of a quantum mechanical effect that has enormous
implications for modern research in physics. Some of the homework
problems investigate the subject with specific numerical examples, to
give a better idea of some of the numerical values actually involved
in neutrino oscillations. Of course, any reader who is interested is
encouraged to research the subject further on his or her own, since
Quantum Field Theory is a very complex, intellectually rewarding sub-
ject that should begin to be accessible to students towards the end of
their undergraduate study.
10.7 problems 73
10.7 problems
1. A proton, as mentioned in the text, is made up of two up quarks

and a down quark, while a neutron is composed of two down
quarks and an up quark. Neutron beta decay is the process in
which a down quark inside of a neutron turns into an up quark
by emitting a W − boson, thus turning the neutron into a proton.
There is a certain characteristic rate at which the weak force acts,
and we might at first expect the rate of neutron decay to be equal
to this characteristic rate. However, the observed rate of neutron
decay is slightly less than this. This can actually be explained by
the fact that the weak states of quarks are not the same as the
mass states of quarks. The quarks that form composites are mass
states, so the quarks inside of the neutron and proton are linear
combinations of weak states. If we simplify the mixing to the first
two families, we can write
d = cos θ d0 − sin θ s0 (10.31a)
s = sin θ d0 + cos θ s0 , (10.31b)
where, as usual, d and s are the mass states, and d’ and s’ are the
weak states. Explain why this might account for the the rate of
neutron decay being less than expected. The mismatch between
these two types of quarks also accounts for the existence of an-
other type of decay. A lambda particle is one that is comprised
of an up, down, and strange quark. One of the ways it can decay
is to a proton, where the strange quark turns into an up quark.
This is called lambda beta decay. One might expect this decay
type to be prohibited, since the up and strange quark are in two
different families. Explain how this decay might be possible, and
how it is related to the previously mentioned issue of neutron
beta decay [76].
Solution: The key to understanding this problem is realizing that, as
mentioned before, the weak force only acts on weak states, and that the
weak force is the mechanism that drives the two decay types. Each of the
mass states in the quark composites is a linear combination of two weak
states, so each has some probability of being measured as one of the two
weak states. In the neutron, the down quark that we expect to decay into
the up quark is actually a linear combination of the two weak states, and
the probability that it will be a down quark weak state is proportional to
the square of the coefficient in the linear combination that is applied to
the down quark weak state. This coefficient is the cosine of the rotation
angle, which implies that squaring this term will give us the probability
that we will select the down quark weak state. Of course, this implies
that there is some probability that with respect to the weak force, we
will have a strange weak state, which is not allowed to decay into an
up quark. So the probability that the neutron will not be able to decay
into a proton is given by the square of the sine of the rotation angle. So
the decreased rate is essentially explained by the fact that in some sense,
the down quark mass state in the neutron is not always a down quark
with respect to the weak mechanism, and so the neutron will not always
decay to the proton.
This also explains how a lambda particle can decay into a proton. The
strange quark in the lambda particle likewise has some probability of
being a down quark with respect to the weak force, given by the square
of the sine of the rotation angle. So there is a finite probability that the
down quark weak state component of the strange quark mass state will
interact with the weak force, allowing the lambda particle to decay into a
proton.
The rate of neutron beta decay will be proportional to the probability
that the down quark mass state in the neutron is in the weak down quark
state, which is given by the square of the cosine of the rotation angle.
The rate of lambda beta decay will be proportional to the probability
that the strange quark mass state in the lambda particle is in the weak
down quark state, which is given by the square of the sine of the rotation
angle. The sum of these two probabilities of course gives unity, so in
some sense, the rate of decay of the lambda particle “makes up for” the
missing rate of neutron beta decay.
Experimentally, the rotation angle between the first two families is found
to be about 0.22 radians, for a decreased rate of neutron decay of about
four percent [76].
2. For the final expression derived in example 10.1, find the spatial wave-
length of oscillation.
Solution: The wavelength of the square of the sine function will be the
first value of the distance that returns the sinusoidal function to zero,
since the square of the sine function oscillates back and forth between
zero and one. The sine of any integer multiple of π will be equal to zero,
so if λ is the wavelength of oscillation, and x is the distance from the
source, then the argument that will be equal to π when the distance x
from the source is equal to the wavelength is simply πx/λ. Equating this
to the expression for the argument found in the example, we have
πx c4 ∆m2 x
= . (10.32)
λ 4h̄cEν
Rearranging this, we have
4πh̄cEν
λ= (10.33)
c4 ∆m2
for the spatial wavelength of oscillation. Note that a larger energy im-
plies a longer oscillation wavelength, since the neutrino will travel a
longer distance in the amount of time it takes to oscillate from one type
of neutrino to another. A larger difference in the masses will cause a
smaller wavelength, since the rate of oscillation increases with larger
mass difference. Note that the wavelength of oscillation is not affected by
the rotation angle among the two families of neutrinos.
3. Using the result derived in the example, find the probability that the
emitted neutrino will be measured as an electron neutrino one hundred
kilometers from the source if it has an energy of one million electron volts.
The experimentally determined value for the mixing among the first two
families is 0.59 radians, and the value found for the difference between
the squares of the masses is 8 · 10−5 eV 2 / c4 (see Figure 10.2) [14].
Solution: If we square the sine of twice the rotation angle, then the value
we attain is
sin2 2θ = sin2 (1.18) ≈ 0.8549. (10.34)
If we substitute the value of the constants into the expression for the
argument to the second sinusoidal function, we have
2
c4 ∆m2 x (c4 ) · (8 · 10−5 eVc4
) · (105 m)
= −
, (10.35)
4h̄cEν (4) · (197.327 · 10 eV · m) · (106 eV )
9
10.8 multiple choice test problems 75
where we look up the value of h̄c. All of the other constants cancel, and
the value of the argument becomes roughly 10.1355. Thus, the probability
is approximately
P( µ0 → e0 ) ≈ sin2 (1.18) sin2 (10.1355) ≈ 0.364. (10.36)
Note that because of the value of the first sinusoidal term, the maximum
probability to measure an electron neutrino at any given time is roughly
eighty-five percent.
10.8 multiple choice test problems
1. Which of the following categories would a neutrino fall under?

a) Gauge bosons
b) Quarks
c) Leptons
d) Massless particles
2. Which of the following particles have half-integer spin?

a) Only gauge bosons
b) Only quarks
c) All bosons
d) All fermions
3. What is the quark content of a neutron?

a) u, u, d
b) u, d, d
c) u, d, s
d) u, u, s
Part III
I N F O R M AT I O N I S P O W E R
A N D Y O ’ D O N N E L L : FA S T F O U R I E R T R A N S F O R M 11
11.1 introduction
ourier Transforms can be found all throughout science and en-

F gineering. In Physics they are often used in Quantum Mechanics
to convert the wave function in position space into the wavefunction
in momentum space. In Electrical Engineering it is used to convert
a signal from the time domain into the frequency domain. The tool
of the Fourier Transform is essential for analyzing the properties of
sampled signals. However, many times scientists and engineers are
not dealing with known analytic functions but are rather looking at
discrete data points and therefore they must use the discrete form of
the Fourier Transform . The Discrete Fourier Transform(DFT) has one
major drawback, it is computationally slow. It requires N 2 computa-
tions, where N is the number of samples you have taken and often
the frequency of sampling is much higher to lower error. Therefore
we will derive and explore the radix-2 Fast Fourier Transform(FFT) for
the reader. We will also show that the number of operations we have
to perform is N2 log2 ( N ). This chapter is divided into three sections.
The first section will give a quick introduction into the continuous
Fourier Transform and its inverse. The second section will introduce
the Discrete Fourier Transform and talk about its inverse. The third
section will talk about the Fast Fourier Transform and a derivation of
the Radix-2 FFT algorithm will be given.
11.2 fourier transform
The Fourier Transform is given by

Z ∞
X( f ) = x (t)e− j2π f t dt. (11.1)
−∞
√
Where x (t) is some function of time, j is the imaginary number −1,
and f is the frequency. The Fourier Transform uses two mathematical
properties to extract the frequency. In the above, we can think of x (t)
as a superposition of sines and cosines. The second term in the above
equation, e− j2π f t , can be expanded using Euler’s Formula
e jθ = cos(θ ) + jsin(θ ) (11.2)
Due to the orthogonality of the trig functions, it can be seen that only
only at a certain values of 2π f will values of the integral survive.
The equation for the inverse continuous Fourier Transform is given
by
Z ∞
x (t) = X ( f )e j2π f t dt. (11.3)
−∞
Given the information on the frequency of a signal, we can then find

the time domain of that signal. [54]
79
80 andy o’donnell: fast fourier transform
11.3 discrete transform
On the computer we do not deal with continous fucntions but rather

with discrete points. The discrete Fourier Transform and its inverse is
N −1
X (m) = ∑ x (n)e− j2πnm/N . (11.4)
n =0
N −1
1
x (n) =
N ∑ X (m)e j2πnm/N . (11.5)
n =0
Where m is the frequency domain index, n is the time domain index,

and N is the number of input samples. According to Nyquist Criterion,
if the sampling rate is greater than twice the highest frequency, then
the discrete form is exactly the same as the continuous form. So with
that condition in mind, we can easily make the transition from the
continuous form of Fourier Transform to the discrete Fourier Transform.
Here is an example of a discrete Fourier Transform. [54]
example 11.1: simple sine function in matlab

Let us look at the Fourier Transform of the function sin(t). If the above
statement is true about the orthogonality, then we expect that that only one
frequency will survive if we apply for the Fourier Transform to sin(t). Here
is some simple code for evaluating the sin(t) in MATLAB. Here we use the
embedded function ’fft’ inside of MATLAB. We can effectively think of it as a
very good aproximation to the continuous Fourier Transform.
%MATLAB Code to compute the Foruier Transform of sin(t)

clear all
max=360;
for ii=1:max
x(ii,1)=ii*(pi/180);
y(ii,1)=sin(x(ii,1));
index(ii,1)=ii/360;
end;
Y=fft(y);
figure(1)
hold on;
clf;
title('Fourier Transform of sin(t)')
xlabel('Frequency Hz')
ylabel('Magnitude of X(m)');
plot(index,Y,'.-')
figure(2)
hold on;
plot(index,y)
title('sin(t)')
xlabel('Time in seconds');
ylabel('x(n)')
As seen from the figures of the signal and the Fourier Transform of it, there
is only one position on the frequency space where we find it. This is backed
up by out expectations about it.
11.4 the fast fourier transform 81
sin(t)
1
0.8
0.6
0.4
0.2
x(n)
−0.2
−0.4
−0.6
−0.8
−1
0 0.2 0.4 0.6 0.8 1
Time in seconds
Figure 11.1: One period of a sine function.
11.4 the fast fourier transform
As you can see above, for the Discrete Fourier Transform to be useful,
there needs to be a faster way to computer it. For example, if you were
to have a value of N = 2, 097, 152, using the Fast Fourier Transform
algorithm it would take your computer about 10 seconds to do it while
using the Discrete Fourier Transform described above it would take
over three weeks. [3] This next section will get into the details and
derivation of the radix-2 FFT algorithm. First, we start off with our
original definition of the discrete Fourier Transform. [46]
N −1
X (m) = ∑ x (n)e− j2πnm/N . (11.6)
n =0
Fourier Transform of sin(t)

3.5
2.5
Magnitude of X(m)
1.5
0.5
−0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Frequency Hz
Figure 11.2: The FFT of a sine function. Notice how it only has discrete compo-
nents .
For the next part of this, we split the sums into evens and odds and
get
( N/2)−1 ( N/2)−1
X (m) = ∑ x (2n)e− j2π (2n)m/N + ∑ x (2n + 1)e− j2π (2n+1)m/N .
n =0 n =0
(11.7)
In the second term, we can easily pull out the phase angle. Doing that
we find
( N/2)−1 ( N/2)−1
X (m) = ∑ x (2n)e− j2π (2n)m/N + e− j2πm/N ∑ x (2n + 1)e− j2π (2n)m/N .
n =0 n =0
(11.8)
Next we need to define some notation to simplify the exponential
terms. We shall now use the following notation.
WN = e− j2π/N , WN
n
= e− j2πn/N , WN
2
= e− j2π2/N , WN
nm
= e− j2πnm/N
(11.9)
Using this notation, we can then replace the above equation with
( N/2)−1 ( N/2)−1
X (m) = ∑ 2nm
x (2n)WN m
+ WN ∑ 2nm
x (2n + 1)WN . (11.10)
n =0 n =0
Then, through algebraic manipulation we know that WN 2 = e− j2π2/N =
e− j2π/( N/2)
= WN/2 . This allows us to change the equations to
( N/2)−1 ( N/2)−1
X (m) = ∑ nm
x (2n)WN/2 m
+ WN ∑ nm
x (2n + 1)WN/2 . (11.11)
n =0 n =0
Next let’s consider the X (m + N/2) case and we find that
( N/2)−1 ( N/2)−1
∑ ∑
n(m+ N/2) m+ N/2 n(m+ N/2)
X (m + N/2) = x (2n)WN/2 + WN x (2n + 1)( N/2) − 1WN/2 .
n =0 n =0
(11.12)
Next we can use the following expression that
n(m+ N/2)
WN/2 nm
= WN/2 nN/2
Wn/2 nm
= WN/2 (e− j2πn2N/2N ) = WN/2
nm nm
(1) = WN/2 .
(11.13)
We then call the express in front of the summation the twiddle factor
and we can simplify the above as
m+ N/2 m N/2 m − j2πN/2N m m

WN = WN WN = WN (e ) = WN (−1) = −WN . (11.14)
We can then plug this in to find that
( N/2)−1 ( N/2)−1
∑ ∑
(nm) m nm
X (m + N/2) = x (2n)WN/2 − WN x (2n + 1)WN/2 .
n =0 n =0
11.5 multiple choice 83
(11.15)
The only different between the X (m + N/2) and X (m) equations is

the minus sign on the second summation. This means we only use the
first N/2 terms in the DFT and then use those to find the final N/2
terms. This is the Fast Fourier Transform because it greatly simplifies
the computational work that has to be done to solve it. [54]
11.5 multiple choice
Question: The number of multiplication steps in the FFT is

A) N 2
B) N
C) N 3
D) N2 log2 ( N )
Answer: D. See Above text.
11.6 homework problems
1) If N = 4, what are the first four terms of the summation of the

Discrete Fourier Transform? Answer:
Let N=4, compute the 4 terms of the Discrete Fourier Transform
output.
X(0)=x(0)cos(2 \pi 0*0/4) -jx(0)sin(2\pi 0*0/4)

+ x(1)cos(2 \pi 1*0/4) -jx(1)sin(2\pi 1*0/4)
+ x(2)cos(2 \pi 2*0/4) -jx(2)sin(2\pi 2*0/4)
+ x(3)cos(2 \pi 3*0/4) -jx(3)sin(2\pi 3*0/4)
X(1)=x(0)cos(2 \pi 0*1/4) -jx(0)sin(2\pi 0*1/4)

+ x(1)cos(2 \pi 1*1/4) -jx(1)sin(2\pi 1*1/4)
+ x(2)cos(2 \pi 2*1/4) -jx(2)sin(2\pi 2*1/4)
+ x(3)cos(2 \pi 3*1/4) -jx(3)sin(2\pi 3*1/4)
X(2)=x(0)cos(2 \pi 0*2/4) -jx(0)sin(2\pi 0*2/4)

+ x(1)cos(2 \pi 1*2/4) -jx(1)sin(2\pi 1*2/4)
+ x(2)cos(2 \pi 2*2/4) -jx(2)sin(2\pi 2*2/4)
+ x(3)cos(2 \pi 3*2/4) -jx(3)sin(2\pi 3*2/4)
X(3)=x(0)cos(2 \pi 0*3/4) -jx(0)sin(2\pi 0*3/4)

+ x(1)cos(2 \pi 1*3/4) -jx(1)sin(2\pi 1*3/4)
+ x(2)cos(2 \pi 2*3/4) -jx(2)sin(2\pi 2*3/4)
+ x(3)cos(2 \pi 3*3/4) -jx(3)sin(2\pi 3*3/4)
As you can clearly see, the amount of multiplication that needs to be

done grows very quickly as N increases. The amount of multiplication
that needs to be done in the discrete Fourier Transform is N 2 , as can be
seen above. [54]
2: Prove Linearity in the discrete Fourier Transform:
Proof:
Xsum (m) = X1 (m) + X2 (m) (11.16)

N −1
Xsum (m) = ∑ (x1 (n) + x2 (n))e j2πnm/N . (11.17)
n =0
N −1 N −1
= ∑ (x1 (n))e j2πnm/N + ∑ (x2 (n))e j2πnm/N . (11.18)
n =0 n =0
[46]
T I M M O R T S O L F : T H E P H Y S I C S O F D ATA S T O R A G E 12
12.1 introduction
n this chapter we are going to apply some physical principles to

I modern data storage technology. Digital technology exploded in
the last quarter of the twentieth century with the invention of the
microprocessor. However, without the data storage devices that were
developed in parallel, the computer revolution would not have been
pervasive to all areas of society. What good what a computer be to the
banking industry if they could not store your account information?
More importantly, what good would your desktop computer be if it
was unable to have a hard disk drive to store your operating system or
desktop applications. The first several sections of this chapter introduce
some basic concepts of digital storage, such as binary number systems,
that are required to understand the remaining parts of the chapter. A
reader experienced with computer science concepts may wish to skip
over these sections and start with section 12.4. Hard disk drives are
the most common storage device used by modern computers and the
next section is devoted to explaining the physical principle behind
these devices. In particular, we explain some of the physics behind
giant magnetoresistance, a technology which has enabled disk drives
to exceed storage capacities of up to 1 Terabyte.
12.2 bits and bytes — the units of digital storage
Before we can begin to discuss how data is physically stored, we need

to understand the numerical units that are used by computers and
data storage devices. People count using a base-10 number system.
This means that each digit of our number system takes on one of ten
different values, which are the digits 0 through 9. Most scientists believe
that we use a base-10 number system because we have 10 fingers and
counting is something that we first learn to do with these 10 fingers.
Digital computers use a base-2 number system. The digits used by a
computer are called binary numbers and these digits can take on only
two values — either 0 or 1. The reason that computers use a base-2
number system instead of a base-10 number system is because it is
much easier to design physical devices that have only two states, on
and off, as opposed to a device that has ten different states. A single
two-bit number is called a "bit". Both digital computers and digital data
storage devices operate at the physical level on these binary bits of data.
Larger numbers are represented by stringing several bits together. Table
12.1 shows how binary numbers with three bits are used to represent
eight different numeric values.
When N binary bits are combined to form a number, the number of
unique numbers M that they can represent is determined by
M = 2N . (12.1)
85
86 tim mortsolf: the physics of data storage
Table 12.1: Numeric values Represented By 3-Bit Binary Numbers

Binary/Base-2 Number Number
000 0
001 1
010 1
011 3
100 4
101 5
110 6
111 7
If we invert this equation by taking the base-2 logarithm of both sides,

then we get a formula to determine the minimum number of binary
digits required to represent a M different values
N = log2 M. (12.2)
A "byte" is a a number formed by collecting eight binary bits to

form a single number that can take on 256 different values. The earliest
common microprocessors operated on 8-bit quantities and had eight
separate wires to signals of these bits between the microprocessor and
the memory. Today’s microprocessors operate on 64-bit numbers but
we still use bytes as the most unit to express the size of a digital storage
device. Prefixes are used to represent large numbers of bytes like the
amount used in data storage systems. Commonly used prefixes are: a
kilobyte for 210 or 1,024 bytes, a megabyte for 220 or 1,048,576 bytes,
and a gigabyte for 230 or 1,073,741,824 bytes.
example 12.1: number of bits on a cd-rom

A standard 120 mm CD-ROM contains 703.1 megabytes of data. How many
binary bits of data does a CD hold?
Solution A megabyte is 1,048,576 bytes and each byte consists of 8

bits so we can compute the number of bytes by simply multiplying these
numbers together,
1, 048, 576 bytes 8 bits

1 CD = 703.1 megabytes × × = 5, 898, 030, 285 bits.
1 megabyte 1 byte
example 12.2: using binary digits to store dna

DNA sequences are represented by biochemists with a string of alphabetic
letters that represent the primary sequence of a DNA strand. For example,
the string AGCTCGAT is a DNA sequence made of eight DNA bases. For
almost all situations there are only four DNA bases in a DNA sequence
that we represent by 4 letters: A for adenine, G for guanine, C for cytosine,
and T for thymine. How many bits of data does it take to store the value
of a DNA base? If we are instead required to use bytes to store these
values, what is the most number of DNA bases that we could store into a byte?
Solution Since there are only four different values of the DNA base,
12.3 storage capacity is everything 87
we can simply use formula 12.2 to solve for the number of bits that are
required
number bits = log2 (4) = 2.
There are several different schemes that we could use encode these bits. Here
is one that encodes the bases in alphabetical order of the symbols.
Binary/Base-2 Number DNA base
00 A (adenine)
01 C (cytosine)
10 G (guanine)
11 T (thymine)
A byte has eight bits of information. Since each DNA base requires
2 bits for encoding, then a byte can store the value of 8/2 = 4 DNA bases in
a sequence. To represent sequences longer than four DNA bases we would
simply use additional bytes.
12.3 storage capacity is everything
When customers are purchasing new computer hardware, the most

important considerations are performance and price. The manufacturers
of computer components are under competitive pressure to regularly
come out with new devices with better performance metrics. When it
comes to data storage devices, and hard disk drives in particular, the
metrics that matter are: storage capacity, transfer rate, and access time;
of these, storage capacity is by far the most important to the average
consumer.
Ever since the digital computer has arrived, there have been two
important trends with storage capacity. The storage capacity and the
transfer rates have simultaneously increased. This is because the areal
density of the hard disk drive surfaces have increased ever since they
were invented. Areal density, or bit density, is the amount of bits that
can be stored in a certain amount of "real estate" of the hard drive
platter’s surface. The units for areal density are BPSI (bits per square
inch). Although we are probably far away from the peak of this trend
[25], it is a trend that in the long term appears to oppose the quantum
mechanical principles of uncertainty. When we consider the future of
data storage, one limit that that scientists envision is the ability to store
a bit of information using a single atom. It might be possible or even
practical to store more than one bit of information in an atom, but for
the sake of this discussion let’s assume that the one bit per atom limit
will one day be achieved. Quantum computers have the ability to store
one bit of information per degree of freedom and provide a reasonable
hope that this is limit can be obtained. As the size of matter used to
store a single bit of information shrinks down to the size of an atom, the
energy of the matter becomes very small. The Heisenberg uncertainty
principle relates the spread in energy ∆E of a quantum state to the
amount of time ∆t it takes for the spread to evolve to an orthogonal
state with the relation
∆E∆t ≥ h̄.
Figure 12.1: Transfer rates of magnetic disk drives and microscopic materials.
The transfer rates of single bits of information on a magnetic drive has continued
to improve as the density as increased, but we may be approaching a limit
a trade off is required. The datapoint for the silicon atom shows the transfer
rate of gold atoms bonded to a silicon surface that were read with an scanning
tunneling microscope (STM). The data rate depicted for DNA is presumably
how fast the DNA is able to be transcribed into an RNA molecule. The rate of
RNA transcription is typically greater than 50 RNA bases per second. (Courtesy
of [42])
This has been extended to relate the uncertainty in time to a particle

with average energy E as which we can rearrange to determine the
uncertainty in time as [56]
πh̄
∆t = .
2∆E
This physical principle implies that in the future, when the densities
used to store bits of information approach the quantum mechanical
limits, there will be a trade off between storage density and transfer
rates. In fact, this is already a limit that we see in microscopic systems
such as gold atoms on a silicon surface and the decoding of DNA.
DNA is very dense, much denser than any man-made storage devices
by several orders of magnitude. DNA uses approximately 32 atoms
for each nucleotide of a DNA sequence; these are the letters A, G, C,
and T that one learns about in a introductory biology class. Figure
12.1 shows the trend of how storage density and transfer rates have
increased throughout the history of HDD development but drop off
tremendously for atomic systems.
12.4 the physics of a hard disk drive
Hard disk drives (HDDs) are the major data storage devices used by
today’s desktop computer for permanent storage of data. Although
HDDs were introduced into the market place in the 1956 by IBM, the
12.4 the physics of a hard disk drive 89
Figure 12.2: Diagram of the major internal view of a hard disk drive. (Courtesy
of [63])
earlier desktop PCs used removable floppy disk drives (FDDs) because
of the high cost and relatively low data storage of HDDs available
at the time. During the early 1990s, HDDs quickly supplanted FDDs
in desktop PCs as technology permitted HDDs to be made at lower
cost and with higher storage capacity. In this section we will explain
how hard disk drives work and the physics of giant magnetoresistance
(GMR), which is a newer technology that has enabled modern HDDs
to have data storage capacities of 1 Terabyte.
12.4.1 Hard Disk Drive Components
Figure 12.2 shows a basic structure of the internal components of a

HDD. We are going to focus on the three main components that deal
with how the binary data is magnetically stored inside the hard drive:
the platter, the read/write heads, and the magnetic surface. This section
serves to introduce the reader to how modern disk drives work and
cannot begin to fully describe the inter workings and technologies that
these devices encompass. The references at the end of this chapter
contain suggested books for an interested reader who wishes to learn
more details about hard disk drive technology.
Hard Disk Platters

An HDD is a sealed unit that that uses stores digital bits of data on hard
disk platters that are coated with a magnetic surface. These platters
of composed of two types of material — a substrate that forms the
mechanical structure of the platter, and a magnetic surface that coats
the substrate and stores the encoded data. The HDDs used in modern
disk drives are composed of ceramic and glass and use both surfaces of
the platter to store data. Modern HDDs contain more than one platter
for two reasons: first this increases the data capacity of the drive, and
second this increases data transfer rate since each read/write head
can transfer data simultaneously. The HDD platters are attached to a

spindle that rotates very rapidly and at a constant speed. The rotational
velocity of HDDs is specified in revolutions per minute (rpm) and is an
important parameter for the consumer. As you can guess, the higher
the rpm, the higher the data transfer rate and the lower the data access
times. Premium desktop HDDs that are available in November of 2008
rotate at 10,000 rpm and have data transfer rates of about 1 Gbit/s.
The information on a hard disk platter is organized at the highest
level in a "track" . A track contains a ring of data on a concentric circle
of the platter surface. Each platter contains thousands of tracks and
each track stores thousands of bytes of data. Each track is divided into
a smaller segment called a sector . A sector holds 512 bytes of data plus
some additional bytes that that contain error correction codes used by
the drive to verify that the data in the sector is accurate.
Read/write heads
HDD read/write heads read data from and write data onto a hard disk
platter. Each platter surface has its own read/write head. A disk drive
with five platters has ten heads since there are each side of a platter has
a magnetic surface. The heads are attached to a single actuator shaft
that positions the heads over a specific location of the platter. The drive
controller is the electronic circuitry that controls the rotation of the
drive and the position of the heads. The computer interfaces with the
drive controller to read and write data to the drive. The computer refers
to a specific location on a hard drive by using a head number, cylinder
number, and sector number. This drive controller uses the head number
to determine which platter surface the data is on, the cylinder number
to identify what track it is on, and the sector number to identify a
specific 512 byte section of data within the track. To access the data, the
drive controller instructs the actuator to move to the requested cylinder.
The actuator moves all of the heads in unison to the cylinder, placing
each head directly over the track identified by the cylinder number.
The drive controller then requests the head to read or write data as the
sector spins past the read/write head.
When the hard disk drive is turned off, the heads rest on the platter
surface. When the drive is powered up and begins to spin, the air
pressure from the spinning platters lift the heads slightly to create a
very tiny gap between the heads and the platter surface. A hard drive
crash occurs when the head hits the surface while the platter is spinning
and scratches the magnetic media.
Magnetic surface
The magnetic surface, or media layer, of a hard disk platter is only
a few millionths of an inch thick. The earlier hard drive models used
iron oxide media on the platter surfaces because of its low cost, ease
for manufacturing, and ability to maintain a strong magnetic field.
However, iron oxide materials are no longer used in today’s HDDs
because of its low density storage capacity. Increases in storage density
require smaller magnetic fields so that the heads only pick up the
magnetic signal of the surface directly beneath them.
Modern disk drives have a thin-film media surface. First a base
surface called the underlayer is placed onto the platter using hard
metal alloys such as NiCr. The magnetic media layer is deposited on
top of the underlayer. This surface is formed by depositing a cobalt

alloy magnetic material using a continuous vacuum deposition process
called sputtering [68]. The magnetic media layer is finally covered with
a carbon layer that protects the magnetic from dust and from head
scratches.
12.4.2 Hard Disk Magnetic Decoding and Recording
The read/write heads in modern HDDs contain separate heads for

the reading and writing functions. Although the technology for both
heads has improved, most of the increases in data density have come
from technology improvements of the read head. Over the last decade,
the technology for read write heads has evolved through several re-
lated by different physical processes. The original heads used in HDDs
were ferrite heads that consisted of an iron core wrapped with wind-
ings, similar to that used to form a solenoid. These heads were able
to perform both a reading and writing function, but the density and
data rates were terrible by today’s standards. One of the technologies
that was applied in the 1990s to led to the widespread acceptance of
HDD technology was the use of the magnetoresistance effect. AMR
(anisotropic magnetoresistance) read heads utilize the magnetoresis-
tance effect to read data from a platter. The next breakthrough was the
incorporation of GMR (giant magnetoresistance) read heads that utilize
the giant magnetoresistance effect. We will cover the physical processes
of magnetoresistance and GMR technology in the next section; this is
an effect that leads to much higher areal densities than AMR and is led
to the explosion of HDD capacities after the turn of the century.
Most HDDs in use today orient the magnetic fields parallel to the
magnetic surface along the direction of the track. This magnetization
scheme is referred to as LMR (longitudinal magnetic recording) . Re-
cently, the growth rate of HDD capacities have slowed because of
limitations in how dense the thin-film media can reliably store mag-
netic bits of information. The fine structure of the magnetic cobalt-alloys
in the magnetic media consists of randomly shaped grains that come in
a variety of different sizes. Each bit that is written onto the surface must
be stored in nearly 100 grains in order for the information to be reliably
stored [95]. One problem that arises as HDDs encode magnetic infor-
mation at higher densities is that thermal energy can excite these grains
and reverse the magnetization of regions on the surface. The amount of
thermal energy required to reverse the magnetization is proportional
to the number of grains used to store the magnetic information. PMR
(perpendicular magnetic recording) is a new technology that is being
use to further push the density envelope on information stored on the
magnetic surface. PMR, as its name suggests, encodes the magnetic
bits in up and down orientations that are perpendicular to the media
surface. These bits are more resilient to thermal fluctuations, but re-
quire a stronger magnetic field to write the information into the grains.
Bits encoded with PMR also produce a sharper magnetic response, so
not only can they be more densely encoded into the surface, but they
are also easier for the disk read head to interpret. Figure 12.3 shows
a picture of LMR and PMR magnetically recorded bits on a media
surface. Notice how the waveform produced at the read head is much
sharper for PRM encoding, even though the regions used to store the
Figure 12.3: Depiction of the magnetic bit orientation and read head signals
of media surfaces encoding using LMR (longitudinal magnetic recording) and
perpendicular magnetic recording (PMR) technology (Courtesy of [95])
information on the surface are smaller. In the future, all HDDs will
probably use PMR until a newer technology comes along to replace it.
12.4.3 Giant Magnetoresistance
Magnetoresistance is a physical process that causes the resistance of a

material to change when it is in the presence of a magnetic field. Let’s
explain how a HDD uses the physical process of magnetoresistance to
detect the magnetic signals on a HDD. The drive controller applies a
voltage to the read heads and detects the current that passes through
them; in this way the drive controller acts as an ammeter. As the
magnetic surface spins beneath a read head, it enters the magnetic field
induced by the cobalt alloy magnetic grains on the thin-film media
surface that is directly beneath the read head. The precise electrical
resistance of the read head depends on the magnitude and direction
of these magnetic grains. Because the HDD platter is spinning, the
magnetic field and hence its electrical resistance changes in direct
response to the magnetic field on the surface. The drive controller
detects these changes in resistance by measuring the current that passes
through the read head. Thus, the drive controller is able to interpret the
magnetic field information that is encoded on the surface of the disk.
The drive controller has very accurate knowledge of which portion of
the disk is under the read head at any given time and is thus able to
determine the bit values of all the magnetic bits stored in a 512 byte
cluster.
As we saw in the last section, the density of HDD information is

intrinsically related to how small of an area the HDD is able to reliably
store the magnetic information so that it cannot be modified by thermal
energy. This is one important area where HDD density is increased, but
it is also not all that is required. The HDD read heads must also be able
to reliably read back the information that is encoded ever more densely
onto the surface. Just before the turn of the century, one major barrier to
improved HDD storage capacity was that HDD read heads were unable
to keep pace at reliably reading magnetic bits at the increased storage
densities that were required. As the density of magnetic information
stored on the disk shrinks, the magnetic fields become smaller and the
resistance changes induced by magnetoresistance become weaker. At a
small enough limit, the signal changes become weak enough that the
signal to noise ratio is too large to reliably decode the information. The
breakthrough that addressed this problem was the incorporation of
GMR (giant magnetoresistance) technology into the HDD read heads.
The physics of magnetoresistance has been understood for over a cen-
tury so it is not surprising that AMR heads were quickly replaced with
technologies that work at much high areal densities.
The GMR effect was independently discovered in 1988 by Albert Fert
and Peter Grünberg for which they shared a Nobel prize in physics
in 2007 [64]. GMR is a technology that can be used to create larger
changes in electrical resistance with weaker magnetic fields. The term
"giant" comes not from the size of the magnets, but from the size of the
effect that is produced. In the explanation of GMR that follows, the thin
magnetic layers in which the GMR effect occurs is are those used in the
read/write heads, not the magnetic materials on the thin-media layer.
GMR occurs on thin layers of magnetic material, such as cobalt alloys,
that are separated by a nonmagnetic spacer that is just a few nanometers
thick. This configuration of magnetic materials produces a tremendous
reduction of electrical resistance when a magnetic field is applied. The
thin magnetic layers that are used in the read write heads do not have
a permanent magnetic dipole associated with them and are quite easily
able to orient themselves in response to an applied magnetic field. In
the absence of an external magnetic field, the magnetizations of the thin
magnetic layers orient themselves in opposite or antiparallel directions.
In this orientation, the electrical resistance is at it highest. When an
external magnetic field is applied, the thin magnetic become aligned
and is accompanied by a tremendous drop in electrical resistance. A
graph of this effect is shown in Figure 12.4.
GMR heads are designed so that one layer is always in a fixed
orientation and the other layer is free to reorient itself. A fourth layer
that is a strong antiferromagnet is used to "pin" down its orientation.
As the magnetic bits pass under the GMR head, the magnetic field
from the grains on the thin-media layer directly beneath the GMR head
reorient the "unpinned" layer which to produce the strong changes
in electrical resistance [68]. Since the "pinned" layer is always force in
one orientation, grains that are oriented in the same direction produce
little change in resistance but grains that are oriented in the opposite
direction will produce large changes in resistance. Thus, a reliable
signal can be encoded onto the thin-media layer at smaller densities
with weaker magnetic fields.
Figure 12.4: Graph of the GMR effect of the resistance on thin magnetic alloy
layers separated by a thin nonmagnetic spacer. (Courtesy of [85])
12.5 summary
Digital computers and data storage systems store numbers using binary
bits because it is practical to build devices that have two states, on and
off. A byte made by combining the values of eight bits into a single
quantity and can represent 28 = 256 different values. Data storage ca-
pacity and transfer rates have been increasing since the beginning of the
digital era, but as we scale the devices down to the atomic level, there
will be a trade off between these two metrics. This limit is explained
by the Heisenberg relation which relates the amount of time it takes to
force a physical system into a state to the energy of the state the system
is forced into, ∆E∆t ≥ h̄. Hard disk drives (HDDs) are the major data
storage devices in use today. They store information by magnetically
encoded physical bits of information onto a hard disk platter that is
coated with a thin-layer media surface composed of magnetic alloys.
HDDs read information from the magnetic surface from a change in
electrical resistance that occurs from magnetoresistance. Giant magne-
toresistance (GMR) is a recent technology that shows greater response
to changes in magnetic fields by designing the read heads to have thin
layers of magnetic alloys separated by a thin nonmagnetic spacer.
12.6 exercises
1. The human genome contains 3 billion nucleotide base sequences.

If we stored each nucleotide in 1 byte, could we store the human
genome on a single 703.1 Megabyte CD?
Solution Using one byte per nucleotide base sequence would
require 3 × 109 bytes of storage. A 703.1 Megabyte CD holds
.703 × 109 bytes. A CD would not be able to store the human
genome unless a more efficient storage scheme was used. In fact,
the human genome has many repeated nucleotide sequences and

can be compressed to fit onto a single CD.
1. Which physical form of magnetic storage has the highest areal

density?
(a) AMR stored with longitudinal encoding
(b) AMR stored with perpendicular encoding
(c) GMR stored with longitudinal encoding
(d) GMR stored with perpendicular encoding
Solution (d)
2. The first generation of hard disk drives that were produced, relied
on the magnetoresistance effect to decode the magnetically stored
bits of data?
(a) True
(b) False
Solution (a)
3. A hard disk drive uses a single read/write head for:

(a) All of the platter surfaces in the drive
(b) Each platter in the drive
(c) Each platter surface in the drive
Solution (c)
4. The GMR effect that has been used to increase hard drive density
is primarily used to:
(a) Decode bits of information on the magnetic thin-layer media
(b) Encode bits of information on the magnetic thin-layer media
(c) Decode and encode bits of information on the magnetic thin-
layer media
Solution (a)
T I M M O R T S O L F : T H E P H Y S I C S O F I N F O R M AT I O N
T H E O RY
13
13.1 introduction
n this chapter we are going to apply information theory to the

I data storage concepts presented in the last chapter. The theory of
information is a modern science that is used for the numerical compu-
tation of data encoding, compression, and transmission properties. The
chapter is an introduction to information theory and the physical prin-
ciples behind "information entropy", a concept which in data storage is
analogous to the role of entropy in thermodynamics.
13.2 information theory – the physical limits of data
We have seen that the technological improvements in data storage

devices over the last 50 years follow a pattern of increased data density
that roughly doubles every 18 months. At the time this textbook was
written (2008), researchers announced a new hard drive density record
of 803 Gb/in2 on a hard drive platter surface. This record was achieved
using TMR (Tunneling Magneto-Resistance) read/write heads. The
rate law of data storage as described by Moore’s law, is not a law that
comes from physical principles, but rather a law has been accurate in
estimating the engineering advances of computational technology that
have occurred over the past half century.
There are some laws of data storage and computation that can be
derived from physical principles. "Information theory" is the branch
of science that applies the rules of physics and mathematics to obtain
physical laws that describe how information is quantitized. These laws
are primarily derived from the thermodynamic equations of state for a
system. Entropy is a key player in this arena since entropy is a quantity
that defines the number of available configurations (or states) of a
closed system. In this section, we will introduce some of the laws
of information theory and show how they can be applied to data
storage devices. We will also use these laws to show that there are
indeed physical limits to computation that determine the maximum
information density and computational speed at which theoretical data
storage devices can operate.
13.2.1 Information Theory and the Scientific Definition of Information
Information theory can be informally defined as the mathematical for-

mulation of the methods that we use to measure, encode, and transform
information. "Information" is an abstract quantity that is hard to pre-
cisely define. In this chapter we will use a narrow technical term for
information to mean the encoding of a message into binary bits that
can be stored on a data storage device. One of the earliest applications
of information theory occurred when the Samuel Morse designed an
efficient Morse code that was used to encode messages for transmission
over a telegraph line. His code used a restricted alphabet of only "dash
97
98 tim mortsolf: the physics of information theory
(long)" and "dot (short)" electrical signals to transmit text messages

over long distances. For example, the letter C is encoded by the Morse
sequence "dash, dot, dash, dot". The principles of information theory
can be used to show that the Morse code is not an optimal encoding
using an alphabet of just two characters and the encoding can be further
improved by a factor of 15% [3].
The scientific treatment of information began with Hartley’s "Trans-
mission of Information" in 1928 [40]. In this paper, he introduced a
technical definition for information .
The answer to a question that can assume the two values ’yes’ or ’no’
(without taking into account the meaning of the question) contains one unit
of information.
Claude Shannon’s publication "A Mathematical Theory of Communi-

cation" in 1948 formed a a solid foundation for the scientific basis for
information theory [72]. Shannon introduced the concept of "informa-
tion entropy", also called the Shannon entropy , that minimum message
length in bits that must be used to encode the true value of a random
variable. Today, information theory exists as a branch of mathematics
that is has its main applications in encryption, digital signal processing,
and even biological sciences.
13.2.2 Information Entropy and Randomness
The best way to understand the basics of information entropy is to

look at an example. For our experiment, we want to flip a coin a large
number of times (let’s say 1,000,000) and encode the results of this
experiment into a message that we can store on a digital storage device.
If the coin used in our experiment is not biased then the two outcomes
"heads" and "tails" should occur with equal probability of 1/2. We will
encode each coin toss with a bit — let 1 indicate "heads" and 0 indicate
"tails". Each coin toss requires exactly one bit of information so for
1,000,000 coin tosses, therefore we would need to store 1,000,000 bits of
data to record the results of our experiment.
Let’s repeat this experiment, but instead let’s assume the coin is now
biased — let "heads" occur 3/4 of the time and "tails" occur only 1/4
of the time. It might surprise you that the results of this experiment
can be recorded on average with less than 1,000,000 bits of information.
Since the distribution of coin flips is biased, we can design an efficient
coding scheme that takes advantage of the biased nature to encode the
results of each coin flip in less than one bit. It turns out that we are
able to record each coin flip using only 0.8113 bits of information and
can thus use a total of 813,000 bits of data to record our results. This
same process explains why we are able to compress text documents
and computer images into a data file that is much smaller than the raw
data that we recover when we decompress the information.
A better example for explaining how we can encode the results of
a coin flip in less than one bit occurs when we let the coin become
even more biased — let "heads" occur 99/100 of the time and "tails"
occur only 1/100 of the time. If we encoded each coin flip using a
single bit as before, we would still need exactly 1,000,000 bits to record
the results of our experiment. But for this experiment we can design
a variable length encoding algorithm that does much better than this.
13.3 shannon’s formula 99
Since almost all of our coin flips are "heads", the sequence of coin flips
will be a long series of "heads" with an occasional "tail" interspersed to
break the series. Our encoding scheme takes advantage of the biased
nature by recording only those places in the sequence where a "tail" flip
has occurred. Since there are 1,000,000 coin flips, each "tail" requires a
20-bit number to record its position in the sequence. Out of 1,000,000
coin flips, the average number of tails will be 1, 000, 000/100 = 10, 000.
On average it will take "10, 000 × 20 = 200, 000 bits to record the results
of each experiment. Thus, by using a very simple encoding scheme we
have been able to reduce the data storage of our experiment by a factor
of 1/5.
For our initial recording scheme that recorded the values of each coin
flip as a single bit, the amount of data required to store the message
was always 1,000,000 bits, no more and no less. With our new encoding
scheme we can only discuss the statistical distribution of what the
amount of data required for each experiment will be. If we look at the
almost impossibly unlucky result where each of 1,000,000 coin flips
is "tails" that occurs for (100)1,000,000 of our experiments, the encoding
scheme would use "1, 000, 000 × 20 = 20, 000, 000 bits of data. This is
100 times larger than our average value of 200,000 bits that we expect
to use for each trial. These improbable unlucky scenarios are balanced
by scenarios where we get lucky and have much fewer coin flips with
"tails" than the average result and can encode our experiment in much
less than 200,000 bits of data. The important thing to note is that almost
all of experiments will require nearly 200,000 bits of data storage, but
the exact value will be different for each result.
We can design an even better encoding scheme than this one. Instead
of recording the absolute sequence number of each coin flip with a
"tails" result, we could just record the difference between the absolute
sequence number of each coin flip of "tails". For example, if a coin flips
of "tails" occurred on the coin flip values
{99, 203, 298, 302, 390}, (13.1)
we could encode these results as the differences in the positions,
{99 - 0 = 99, 203 - 99 = 104, 298 - 203 = 95, 302 - 298 = 96, 390 - 302 = 88}.
(13.2)
For this encoding scheme, we use 8-bit to encode differences less than
256, while our original compression scheme used 20-bit numbers for
those cases where the "tails" result to save its absolute position in the
sequence. With our latest encoding scheme, on average it will take
"10, 000 × 8 = 80, 000 bits to record the results of each experiment. But
can we do better than this? As you have probably guessed, we can
indeed. The limit to how well we can do can be calculated by Shannon’s
general formula for information uncertainty that we will develop in the
next section.
13.3 shannon’s formula
We begin our development of Shannon’s formula by introducing the

principle of information content, also known as information uncertainty.
We want to create a definition of information content that represents
the minimum number of bits required to encode a message with M
possible values. Let’s go back to the first experiment of 1,000,000 coin

flips of an unbiased coin. Before we start this experiment, we have no
idea what the result is going to be. Since each coin flip is can be one of
two results, namely "heads" or "tails", then there are M = 2n = 21,000,000
possible outcomes for this experiment and each of these outcomes is
equally likely since our coin flips are unbiased. It takes n = 1, 000, 000
bits of information to record each of these possible outcomes. We define
the uncertainty of the information content I contained in a message
with M possible different messages as
I = log2 ( M ) (13.3)
We also require that our definition of information content be additive.
For example, if we perform an experiment using 500,000 coins followed
by a second experiment again of 500,000 coins, then the sum of the
information content for each experiment should be the same as the
information content of the experiment for 1,000,000 coins. We can
see that our definition of information content does indeed satisfy our
requirement. The sum of the uncertainties of two different experiments
performed n1 and n2 times
I1 = log2 2n1 = n1 I2 = log2 2n2 = n2 , (13.4)
equals the uncertainty of an experiment performed n1 + n2 times
I = log2 2n1 +n2 = log2 2n1 2n1 = log2 2n1 + log2 2n2 = I1 + I2 . (13.5)
The quantity I is unitless. The unit value of I represents the number
of bits required to encode a message with 2 possible outcomes,
I (1) = log2 2 = 1. (13.6)
We now need to apply the definition for information content to
situations where the values encoded by the message are not equally
likely, such as is the case for our biased coin. Let Pi = 1/Mi be the
probability of getting any message Mi . We define the surprisal ui as the
"surprise" that we get when we encounter the ith type of symbol,
ui = − log2 ( Pi ). (13.7)
If the symbol Mi occurs rarely, then Pi is almost zero and ui becomes
very large, indicating that we would be quite surprised to see Mi .
However, if the symbol Mi occurs almost all the time, then Pi is almost
one and ui becomes very small, and we would not be surprised to see
Mi at all.
Shannon’s definition of uncertainty is the average surprisal of the
values contained in a message of infinite size. This is a limit that a
finite message of length N converges to when the size of the message
becomes infinite. First let’s determine the average surprise Hn for a
message of length N is encoded by an set of M different symbols, with
each symbol Mi appearing Ni times,
M
N1 u1 + N2 u2 + N3 u3 + ... + N4 u N Ni
Hn =
N
= ∑ N i
u. (13.8)
i
If we make this calculation in the limit of messages with an infinite

number of symbols then the frequency Ni /N of symbol Mi converges
to its probability Pi
Ni
Pi = lim N →∞ , (13.9)
N
13.4 the physical limits of data storage 101
which we can substitute into (13.8) to get,
M
H= ∑ Pi ui . (13.10)
i
We finally arrive at Shannon’s formula for information uncertainty by

substituting (13.7) into (13.10),
M
H = − ∑ Pi log2 Pi Shannon’s formula (bits per symbol). (13.11)
i
example 13.1: using shannon’s formula

In this section, we indicated that trials from a coin biased with 3/4 "heads"
to 1/4 "tails" can be recorded using 0.8113 bits of information. Let’s use
Shannon’s formula to prove this.
Solution Use Shannon’s formula from 13.11, setting P1 = 3/4 and

P2 = 1/4
M
H = − ∑ Pi log2 Pi
i
= −(3/4 × log2 3/4) − (1/4 × log2 1/4)
log10 3/4 log10 1/4
= −(3/4 × ) − (1/4 × )
log10 2 log10 2
= 0.8113
13.4 the physical limits of data storage
In the closing section to our chapter, we combine Shannon’s law that

we used to compute the information entropy with the laws of thermo-
dynamics. These will be used in a non-rigorous fashion to determine
the physical limits of data storage density and data operations of a
device called the "perfect disk drive". The "perfect disk drive" or PDD
is not a real device that exists today, but one we use to determine what
the ultimate boundaries of data storage capacity and data storage rates.
The PDD stores information at the lowest density of matter permitted
by physics. Compared to the PDD, today’s computers are extremely
inefficient. Most of the atoms of a highly dense hard drive that we
use today does not store any information, but are instead required to
make the disk drive function. Even if we peel back the layers of a single
surface of a hard disk drive platter, there are millions (if not more)
of atoms required to store the state of just a single bit of information.
Let’s consider this problem from a statistical mechanics point of view.
A disk drive has an enormous amount of available energy states that
come from the number of particles that the components of the disk
drive are constructed from. If we try to use quantum mechanics to
determine the number of available energy states for these particles then
we are attempting to solve something that is impossible with today’s
scientific methods. Fortunately, statistical mechanics provides us with
a method to approximate the number of available states of a system
from the thermodynamic equations of state. These approximations are
very accurate as temperatures increase and the quantum distribution of
energies become nearly continuous and can be treated classically. The

equation
S = k B ln W (13.12)
relates the thermodynamic state variable for entropy S to the quantity

W for the number of accessible states of a closed system, where k B is
the value of Boltzmann’s constant. The highest efficiency that our PDD
could achieve for data density is such it is able to precisely encode its
information into the number of accessible thermodynamic states. That
is, we assume the PDD can encode a single bit of information using just
one of these accessible states. We are not claiming this is something we
will actually be able to achieve, but we are claiming this is a physical
limit that the device can’t exceed. The number of accessible states W of
our closed system then able to hold exactly W bits of information. In the
previous section we defined the information content as the minimum
number of bits required to encode a message with M possible values.
The physical relation that defines our PDD is
M = W, (13.13)
which means it can encode a message with M possible values using

exactly the number of states W that are accessible to the system. Now
we can substitute this into (13.12) and use (13.3) to get
S = k B ln M = k B ln 2 I = k B I ln 2, (13.14)
which we rearrange to solve for I
S
I= (13.15)
k B ln 2
This formula (13.15) relates the information content we can store in the
device to the thermodynamic entropy of the physical system. Unfor-
tunately, there is not a simple analytical formula that we can use to
compute the entropy of an arbitrary device from knowledge of physi-
cal properties such as mass, volume, temperature, and pressure. The
computation of entropy requires complete knowledge of the quantum
energy levels available to each particle in the device. One author used
a method to approximate a lower bound for the maximum amount of
entropy in a 1 kg device that occupies a volume of 1 L and assumes
that most of the energy of the system arises from blackbody photon
radiation [52]. Using these simplifications, he arrives at a value of
T = 5.872 × 108 K for the temperature at which the maximum entropy
occurs, and S = 2.042 × 108 J K−1 for the total entropy. Substituting
these values into our equation for the information content,
2.042 × 108 J K−1

I= = 2.132 × 1031 bits (13.16)
1.381 × 10−23 J K−1 ln 2
How does this compare to today’s hard disk drive technology? At

the time of this book (November 2008), disk drives with a little over
1.5 Terabyte = 1.2 × 1013 bits of data can be purchased for desktop
computer systems.
13.5 summary 103
example 13.2: applying moore’s law

Use Moore’s law to see how far away we are from this theoretical limit
Solution Current technology permits us to mass produce desktop

hard drives with 1.2 × 1013 bits of data. We need to find out how long it will
take to achieve 2.132 × 1031 bits of data in roughly the same size device.
2.132 × 1031 bits

increased storage ratio = = 1.78 × 1018
1.2 × 1013 bits
To find out how many times we have to double the storage capacity to
increase our storage capacity by this much, we take the base 2 logarithm of it.
log10 1.78 × 1018

number times to double = log2 1.78 × 1018 = = 60.6 times
log10 2
Now using Moore’s law of 18 months per doubling of storage capacity, we

estimate the amount of time required to be
18 months 1 year
time required = 60.6 times to double × ×
1 time to double 12 months
time required ≈ 91 years
Thus it will take about 90 years to develop the technology to store data at our
arrived value of the maximum theoretical limit for data storage. Although
this calculation does have several simplifying assumptions, it does show that
there is a lot of room left to increase the data density relative to a device that
precisely encodes information content into the entropic states of a physical
system.
13.5 summary
Information theory is a mathematical formulation of the methods that

we use to measure, encode, and transform information. Information
entropy is a quantity that measures the randomness of data sequences.
Shannon’s formula for information uncertainty is used to calculate this
value. The lower the entropy, the less the randomness, and the greater
we can compress the data. The principles of thermodynamics and
information entropy can be combined to estimate the maximum amount
of information that can be stored at the atomic level per quantum degree
of freedom.
13.6 exercises
1. Your assistant Bob rolls a six-sided dice and you don’t know the
result. What is the minimum number of yes/no questions that
you could ask Bob to get the answer? Try to use information
theory to answer this question.
Solution Before posing a set of questions, let’s first use informa-
tion theory to try to encode the result. The dice and have six
results and using a binary coding scheme, we need three bits to
encode the results. There are several different schemes that we
could use encode the dice result. Here is one that encodes them
in order of lower to higher values.
Encoding Dice Throw

000 1
001 2
010 3
011 4
100 5
101 6
110 Not used
111 Not used
Our encoding scheme is not perfect since it has two unused values.
But to record the results of a single dice throw, we cannot design
a code more efficient than this. If the results of many dice throws
were recorded we could be able to record the results in less than
three bits per throw by using a clever compression scheme. From
the encoding scheme, we can design a series of questions that
ascertain the value of the bits used to encode our results. The first
question tests the value of the first bit — "Is the value of the dice
roll a five or a six?". The second question tests the value of the
second bit "Is the value of the dice roll a 3 or 4?". And the third
question tests the value of the last bit — "Is the value of the dice
roll an odd number?".
2. Repeat the same problem but this time Bob rolls two six-sided
dice and you only want to determine the value of the sum.
Solution The sum of two six-sided dice rolls will have values from
2 (when two 1’s are rolled) through 12 (when two 6’s are rolled).
Our encoding scheme needs to encode eleven values which takes
four bits. However, since in this case some results are more likely
than others, let’s see if we can design a more clever encoding
scheme.
Binary/Base-2 Number Sum of dice Probability
1
0000 2 36
2
0001 3 36
3
0010 4 36
4
0011 5 36
5
0100 6 36
0101 6
6
0110 7 36
0111 7
1000 7
1001 7
5
1010 8 36
1011 8
4
1100 9 36
3
1101 10 36
2
1110 11 36
1
1111 12 36
This encoding scheme does not have any unused values. We

already require four bits to encode all of the possible values of the
sum. However, instead of having encoding that are unused, the
extra codes are assigned to those values of the dice sums that are
more likely to occur, which in this case are the values 6, 7, 8. This
encoding requires a maximum of four questions to determine the
value of the dice sums, but for some values it will be less than this,
namely the ones for which we have degenerate encoding values.
From the encoding scheme, we can design a series of questions
that ascertain the value of the bits used to encode our results. But
now we can now test for a value of a six using just two questions.
The first question tests the value of the second bit — "Is the value
of the sum a six or a seven?". For an answer of yes, the second
question tests the value of the third bit — "Is the value of the
sum a six?". Thus, in just two questions we know if the sum has a
value of six. Similarly, with just three questions we will know if
the value of the sum was a seven or an eight. All the other values
would require the full four questions. However, on average our
encoding scheme and associated questions require less than four
questions to determine the result of each set of dice rolls.
Challenge: Use Shannon’s formula to determine the minimum
number of bits required to encode the results. Then use our
encoding scheme and compare its efficiency to this theoretical
limit.
1. How many bits of information would be required on average to

record the results of an unbiased coin that is flipped 1,000,000
times
(a) 813,000
(b) 500,000
(c) 1,000,000
(d) None of the above
Solution (c)
2. How many bits of information would be required on average to

record the results of 1,000,000 coin flips of a coin that is biased so
that 3/4 of the flips are "heads" and 1/4 are "tails"?
(a) 813,000
(b) 500,000
(c) 1,000,000
(d) None of the above
Solution (a)
S A M B O O N E : A N A LY T I C A L I N V E S T I G AT I O N O F
T H E O P T I M A L T R A F F I C O R G A N I Z AT I O N O F
SOCIAL INSECTS
14
warm Intelligence is the property of a system where a collec-
S tion of unsophisticated beings or agents are functionally coherent
through local interactions with their environment allowing for large
scale patterns to occur. This organizing principle is named stigmergy.
While the term, stigmergy, was dubbed in the 1950’s by Pierre-Paul
Grasse to give a name for the relationship between social insects and the
structures they create, such as ant hills, beehives and termite mounds,
this anomaly has actually existed for billions of years. The term, stig-
mergy, literally means “driven by the mark [86].” The bodies of all
multi-cellular organisms are stigmergy formations. This phenomenon
is apparent throughout nature in multiple species of social insects such
as ants, wasps, termites and others. While stigmergy has been proven
to be a powerful and immensely useful criterion of self-organized opti-
mization, it has only been researched in recent years [66]. The research
that has been conducted has uncovered simple algorithms for traffic
congestion control and resource usage based on local interactions as
opposed to centralized systems of control. These powerful algorithms
have already begun to be applied to artificial intelligence and computer
programing.
14.1 introduction
14.1.1 Stigmergy in Nature
Ant trail formation and general large scale organized travel is made pos-
sible by the indirect communication between individual ants through
the environment. The individual ants deposit pheromones while walk-
ing. As an ant travels, it will probabilistically follow the path richest in
pheromones [22]. This allows ants to adapt to changes or sudden ob-
stacles in their path, and find the new shortest path to their destination.
This incredibly powerful social system allows entire colonies of ants to
make adaptive decisions based solely on local information permitting
ants to transport food far distances back to their nest.
The organizing process, stigmergy, is also used in a similar way
by termites. When termites build mounds they start by one termite
retrieves a grain of soil and sets it in place with a sticky glue-like saliva
that contains pheromones. In this case the mark referred to in the literal
meaning of stigmergy is the grain of soil. This pheromone filled saliva
then attracts other termites who then glue another grain of soil on
top of the last. As this process continues the attractive signal increases
as the amount of pheromones increase [86]. This process is called,
stigmertic building. Stigmertic Building allows the termites to construct
pillars and roofed galleries. The simplest form of stigmertic building is
pillar construction. As the pile of grains of soil accumulate upwards it
naturally forms a pillar. Once the pillar reaches a height approximately
equal to that of a termite standing on its hind legs the termites begin
107
108 sam boone: analytical investigation of the optimal traffic organization of socia
to add soil grains laterally. These lateral additions collectively form a

shelf extending from the top of the pillar. Eventually, the shelf of one
pillar meets with the shelf of another nearby pillar. This connection of
shelves creates a roof.
Figure 14.1: Simulated Stages of Termite Roofed Gallery Construction [86]
Figure 14.2: Common Example of a Termite Roofed Gallery Construction [86]
14.1.2 Applying Stigmergy in Human Life
The stigmergy concept in general has led to simple algorithms that have
been applied to the field of combinatorial optimization problems, which
includes routing in communication networks, shortest path problems
and material flow on factory floors [66]. While these applications can
aid efforts to make transportation routes become more efficient other
considerations must be made, such as, congestion and bottlenecking.
This problem applies to user traffic on the internet as well as motor
vehicle traffic.
Ant Colony Optimization(ACO) based algorithms have also begun to
be used in multi-objective optimization problems. In recent years video
game designers have started to use ACO based algorithms to manage
Artificial Intelligence in their games. In particular, these algorithms
are used to manage functions like multi-agent patrolling systems. In
order for this function to run properly, all agents must coordinate their
actions so that they can most efficiently visit the areas of most relevance
as frequently as possible. Similar algorithms have been used in robotics,
computer network management and vehicle routing. Their has also
been recent break-throughs in using ACO to create artificial 3D terrain
objects for video games, as well as engineering, simulations, training
environments, movies and artistic applications.
ACO algorithms can also be very useful for site layout optimization.
This proves to be very valuable, whether applied to the construction
of highways and parking garages to manufacturing plants and ports.
Any construction site layout that is concerned with the positioning and
timing of temporary objects, machines and facilities that are used to
carry out the actual construction process, as well as the permanent
objects being created with regards to their ultimate purpose that will
be carried out, will greatly benefit from the solutions of these prob-
14.2 ant colony optimization 109
lems. Ultimately these ACO based algorithms will greatly enhance the
efficiency while decreasing the cost of these sites.
While ACO research is still very young and fairly scarce, the number
of possible applications seem to be limitless. Almost any field of science,
mathematics, engineering, computer design, finance, economics and
electronics can be greatly improved and expedited with the application
of ACO.
14.2 ant colony optimization
In 1992, Marco Doringo proposed a metaheuristic approach called

Ant Colony Optimization (ACO). Dorigo was inspired by the foraging
behavior of ants. ACO enables ants to find the shortest paths from food
sources to their nest. The decisions of path of travel are probabilistically
the path of highest pheromone concentration. The ACO algorithms. that
Doringo proposed, are based on a probabilistic model, the pheromone
model. This model is used to represent the chemical pheromone trails
[22]. The pheromone model uses artificial ants that construct solutions in
increments by adding solution components to a partial solution that
is under consideration. The artificial ants perform randomized walks
on a graph G = (C, L), called a construction graph, whose vertices are
the solution components C and the set L are the connections [22]. This
model is applied to a particular combinatorial optimization problem by
having the constraints of that particular problem built into the artificial
ants’ constructive procedure so that in every incremental step of the
solution construction only feasible solution components can be added.
Depending on the problem the use of the pheromone trail parameter Ti
is beneficial. The set of all pheromone trail parameters is labeled T.
This allows the artificial ants to make probabilistic decisions on which
direction to move on the graph [10].
14.3 optimization by hand
The difficulty of an optimization problem directly correlates with the

complexity of the problem itself. The more variables that the solution
relies on the exponentially more difficult the problem becomes. To
drive home this notion we will attempt a simple 2-variable optimization
problem whose solution can easily be solved by hand.
example 14.1:
Problem 1: Find two positive numbers whose sum is 12 and the product of
one number and the square of another is at a maximum.
Solution: Strictly based on the problems design we are left with the fact
that:
x + y = 12. (14.1)
By rearranging for y we are left with,
y = 12 − x. (14.2)
Our goal is to maximize the product:
P = xy2 , (14.3)
which can be rewritten by substituting our value for y to look like,
P = x (12 − x )2 . (14.4)
We then take the first order derivative of both sides,
dx (12 − x )2
P0 = = −2x (12 − x )2 + (12 − x )2 , (14.5)
dx
which simplifies to become,
P0 = (12 − x )(−3)( x − 4). (14.6)
Therefore, P0 equals 0 when x = 4or12. We then calculate the values of our

product for these values.
P( x = 0, y = 12) = 0, (14.7)
P( x = 4, y = 8) = 256, (14.8)
P( x = 8, y = 4) = 128, (14.9)
P( x = 12, y = 0) = 0. (14.10)
Finally, we are left with our solution:
x = 4, y = 8, (14.11)
P = 256. (14.12)
While this simple optimization problem can be easily solved by

hand, a more complex problem with hundreds of variables would be
unbelievably difficult to solve and the time to calculate the solution
would be tremendous. It is for these complex optimization problems
that ACO based algorithms are enormously helpful.
14.4 applying aco meta-heuristic to the traveling sales-

man problem
One of the famous optimization problems that the Ant Colony Opti-
mization Meta-Heuristic has been applied to is the Traveling Salesman
Problem. The Traveling Salesman Problem is the problem to find the
shortest closed path in which one could visit all the cities in a certain set.
While this algorithm or others of the same nature could be tweaked to
handle this problem with different variables we will concern ourselves
with a Traveling Salesman Problem in which there exists a path (edge)
between any pair of cities.
14.4.1 Applicable Ant Behaviors Observed in Nature
When a colony of ants forage for food, the ants will initially leave their
nest and head for the foraging area in which the food that they are
looking for lies. Lets assume that their is a natural path from the nest
to the foraging area which twice forks and reconnects. Therefore their
are four possible paths to the food in which the ants can travel. At
first, there is no information on these paths that could give the ants
14.4 applying aco meta-heuristic to the traveling salesman problem 111
prior information on which path is the shortest. As a result the ants

will randomly choose a path to take to the food. Naturally the ants
that arrive at the food first will be the ones that chose the shortest path.
These ants will then take the food that they gather and head back to
their nest backtracking on their path that they took to the foraging area.
Since these ants will be the first to the food and the first to head back to
the nest their chosen paths will build up higher levels of pheromones
quicker than the paths chosen by their counterparts. Very quickly, the
shortest path will by far possess the highest level of pheromone and as
a result the vast majority of the colony will begin to use this path. The
amazing aspect of this natural problem solving exposition displayed
by these ants is that the natural properties of the paths or obstacles in
question intrinsically hold the solution to the problem.
Figure 14.3: Double bridge experiment. (a) The ants start to randomly choose
paths to foraging area. (b) The majority of the ants eventually choose the
shortest path (Graph) The distribution of the percentage of ants that chose the
shortest path in the double bridge experiment [24]
14.4.2 Approach to the Traveling Salesman Problem
While we will use Marco Dorigo’s artificial ants to solve our optimiza-
tion problem hands-on we must recall that all of our artificial ants will
be acting based upon natural ant behaviors that have been witnessed
in nature. For this problem we will use three ideas that have been
witnessed in nature by real ants [23]:
1. Preference to choose paths rich in pheromone levels.
2. Increased rate of growth of pheromone levels on shorter paths.
3. Communication between ants mediated by alterations in environ-

ment or the trail.
Our artificial ants will move from city to city on the Traveling Salesman
Problem (TSP) Graph based on a probabilistic function that incorporates
both heuristic values that depend on the length of edges (length of
trails between cities) and trail accumulated on edges [23]. The artificial
ants will prefer cities rich in pheromone trail and which are connected
by edges (which in our case is all possible cities). Every time step when
an ant travels to another city they modify the pheromone trail on the
edge being used. Dorigo calls this local trail updating. Initially we will
place m artificial ants randomly across our (TSP) Graph at randomly
selected cities. Once all of our ants have completed a tour the ant that
made the shortest journey will modify the edges used in his tour by
adding pheromone amounting to a total that is inversely proportional
to the tour length. This step is dubbed global trail updating.
14.4.3 Comparison Between Artificial Ants and Real Ants
As we stated earlier, the artificial ants that we are using act based upon
observed behaviors of real ants in nature, however these artificial ants
were given some capabilities that the real ants do not possess. How-
ever, all of these differences are necessary adaptations that must take
place in order for the artificial ants to be able to discovery solutions
to optimization problems in their new environment (the TSP Graph).
The fundamental difference between the artificial ants used in the ACO
algorithm and the biological ants their design is based upon is in the
way that they modify the trails they travel. Like the ants observed in
nature who deposit pheromones on the surface of their environment,
the artificial ants modify their surroundings by changing some numeri-
cal information that is stored locally in the problem’s state [24]. This
information includes the ant’s current history and performance which
can be read and altered by ants who later visit that region. This informa-
tion is called artificial pheromone trail. This artificial pheromone trail is the
only means of communication between each individual member of the
colony. In most ACO algorithms an evaporation mechanism is installed,
much like real pheromone evaporation, that weakens pheromone infor-
mation over time allowing the ant colony to gradually forget its past
history so that it can modify its direction without being hindered too
much by its past decisions [24].
There are a few traits that the artificial ants possess that the real ants
do not [24].
1. The artificial ants possess an internal state that contains the mem-
ory of its past actions.
2. The artificial ants deposit an amount of pheromone that correlates
to the quality of the solution.
3. The artificial ants timing of when they lay their pheromone does
not reflect any pattern of actual ants. For example, in our case
of the Traveling Salesman Problem the artificial ants update the
pheromone trails after they have found the best solution.
4. The artificial ants in the ACO algorithms can be given extra capa-
bilities by design like lookahead, backtracking, local optimization
and others, that make the overall system much more efficient.
14.5 solution to the traveling salesman problem using an

aco algorithm
In the Ant Colony System we have designed an artificial ant k is in a city

r and chooses a city s to travel to. This city s will be chosen among those
14.5 solution to the traveling salesman problem using an aco algorithm 113
that are not apart of that ants working memory Mk . The ant chooses
the next city s using the probabilistic formula:
∈ Mk [ τ (r, u )] · [ η (r, u )] ,
β
s = argmaxu/ (14.13)
where q ≤ qo . The probabilistic function, “argmax,” stands for the

“argument of the maximum.” In other words, this is the value of u
where [τ (r, u)] · [η (r, u)] β is at a maximum. Otherwise,
s = S, (14.14)
where τ (r, u) is the amount of pheromone on an edge (r, u), η (r, u) is

a heuristic function that was designed to be the inverse of the distance
between cities r and u, β is the parameter that weighs the importance
of the pheromone level and how close the nearest cities are, q is a
random value with a uniform probability [0, 1], qo is a parameter in
(0 ≤ qo ≤ 1), and S is a variable that were randomly selected according
to a probability distribution which favors edges that are shorter and
have higher build-ups of pheromone [23]. The random variable S is
chosen based on the probability distribution:
[τ (r, s)] · [η (r, s)] β

pk (r, s) = , (14.15)
∑u/
∈ Mk [ τ (r, s )] · [ η (r, s )]
β
if s ∈
/ Mk . Otherwise,
pk (r, s) = 0, (14.16)
where pk (r, s) is the probability of an ant k choosing to move from a

city r to a city s [23].
So once all of the artificial ants have completed their respective tours
the ant with the shortest solution goes back and lays pheromone along
the edges it used in its solution. The amount of pheromone ∆φ(r, s)
that the ant deposits is inversely proportional to the length of the tour.
This global trail updating formula is:
∆φ(r, s) ← (1 − α) · φ(r, s) + α · ∆φ(r, s), (14.17)
where ∆φ(r, s) = 1/shortesttour. This global trail updating ensures that

the better solutions receive a higher reinforcement.
Local trail updating, on the other hand, ensures that not all of the
ants choose a very strong edge. This is achieved by applying the local
trail updating formula:
τ (r, s) ← (1 − α) · τ (r, s) + α · τo , (14.18)
where τo is simply a parameter.

The beauty of this reinforcement learning system is that the colonies
behavior depends on two constraints, the pairs of formulas (1.1) and
(1.2), as well as, (1.3) and (1.4). The first pair allows an ant to exploit
the colony’s accumulated experience, with probability qo , in the means
of pheromone levels, which has been built up on edges that belong to
short tours. The second pair allows the ant to make decisions based
upon exploration, with probability (1 − qo ), that is biased towards
short and high trail edges. This exploration is one of new cities that
are chosen randomly with a probability distribution that is a function
of the heuristic function, the accumulated pheromone levels and the
working memory Mk .
1. Homework Problem 1: An open rectangular box with a square base

is to be made from 96 f t2 of material. Find the dimensions that
will allow the box to have the maximum possible volume.
Solution The surface area of the box will be equal to the sum of
the area of the base and the four sides. This must all be equal to
96 f t2 ,
A = 96 = x2 + 4xy. (14.19)
By rearranging and isolating y we are left with,
96 − x2
y= . (14.20)
4x
Simplifying this further,
24 1
y= − x. (14.21)
x 4
The expression for the volume of our box is as follows,
V = x2 y. (14.22)
Substituting in our expression for y we are left with,
1
V = 24x − x3 . (14.23)
4
Now we differentiate both sides,
3
V 0 = 24 − x2 (14.24)
4
3
V0 = (32 − x2 ) (14.25)
4
√
Therefore, when V 0 = 0, x must be equal to +/ − 32. SInce x is
a measurement it must be positive. Therefore,
√
x = 32 f t. = 5.66 f t., (14.26)
and
y = 2.83 f t. (14.27)
The volume of the box is then,
V = 90.56 f t2 . (14.28)
2. Homework Problem 2: In this chapter the only specific ACO algo-

rithm that we discussed was the one designed for the Traveling
Salesman Problem. We did, however, mention that ACO algo-
rithms have been used for many different optimization problems
in many different fields. In general, what aspects of the ACO
algorithm would need to be adjusted for the algorithm to be able
find a solution for a different problem?
Solution: The major aspects of an ACO algorithm that determine
the problem solving abilities of that algorithm are the probabilistic
biases that the artificial ants’ movements abide to, the way in
which they lay pheromone and reinforce quality solutions, the
movement constraints that exist on the ants due to the problem’s
intrinsic design and whether or not the artificial ants possess
extra, problem specific abilities that assist them in their search for
the solution.
14.5 solution to the traveling salesman problem using an aco algorithm 115
3. Exam Question 1: The process in which an artificial ant in an

ACO algorithm for the Traveling Salesman Problem retraces his
steps after every ant has completed its respective tour and adds
pheromone to the edges he used is called:
a = local trail updating
b = the pheromone model
c = stigmertic building
d = global trail updating
Solution: d = global trail updating
4. Exam Question 2: Which of the following traits is not a difference

between artificial ant of the ACO algorithm and real ants?
a = The amount of pheromone that is deposited by a single ant
correlates to the quality of the solution
b = A preference to choose trails that are rich in pheromone
c = The possesion of an internal state that contains the memory
of the owner’s past actions
d = The way in which the ant alters their environment to commu-
nicate with the rest of the colony
Solution: b = A preference to choose trails that are rich in pheromone
CHRISTOPHER KERRIGAN: THE PHYSICS OF
SOCIAL INSECTS
15
15.1 introduction
hysics as a whole is an area in which precise measurement and

P observation is integral to the expansion of the field. It is perhaps
for this reason that the study of living systems under the framework of
physics is an oft overlooked application. This chapter therefore concerns
itself with attempting to approach one of the groups of living systems
most closely correlated to preexisting physical concepts and theories:
the social insect.
But just what are social insects? This question is best answered by
taking into account the concept of eusociality, but since this involves at
least some digression into the world of sociology, we will leave it to be
explored at will by the reader. For our purposes, the physical aspects
of the social insects are most important. We have a group of hundreds
to thousands of insects of the same species who, individually, show
almost no signs of intelligence or survival skills but are a functioning
group as a whole. The physical dynamics of this group in its entirety
shall be our focus.
It is the author’s experience that nature is a thing of patterns, and
this observation will be the basis for the structure of this chapter.
We shall apply to the dynamics of social insects the concepts and
theories derived from a more well-explored avenue of physics, namely
electromagnetism.
Note: This chapter serves a dual purpose as not only a method of
studying the dynamics of social insects, but also as an example of how
to approach a concept with "‘the physicists mind"’. That is, the chapter
explains how to expand the ideas of physics to cover new areas.
15.2 the electron and the ant
In comparing social insect systems with electromagnetic ones, the first

logical correlation to draw is of the fundamental constituents of the
system. In social insect systems, this is the individual insect (which
shall hereafter be referred to by the synecdoche "‘ant"’ to save time). In
electromagnetism, this is the electron.
The second logical correlation is of the system as a whole. For social
insects, this is the colony. For electromagnetism, our analogy will be
the circuit.
So what does a circuit do? We know that individual electrons moving
through a conducting material creates a current. This current is used
to power various components in the circuit which achieve some goal.
Likewise, ants move in a constrained group in some sort of cycle to
achieve some goal, namely the survival of the group. We can say with
confidence that the movement of the ants is indeed cyclical simply
because they are members of the colony. If they leave the center of the
colony (to fulfill some goal for the group), they will return at some time
(probably having fulfilled said goal). We can define the center of the
117
118 christopher kerrigan: the physics of social insects
colony as the place where more ants are produced. This has an obvious
analogy to the battery (or EMF) in an electric circuit. Also, we can safely
say that the constrained group in which the ants move is probably a
single-file line because of the width of ant tunnels (and from general
observation.) Knowing that the processes of both systems occur in a
similar fashion, perhaps we can use the equations of electric circuits to
describe the motion of the ants.
15.3 insect current
The current in a circuit is any motion of charge from one region to

another, and can be described as the net charge flowing through some
cross-sectional area per unit time. We should be able to define the insect
current in the same manner, replacing the charge with the ant. This
current can be described as the number of ants crossing some width
(assuming the ants move along a roughly two-dimensional surface (the
ground) and do not walk on top of each other) per unit time,
N
I= , (15.1)
t
where N is the number of ants. With this simple and obvious compar-
ison, we can draw many conclusions about the motion of a group of
social insects.
example 15.1: the ant hill

A group of one hundred ants is moving in a single-file line into the ant hill,
whose diameter can fit one ant. If ants can travel at a rate of .3 meters per
second and an ant’s size is 10 mm, what is their current through the ant hill?
Answer:
Our line of ants is 10mm × 100 = .1m long. A line of ants of

this length moving at the given rate will take (.1m)/(.3m/s) = 1/3
seconds. Knowing both the time and the number of ants, we can
easily calculate the current, I = 100/ 13 = 300 ants per second.
15.4 insect diagram
The destinations of social insect groups can often be defined in terms

of the provisions that the group needs to survive. The most important
factor for the survival of our ant colony is its food, so we can interpret
the possible destinations of the ants as locations with food to bring
back to the center.
We are beginning to make it possible to chart the movement of the
colony. We know that the ants come from a source and move in lines
toward a source of sustenance which they collect and return to the
colony center with. This is analagous to the circuit, whose electrons
come from a battery and travel through a conductor through various
pathways which, by the definition of a complete electric circuit, must
lead them back to the battery. Since this process can be charted by way
of the circuit diagram, we can chart the progression of the ants in a
similar manner.
Let us first examine the well-known structure of the circuit diagram
by referring to Fig. 1. We can see that the diagram depicts a battery and
15.5 kirchoff’s rules (for ants) 119
Figure 15.1: Circuit
two resistors in parallel. If we imagine the battery as being the center

of the ant colony and the resistors as being sources of food, we have
what we shall refer to as an insect diagram. Ants leave the center from
the "‘positive side,"’ travel to one of two (for example) food sources,
and return to the center on the "‘negative side"’.
15.5 kirchoff’s rules (for ants)
We will now apply two well-known theorems to the notions that we

have already established, namely those of current and diagram. The
theorems to which we refer are Kirchoff’s junction rule and his loop
rule. The former states that the algeraic sum of the currents into any
junction is zero. This would mean, in our insect terms, that the number
of ants going into any point at which they decide which food source
to travel to is equal to the number of ants at all the subsequent food
sources. This should be not only true, but obvious (assuming no ant
dies along the way). The loop rule states that the algebraic sum of
voltages in a closed circuit must be zero. In our terms, this may mean
that there must be both an ant source and a food source in order for
a path to exist. This makes logical sense, since ants do not simply go
out wandering. We have therefore successfully applied an important
theorem of electrodynamics to a macroscopic and very real system.
15.6 differences
When applying a theory that works to describe some system to a totally

new system, perhaps the most important relationship between the two
is that of their differences. A difference that seems particularly suited
to analysis is the fact that electrons will travel in the direction of lower
voltage, while ants seem to choose an initial path rather randomly and
then perfect this path based on the movement of the returning ants and
by sensing the pheromones left behind by ants who have travelled the
better path. We shall explore this behavior as an exercise in making the
differences clear.
Assume that two ants start a journey towards food with equal proba-
bilities of going on either of two paths (Fig. 2). We will say that one of
the paths (the "‘lower"’ path in the diagram) is shorter, and thus it takes
longer for the ant not on this path to get to the food (Fig. 3). Following
Figure 15.2: Ants
Figure 15.3: Ants
this logic, the ant who has taken the shorter path will return to the nest
faster (Fig. 4). This means that just outside the nest, the pheromone
density is twice as high on the shorter path (because it has been passed
twice, whereas the longer path has only been passed once until the ant
makes it back to the nest). Other ants leaving the nest, then, will opt for
the shorter path because of the higher pheremone density. Over many
iterations, the ants further reinforce the shorter path and after a while
this will be the only path used (Fig. 5).
15.7 the real element
We have managed to describe at least some of the motion of social

insects in a quantitative manner, which was our initial goal. This method
can be used to study the social insects in their environment, but it is
important to remember the constraints and assumptions that we may
have taken for granted. The behavior of living beings is an extremely
complicated concept to analyze, and it is therefore vital that we take
our limitations into account.
We can consider, for example, experiments in which ants in an in-
duced state of panic and given two doors to escape the "room" they
are in will not leave in a symmetric manner, as a physical analysis
may predict. They instead show a tendency to "follow the leader,"
which is a complex phenomenon far beyond the reach of this text and
15.8 problems 121
Figure 15.4: Ants
Figure 15.5: Ants
perhaps even biophysics as a whole. This situation is intended to il-

lustrate the fact that physical representations of systems are just those:
representations, and should be applied as such.
15.8 problems
1. Calculate the probability that one of many ants is at one of n food

sources. The source of the ants and the food sources are all on a
line with a distance l between each.
Solution:
We know that a given food source is n × l from the ant source. If we
assume that the probability of an ant to go either direction at a food
source is 1/2, we see that for a food source F (n)
1
P[ F (1)] = (15.2)
2
1 1
P[ F (2)] = ( ) (15.3)
2 2
1
P[ F (n)] = ( )n (15.4)
2
2. Suppose the distance between two points a and b is d ab and Cab is the
pheromone density along this line. What is the probability for a single ant
to go from one point a to another point b?

Solution:
We can imagine that the probability of an ant moving in a direction is
directly proportional to the pheromone density in that direction. Since
density along a line is inversely proportional to the length of the line
(whereas in three dimensions it is inversely proportional to the volume),
we can say that
Cab
P= (15.5)
d ab
Part IV
W H AT ’ S O U T T H E R E
R O B E RT D E E G A N : T H E D I S C O V E RY O F N E P T U N E 16
16.1 introduction
eptune was first officially discovered on September 23, 1846.

N While the discovery of any new planet is normally viewed by
society as a rather important event, Neptune’s discovery stands out as
notable in the scientific community as well because of the manner in
which it was found. Unlike the other seven planets in our solar system
Neptune was not first directly observed, but rather it’s existence was
theorized due to the effects of its gravity, in accordance with Newton’s
Laws.
16.2 newton’s law of universal gravitation
Before we can look into exactly how it was that astronomers found
Neptune, we must first understand the laws that govern celestial bodies.
The interaction of any two masses was described by Newton with his
Law of Universal Gravitation which states that every point mass acts on
every other point mass with an attractive force along the line connecting
the points. This force is proportional to the product of the two masses
and inversely proportional to the square of the distance between the
points
m1 m2
F=G (16.1)
r2
where the force is in Newtons and G is the gravitational constant equal
to 6.674 × 10−11 N m2 kg−2 [101].
example 16.1: celestial gravitational attraction

Using Newton’s Laws we can find the forces acting upon Uranus due to both
the Sun and Neptune, when Neptune and Uranus are at their closest:
Orbital Radius of Neptune ( Rn ) = 4.50 × 1012 m
Orbital Radius of Uranus ( Ru ) = 2.87 × 1012 m
Rnu = Rn − Ru = 1.63 × 1012 m
Mass of Sun ( Ms ) = 1.99 × 1030 kg
Mass of Neptune ( Mn ) = 1.02 × 1026 kg
Mass of Uranus ( Mu ) = 8.68 × 1025 kg
Ms Mu
Fs−u = G = 1.399 × 1021 N (16.2)
( R u )2
Mn Mu
Fn−u = G = 2.224 × 1017 N (16.3)
( Rnu )2
So the force acting on Uranus due to the Sun’s gravity is 4 orders of magnitude
larger than the force due to Neptune’s gravity.
125
126 robert deegan: the discovery of neptune
Figure 16.1: Predicted and Actual Orbits of Uranus and Neptune. Grey =
Uranus’ predicted position, light blue = Uranus’ actual position, dark blue =
Neptune’s actual position, yellow = Le Verrier’s prediction, green = Adams’
prediction [94]
Given that one object is much more massive than the other, this
attractive force will cause the less massive object to accelerate towards
the more massive object while having little effect on the latter. However,
if the less massive object is not at rest but rather has some initial velocity
vector and thus an initial angular momentum, then this force will not
cause the lesser mass to accelerate directly towards the larger mass
but rather cause it orbit about the larger object due to the necessity to
conserve the angular momentum of the system. This obviously is the
reason the planets in our solar system orbit about the Sun, due to its
relatively large mass. However, the orbits of the planets in our solar
system are not just affected by the Sun, in most cases the gravitational
field of adjacent planets are also strong enough to influence each others
orbit, as we shall see momentarily.
16.3 adams and le verrier
Shortly after Uranus was discovered in 1781 its predicted orbit was
calculated using Newton’s laws of motion and gravitation. However,
by looking at data for the position of Uranus from 1754-830(some
observations of the planet were made prior to the discovery that it
was in fact a planet) and specifically over the period from 1818-1826 it
was noticed that there were discrepancies between Uranus’ predicted
and observed orbit. Tables predicting Uranus’ orbit had been made
shortly after its discovery and soon comparison to these tables noted
discrepancies to large to be accounted for by observational error or
by the effects of Saturn and Jupiter on Uranus. As these discrepancies
grew larger and larger in the early 1800’s they became more and more
troubling, and numerous theories to account for this were postulated,
one being that there was some unknown body perturbing the orbit. This
theory appealed to two men, Urbain Le Verrier and John Couch Adams.
Both Adams and Le Verrier set out independently to investigate it.
16.4 perturbation 127
Observations had shown that not only was Uranus not at the correct
position in it’s orbit at a given time, but also that its distance from
the sun was substantially larger than it was predicted to be at some
points. This evidence to some further supported the hypothesis of the
unknown planet, and the increased radius vector clearly showed that
this mysterious planet must be outside of Uranus’ orbit. This is the
assumption Adams made as he began his investigation, but he quickly
ran into trouble. The typical way of determining a perturbing bodies
effect on an orbit was to calculate the gravitational effects of a body of
known mass and position on another planet, and subtract these values
from the observed orbit. After this had been done one could see the
"true" orbit of the planet, as it would be without the perturbing body,
and then when adding back in the effects of the perturbing planet
one gets the actual elliptical orbit followed by the planet and can thus
predict its location with extreme precision.
In this case however, the characteristics of the perturbing body were
unknown which made it impossible to calculate the "true" elliptical or-
bit, as this required knowing the perturbative effects. And these effects
in turn could only be found once one already knew the "true" orbit, so
Adams’ approach was to solve both problems simultaneously. Using
this approach Adams soon ran into another problem, that being that
the perturbation caused by a more massive planet further away from
Uranus would be indistinguishable from the effect of a less massive
planet closer to Uranus. To avoid this issue Adams simply guessed that
the average distance from the sun of the perturbing planet was twice
that of Uranus, and then to come back and adjust this assumption after
he had completed his calculations were it necessary to do so. From
this point on Adams simply needed to right out a serious of equations
relating the predicted position at each date to the actual position it was
found at. Adams wrote 21 such equations, one for every third year for
which he had observational data, and solved each of these equations
one at a time. Alone these were not enough to give the characteristics
of the perturbing planet, but each narrowed down the possibilities
further and further until after all 21 equations had been solved Adams
had a very accurate model describing the characteristics of this new
planet. Adams then calculated the effect this new planet would have on
Uranus’ orbit and found that it accounted for all of the discrepancies
between Uranus’ actual and predicted orbits. Finally, Adams calculated
the longitudinal position he expected to find Uranus at and gave this
data to the Royal Astronomer in order to confirm the existence of this
new planet there, and indeed the new planet was observed to be within
a few arcseconds of Adams’ prediction[77].
At the same time in France, Le Verrier was going through the same
processes and calculations and came to an almost identical result by
which astronomers in France independently discovered this new planet
at almost the exact same time that this was being done in England.
16.4 perturbation
It is important to examine how exactly Neptune’s gravitational field

perturbed the orbit of Uranus. First of all, the gravitational interaction
between Neptune and Uranus obviously puller Uranus further from the
Sun and Neptune closer to it. Also, as a result of this increased radius
at certain points in Uranus’ orbit the average distance of Uranus from
128 robert deegan: the discovery of neptune
Figure 16.2: Graviational Perturbation of Uranus by Neptune[18]
the Sun was increased. By Kepler’s third law, P2 = 4π 2 a3 /G ( M + m),

we know that as a result of this period of revolution of Uranus was
increased slightly, and obviously the opposite was true for Neptune[19].
However, neither of these was the most drastic or noticeable effect;
since Uranus is closer to the Sun than Neptune it obviously traversed
its orbit faster than Neptune and so there were points where Neptune
was slightly in front of Uranus and points where it was slightly behind
it. In the case of the first situation Neptune’s gravitational attraction
would pull Uranus towards it effectively speeding up Uranus’ motion
through its orbit and pulling it ahead of the predicted location. The
opposite was true when Neptune was behind Uranus, it then pulled
it back and slowed its orbit. This effect is what caused the discrepan-
cies in Uranus’ longitude that astronomers noticed, as Neptune and
Uranus got closer together in the early 1800’s the effect became more
pronounced since the gravitational force between them was increas-
ing, and this is what caused astronomers to finally throw out their
current predictions of Uranus’ orbit and try to determine what what
was causing these discrepancies.
16.5 methods and modern approaches
The method Adams used to predict the characteristics of Neptune was

an incredibly tedious one and at the time was thought to be the only
possible way to solve the problem. Since then however, another possible
way to solve this problem has been postulated. This process involves
looking at the problem as a three-body system, in which we examine the
mutual gravitational interactions of the three bodies involved, the Sun,
Neptune, and Uranus. This problem is certainly not a trivial one, and
though numerous particular solutions to this problem have been found
there is still no general solution, and many believe such a solution
is actually impossible as this problem involves chaotic behavior[21].
An attempt at a particular solution to this problem for the case of
the Neptune-Uranus-Sun system is beyond the level of this book and
so we shall not attempt it here. It is significant to note though that
despite numerous advances in the field of celestial mechanics since the
time of Adams and Le Verrier, the method they used is still the only
16.6 practice questions 129
practical one for solving this problem. As mentioned in this section

there are other possible ways to solve this problem but they are far
more complicated and no more accurate, so if in the present day there
was a need to find the cause of some unexplained perturbations the
same method that Adams and Le Verrier followed would probably still
be used(though it would be much faster and more accurate thanks to
use of modern computers).
16.6 practice questions
1. Astronomers noted that Uranus was not following its predicted

elliptical orbit, but rather traversing a different ellipse. How much
data is required to determine this, i.e. what is required to charac-
terize an ellipse?
2. Assume that the gravitational pull of Neptune increased Uranus’

average distance from the sun by 3 × 1012 m, how much would
this increase Uranus’ period by (in years)?
16.7 answers to practice questions
1. An elliptical orbit is mathematically characterized by six numbers:

The average distance between the planet and the sun, the eccen-
tricity of the ellipse, three angles to determine the orientation of
the ellipse in space, and a point in time at which the planet is at a
particular point in the orbit. The average distance of Uranus was
already known to astronomers, and the eccentricity of the orbit
is determined from the length of the semi-major and semi-minor
axises. It requires two reasonably separated data points to deter-
mine these axises. In order to determine the orientation of the
orbit in space requires three data points as one would expect, and
again some separation is required between these points. Since the
points used to determine the eccentricity and the orientation of the
orbit need not be separate, we can thus conclude that along with
the knowledge of the average distance to the sun(the semi-major
axis), it requires only three reasonably separated observations to
determine the orbit of a planet.
2. We can apply Kepler’s 3rd Law here to find what effect this
increase in orbital distance will have on the period:
4π 2 ∗ a3
P2 = G ( M+m)
a3 = (3 × 1012 )3
= 2.7 × 1037 m3
G = 6.67 × 10 s−2 m3 kg−1
− 11
M = 1.99 × 1030 kg
m = 1.02 × 1026 kg
1.066×1039 m3
P2 = 1.327×1020 s−2 ∗m3
= 8.03 × 1018 s2
P= 2.83 × 109 s = 89.8 years
A L E X K I R I A K O P O U L O S : W H I T E D WA R F S 17
17.1 introduction
hite Dwarfs are class D stars based on the Morgan-Keenan spec-

W tral classification. They fall under extended spectral types and
can further be differentiated by DA (H rich), DB (He I rich), DO (He
II rich), DQ (C rich), DX (indeterminate), DZ (Metal rich), and DC (no
strong spectral lines). The additional letter indicates the presence of
different spectral lines which indicate what elements are present in the
atmosphere and do not correspond to other star classes.
They are characterized by low mass about one solar mass [73], small
sizes with characteristic radii of 5000 km [73], and no nuclear fusion
reaction in the core. White dwarfs, neutron stars, and black holes are
fall under a caterogr of compact objects. These compacts objects do not
burn nuclear fuel and have small radii. White dwarfs are thought to
be in late evolutionary stages of less massive stars and from lack of
nuclear reactions in the core are radiating their residual thermal energy
slowly over time.
Sirius B a typical white dwarf star has a solar mass of about 0.75 M
to 0.95 M and a radius of about 4700 km [73]. This corresponds to a
planetary sized star with a mass nearing that of the Sun. Because of
their small radii white dwarfs have higher effective temperatures than
other stars and hence appear whiter. The luminosity varies as R2 T 4 for
a black body. [73].
This make white dwarfs extremely dense stars with mean densities
of 106 g/cm3 , 1, 000, 000 times great than the sun. All four fundamental
forces are actively involved in the dynamics of these stars [73]. The
tremendous density values create immense gravitational forces that
pull the star together. This creates an electron degeneracy pressure
that supports the star from gravitational collapse. According to the
uncertainty principal if the electrons positions are all well define then
they have a correspondingly high uncertainty in their momentum. Even
if the temperature were to be zero there would still be electrons moving
about.
When pressure due to the confinement of the matter exceeds that of
thermally contributed pressure, the electrons are referred to as degen-
erate. Black holes however are completely collapsed stars that had no
means of creating a pressure great enough to support against gravita-
tional collapse.
Simplifying this problem into the particle in the box with dimensions
L x , Ly , and Lz and setting the potential to be zero in the box and
2
h̄
infinite outside the box the Schrödinger equation reads − 2m ∇2 ψ = Eψ.
ψ factors into 3 functions of X ( x ), Y (y), and Z (z) and yields three
second order ordinary differential equations, one for each of the three
functions. Solving them and setting
p
2mEx,y,z
k x,y,z = (17.1)
h̄
131
132 alex kiriakopoulos: white dwarfs
and taking into consideration the boundary conditions
k x,y,z L x,y,z = n x,y,z π. (17.2)
The allowed energies of this wave function are
h̄2 k2
Enx,y,z = (17.3)
2m
where k is the magnitude of the wave vector k and each state in this
3
system occupies a volume πV for n = 1. Since the electrons behave as
identical fermions and hence are subject to the Pauli exclusion principle
only two can occupy any given state. Therefore they fill only one octant
of a sphere of k-space [37]
1 4 3 Nq π 3
( πk F ) = ( ) (17.4)
8 3 2 V
1
where k3F = (3ρπ 2 ) 3 is the radius determined by the required volume,
π3
V , the electrons must take, N is the total number of electrons, and q is
the number of free electrons each atom contributes.
Nq
Then ρ ≡ V is the free electron density. The Fermi surface here is
the boundary separating the occupied and unoccupied states in k-space.
The corresponding energy is called the Fermi energy the energy of the
highest occupied quantum state in this system of electrons. The Fermi
h̄2 2 23
energy is EF = 2m (3ρπ ) , and this is the energy for a free electron
gas.
17.2 the total energy
To calculate the total energy in the gas considered a shell of thickness

dk which contains a volume 18 (4πk2 )dk with the number of electrons
h̄2 k2 V 2
occupying that shell to be πV2 k2 dk. Each state carries energy 2mπ 2
k dk.
Thus the total energy must be
5
h̄2 (3π 2 Nq) 3 − 2
Etotalelectron = V 3 (17.5)
10π 2 m
This energy is analogous to the internal thermal energy of ordinary
gas and exerts a pressure on the boundaries if it expands by a small
amount [37], dV.
dE = − 23 E dV
V resembles that of the work done by this electron pres-
sure,
2
2E (3π 2 ) 3 h̄2 5
P= = ρ3 (17.6)
3V 5m
Evidently the pressure depends on the free electron density. It is
when this pressure equals the gravitational pressure from the mutual
attraction of the ensemble, do white dwarf conditions occur. This is
the fundamental difference between white dwarfs and "‘normal"’ stars.
Stars usually have cores that sustain nuclear fusion of hydrogen into
helium that counteract the gravitational pressure. The white dwarf
17.3 question 133
sustains itself however through the electron degeneracy pressure. This

is a quantum mechanical result. This is why a cold object would not
simply collapse after being continusouly cooled and allows the white
dwarf to exist.
Expressing the total energy as the sum and the total electron energy
2
plus the gravitational energy of a uniformly dense sphere, U = 35 GM r ,
and setting the equation to zero,
Etotal = Etotalelectron + U = 0 (17.7)
reveals the radius function for this minimization is,

5
9π 2 h̄2 q 3
R=( )3 1
(17.8)
4 2
GmMnuclei N3
The equation reveals a fundamental nature of these compact objects;

as the mass increases the radius decreases in this case we would be
increase N, the number of nucleons. This is how objects such as black
holes or neutrons stars which, are other compact objects like white
dwarfs, can contain so much mass into a smaller and smaller radius
yielding higher densities.
Energy transport within the star occurs via conduction because the
particle’s mean free paths are increased due to density. Particles then
find it difficult to collide since all the lower states are filled. The coffefi-
cent of thermal conductivity becomes very large and the interiors are
not far from being isothermal making the core much hotter. A surface
temperature of 8000 K will have a core tmeperature of 5, 000, 000 K [80].
The stars itself is composed mainly of carbon and oxygen but high
gravitational forces separates these elements from lighter elements
which are found on the surface. Spectroscopy techinques reveal atmo-
sphere to be compose of either helium or hydrogen dominate. The
atmosphere may however contain other elements in some cases such as
carbon and metals.
Magnetic fields in white dwarfs are thought to be due to conservation
of total surface magnetic flux. A larger progenitor star generating a
magnetic field at one radius will produce a much stronger magnetic
field once its radius has decreased according to conservation of mag-
netic flux. This explains the magnetic fields on the order of millions of
guass in white dwarf stars. The strength of the field is calculated by
observing the emission of circularly polarized light.
White dwarfs once formed are stable and cool continuously until it
can no longer emit heat or light. Once this happen the white dwarf is
referred to as a black dwarf. No black dwarfs are not thought to exist
however since the time required for a white dwarf to become a black
dwarf is longer then the age of the universe [80].
17.3 question
h̄2 k2 V 2
dE = k dk (17.9)
2mπ 2
and
5
h̄2 (3π 2 Nq) 3 − 2
Etotal electron = V 3 (17.10)
10π 2 m
134 alex kiriakopoulos: white dwarfs
What is the integral of the above derivative. What is it? What are the
limits of integration? What does the upper limit represent?
Answer:
h̄2 V
Z kF
Etotal electron = k4 dk (17.11)
2π 2 m 0
The limits are 0 to k F the radius of a sphere of k-space. The k-space

radius, k x,y,z depends on the energy levels, n x,y,z .
D A N I E L R O G E R S : S U P E R N O VA E A N D T H E
P R O G E N I TO R T H E O RY
18
18.1 introduction
he universe started with a big bang. The temperatures were so

T high in the minutes after this event that fusion reactions occurred.
This resulted in the formation of elements such as hydrogen, deuterium,
helium, lithium, and even small amounts of beryllium.[6] This is known
as Big Bang Nucleosynthesis, and nicely explains the presence of these
lighter elements in the universe. However, the brevity of this process
is believed to have prevented elements heavier than beryllium from
forming.[53] So what is the origin of oxygen, carbon, nitrogen, and the
many other heavy elements known to man? And how do we explain
the significant abundances of these elements in our solar system?
18.2 creation of heavy elements
Nuclear fusion is the process by which multiple atomic nuclei join

together to form a heavier nucleus.[61] As explained before, this was
widespread just after the Big Bang. The result of this process is the
release of considerable amounts of energy; the resultant nucleus is
smaller in mass than the sum of the original nuclei, and the difference
in mass is converted into energy by Einstein’s equation, E = mc2 .[53]
Nuclear fusion also occurs in the cores of stars and is the source of
their thermal energy. In general, large stars have higher core tempera-
tures than small star because they experience higher internal pressures
due to the effects of gravity.[53] Thus, a star’s mass determines what
type of nucleosynthesis can occur in its core.
In stars less massive then our sun, the dominant fusion process is
proton-proton fusion. This converts hydrogen to helium. In stars with
masses between one and eight solar masses (we define our sun as one
solar mass), the carbon cycle fusion process takes place.[53] This con-
verts helium into oxygen and carbon once hydrogen is depleted within
the star. In very massive stars (greater than eight solar masses), carbon
Figure 18.1: Elements are produced at different depths within a star. This
illustrates the elements that are produced in massive stars (not to scale). Notice
the iron core.[39]
135
136 daniel rogers: supernovae and the progenitor theory
and oxygen can be further fused into neon, sodium, magnesium, sulfur
and silicon. Later reactions transform these elements into calcium, iron,
nickel, chromium, copper, etc. In a supernova event, neutron capture
reactions lead to the formation of elements heavier than iron.[53] Thus,
we see that all heavy elements are formed in the cores of stars at various
points in their lives as they burn through their thermonuclear fuel. In
general, the mass of a star can be used to determine what elements are
formed and the abundances that are produced.
example 18.1: conversion in the sun

How much hydrogen is converted to helium each second in the sun? Use the
fact that the sun’s luminosity is 3.8 × 1026 W, and that 0.7% of the hydrogen
mass becomes energy during the fusion process.
Solution: We know that the sun produces 3.8 × 1026 W, and so 3.8 × 1026 J
are emitted each second. Now, simply use Einstein’s equation.
E 3.8 × 1026 J
E = mc2 ⇒ m = = = 4.2 × 109 kg (18.1)
c2 (3 × 108 ms )2
This is the mass converted to energy in the sun each second. We know that
this mass is only 0.7% of the mass of the hydrogen that goes into the fusion
process.
4.2 × 109 kg
MH = = 6.0 × 1011 kg (18.2)
0.007
So we see that the sun fuses about 600 billion kg of hydrogen each second,
though about 4 billion kg are converted into energy. The remaining 596 billion
kg becomes helium.[6]
18.3 dispersal of heavy elements
A star experiences a constant struggle against collapse due to the

gravitational force of its own mass. Throughout the main sequence of
its life, it is able to resist gravity with thermal pressure, as the fusion of
elements in its core heats up the star’s interior gas. The hot gas expands,
exerting an outward pressure that balances the inward force of gravity.
The life of a star ends when it is completely depleted of thermonuclear
fuel, and gravity is able to overcome this outward thermal pressure.[53]
The life span of a star and its final state are determined by the
mass of the star. Large stars generally live shorter lives than small
stars; although they have more fuel for nuclear reactions, their rate of
consumption is much greater. When a relatively small star runs out of
fuel, it collapses because of gravity and becomes a white dwarf. At this
point, the only outward pressure is due to electron degeneracy. Since
white dwarfs succumb entirely to gravity and never explode outward,
the elements formed in their cores are never ejected into space.[6]
Stars that are large, however, experience different effects due to
greater gravitational forces during collapse. As a massive star begins
to run low on hydrogen fuel, the iron it produces piles up in its core.
Iron has the lowest mass per nuclear particle of all nuclei and therefore
cannot release energy by fusion. Once all the matter in the core turns to
iron, it can no longer generate any energy.[6] This marks the beginning
of collapse. As massive stars collapse, reactions take place in which
electrons and protons are forced together with such great amounts of
18.3 dispersal of heavy elements 137
Figure 18.2: (a) The layered shells of elements in a massive star. (b) Collapse
begins when fuel is depleted. (c) As gravity takes over, the star shrinks signifi-
cantly. (d) The red area experiences enormous outward forces as hot gases pile
up on the degenerate core. (e) The gases are ejected outward at high speeds. (f)
All that remains is the degenerate neutron core.[39]
force that they merge to become neutrons. Quantum mechanics restricts

the number of neutrons that can have low energy, as each neutron
must occupy its own energy state. When neutrons are tightly packed
together, as they are in this case, the number of available low energy
states is small and many neutrons are forced into high energy states.
The resulting neutron degeneracy pressure quickly stops gravitational
collapse and the matter in the star is subjected to an enormous outward
force. With gravitational collapse halted suddenly, the outer layers of
gas bounce back upon hitting the degenerate core like a large wave
hitting a sea wall.[53] The violent explosion that follows is known as a
supernova event.
Most of the energy of a supernova explosion is released in the form
of energetic neutrinos. It is this energy that initiates the formation of
elements heavier than iron, as described before. The remaining energy
is released as kinetic energy in the ejected matter. The shock wave sends
the ejected material outward at speeds of over 10,000 km/s.[50] All that
remains is the sphere of tightly packed neutrons, called a neutron star.
If the original star was massive enough, the remaining neutron star
may be so large that gravity also overcomes the neutron degeneracy
pressure and the core continues to collapse into a black hole. Otherwise,
it becomes nothing more than the corpse of a star that has depleted its
fuel supply.[53]
The expanding cloud of debris from the supernova explosion is
known as a supernova remnant. The ejected gases slowly cool and
fade in brightness, but they continue to move outward at high speeds.
Carried with this debris is the variety of heavy elements produced in
the core of the star, as well as those created by the collision of high
energy neutrons during the supernova event.
example 18.2: stellar equilibrium
To maintain equilibrium, a star’s outward thermal pressure must balance
inward gravitational forces. This results in enormous pressure at its core.
How does the gas pressure in the core of the sun compare to the pressure of
Earth’s atmosphere at sea level? The sun’s core contains about 1026 particles
per cubic centimeter at a temperature of 15 million K. At sea level on Earth, the
atmosphere contains 2.4 × 1019 particles per cubic centimeter at a temperature
of about 300 K.
Solution: All we need to do here is apply the ideal gas law.
part
Psun nsun kTsun (1 × 1026 cm3 )(1.5 × 107 K)
= = part = 2 × 1011 (18.3)
PEarth nEarth kTEarth (2.4 × 1019 cm3 )(300 K)
The sun’s core pressure is about 200 billion times greater than the atmospheric
pressure on Earth at sea level.[6]
Figure 18.3: An artist’s depiction of a solar nebula receiving supernova

ejecta.[81]
18.4 progenitor theory
Now consider if part of this expelled cloud of isotopes (the ejecta) were
to fall into the gravitational field of a neighboring star that is in the
early stages of its formation. At this point the young star is surrounded
by what is known as a protoplanetary disk, which is the disk of gas
and dust from which planets and asteroids form. If the ejecta from
the supernova event collided and blended with this disk, it would
contribute to its chemical composition.[83]
This is one of the leading theories that explains the presence of
heavier elements in our own solar system. This is obviously impossible
to prove by experiment, but fortunately there are ways to measure the
probability that this process occurred by studying chemical properties
of our solar system.
One such property is the concentration of the stable isotope 60 Ni.
This has been detected and measured in meteorites that have been
untouched since the formation of our solar system. There is also a
correlation between the amounts of 60 Ni and another stable isotope,
56 Fe, in these meteorites.[65] The laws of chemistry predict that the
unstable isotope 60 Fe will decay into 60 Ni with a half-life of 1.5 million

years. Since our solar system is 4.6 billion years old, it is theorized that
it was initially 60 Fe that was present in the meteorites.[50] It is known
that 60 Fe is one of the isotopes that massive stars form in their cores.[51]
Therefore, 60 Ni might serve as an indicator as to how much supernova
ejecta our solar system collected during its formation.
18.5 a mathematical model
This section will work through a simplified mathematical model of the

progenitor theory. In the end, we will find an expression relating the
radius of a young solar system with some of its chemical properties
and distance from a likely supernova. Remember that there are more
factors that real astronomers take into account, but this is a general
idea of how they hope to prove the validity of the progenitor theory.
We start with defining several variables. Let R be the radius of our
solar system as it was during its formation 4.6 billion years ago. We
will then call the area of our solar system AR = πR2 . Now we define r
as the radius of the supernova remnant; here, we want r equal to the
distance between the supernova and our solar system. We will assume
that the material ejected from the supernova is uniformly distributed
18.6 conclusion 139
over a sphere that expands as the ejecta moves outward. Thus we will
define Ar = 4πr2 . Now let Mr60 be the total mass of 60 Fe ejected from
the supernova, and MR60 be the total mass of 60 Fe injected into our solar
system. The amount of ejecta that a solar nebula can receive is inversely
proportional to its square distance from a supernova explosion.[50]
Mr60 AR
MR60 = (18.4)
Ar
Now let M be the total mass of our solar system, and MR56 be the
amount of 56 Fe in our solar system per unit mass. Multiplying them
together gives the total amount of 56 Fe in our solar system. Then we
can define a relationship for the ratio of 60 Fe to 56 Fe injected into our
solar system.[50]
60 Fe MR60
56 Fe
= (18.5)
( MR56 M)
Since it is impossible to know exactly how much 60 Fe was initially

present in our solar system, we can use an estimated ratio between 60 Fe
and 56 Fe based on the 60 Ni meteoritic evidence mentioned before. This
ratio has been determined to be on the order of 10−7 based on studies
of the meteorites, though its precise value is still being investigated.[79]
It must be noted that there is a relatively small window of time during
the formation of a solar system in which supernova ejecta is optimally
received. This window is on the scale of a few million years for a star
the size of our sun. Thus, we need to assume that the progenitor star
has a lifetime on this scale to increase the probability of it becoming
a supernova during the necessary time period. Stars that are around
60 solar masses have short lifetimes of about 3.8 million years, making
a star of this size a good candidate.[50] As discussed previously, the
amount of iron produced in the interior of a star throughout its lifetime
can be determined from its mass. For a 60 solar mass progenitor star,
we can estimate the amount of 60 Fe produced to be about 0.0002512
solar masses. This was taken from research by Marco Limongi and
Alessandro Chieffi in 2006.[51]
We can use estimates for the other values in the equations above
as well. For instance, we know that the mass of a protoplanetary disk
around a solar mass sized star is about 0.01 solar masses.[50] Also, the
iron in our solar system is thought to comprise roughly 0.014% of the
mass of the entire system. An estimated 91.57% of this iron consists of
the isotope 56 Fe. This means that the amount of 56 Fe in the solar system
comprises 0.01282% of its mass.[50]
By combining the two equations above, it is possible to find the
minimum radius of a solar nebula for it to have received the appropriate
amount of the isotopes from a supernova.[50]
s
60 Fe
2
56 Fe 4r ( MR56 M )
R= (18.6)
Mr60
18.6 conclusion
Calculating the necessary radius for the protoplanetary disk of a solar

mass star using known properties of our solar system suggests how
probable it is that a supernova event played a role in its formation.
Protoplanetary disks rarely exceed a few hundred astronomical units

(AU) and have never been known to stretch beyond 1,000 AU. The
farthest we have observed a body orbiting our sun is roughly 47 AU,
indicating that it may have been reduced to a few tens of AU before
planet formation.[50] Thus, if these calculations yield radii on this scale
across a wide range of values for r (again, the distance between the
supernova event and our sun), then this will provide support for the
progenitor theory of supernova injected isotopes.
There are, of course, many factors that determined our sun’s distance
from supernova events. These include the size of the star cluster in
which it formed and the ratio of the total potential and kinetic energies
in that star cluster (known as the virial ratio). As these properties
change, the average distances between solar mass stars and likely
supernovae do as well. This is significant since stars that are further
from supernova events will receive fewer ejecta.[50]
There is still one very important assumption that has been made as
well. By calculating the radius we assume that the protoplanetary disk
was face-on relative to the supernova event (to maximize the flux of the
ejecta through the disk). Very little ejecta would be received by a disk if
it were facing a progenitor edge-on.[50]
These shortcomings are not grounds for abandoning this study, how-
ever, as this is still a likely explanation for the presence of heavy
elements in our solar system. The results of these calculations are sure
to bring us closer to the truth as mathematical techniques and computer
programs evolve over the years to come.
18.7 problems
1. Homework Problem 1
Question: Use the result of Example Problem 1. How many times
does the proton-proton fusion reaction occur each second in the
sun? The mass of a proton is 1.6726 × 10−27 kg; hydrogen is
composed of one proton, and helium is composed of four. The
mass of a helium nucleus is 6.643 × 10−27 kg.
Solution: Here, fusion converts four hydrogen nuclei (protons)
into one helium nucleus. Four protons have a mass of 6.690 ×
10−27 . When four protons fuse to make one helium nucleus, the
amount of mass that disappears and becomes energy is as follows.
6.690 × 10−27 kg − 6.643 × 10−27 kg = 0.047 × 10−27 kg (18.7)
From Example Problem 1 we know that the sun converts a total

of 4.2 × 109 kg of mass into energy each second.
kg
mass lost per second 4.2 × 109 s
=
mass lost in each reaction 0.047 × 10−27 kg (18.8)
reactions
= 8.9 × 1037
s
Thus, nearly 1038 fusion reactions occur in the sun each second.[6]
2. Homework Problem 2
18.7 problems 141
Question: Explain why the ratio of 60 Fe to 56 Fe is used in the

progenitor theory, rather than the amount of 60 Fe.
Solution: Since 60 Fe has a half-life of 1.5 million years, nearly all
of it that was present during the formation of our solar system
has decayed into 60 Ni. Scientists have been able to determine a
ratio of 60 Ni to 56 Fe from samples of meteorites. These meteorites
are indicative of the chemical composition of the early solar sys-
tem. Scientists have also been able to roughly determine the total
amount of 56 Fe present today, since it has not decayed. Multiply-
ing the ratio of 60 Ni to 56 Fe by the total amount of 56 Fe gives an
estimate of the total amount of 60 Fe injected into our solar system
during its formation.
3. Test Problem 1
Question: Which of these elements had to be made in a supernova
explosion? (a) calcium (b) oxygen (c) uranium
Solution: (c) uranium
4. Test Problem 2
Question: Suppose there is a supernova explosion at some dis-
tance from a young solar system. If that distance is doubled, by
what factor would the radius of the solar system need to be mul-
tiplied in order to receive the same amount of ejecta? (a) 0.5 (b) 2
(c) 4
Solution: (b) 2
D AV I D PA R K E R : T H E E Q U I VA L E N C E P R I N C I P L E 19
19.1 introduction
he basic idea of the equivalence principle is to “assume the com-

T plete physical equivalence of a gravitational field and a correspond-
ing acceleration of the reference system.”[26] This means that being
at rest on the surface of the Earth is exactly equivalent to being in an
accelerating reference frame free of any gravitational fields. This idea,
when originally developed by Einstein in 1907, was the beginning of
Einstein’s search for for a relativistic theory of gravity .
There are three different ways to interpret the equivalence principle
each of which allow or disallow different theories of gravity; Currently
the only theory of gravity to satisfy all three is general relativity , this
is part of what makes GR peerlessly elegant.
Bear in mind that most discussion of the equivalence principle is in
attempts to disprove or provide limitations on it in order to support
an alternate theory of gravity. Because of this much of the discussion
requires you to check you intuitive understanding of the universe even
more so than GR.
19.2 weak? strong? einstein?
The weak equivalence principle states that “All test particles at the
same spacetime point in a given gravitational field will undergo the
same acceleration, independent of their properties, including their
rest mass.”[92] This form of the equivalence principle is very similiar
to Einstein’s original statement on the subject, but is referred to as
“weak” because of the extent to which Einstein’s conceptualization of
the equivalence principle matured. The other two interpretations of the
equivalence principle use the weak equivalence principle as a starting
point, assuming its truth. Very few, if any, theories of gravity contradict
the weak equivalence principle.
The Einstein equivalence principle states that the result of a local non-
gravitational experiment in an inertial reference frame is independent
of the velocity of location of the experiment. This variation is basically
an extension of the Copernican idea that masses will behave exactly
the same anywhere in the universe. It grows out of the postulates of
special relativity and requires that all dimensionless physical values
remain constant everywhere in the universe.
The strong equivalence principle states that the results of any local
experiment in an inertial reference frame are independent of where and
when it is conducted. The important difference between this variation
and the first two, weaker, variations is that this is the only variation
that accounts for self-gravitating objects. That is, objects that are so
massive as to have internal gravitational interactions. Therefore, this
is an extremely important idea because of the extreme importance of
self-gravitating bodies, e.g. stars, to our understanding of the universe.
143
144 david parker: the equivalence principle
Figure 19.1: Bending starlight in a space-time diagram [87]
19.3 consequences
The easiest and most drastic consequence of the equivalence principle

is that light will bend in a gravitational field. Imagine an elevator freely
falling in a gravitational field with a laser on one wall. If the laser shines
then, according to the laws of relativity, the light will hit the point on
the wall that is directly opposite the laser. an observer in the elevator
will correctly assert that the light traveled a straight line. However, from
the perspective of an observer on the surface of the Earth, or whatever
body is causing the gravitational field, the light will still hit the point
on the wall directly opposite the laser, but because of teh elevator’s
downward motion the light will have followed a parabolic path. This
was how Einstein originally described the phenomenon of light bending
in a gravitational field, and this is still the best way to describe it.
A sharp student may point out that if you continued this thought
experiment by having countless elevators falling side by side with
windows to allow the laser to shine through them all that the light
would actually bend at a rate twice that of a normal object falling in
a gravitational field because the elevators are all accelerating radially
rather than in the same direction. And with this student I would have
to agree, but Einstein got there first.
19.4 example
1. Imagine that an elevator is falling freely in Earth’s gravitional

field with a laser mounted in the ceiling pointing directly at the
floor. When the laser shines would an observer at the floor of the
19.5 problems 145
elevator see the light as doppler shifted? Would an observer on

the surface of the Earth?
Answer No the light would not appear redshifted to the observer
inside of the elevator, from his perspective he is in a completely
motionless box that is free from any external fields. However for
the observer on the surface of the Earth the light from the laser
would of course be blue-shifted because of its descent into the
gravitational field, or conversely, because of the acceleration of
the observer up to it.
2. Are there any constraints on the equivalence principle, and if so,

what are they?
Answer There are constraints, and they actually change the nature
of thought about the equivalence principle greatly. The equiva-
lence principle is only valid in completely flat space, or a homoge-
nous gravitational field. However, since there is no place in the
universe where we will find either of those things the equivalence
principle can only actually be applied in a infinitesimally small
section of space-time.
19.5 problems
1. An elevator on Earth is accelerating upwards at 6.6 m/sec2 . How

long will it take a rock of 0.2 kilograms to hit the floor of the
elevator if it is dropped from a height of 1.5 meters?
Answer 1.5m = f rac12(6.6m/sec2 + 9.8m/sec2 )t2 t = .43sec
2. If granite was found to fall at a different rate than water what

consequences would this have for the principle of equivalence?
Answer This is a direct violation of the WEP and therefore would
invalidate all of GR and most other theories of gravity in one fell
swoop.
3. If the speed of light was measured in an area of flat space, and

then measured again on the surface of the Earth, discounting
atmosphere, in which reference frame would the speed of light be
faster? Answer They would both be the same. The speed of light
in a vacuum is a constant in all reference frames. The equivalence
principle combined with this fact, allows us to predict the bending
of light in a gravitational field.
RICHARD RINES: THE FIFTH INTERACTION 20
20.1 introduction: the ‘four’ forces
urrent models of this universe explain all interaction in terms of

C four fundamental forces. These forces (gravity, electromagnetism,
and the strong and weak nuclear forces), are all defined by unique
“sources," or properties of individual particles that determine their
attraction or repulsion in terms of the specific force. For example,
observe the known electric potential for a pair of particles:
1 q1 q2
Ue (r ) = − (20.1)
4πe0 r
Here it can be seen that the only properties of the matter that determine
the electrical potential are q1 and q2 , the respective charges of the
particles. Thus, the “source" of the electric portion of the electromagnetic
force is the value of charge. In the case of (non-relativistic) gravity,
only the gravitational mass of each particles acts as the source for the
potential:
m m2
Ug (r ) = − G 1 , (20.2)
r
where G is Newton’s gravitational constant1 . Notice that in both of
these cases, the potential falls off as the separation r increases, but has
an infinite range. In the case of the nuclear forces, this is not the case:
their ranges are finite.2
example 20.1: the resultant force
Forces are often described by their potential. Once such a potential is de-
scribed, however, it is simple to determine the force experienced by a system
of two particles:
d
F12 = − U (r ), (20.3)
dr
where F12 is the force experienced by one particle away from the other. In the
case of gravity:
d m m m m
F12 = − −G 1 2 = −G 12 2 , (20.4)
dr r r
the negative sign implying the particles move toward one another.
In the early 1980s, some irregularities in experimental data prompted

many to ponder the existence of another, fifth primary force, with a
potential determined by particle properties unique from these four
known forces.
20.2 the beginning: testing weak equivalence
A basis of both Sir Isaac Newton’s Law of Universal Gravitation and

Albert Einstein’s General Theory of Relativity, the weak equivalence prin-
3
1 Currently accepted to be 6.673 · 10−11 kgm·s2
2 Roughly 10−15 m for the strong force and 10−18 m for the weak force
147
148 richard rines: the fifth interaction
ciple is both elegant in its simplicity and profound in its implications.

Put simply, the principle states that:
“All test particles at the same spacetime point in a given gravitational field
will undergo the same acceleration, independent of their properties, including
their rest mass." [91]
More subtly, this implies that the inertial mass of an object (the propor-
tionality between the force acting on the object and its acceleration) is
exactly equal to its gravitational mass (the proportionality between the
force an object experiences in a gravitational field and the strength of
that field). This point is readily seen in observing Newton’s gravita-
tional potential (equation (20.4)).
Late in the 19th and early in the 20th century, Physicist Lorand Eötvös
set out to verify this equivalence. Between 1885 and 1909, he devised,
implemented, and refined an experiment which evidenced the equiva-
lence to a much higher degree of accuracy than had been previously
shown. Put simply, his procedure involved hanging different types of
masses in a balance along a solid rod. Torque would then be applied
to the rod from two sources: differences in gravitational force on the
two masses (a measurement of the gravitational mass of the masses),
and differences in the centrifugal force each mass experiences (a mea-
surement of the inertial mass). Only if these two masses are equivalent
would the rod remain perfectly stationary.
As with any physical procedure, experimental uncertainty and imper-
fections plagued Eötvös’s results. These were originally averaged out to
provide fairly precise results favor of weak equivalence. As subsequent
tests of the weak equivalence principle, using slightly different exper-
imental procedure, did not observe such uncertainty, the variations
in Eötvös’s results were seen as the result of imprecise experimental
process.
More recently, however, physicists such as Ephraim Fischbach began
to reexamine these irregularities in terms of a possible fifth interaction
between elementary particles.
20.3 a new force
Well after the time of Eötvös, physicists were struggling to explain

two, seemingly unconnected phenomena. The first involved certain
violations of CP-symmetry seen in the decay of K0L mesons [34]. Such
decomposition had been observed, but was not expected by any current
theory or model of elementary particles.
The second was an uncertainty in the gravitational potential: the
value of the gravitational constant G has much higher experimental
uncertainty than any other physical constant. Recently, experiments
have been conducted providing a range of inconsistent results from
−0.1% to +0.6% [1]. A group in based in Russia has measured values
with a 0.7% fluctuation based on time and position [1]. Furthermore,
measurements taken in mineshafts and submarines, though containing
a very large range of uncertainty, have consistently provided values that
are greater than the accepted number [34]. This inability to determine
precisely the value of G opened the door to possible modifications
to the gravitational potential, which could in turn, some physicists
argued, explain the aforementioned CP-violations. Such a modification
20.3 a new force 149
on the gravitational potential would then imply a possible fifth particle

interaction.
In this light, Fischbach explained the dissimilarities between Eötvös’s
results and that of his predecessors not as experimental imprecision,
but as the result of certain experimental differences between his exper-
iments and those of his predecessors. Eötvös’s experiment involved
relatively small acting distances for gravity (namely that between objects
on the surface of the Earth and Earth itself). However, many experi-
ments and measurements conducted by later experimentalists in order
to verify the weak equivalence principle relied only on much larger
distances (such as those between celestial bodies). Most relevantly, the
two subsequent repetitions of the experiment that were found to be
most in conflict with Eötvös relied on the attraction of Earthly objects
toward the sun [33].
To explain the relevance of this discrepancy, Fischbach postulated
that a new force could be modeled by the Yukawa potential, which had
been very effective in modeling the finite-range nuclear forces. In a 1986
paper, he proposed an additive term, in the form of a Yukawa potential,
to the gravitational potential to account for the effect of a fifth particle
interaction:
Gm1 m2
U (r ) = − 1 + αe−r/λ , (20.5)
r
where α and λ were to be determined experimentally. Clearly, this
term very quickly approaches unity as radius increases, making it only
relevant at closer distances. At these distances, the correction terms
acts equivalently to a new, rescaled value of the gravitational constant
G. The factor by which the value is scaled is very strongly dependent
upon the distance at which the gravity is acting, seemingly explaining
the existence of such a large range of measured values. Furthermore,
Fischbach found that with the correct physical parameters3 , this model
accurately predicted the varying published geophysical measurements
of G [33].
Motivated by the potential for modifications on the law of gravity
to explain known violations of CP-symmetries [34]4 , Fischbach plotted
variations in results between different masses against various elemen-
tary properties of the material used (properties which were unknown
at the time of Eötvös’s experiments). Such a pattern would imply a
compositionally-dependent deviation from the weak equivalence prin-
ciple, and therefore the existence of a new, yet unexplained force.
His results were promising: he found evidence of a linear relationship
between the error in Eötvös’s results and the difference in a function of
the baryon number of the particles in balance. The baryon number (B)
of a particle is defined as:
Nq − Nq̄
B= , (20.6)
3
where Nq is the number of quarks and Nq̄ the number of antiquarks.
Specifically, Fischbach observed the fractional difference (∆κ) between
the acceleration of the object and the acceleration of gravity and found
the following linear relationship:
3 Very roughly, α = −(7.2 ± 3.6) · 10−3 , λ = (200 ± 50) m

4 For a more technical discussion these violations as Fischbach’s (and others’) motivations,
see Franklin [34]
150 richard rines: the fifth interaction
Figure 20.1: The linear relationship observed by Fischbach

a B
∆κ = ∆ = α +β (20.7)
g µ
Here, a is the acceleration of the object, g is that of gravity, µ is the

atomic mass of the object, and α and β are constants5 This relationship,
according to Fischbach, proved Eötvös’s results to be compositionally
dependent, and this dependence was associated with the fifth funda-
mental force. This force would have a short-range Yukawa potential:
Gm1 m2 αe−r/λ
U5 (r ) = − , (20.8)
r
where α and λ are functions of the baryon numbers of m1 and m2.
20.4 the death of the force
Though Fischbach’s findings were initially quite persuasive, very little

else came of the fifth force. Of the fourteen subsequent published
experiments searching for a force couple to Baryon number, only two
had positive results, and these had not been effectively reproduced
[34]. Some of these results, such as those from Washington University’s
Eöt-Wash group (see figure 20.2), directly contradicted the linear Baryon-
coupling results shown by Fischbach. Though some research in the area
still exists, most have accepted that no evidence remains that such a fifth
particle interaction physically exists. In the 1990 Moriond workshop,
fewer than ten years after the beginning of the search, it was established
that no further experimentation was necessary to disregard the fifth
interaction hypothesis [34].
5 Fischbach found that, very roughly, α = (5.65 ± 0.71) · 10−6 and β = (4.83 ± 6.44) · 10−10 .
20.5 problems 151
Figure 20.2: New experiments by the Eöt-Wash group provide results contra-
dicting Fischbach’s assumption of linearity [34]
20.5 problems
1. Write an expression for the force experienced between two objects

as a result of the fifth interaction.
−r/λ −r/λ

Answer: F12 = − Gm1 m2 α e r2 + e λr
2. Assume the additive potential (equation (20.8)) accurately mod-

els the gravitational potential. Using the rough geophysical data
provided by Fischbach (α ≈ −7.2 · 10−3 , λ ≈ 200 m), write an
expression for the fraction by which a measured value of the
gravitational constant G on the surface of the Earth (r = 6.38 · 106
m) would differ from that at infinity.
1
Answer: −4
1−7.2e−3.19·10
3. Two objects of mass 1 kg, with the same composition as Earth, are
separated by a distance of 1 mm. What force do the experience as
a result of the fifth interaction? In what direction is this force?
Answer: F = 4.805 · 10−9 N, away from one another
4. Though the existence of a fifth force was eventually shown to lack

sufficient evidence, how could its investigation have been useful
to the physicists involved? For further insight, see Franklin [34] for
a fascinating discussion of this importance.
D O U G L A S H E R B E R T: T H E S C I E N C E O F T H E
A P O C A LY P S E
21
21.1 introduction
or 500,000 years Homo sapiens has roamed the Earth, building cities
F and creating languages. We’ve gone from stagecoaches to space
travel in the span of one human lifetime, and we’ve sent robotic scouts
to other planets. It’s difficult to imagine it all coming to an end. Yet 99
percent of all species that have ever lived have gone extinct, including
every one of our hominid ancestors. If humans survive for a long
enough time and spread throughout the galaxy, the total number of
people who will ever live might number in the trillions. It’s unlikely
that we would be among the first generation to see our descendants
colonize planets, but what are the odds that we would be the last
generation of Homo sapiens? By some estimates, the current rate of
extinctions is 10,000 times the average in the fossil record. We may be
worried about spotted owls and red squirrels, but the next statistic on
the list could be us.
21.2 asteroid impact
Space is filled with asteroids and comets which can pose a threat to
life on Earth. Fortunately Earth’s atmosphere protects us from the
thousands of pebble-sized and smaller asteroids - only weighing a few
grams - which strike earth every day at speeds of tens of kilometers per
second1 . At these high velocities, friction with the upper atmosphere
heats the meteoroids white-hot and causes immense deceleration forces.
These small meteoroids get destroyed by the heat and deceleration, and
are seen from Earth as shooting stars. However, some of the fragments,
especially those from iron meteoroids, will reach Earth’s surface. It is
estimated that 20 tons of meteorites reach Earth’s surface each day.
So, what if an asteroid hit Earth? Not just the dust that Earth collects
as it sweeps through space, but something serious, like “Armageddon”,
or “Deep Impact”? Earth and the moon are heavily cratered from
previous impacts, the most famous of which happened on Earth at the
end of the Cretaceous period, about 65 million years ago. Scientists
hypothesize that this impact was the cause of the End-Cretaceous
(K-T) extinction, in which eighty-five percent of all species on Earth
disappeared, making it the second largest mass extinction event in
geological history2 . Dr. Richard Muller in his book “Nemesis: The
Death Star” describes the event:
At the end of the Cretaceous period, the golden age of

dinosaurs, an asteroid or comet about 5 miles in diameter
(about the size of Mt. Everest) headed directly toward the
1 A meteor is an asteroid which breaches Earth’s atmosphere, a meteorite is one that strikes
Earth’s surface.
2 The Permian mass extinction occurred about 248 million years ago and was the greatest
mass extinction ever recorded in earth history; even larger than the better known K-T
extinction that felled the dinosaurs.
153
154 douglas herbert: the science of the apocalypse
Earth with a velocity of about 20 miles per second, more

than 10 times faster than our fastest bullets. Many such
large objects may have come close to the Earth, but this
was the one that finally hit. It hardly noticed the air as it
plunged through the atmosphere in a fraction of a second,
momentarily leaving a trail of vacuum behind it. It hit the
Earth with such force that it and the rock near it were
suddenly heated to a temperature of over a million degrees
Celsius, several hundred times hotter than the surface of the
sun. Asteroid, rock, and water (if it hit in the ocean) were
instantly vaporized. The energy released was greater than
that of 100 million megatons of TNT, 100 teratons, more than
10,000 times greater than the total U.S. and Soviet nuclear
arsenals.
Before a minute had passed, the expanding crater was 60
miles across and 20 miles deep. (It would soon grow even
larger.) Hot vaporized material from the impact had already
blasted its way out through most of the atmosphere to an
altitude of 15 miles. Material that a moment earlier had been
glowing plasma was beginning to cool and condense into
dust and rock that would be spread worldwide. The entire
Earth recoiled from the impact, but only a few hundred feet.
The length of the year changed by a few hundredths of a
second3 .
In 2028, the asteroid 1997 XF11 will come close to Earth but will miss
our planet by about two and a half lunar distances, that’s extremely
close, considering how big outer space is. If something was to change
it’s course, and it did hit Earth, what you would have is a 1.6 km (1 mile)
wide meteorite striking the planet’s surface at about 48,000 kph (30,000
mph). The energy released during the impact is related to the kinetic
energy of the asteroid before atmospheric entry begins. At typical solar
system impact speeds of 12 to 20 km/s, energy E is approximately
given as one half times the asteroid mass m times the square of the
asteroid velocity v , which can be rewritten in terms of the asteroid’s
density ρ and diameter l , assuming that the asteroid is approximately
spherical:
1 2 π 3 2
E= mv = ρl v (21.1)
2 12
A kilometer and a half wide asteroid traveling at 20 km/s has an
E roughly equal to a 1 million megaton bomb, that’s 10 million times
greater than the bomb that fell on Hiroshima. A land strike would
produce a fireball several miles wide which would briefly be as hot as
the surface of the sun, igniting anything in site, and it would blast tons
of sulfur-rich rock and dust high into the atmosphere, encircling the
globe. As the burning debris rained back down to earth, the soot and
dust would blacken the skies for months, if not years, to come. An ocean
landing would be no better, instantly vaporizing 700 cubic kilometers
(435 cubic miles) of water, and blasting a tower of steam several miles
into the atmosphere, again benighting the sky. The meteor itself would
3 The impact believed to have caused the extinction of the dinosaurs left a 300 km (186
mile) wide crater on the coast of Yucatán. The impactor had to have been at least 30 km
(19 miles) across.
21.3 errant black holes 155
likely crack Earth’s crust at the ocean floor, triggering massive global
earthquakes, sprouting volcanoes at weak spots in Earth’s crust, and
creating a tsunami as high as the ocean is deep, moving at hundreds of
kilometers an hour in every direction. Humans would likely survive,
but civilization would not.
The Kuiper belt is an asteroid zone just beyond the orbit Neptune,
and it contains roughly 100,000 ice balls that are each more than 80 and
a half kilometers (50 miles) in diameter. As Neptune’s orbit perturbs
the Kuiper belt, a steady rain of small comets are sent earthward. If one
of the big ones heads right for us, that would certainly be the end for
just about all higher forms of life on Earth.
21.3 errant black holes
The Milky Way galaxy is full of black holes, collapsed stars about 20 km
(12 miles) across. Just how full is hard to say, a black hole’s gravity is so
strong that anything approaching within a certain radius will no longer
be able to escape, including any light which would betray it’s presence,
this critical radius is called the “Schwarzchild radius”4 . The interior of
a sphere whose radius is the Schwarzchild radius is completely cut off
from the rest of the Universe, the only way to “see” a black hole is to
spot it’s gravitational lensing - the distortion of background light by
foreground matter, i.e., the black hole. Beyond the Schwarzchild radius,
the gravitational attraction of a black hole is indistinguishable from
that of an ordinary star of equal mass. Based on such observations,
and theoretical arguments, researchers estimate that there are about 10
million black holes in the Milky Way, including one at the core of our
galaxy5 , whose mass is as much as 2 million solar masses.
A black hole has an orbit just like any other star, so it’s not likely
that one is heading toward us. If any other star was approaching us,
we would know, with a black hole there would be little warning. A few
decades before a close approach to Earth, astronomers would probably
notice a strange perturbation in the orbits of the outer planets, as the
effects grew larger, it would be possible to make increasingly precise
estimates of the location and mass of the interloper. A black hole doesn’t
have to come very close to Earth to bring ruin, it simply needs to pass
through the solar system. Just as Neptune’s gravity disturbs the Kupier
belt, a rogue black hole could pull Earth’s orbit into an exaggerated
ellipse, either too close to the sun, or too far from the sun. It could even
eject Earth from it’s orbit and send the planet off into deep space.
21.4 flood volcanism
In 1783, the Laki volcano in Iceland erupted, pouring out 5 cubic

kilometers (3 cubic miles) of lava. Floods, ash, and fumes killed 9,000
people and 80 percent of Iceland’s livestock, and the ensuing starvation
killed 25 percent of Iceland’s remaining population. Atmospheric dust
lowered winter temperatures by 9 degrees in the newly independent
United States, and that was minor compared to what Earth is capable
4 In 1916, German astronomer Karl Schwarzchild established the theoretical existence of

black holes as the solution to Einstein’s gravitational equations. The Schwarzchild radius
is equal to 3 km times the number of solar masses of the black hole, the solar masses are
determined via Kepler’s laws of planetary motion.
5 Researchers also believe that there is a black hole at the center of every galaxy
of. Sixty-five million years ago, a plume of hot rock from Earth’s mantle
burst through the crust in what is now India, when continental drift
moved that area moved over a “hot spot” in the Indian Ocean. Eruptions
raged century after century, producing lava flows that exceeded 100,000
square kilometers (62 square miles) in area and 150 meters (500 feet) in
thickness. - the Laki eruption 3,000 times over. Some scientists believe
that this Indian outburst, and not an asteroid, was responsible for
the fall of the dinosaurs (the K-T extinction), since such lava flows
would have produced enormous amounts of ash, altering global climatic
conditions and changing ocean chemistry. An earlier, even larger event
in Siberia occurred at the time of the Permian mass extinction 248
million years ago, and is the most thorough extermination known to
paleontology. At that time 95 percent of all species were wiped out.
Sulfurous volcanic gases produce acid rains. Chlorine-bearing com-
pounds break down the ozone layer6 . While they are causing short-term
destruction, volcanoes also release carbon dioxide which yields long
term, greenhouse effect global warming. The last big pulse of flood
volcanism built the Columbia River plateau about 17 million years ago.
If the idea of cataclysmic volcanism sounds too unlikely, Tom Bissell
notes in a 2003 Harper’s Magazine article (A Comet’s Tale) that:
...73,500 years ago what is known as a volcanic supere-
ruption occurred in modern-day Sumatra. The resultant
volcanic winter blocked photosynthesis for years and very
nearly wiped out the human race. DNA studies have sug-
gested that the sum total of human characteristics can be
traced back to a few thousand survivors of this catastrophe.
21.5 giant solar flares
More properly known as coronal mass ejections, solar flares are the
outbursts caused by enormous magnetic storms on the sun, and they
bombard Earth with high speed subatomic particles. A typical solar
flare releases the equivalent energy of a billion hydrogen bombs, and
ejects a hundred billion tons of high energy particles into space. Earth’s
magnetic field and atmosphere negate the potentially lethal effects of
ordinary flares, knocking the particles back into space, and steering
some of the particles over the poles, creating Earth’s auroras. But while
examining old astronomical records, Yale University’s Bradley Schaefer
found evidence that some sunlike stars can brighten briefly by up
to a factor of 20. Schaefer believes that these increases are caused by
superflares millions of times more powerful than those commonly
experienced by Earth. Scientists don’t know why superflares happen at
all, or whether our sun could exhibit milder but still disruptive behavior.
A superflare on the sun only a hundred times larger than typical would
overwhelm Earth’s magnetosphere and begin disintegrating the ozone
layer (see footnote 6). Such a burst would certainly kill anything basking
in its glow, and according to Bruce Tsurutani of NASA’s Jet Propulson
Labatory, “it would leave no trace in history.”
While too much solar activity could be deadly, too little of it is
problematic as well. Sallie Baliunas of the Harvard-Smithsonian Center
for Astrophysics says that many sunlike stars pass through extended
6 Without the ozone layer, ultraviolet rays from the sun would reach the surface the earth
at nearly full force, causing skin cancer, and killing off the photosynthetic plankton in the
ocean that provide oxygen to the atmosphere and support the bottom of the food chain.
21.6 viral epidemic 157
periods of lessened activity, during which they become nearly 1 percent

dimmer. A similar downturn in our own sun could send us into another
ice age. According to Baliunas, there is evidence that decreased solar
activity contributed to 17 of the 19 major cold episodes on Earth in the
last 10,000 years.
21.6 viral epidemic
Viruses prosper by using the genetic material of a healthy cell to pro-

duce more virus. They multiply within the healthy cell, burst out, and
attack more healthy cells. They also have the capacity to burst forth in
some startling new form, and then to disappear just as quickly. In 1916,
people in Europe and America began to come down with a strange
sleeping sickness, which became known as encephalitis lethargica . Vic-
tims would go to sleep and not wake up, they could be roused with
great difficulty, but once they were allowed to rest they would fall back
into deep sleep. Some victims continued like this for months before
dying. In ten years encephalitis lethargica killed five million people and
then simply disappeared, the only reason this disease didn’t get much
lasting attention is because the worst epidemic in history was sweeping
across the world at the same time.
The Great Swine Flu, sometimes called the Great Spanish Flu, killed
twenty-one million people in its first four months. Between autumn of
1918 and spring of 1919, 548,452 people in the United States died of the
flu. In Britain, France, and Germany, the toll was over 220,000 dead in
each country, the global toll was approximately 50 million, with some
estimates as high as 100 million. Much about the 1918 flu is understood
poorly, if at all, one mystery is how it managed to break out seemingly
everywhere all at once, in countries separated by oceans and mountain
ranges. A virus can only survive outside a host body for a few hours,
so how did this virus appear in Bombay, Madrid, and Philadelphia, all
within the same week?
Some consider it a miracle that other diseases have not gone rampant.
Lassa fever, first detected in 1969 West Africa is extremely virulent. A
doctor at Yale University came down with Lassa fever when he was
studying it in 1969. He survived, but a technician in a different lab-
oratory, with no direct exposure to the virus contracted it and died.
Fortunately the outbreak stopped there, but in 1990 a Nigerian living
in New York contracted Lassa fever on a visit home and didn’t develop
any symptoms until his return to the United States. He died undiag-
nosed in a Chicago hospital, without anyone having taken any special
precautions, or knowing that he had contracted one of the most infec-
tious and lethal diseases on the planet. Our lifestyles invite epidemics,
air travel makes it possible to spread disease with alarming ease. An
Ebola outbreak could begin in Boston, jump to Paris, and then to Tokyo
before anyone ever became aware of it.
We’re accustomed to being Earth’s (and maybe the galaxy’s) domi-
nant species, which makes it difficult to consider that we may only be
here because of various chance events, or that our continued presence
may be due to the absence of other chance events.
Humans are here today because our particular line never frac-
tured - never once at any of the billion points that could have
erased us from history.
Stephen Jay Gould

A M A N D A L U N D : E X T R AT E R R E S T R I A L
INTELLIGENCE
22
22.1 introduction
ife, as far as we know, exists only on our own planet Earth. There
L is no evidence of intelligence (or life of any form) in our solar
system, galaxy, or beyond. But are we really alone? The idea that our
small planet possesses the only biological beings in the universe seems
egotistical and, in light of our recent advances in astronomy and the
sciences, unlikely. Many efforts are currently underway in the search for
extraterrestrial intelligence, with the hope of finding it still undamped
by the lack of success. If intelligent life does indeed exist elsewhere, it
remains to be seen whether it will dwell in a region close enough to
make contact with, and even whether or not it will want to be found.
22.2 the possibility of life in the universe
Exploration of our solar system has provided no evidence for the

existence of life anywhere within it except Earth. While ancient Mars
may once have had a life-supporting environment and planetary moons
such as Europa and Titan might be capable of sustaining life, no life,
and certainly no intelligent life, has been observed. Despite this, modern
Astronomy suggests that extrasolar planets (planet-star systems like our
Earth and sun) might not be as rare in galaxies as was once believed [55].
Many Jovian planets have been detected due to their large mass, but
smaller, rocky planets may even outnumber these gas giants three to
one.
The quantity of possible life-supporting planets, along with several
other factors which would determine the amount of intelligent life in
the universe, is expressed concisely in the famous Drake Equation. This
equation was created by Dr. Frank Drake of the University of California,
Santa Cruz in 1960 [55], and defines the total number of civilizations in
the universe with which we might be able to communicate (NC ) as
NC = RS f P n f L f I f C L, (22.1)
where
• RS is the formation rate of stars in a galaxy,
• f P is the fraction of stars with planetary systems,
• n is the average number of habitable planets in a planetary system,
• f L is the fraction of habitable planets which develop life,
• f I is the fraction of habitable planets which develop intelligent

life,
• f C is the fraction of planets with life that develop intelligent

beings interested in communication, and
159
160 amanda lund: extraterrestrial intelligence
Figure 22.1: One way of detecting extrasolar planets is by observing a decrease

in the brightness of a star as the planet passes in front of it [30].
• L is the average lifetime of a civilization.
As the only term in this equation known with any certainty is the rate
of star formation RS (about 2 or 3 per year), the projected values of NC
vary greatly [75]. The fraction of stars with solar systems ( f P ) has been
estimated between 20% and 50% [55], but the remaining terms depend
greatly on the optimism of the person assigning them. For instance,
the average lifetime of an intelligent civilization might be anywhere
between 100 and 1010 years. A belief in the tendency of civilizations to
self-destruct would set L at the lower limit, while the opposing view
that societies can overcome this inclination to eradicate themselves
would let L be as large as the lifetime of their star [75].
As the only known example of f L , f I , and f C is the earth, there is
really no way to obtain an accurate estimate of these probabilities [57].
Based on the time it took life to evolve on earth, a rough estimate of
f L > 0.13 could be assigned. Drake has estimated both f I , and f C to be
0.01, though there is not much basis for these assumptions.
22.3 the search for intelligent life
The current quest for intelligent life mainly involve filtering through
multitudes of electromagnetic emissions in an attempt to "eavesdrop"
on possible sentient societies [82]. Since World War II, radio astronomy
has advanced significantly, and most of the searches are being done in
the radio and microwave regions of the spectrum. SETI (the Search for
ExtraTerrestrial Intelligence), the main organization conducting these
searches, has been using large radio telescopes in an attempt to pick
up alien transmissions. Their receivers are intended to identify narrow-
band broadcasts, which are easy to discern even at large distances and
can only be created by transmitters [84].
A particularly important signal for astronomy and SETI is the hy-
22.4 will we find it? and when? 161
drogen 21-centimeter line. This special wavelength of radiation is a

quantum effect of the hyperfine structure of hydrogen, which provides
a very small correction to the hydrogen energy levels due to the inter-
action of the proton and electron spins. A transition from the triplet
to the singlet state in the ground state of hydrogen corresponds to the
spin of these two particles flipping from parallel to antiparallel. This is
a very rare transition, but when it occurs a small amount of energy is
released at a wavelength of 21 centimeters [15].
example 22.1: the 21 centimeter line
Using Planck’s relation E = hν, show that if the energy of the photon
emitted in the transition from the triplet state to singlet state of the hydrogen
hyperfine structure described above is 5.9 × 10−6 eV, the corresponding
wavelength is 21 centimeters.
Solution:
First we must convert the energy in eV to joules, using the conversion factor
1 eV = 1.6 × 10−19 J. (22.2)
This gives us:
5.9 × 10−6 eV = 9.44 × 10−25 J. (22.3)
Using both this value of the change in energy in joules and Planck’s constant,
h = 6.626 × 10−34 J s, (22.4)
we can determine the frequency of the photon emitted using Planck’s relation.
E = hν (22.5)
∆E 9.44 × 10−25
J
ν= = = 1420 MHz (22.6)
h 6.626 × 10−34 J s
Now, by plugging this frequency into the equation relating frequency and
wavelength (where c denotes the speed of light in a vacuum), we can find the
corresponding wavelength.
c 3 × 108 m/s
λ= = = 0.21 m = 21cm (22.7)
ν 1420 MHz
Though this type of transition is extremely uncommon, about 90%

of the interstellar medium consists of atomic and molecular hydrogen;
this greatly increases the probability of observing 21-centimeter radi-
ation. This radiation is able to penetrate dust clouds in space, and as
frequencies around it are protected for radio astronomy, there is limited
interference from Earth [15]. Because it is such an important frequency,
some scientists believe it is a likely signal for extraterrestrial intelligence
to send out in a communication attempt. As a result, SETI has radio-
telescopes searching for the 21-centimeter line around sun-like stars
and is even broadcasting its own 21-centimeter wavelength signals.
22.4 will we find it? and when?
Over the past 50 years, great advances in technology and in our un-
derstanding of physics and astronomy have transformed the search for
extraterrestrial intelligence from science fiction and Hollywood horror
films into an actual science. Despite this, we may still be far from find-
ing other life in the universe. If it does indeed exist, it may be nearly
Figure 22.2: SETI uses radio telescopes to try to pick up signals from extrater-
restrial life [71].
impossible to locate life among the billions of stars and galaxies and
light years of space that surround us, especially if any intelligent life out
there does not want to be found. The search has been likened to trying
to find not a needle but a single atom of a needle in a haystack [74].
Yet there are still optimists who believe we might not be so far off
after all. Notable astronomers including Carl Sagaan and Frank Drake,
who are searching through this cosmic haystack themselves, have pre-
dicted that extraterrestrial intelligence will be found between 2015 and
2026 [74]. SETI even offers a program called SETI@home, where anyone
with a computer can donate disk space to help analyze radio telescope
data and speed up the search. It seems reasonable to believe we are not
alone in the universe; the real question, then, must be whether or not
we will see evidence of other intelligence–and if so, when.
22.5 problems
1. Using the Drake Equation (Eq. (22.1)) and given values

a) RS = 2.5,
b) f P = 0.35,
c) n = 1,
d) 100 < L < 1010 ,
e) 0.13 < f L > 1,
f) f I = 0.01,
g) f C = 0.01,
determine the most optimistic and most pessimistic values of NC .
Solution: Using
NC = RS f P n f L f I f C L, (22.8)
the most optimistic value would use the upper limits in the ranges, and
the result would be
NC = 2.5 × 0.35 × 1 × 1 × 0.01 × 0.01 × 1010 = 8.75 × 106 . (22.9)
The most pessimistic value, using the lower limits, would be
NC = 2.5 × 0.35 × 1 × 0.13 × 0.01 × 0.01 × 100 = 1.14 × 10−3 .

(22.10)
However, this pessimistic estimate is much less than 1, indicating a lack

of intelligent civilizations in the universe.
2. Some astronomers have suggested that electromagnetic waves

outside the radio and microwave regions might be as good a
way or even a more promising way to succeed in the search for
extraterrestrial intelligence [84]. If we were to search for signals
with wavelengths between 750 nm and 10 µm, in what region of
the electromagnetic spectrum would this be? What would the be
corresponding frequencies for these wavelengths?
Solution: These wavelengths are in the infrared region. To find their

frequencies, we use
c
ν= . (22.11)
λ
The upper frequency is
3 × 108 m/s
ν= = 4 × 1014 s−1 , (22.12)
7.5 × 10−7 m
and the lower frequency is
3 × 108 m/s
ν= = 3 × 1013 s−1 , (22.13)
1 × 10−5 m
so 3 × 1013 s−1 < ν < 4 × 1014 s−1 .
1. What is an extrasolar planet?

a) A planet in our solar system that is outside the frost line
b) A planet outside our solar system orbiting around another
star
c) A massive body that is not a true planet but exists in free
space and does not orbit a star
d) A planet on which life exists
Answer: b)
2. How does SETI hope to find extraterrestrial life?

a) By intercepting radio signals from intelligent civilizations
b) By building a spaceship that can travel at close to the speed
of light and sending it toward extrasolar planets in the An-
dromeda galaxy
c) By following up on all reports of UFO sightings in hopes
that one of them is valid
d) By using radio telescopes to intercept government intelli-
gence transmissions and expose the evidence of extraterres-
trials that the government has been concealing
Answer: a)
22.7 summary
So far, the only life we know of in the universe exists on Earth. However,
given the vastness of the universe and the huge number of galaxies,
stars, and planets within it, it is unlikely that we are alone. With our
current technology there is no way to tell how much intelligent life
exists outside our planet or where it is located; the Drake Equation is
one way to estimate the number of intelligent civilizations, but it is
based mainly on guesswork. The SETI institute has set out to find life by
attempting to eavesdrop on radio signals from other civilizations–one
important wavelength in this search is the 21-centimeter line, a rare
transition between two hyperfine states of hydrogen. There is no way
to know when or if we will find anything, though some optimistic
astronomers predict as soon as 2015.
BIBLIOGRAPHY A
[1] The Eöt-Wash Group, Washington University, 2008.
http://www.npl.washington.edu/eotwash/. (Cited on
page 148.)
[2] Geoff Andersen. The Telescope. Princeton University Press, 2007.

(Cited on page 17.)
[3] Christoph Arndt. Information Measures. Springer, 2001. (Cited on

page 98.)
[4] BBC. Freak wave. BBC 2, November 14 2002. (Cited on pages 47

and 53.)
[5] Carl M. Bender and Steven A. Orszag. Advanced Mathematical

Methods for Scientists and Engineers. Springer, 1999. (Cited on
page 29.)
[6] Bennet, Donahue, Schneider, and Voit. The Cosmic Perspective:

Fourth Edition. Pearson Education Inc., San Francisco, CA, 2007.
(Cited on pages 135, 136, 137, and 140.)
[7] Ben Best. Lessons for cryonics from metallurgy and ceram-
ics. benbest.com/cryonics/lessons.html. (Cited on pages 34
and 35.)
[8] B.I. Bleaney. Electricity and Magnetism, volume 1. Oxford Univer-

sity Press, Great Britain, 3rd edition, 1976. (Cited on page 4.)
[9] Laura Bloom. The Electromagnetic Spectrum. University of Col-

orado, 2008. http://lasp.colorado.edu/cassini/education/
Electromagnetic%20Spectrum.htm. (Cited on pages 4 and 7.)
[10] Christian Blum. Ant colony optimization. http://iridia.

ulb.ac.be/~meta/newsite/index.php?main=3&sub=31. (Cited
on page 109.)
[11] Stehpen Brawer. Relaxation in Viscous Liquids and Glasses. The

American Ceramics Society, 1985. (Cited on page 34.)
[12] Evelyn Brown. Ocean Circulation. Butterworth Heinemann, Milton

Keynes, UK, 2nd edition, 2002. (Cited on page 47.)
[13] Edward Bryant. Tsunami-The Underrated Hazard. Cambridge Uni-

versity Press, Cambridge, UK, 2001. (Cited on page 47.)
[14] Laura Cadonati. private communication, 2008. (Cited on pages 64,

66, 67, 72, and 74.)
[15] Laura Cadonati. Hyperfine structure. Quantum Mechanics class

notes, November 2008. (Cited on page 161.)
[16] Kenneth Chang. The nature of glass remains anything but clear,
July 2008. New York Times. (Cited on pages 33 and 34.)
165
166 bibliography
[17] Colloid. Colloid. http://dictionary.reference.com/browse/

colloid. (Cited on page 36.)
[18] Wikimedia Commons. Gravitational perturbation.

http://upload.wikimedia.org/wikipedia/commons/0/01/
Gravitational_perturbation.svg. (Cited on page 128.)
[19] Dennis Joseph Cowles. Kepler’s third law. http://www.

craigmont.org/kepler.htm. (Cited on page 128.)
[20] Rod Cross. Increase in friction with sliding speed. American

Journal of Physics, 73(9), 2005. (Cited on page 56.)
[21] J.M.A. Danby. Fundamentals of Celestial Mechanics. MacMillan,
New York City, NY, first edition, 1962. (Cited on page 128.)
[22] Marco Dorigo. Ant colony optimization. http://iridia.ulb.ac.
be/~mdorigo/ACO/ACO.html, 2008. (Cited on pages 107 and 109.)
[23] Marco Dorigo. Ant colonies for the traveling salesman problem.
Technical report, Université Libre de Bruxelles, 1996. (Cited on
pages 111 and 113.)
[24] Marco Dorigo. Ant algorithms for discrete optimization. Technical
report, Université Libre de Bruxelles, 1996. (Cited on pages 111
and 112.)
[25] Belle Dumé. Magnetic recording has a speed limit. http:
//physicsworld.com/cws/article/news/19401, Apr 21, 2004.
(Cited on page 87.)
[26] Albert Einstein. Relativity: The special and general theory. Henry
Holt and Company, New York, 1920. Translated by Robert W.
Lawson. (Cited on page 143.)
[27] SR Elliot. Physics of Amorphous Materials. Longman, 1984. (Cited
on pages 33, 34, and 35.)
[28] B. N. J. Persson et. al. On the origin of amonton’s friction law.
Journal of Physics: Condensed Matter, 20, 2008. (Cited on page 57.)
[29] H. Olsson et. al. Friction models and friction compensation.
European Journal of Control, 4(9), 1998. (Cited on pages 56 and 57.)
[30] Extrasolar Planets. Extrasolar planets. http://www.astro.keele.
ac.uk/workx/extra-solar-planets/extrasolar.html. Image of
an artist’s representation of an extrasolar planet taken from this
site. (Cited on page 160.)
[31] Adalbert Feltz. Amorphous Inorganic Materials and Glasses. VCH,
1993. (Cited on page 33.)
[32] Richard Feynman, Robert Leighton, and Matthew Sands. The
Feynman Lectures on Physics. Addison-Wesley, San Franscisco,
[33] Ephraim Fischbach. Reanalysis of the Eötvös Experiment. 1986.
(Cited on page 149.)
[34] Allan Franklin. The Rise and Fall of the Fifth Force. American Insti-
tute of Physics, New York, New York, 1993. (Cited on pages 148,
149, 150, and 151.)
bibliography 167
[35] Carolyn Gordon, David L. Webb, and Scott Wolpert. One cannot
hear the shape of a drum. Bulletin of the American Mathematical
Society, 27:134–138, 1992. (Cited on page 29.)
[36] David J. Griffiths. Introduction to Electrodynamics. Prentice Hall,
Upper Saddle River, New Jersey, 3rd edition, 1999. (Cited on
pages 3, 5, 6, and 8.)
[37] David J. Griffiths. Introduction to Quantum Mechanics. Benjamin
Cummings, San Francisco, second edition, 2004. (Cited on
pages 61, 62, 64, and 132.)
[38] John Grue and Karsten Trulsen. Waves in Geophysical Fluids.
SpringerWienNewYork, Italy, 2006. (Cited on page 47.)
[39] R. J. Hall. Interior structure of star prior to core collapse.
http://www.newwest.net/topic/article/cosmicfireworks, De-
cember 2006. (Cited on pages 135 and 137.)
[40] R.V.L. Hartley. Transmission of information. Bell System Technical
Journal, July:535–563, 1928. (Cited on page 98.)
[41] Sverre Haver. Freak wave event at draupner jacket. Statoil, 2005.
(Cited on page 48.)
[42] Franz Himpsel. Nanoscale memory. http://uw.physics.wisc.
edu/~himpsel/memory.html, 2008. (Cited on page 88.)
[43] Keigo Iizuka. Springer Series in Optical Sciences: Engineering Optics.

Springer Science+Business Media, 2008. (Cited on page 13.)
[44] Vincent Ilardi. Renaissance Vision from Spectacles to Telescopes.
American Philosophical Society, 2007. (Cited on page 13.)
[45] Mark Kac. Can one hear the shape of a drum? American Mathe-
matical Montly, 73:1–23, 1966. (Cited on pages 27 and 29.)
[46] Edward W. Kamen and Bonnie S. Heck. Fundamentals of Signals
and Systems. Prentice Hall, Standford University, 2007. (Cited on
pages 81 and 84.)
[47] Joseph E Kasper and Steven A Feller. The Complete Book of Holo-
grams: How They Work and How to Make Them. Dover, New York,
1987. (Cited on pages 19, 20, 21, and 23.)
[48] Christian Kharif and Efim Pelinovsky. Physical mechanisms of
the rogue wave phenomenon. European Journal of Mechanics, 22:
603–634. (Cited on pages 52 and 53.)
[49] I. V. Lavrenov. The wave energy concentration at the agulhas
current off south africa. Natural Hazards, 17:117–127, March 1998.
(Cited on pages 50 and 51.)
[50] L. A. Leshin, N. Ouellette, S. J. Desch, and J. J. Hester. A nearby
supernova injected short-lived radionuclides into our protoplan-
etary disk. Chondrites and the Protoplanetary Disk: ASP Conference
Series, 341, 2005. (Cited on pages 137, 138, 139, and 140.)
[51] M. Limongi and A. Chieffi. The nucleosynthesis of 26 Al and 60 Fe
in solar metallicity stars extending in mass from 11 to 120 solar
masses. The Astrophysical Journal, 647:483–500, 2006. (Cited on
pages 138 and 139.)
168 bibliography
[52] Seth Lloyd. Ultimate physical limits to computation. Nature, 406:

1047–1054, 2000. (Cited on page 102.)
[53] Jim Lochner and Meredith Gibb. Introduction to super-

nova remnants. http://heasarc.gsfc.nasa.gov/docs/objects/
snrs/snrstext.html, October 2007. (Cited on pages 135, 136,
and 137.)
[54] Richard G. Lyons. Understanding Digitial Signal Processing.

Prentice-Hall, Univ. of California Santa Cruz, 2004. (Cited on
pages 79, 80, and 83.)
[55] D. J. Des Marais and M. R. Walter. Astrobiology: Exploring the

origins, evolution, and distribution of life in the universe. Annual
Review of Ecology and Systematics, 30:397–420, 2008. (Cited on
pages 159 and 160.)
[56] N. Margolus and L. B. Levitin. The maximum speed of dynamical

evolution. Physica D, 120:188–195, 1998. (Cited on page 88.)
[57] Roy Mash. Big numbers and induction in the case for extraterres-
trial intelligence. Philosophy of Science, 60:204–222, 1993. (Cited
on page 160.)
[58] S.R. Massel. Ocean Surface Waves; their Physics and Prediction.
World Scientific Publ., New Jersey, 1996. (Cited on page 49.)
[59] N. David Mermin. Boojums All the Way Through: Communicating

Science in a Prosaic Age. Cambridge University Press, Cambridge,
1990. (Cited on page xi.)
[60] Kurt Nassau. The Physics and Chemistry of Color. John Wiley and
Sons, Inc., New York, second edition, 2001. (Cited on page 9.)
[61] R. Nave. Nuclear reactions in stars. http://hyperphysics.

phy-astr.gsu.edu/Hbase/hph.html, 2005. (Cited on page 135.)
[62] Rod Nave. Maxwell’s Equations. Georgia State Univer-

sity, 2005. http://hyperphysics.phy-astr.gsu.edu/HBASE/
electric/maxeq2.html#c3. (Cited on page 4.)
[63] BBC News. Factfile: Hard disk drive. http://news.bbc.co.uk/

2/hi/technology/6677545.stm, 21 May 2007. (Cited on page 89.)
[64] The Royal Swedish Academy of Sciences. The nobel prize in

physics 2007. http://nobelprize.org/nobel_prizes/physics/
laureates/2007/info.pdf, 2007. (Cited on page 93.)
[65] H. Palme and A. Jones. Solar system abundances of the elements.

Treatise on Geochemistry, 1:41–61, 2003. (Cited on page 138.)
[66] Karsten Peters, Anders Johansson, Audrey Dussutour, and Dirk

Helbing. ANALYTICAL AND NUMERICAL INVESTIGATION
OF ANT BEHAVIOR UNDER CROWDED CONDITIONS. World
Scientific Publishing Company, Institute for Transport & Eco-
nomics, Dresden University of Technology, Andreas-Schubert-Str.
23, 01062 Dresden, Germany, 2008. (Cited on pages 107 and 108.)
[67] Paul R. Pinet. Invitation to Oceanography. Jones and Bartlett Pub-

lishers, Sudbury, MA, 4th edition, 2006. (Cited on page 47.)
bibliography 169
[68] IBM Research. The giant magnetoresistive head: A giant leap

for ibm research. http://www.research.ibm.com/research/gmr.
html, 2008. (Cited on pages 91 and 93.)
[69] C. Patrick Royall, Esther C. M. Vermolen, Alfons van Blaaderen,

and Hajime Tanaka. Controlling competition between crystalliza-
tion and glass formation in binary colloids with an external field.
JOURNAL OF PHYSICS-CONDENSED MATTER, 20(40), OCT 8
2008. ISSN 0953-8984. doi: {10.1088/0953-8984/20/40/404225}.
2nd Conference on Colloidal Dispersions in External Fields, Bonn
Bad Godesberg, GERMANY, MAR 31-APR 02, 2008. (Cited on
pages 36 and 37.)
[70] Francis W. Sears and Mark W. Zemansky. University Physics.

Addison-Wesley Publishing Company, Reading, Massachusetts,
second edition, 1955. (Cited on page 10.)
[71] SETI. Seti radio telescopes. http://blog.wired.com/photos/

uncategorized/2008/06/09/setiscopes.jpg. Image of SETI ra-
dio telescopes taken from this site. (Cited on page 162.)
[72] C. E. Shannon. A mathematical theory of communication. Bell

System Technical Journal, 27:379–423,623–656, 1948. (Cited on
page 98.)
[73] S. L. Shapiro and Saul A. Teukolsky. Black Holes White Dwarfs,

and Neutron Stars. Wiley-Interscience, 1983. (Cited on page 131.)
[74] Seth Shostak. When will we find the extraterrestrials? (And What
Will We Do When We Find Them?). lecture at the University of
Massachusetts Amherst, October 2008. (Cited on page 162.)
[75] Frank H. Shu. The Physical Universe: An Introduction to Astronomy.

University Science Books, 1982. (Cited on page 160.)
[76] Richard Slansky, Stuart Raby, Terry Goldman, and Gerry Garvey.
The oscillating neutrino: An introduction to neutrino masses and
mixings. Los Alamos Science, 25:28–63, 1997. (Cited on pages 65,
66, 72, 73, and 74.)
[77] Tom Standage. The Neptune File: A Story of Astronomical Rivalry

and the Pioneers of Planet Hunting. Walker Publishing Comapny,
Inc., New York, NY, 2000. (Cited on page 127.)
[78] Boris Svistunov. private communication, 2008. (Cited on page 64.)
[79] S. Tachibana, G. R. Huss, N. T. Kita, H. Shimoda, and Y. Mor-

ishita. The abundances of iron-60 in pyroxene chondrules from
unequilibrated ordinary chondrites. Lunar and Planetary Science,
XXXVI:1529–1530, 2005. (Cited on page 139.)
[80] R.J. Tayler. The Stars: their structure and evolution. Cambridge,
[81] Ker Than. Monster busts theory. http://www.burzycki.org/

category/science, October 2007. (Cited on page 138.)
[82] D. E. Thomsen. Sifting the cosmic haystack for aliens. Science

News, 121:374, 1982. (Cited on page 160.)
170 bibliography
[83] J. J. Tobin, L. W. Looney, and B. D. Fields. Radioactive Probes

of the Supernova-Contaminated Solar Nebula: Evidence that the Sun
was Born in a Cluster. Faculty Publication, Univ. of Illinois at
Urbana-Champaign, 2006. (Cited on page 138.)
[84] C. H. Townes. At what wavelengths should we search for sig-

nals from extraterrestrial intelligence? Proceedings of the National
Academy of Sciences of the United States of America, 80:1147–1151,
2008. (Cited on pages 160 and 163.)
[85] Evgeny Y. Tsymbal. Giant magnetoresistance. http://physics.

unl.edu/~tsymbal/tsymbal_files/GMR/gmr.html, 2008. (Cited
on page 94.)
[86] Scott Turner. Stigmergic building. http://www.esf.edu/efb/

turner/termite/stigmergicbuilding.html. (Cited on pages 107
and 108.)
[87] Syracuse University. Introducing the einstein principle of equiv-

alence. http://physics.syr.edu/courses/modules/LIGHTCONE/
equivalence.html, November 2008. (Cited on page 144.)
[88] Martinus Veltman. Facts and Mysteries in Elementary Particle

Physics. World Scientific Publishing Company, Sassari, 2003.
(Cited on pages 62, 63, and 65.)
[89] Nelson Wallace. Rainbows: How does a rainbow

work? http://web.archive.org/web/20020126143121/http:
//home.earthlink.net/~nwwallace/rainbows.htm. (Cited on
page 12.)
[90] Martin Waugh. Liquid sculpture. http://www.liquidsculpture.

com/fine_art/image.htm?title=untitled_001, 2006. (Cited on
page 39.)
[91] Paul S. Wesson. Five-dimensional Physics: Classical and Quantum

Consequences of Kaluza-Klein Cosmology. World Scientific, 2006.
[92] Paul S. Wesson. Five-dimensional Physics: Classical and Quantum

Consequences of Kaluza-Klein Cosmology. World Scientific, 2006.
[93] Robert J. Whitaker. Physics of the rainbow. Physics Teacher, 17:

283–286, May 1974. (Cited on pages 10 and 11.)
[94] Nicholas Kollerstrom William Sheehan and Craig B. Waff. The

case of the pilfered planet. Scientific American, page 98, 2004.
[95] R. Wood, Y. Hsu, and M. Schultz. Perpendicular magnetic record-

ing technology. Hitachi Global Storage Technologies White Paper,
November 2007. (Cited on pages 91 and 92.)
[96] A.M. Worthington. A Study of Splashes. MacMillan, New York US,

[97] Lei Xu, Wendy W. Zhang, and Sidney R. Nagel. Drop splashing
on a dry smooth surface. Phys. Rev. Lett., 94, 2005. (Cited on
page 39.)
bibliography 171
[98] Lei Xu, Loreto Barcos, and Sidney R. Nagel. Splashing of liquids:
Interplay of surface roughness with surrounding gas. Phys. Rev.,
76, 2007. (Cited on page 39.)
[99] Hugh D. Young and Roger A. Freedman. University Physics.

Addison-Wesley, 2004. (Cited on pages 15 and 16.)
[100] Hugh D Young and Roger A Freedman. University Physics, 11th

Edition. Addison-Wesley, San Franscisco, 2005. (Cited on page 19.)
[101] Hugh D. Young and Roger A. Freedman. University Physics.

Pearson Education, Inc., San Francisco, CA, 11th edition, 2004.
INDEX
Andersen [2], 17, 165 Limongi and Chieffi [51], 138, 139,
Arndt [3], 98, 165 167
BBC [4], 47, 53, 165 Lloyd [52], 102, 167
Bender and Orszag [5], 29, 165 Lochner and Gibb [53], 135–137,
Bennet et al. [6], 135–137, 140, 165 168
Best [7], 34, 35, 165 Lyons [54], 79, 80, 83, 168
Bleaney [8], 4, 165 Marais and Walter [55], 159, 160,
Bloom [9], 4, 7, 165 168
Blum [10], 109, 165 Margolus and Levitin [56], 88, 168
Brawer [11], 34, 165 Mash [57], 160, 168
Brown [12], 47, 165 Massel [58], 49, 168
Bryant [13], 47, 165 Mermin [59], xi, 168
Cadonati [14], 64, 66, 67, 72, 74, 165 Nassau [60], 9, 168
Cadonati [15], 161, 165 Nave [61], 135, 168
Chang [16], 33, 34, 165 Nave [62], 4, 168
Colloid [17], 36, 165 News [63], 89, 168
Commons [18], 128, 166 Palme and Jones [65], 138, 168
Cowles [19], 128, 166 Peters et al. [66], 107, 108, 168
Cross [20], 56, 166 Pinet [67], 47, 168
Danby [21], 128, 166 Research [68], 91, 93, 168
Dorigo [22], 107, 109, 166 Royall et al. [69], 36, 37, 169
Dorigo [23], 111, 113, 166 SETI [71], 162, 169
Dorigo [24], 111, 112, 166 Sears and Zemansky [70], 10, 169
Dumé [25], 87, 166 Shannon [72], 98, 169
Einstein [26], 143, 166 Shapiro and Teukolsky [73], 131,
Elliot [27], 33–35, 166 169
Extrasolar Planets [30], 160, 166 Shostak [74], 162, 169
Feltz [31], 33, 166 Shu [75], 160, 169
Feynman et al. [32], 19, 166 Slansky et al. [76], 65, 66, 72–74,
Fischbach [33], 149, 166 169
Franklin [34], 148–151, 166 Standage [77], 127, 169
Gordon et al. [35], 29, 166 Svistunov [78], 64, 169
Griffiths [36], 3, 5, 6, 8, 167 Tachibana et al. [79], 139, 169
Griffiths [37], 61, 62, 64, 132, 167 Tayler [80], 133, 169
Grue and Trulsen [38], 47, 167 Than [81], 138, 169
Hall [39], 135, 137, 167 Thomsen [82], 160, 169
Hartley [40], 98, 167 Tobin et al. [83], 138, 169
Haver [41], 48, 167 Townes [84], 160, 163, 170
Himpsel [42], 88, 167 Tsymbal [85], 94, 170
Iizuka [43], 13, 167 Turner [86], 107, 108, 170
Ilardi [44], 13, 167 University [87], 144, 170
Kac [45], 27, 29, 167 Veltman [88], 62, 63, 65, 170
Kamen and Heck [46], 81, 84, 167 Wallace [89], 12, 170
Kasper and Feller [47], 19–21, 23, Waugh [90], 39, 170
167 Wesson [91], 148, 170
Kharif and Pelinovsky [48], 52, 53, Wesson [92], 143, 170
167 Whitaker [93], 10, 11, 170
Lavrenov [49], 50, 51, 167 William Sheehan and Waff [94], 126,
Leshin et al. [50], 137–140, 167 170
173
174 INDEX
Wood et al. [95], 91, 92, 170 electromagnetic waves, 3

Worthington [96], 39, 170 Electromagnetic force, 63
Xu et al. [97], 39, 170 Electroweak theory, 64
Xu et al. [98], 39, 170 Entropy, 102
Young and Freedman [100], 19, 171 equivalence principle, 143
Young and Freedman [101], 125, extrasolar planets, 159
171
Young and Freedman [99], 15, 16, f-number, 16
171 Fermions, 62
eot [1], 148, 165 fifth interaction, 148
et. al. [28], 57, 166 Fischbach, 148
et. al. [29], 56, 57, 166 focal length, 14
of Sciences [64], 93, 168 Fourier hologram, 24
21-centimeter line, 161 Frank Drake, 159, 162
fundamental forces, 147
Alhazen, 13
amorphous Solid, 33 Gauge Boson, 63
Antarctica, 50 gauss’s law, 4
Areal density, 87 general relativity, 143
astronomer’s lens, 17 Generations of matter, 62
Giant magnetoresistance (GMR),
baryon number, 149 93
Basis, 62 Girolamo Cardano, 13
beam ratio, 22 glass, 33
Benjamin-Feir instability, 52 Glass Transition, 34
Binary bit, 85 Glass Transition Temperature, 35
Binary byte, 86 Gluon, 62
Bosons, 62 Grand unified theory, 72
Bragg reflection, 21 gravitation, 147
breather solutions, 52 Gulfstream current, 50
Bremen, 52
Hamiltonian, 70
Caledonian Star, 52 Hard disk platter, 89
camera obscura, 13 Hard disk sector, 90
camera screen, 15 Hard disk track, 90
Cape of Good Hope, 50 Hartley, 98
Carl Sagaan, 162 Heisenberg uncertainty principle,
CERN, 72 87
Change of basis, 66 Hermitian operator, 62
classical photography, 19 Higgs boson, 65
Colloids, 35 Higgs mechanism, 65
converging lens, 14 holographic data storage, 23
Creation operator, 64 holographic convolution, 24
crystalline Solid, 33 holography
current refraction, 51 geometric model, 19
diverging lens, 14 Inertia, 65

DNA, 86 Information Entropy, 98
double-slit experiment, 19 Information Theory, 97
Drake Equation, 159 Information uncertainty, 100
draupner wave, 48, 49 Information, formal definition of,
98
Eötvös, 148 interference fringes, 19
Eigenstates, 61 intermodulation noise, 22
Einstein equivalence principle, 143
John Couch Adams, 127
INDEX 175
Kepler’s Third Law, 128 strong equivalence principle, 143

Kuroshio current, 50 Strong force, 62
Sudbury Neutrino Observatory, 72
Ladder operator, 64 Supercooled Liquid, 35
Lambda particle, 73 surface hologram, 22
Large hadron collider, 72
Lepton, 63 theory of gravity, 143
liquid, 33 thick hologram, 22
long-range order, 33 thin hologram, 22
Longitudinal magnetic recording tidal wave, 47
(LMR), 91 transmission hologram, 21
tsunami, 47
Magnetoresistance, 92
magnification, 14 Uranus, 126
Mass states, 67
virtual image, 21
Neptune, 125
Neutrino, 67 weak equivalence principle, 143,
Neutrino oscillations, 67 147
Newton’s Law of Universal Gravi- Weak eigenstate, 65
tation, 125 Weak force, 63
W boson, 64
object beam, 20
Yukawa potential, 149
Pacific Ocean, 50
pattern recognition, 24 zone plate model, 21, 22
Perpendicular magnetic recording zoom lens, 16
(PMR), 91 Z boson, 64
Photon, 63
Planck energy, 72
Principle of superposition, 61
Quantum field theory, 63

Quantum mechanical operator, 61
Quantum mechanics, 61
Quark, 63
Radial Distribution Function, 36

Rayleigh distribution, 49
real image, 21
redundancy, 20
reference beam, 20
reflection holograms, 21
rogue waves, 49–54
SETI, 160, 162

Shannon, 98
Shannon entropy, 98
Shannon’s formula, 101
short-range order, 33
sine waves, 3
Soda-Lime Glass, 34
South Africa, 50
South Africa, 50
Special relativity, 70
Standard Model, 62

Intelligent Designs: Selected Topics in Physics

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Intelligent Designs: Selected Topics in Physics

Diunggah oleh

Hak Cipta:

Format Tersedia

INTELLIGENT DESIGNS

selected topics in physics

Sam Bingham sbingham @student.umass.edu

Samuel Boone sboone

Morgan-Elise Cervo mcervo

Jose Clemente jaclemen

Adam Cohen afcohen

Robert Deegan rdeegan

Matthew Drake mdrake

Christopher Emma cpemma

Sebastian Fischetti sfischet

Keith Fratus kfratus

Douglas Herbert dherbert

Paul Hughes phughes

Christopher Kerrigan crkerrig

Alexander Kiriakopoulos akiriako

Collin Lally clally

Amanda Lund alund

Christopher MacLellan cmaclell

Matthew Mirigian mmirigia

Tim Mortsolf tmortsol

Andrew O’Donnell anodonne

David Parker dparker

Robert Pierce rpierce

Richard Rines rrines

Daniel Rogers drrogers

Daniel Schmidt dschmidt

Jonah Zimmerman jzimmerm

i Seeing the Light 1

ii Mind Over Matter 31

8.2 Linear Model 48

iii Information is Power 77

14.5 Solution to the Traveling Salesman Problem Using an

iv What’s Out There 123

22.1 Introduction 159

his book has been produced as an assignment for Physics 381,

class times and office hours

4. Show proper respect to your readers, making sure not to

SEEING THE LIGHT

To understand what is happening up in the sky we first need to review

f (z, t) = A cos[k(z − vt) + δ]. (1.1)

In the above equation the variable A represents the amplitude. The

Now that we understand the different properties of waves it is easy to

f (z, t) = A cos(kz + ωt − δ). (1.5)

1.3 electromagnetic waves

Electromagnetic waves are formed when an electric field is combined

to motion of the electromagnetic wave. Electromagnetic waves are

where ρ represents the charge density and e0 is the electric constant.

Now we have the necessary tools to write an equation for an electro-

Now that we know electromagnetic waves exist you might question,

Figure 1.3: Problem Geometry

I (t) = −q0 ω sin(ωt)ẑ. (1.16)

Making reference to Figure 3 we derive,

d/2 − q0 ω sin[ ω ( t − γ/c )]

A visual interpretation of the intensity shows that the function takes

µ0 p20 ω 4 sin2 Θ 2 µ0 p20 ω 4

1.5 blueness of the sky

If we consider the radiation of the sun’s light to be compatible with our

example 1.1: why isn’t the sky violet?

1. Using the equation for combining waves, A3 eiδ3 = A1 eiδ1 + A2 eiδ2

2. Check that the retarded potentials of an oscillating dipole ((1.15) and

ainbows are one of nature’s most beautiful sights. Some might

2.2 the primary bow