Anda di halaman 1dari 478


A theoretical physics FAQ



The up-to-date version of the theoretical physics FAQ is at

in a reorganized and clickable version.

The present document is an old ASCII version,

in which the state of the FAQ on January 9, 2010 is frozen.


Consider everything, and keep the good.

(St. Paul, 1 Thess. 5:21)

This document (a simple ASCII file) contains answers to some more or

less frequently asked questions from theoretical physics. Currently,

the FAQ contains 148 topics, grouped into 20 chapters, and filling over

11000 lines of text (about half a megabyte), corresponding to a book

of about 220 pages. Starng in 2004, the topics were edited from my

answers to postings to the moderated newsgroup sci.physics.research

(or, for some, translated from postings to the unmoderated German

newsgroup de.sci.physik).

If you like the FAQ and/or found it useful, please link to it from

your home page to make it more widely known.

If you spot errors or have suggestions for improvements,

please write me (at

If you have questions, please post them to the moderated newsgroup

sci.physics.research (!
If you found this FAQ useful you are likely to benefit also from

reading our book

Arnold Neumaier and Dennis Westra,

Classical and Quantum Mechanics via Lie algebras,


Of course, the FAQ refers only to a tiny part of theoretical physics,

namely to what I happened to discuss on sci.physics.research.

The answers are only as good as my understanding of the matter.

This doesn't mean that they are poor but probably that they are

not perfect. Many topics are discussed quite in detail, but this is

not a book, so don't expect completeness or comprehensiveness in any


On topics where the physics community has not yet reached a


my point of view is of course only one of the possibilities, and not

always the mainstream view, although I tend to discuss that view, too.

In any case, I try to be accurate, consistent, and intelligible.

Happy Reading!

Arnold Neumaier

University of Vienna

I like to see people grow


Table of Contents


The 21 topics in the inial version, posted there on April 28, 2004,

have grown to 88 by January 1, 2005, to 116 by January 4, 2006,

to 128 by January 3, 2007, to 140 by January 3, 2008, to 147 by

January 30, 2009, and are likely to grow further.

(A * indicates addition of a new topic, or large modification of

an old one, since January 30, 2009. Minor changes or addions to

old topics are not indicated.)

The various topics can usually be read independently of each other;

they are arranged into groups of loosely related topics.

To read a parcular entry, grep for its label, e.g., S2e.

The labels may change with time as answers to further questions

will be added and old answers regrouped. So, to quote part of the FAQ,

refer to the title of a section and not only to its label.


QM = quantum mechanics, QFT = quantum field theory,

QED = quantum electrodynamics, CCR = canonical commutation


s.p.r. = sci.physics.research (newsgroup).

Strings like quant-ph/0303047 or arXiv:0810.1019 refer to electronic

documents in the e-Print archive at and mirror sites.

p_0 and \p are the me and space part of a 4-vector p;

the Minkowski inner product is always taken to be p^2=p_0^2-\p^2.

Chapter 1 (20 secons)

S1a. What are bras and kets?

S1b. Projecve geometry and quantum mechanics

S1c. What is the meaning of the entries of a density matrix?

S1d. Postulates for the formal core of quantum mechanics

S1e. Open quantum systems

S1f. Interacon with a heat bath

S1g. Quantum-classical mechanics

S1h. Can all quantum states be realized in nature?

S1i. Modes and wave funcons of laser beams

S1j. Classical and quantum tunneling

S1k. Quanzaon in non-Cartesian coordinates

S1l. Second quanzaon

S1m. When is an object macroscopic?

S1n. The role of the ergodic hypothesis

S1o. Does quantum mechanics apply to single systems?

*S1p. Dissipave dynamics and Lagrangians

*S1q. How can QM be stochasc while the Schroedinger equaon is


*S1r. Measurement theory for real numbers

*S1s. The classical limit of quantum mechanics

*S1t. The classical limit via coherent states

Chapter 2 (10 secons)

S2a. Lie groups and Lie algebras

S2b. The Galilei group as contracon of the Poincare group

S2c. Representaons of the Poincare group, spin and gauge invariance

S2d. Forms of relavisc dynamics

S2e. Is there a mulparcle relavisc quantum mechanics?

S2f. What is a photon?

S2g. Parcle posions and the posion operator

S2h. Localizaon and posion operators

*S2i. Posion operators in relavisc quantum field theory

S2j. Coherent states of light as ensembles

Chapter 3 (6 secons)

S3a. What are 'bare' and 'dressed' parcles?

S3b. How meaningful are single Feynman diagrams?

S3c. How real are 'virtual parcles'?

S3d. What is the meaning of 'on-shell' and 'off-shell'?

S3e. Virtual parcles and Coulomb interacon

S3f. Are virtual parcles and decaying parcles the same?

Chapter 4 (10 secons)

S4a. How do atoms and molecules look like?

S4b. Why are observable densies state-dependent?

S4c. Are electrons pointlike/structureless?

S4d. How much informaon is in a parcle?

S4e. Entropy and missing informaon

S4f. How real is the wave funcon?

S4g. How real are Feynman's paths?

S4h. Can parcles go backward in me?

S4i. What about parcles faster than light (tachyons)?

S4j. Do free parcles exist?

Chapter 5 (9 secons)

S5a. QM pictures and representaons

S5b. Inequivalent representaons of the CCR/CAR

S5c. Why does QFT look so different from QM?

S5d. Why is QFT based on a classical acon?

S5e. Why does the acon only contain first derivaves?

S5f. Why normal ordering?

S5g. Why locality and causal commutation relations?

S5h. Creaon operators and rigged Hilbert space

S5i. Why Feynman diagrams?

Chapter 6 (8 secons)

S6a. Nonperturbave computaons in quantum field theory

S6b. The formal funconal integral approach to QFT

S6c. Funconal integrals, Wightman functions, and rigorous QFT

S6d. Is there a rigorous interacng QFT in 4 dimensions?

S6e. Construcve field theory

S6f. The classical limit of relavisc QFT

S6g. What are interpolang fields?

S6h. Hilbert space and Hamiltonian in relativistic quantum field theory

*S6i. 2-dimensional quantum field theory

Chapter 7 (3 secons)

S7a. What is the mass gap?

S7b. Why can a bound state of massless quarks be heavy?

S7c. Bound states in relavisc quantum field theory

Chapter 8 (9 secons)

S8a. Why renormalizaon?

S8b. Renormalizaon without infinies I

S8c. Renormalizaon without infinies II

S8d. Renormalizaon and coarse graining

S8e. Renormalizaon scale and experimental energy scale

S8f. Dimensional regularizaon

S8g. Nonrelavisc quantum field theory

S8h. Nonrenormalizable theories as effecve theories

S8i. What about infrared divergences?

Chapter 9 (6 secons)

S9a. Summing divergent series

S9b. Is QED consistent?

S9c. What about relativistic QFT at finite times?

S9d. Perturbaon theory and instantaneous forces

S9e. QED and relavisc quantum chemistry

S9f. Are protons described by QED?

Chapter 10 (13 secons)

S10a. How are matrices and tensors related?

S10b. Is quantum mechanics compatible with general relativity?

S10c. Difficules in quanzing gravity

S10d. Renormalizaon in quantum gravity

S10e. Hadamard states and their Hilbert spaces

S10f. Why do gravitons have spin 2?

S10g. What is the tetrad formalism?

S10h. Energy in general relavity

S10i. What happened to the aether?

S10j. What is me?

S10k. Time in quantum mechanics

S10l. Diffeomorphism invariant classical mechanics

S10m. The concept of ''Now''

Chapter 11 (7 secons)

S11a. A concise formulaon of the measurement problem of QM

S11b. The double slit experiment

S11c. The Stern-Gerlach experiment

S11d. The minimal interpretaon

S11e. The preferred basis problem

S11f. Master equaon and pointer variables

S11g. Does decoherence solve the measurement problem?

Chapter 12 (6 secons)

S12a. Which interpretaon of quantum mechanics is most consistent?

S12b. Which textbook of quantum mechanics is best for foundaons?

S12c. What is the role of quantum logic?

S12d. Stochasc quantum mechanics

S12e. Is there a relavisc measurement theory?

S12f. Quantum mechanics and dice

Chapter 13 (10 secons)

S13a. Random numbers and other random objects

S13b. What is the meaning of probabilies?

S13c. What about the subjecve interpretation of probabilities?

S13d. Are probabilies limits of relave frequencies?

S13e. How meaningful are probabilies of single events?

S13f. Objecve probabilies

S13g. How probable are realizaons of stochasc processes?

S13h. How do probabilies apply in pracce?

S13i. Incomplete knowledge and stascs

S13j. Priors and entropy in probability theory

Chapter 14 (4 secons)

S14a. Theorecal challenges close to experimental data

S14b. Does the standard model predict chemistry?

S14c. Is the result of a measurement a real number?

S14d. Why use complex numbers in physics?

Chapter 15 (5 secons)

S15a. How precise can physical language be?

S15b. Why bother about rigor in physics?

S15c. Jusfying the foundaons of a theory

S15d. Foundaons, theory and experiment

S15e. Theorecal physics as a formal model of reality

Chapter 16 (12 secons)

S16a. On progress in science

S16b. How different are physical sciences and social sciences

S16c. Can good theories be falsified?

S16d. What, then, disnguishes a good theory?

S16e. When is a theory preferred to another one?

S16f. What is a fact?

S16g. Physics and experience

S16h. Modeling reality

S16i. What is a system (e.g., an ideal gas)?

S16j. When is a theory confirmed?

S16k. What is real?

S16l. How many angels fit onto the p of a needle?

Chapter 17 (8 secons)

S17a. How to get informaon from sci.physics.research

S17b. How to get your work published

S17c. How to respond to crical referee's reports

S17d. How to sell your revolutionary idea

S17e. Useful background, online lecture notes, etc.

S17f. Stories about physicists

S17g. Other physics FAQs

*S17h. Naming in science

Chapter 18 (5 secons)

S18a. What is the meaning of 'self-consistent'?

S18b. What is a vector?

S18c. Learning quantum mechanics at age 14

S18d. Research at age 16

S18e. Are there indefinite Hilbert spaces?

Chapter 19 (1 secon)

S19a. God and physics

Chapter 20 (1 secon)

S20a. Acknowledgments

Since March 1, 2005, there is also a related FAQ in German language,

Ein Theoretische Physik FAQ
where I describe some more topics which I have not translated.

(Among other topics, it discusses a new interpretation of quantum

mechanics, which I call the 'consistent experiment interpretation'.

It gives a new meaning to the foundations of physics, less paradox

than the conventional interpretations. I expect to have soon an

English version of it.)


S1a. What are bras and kets?


In the language of linear algebra, kets |psi> are just column vectors

psi (for systems with finitely many levels only; each component gives

the amplitude for the corresponding level), and the corresponding

bras <psi| are the complex conjugated transposed row vectors psi^*.

The inner product <phi|psi>, the bra(c)ket, is therefore

<phi|psi> = phi^*psi = sum_k phi^k^* psi_k.

For the basis bra <k|, the unit vector with a single entry 1 at

position k, we find as special case

<k|psi> = psi_k.
In infinite dimensions, the sum becomes an integral, and we get

<phi|psi> = integral dx phi(x)^* psi(x)

and for the basis bra <x|, which is a delta distribution centered at x,

we have

psi(x) = <x|psi>.

Actually, in infinite dimensions, one needs functional analysis

in place of linear algebra to get a concise definition; kets are smooth

functions from some nice function space, and bras are linear

functionals on the dual space. The dual space is larger and also

contains distributions.

(For those who want to be fully rigorous: kets belong to a

so-called nuclear space H_inf, for example the space of Schwartz

functions; its closure H under the Euclidean norm

gives the conventional Hilbert space, and together with the dual

H_inf^* = H_-inf, these define a Gelfand triple or rigged Hilbert

space, two names for the same concept).

Physicists are less picky, however, and allow kets also to be

less smooth functions and even distributions, so that every bra has

a corresponding ket. Thus they use the ket |x> although this is not a

function but a delta distribution centered at x.

This allows them to write not only psi(x) = <x|psi>, but also

psi(x)^* = <x|psi>^* = <psi|x>.

The price to be paid is that inner products are no longer well-defined

in general; for example, <x|x> is infinite. They say, |x> is not

normalizable and mean that it is not in the Hilbert space of

well-behaved pure states.

Caution: Physicists often use different bases which may cause


notation. For example <p| is a momentum basis state, while <x| is a

posion basis state. But while <x|y> = 0 if x and y are disnct

posions, and <p|q> = 0 if p and q are disnct momenta,

the inner product of a momentum bra <p| and a position ket |x>

(or vice versa) is never zero. (Exercise: Verify this by computing

explicit formulas for <p|x> and <x|p>!) Thus, unlike in mathematics,

the formulas are not invariant under substitution of letters for

the variables!

About the pitfalls when not using the required care, I recommend

F. Gieres,

Mathematical surprises and Dirac's formalism in quantum


Rep. Prog. Phys. 63 (2000) 1893-1931.



G. Bonneau, J. Faraut, G. Valent,

Self-adjoint extensions of operators and the teaching of quantum


Amer. J. Phys. 69 (2001) 322-331.



S1b. Projecve geometry and quantum mechanics


Projective geometry means that one works with rays instead of vectors

to designate points in a geometry.

Think of the 2-dimensional affine plane. The points are represented by

vectors in R^2. On the other hand, by moving an affine plane lying on

the floor a little upwards into the air (the same amount at every

point), one may think of each point as being represented by the ray

from an origin on the floor to the point on the plane.

(Actually, instead of the ray one should consider the whole line;

strictly speaking, a ray is only a half-line. But in quantum physics,

one custonmarily calls the 1-dimensional subspaces rays. Since the

coefficient field is complex, the rays are actually rotated complex

number planes.)
Similarly, lines are now 2-spaces through the origin. This gives

projective geometry (or homogeneous coordinates, which is the same


more algebraic terms).

But now one also has some additional points, corresponding to rays

parallel to the affine plane. These points form the 'line at infinity'

= the 2-space through the origin parallel to the affine plane.

A slightly closer look reveals that the geometry has become more

complete: Now not only every two points have a unique connecting

but also any two lines have a unique intersections - what were before

parallels are now lines intersecting 'at infinity'. Imagine two long,

straight rails of a railway track...

Thic can be extended to higher dimensions. n-dimensional affine


can be respresented by rays through 0 in n+1 dimensional space, and


be completed there to a projective geometry, in which the vector

subspaces are the geometrical objects. In Hilbert space one cannot

count anymore dimensions, but otherwise everything is similar.

Since, in quantum mechanics, state vectors are only defined up to a

phase (even when normalized), they correspond uniquely to rays

= 1-dimensional subspaces in Hilbert space. Hence quantum mechanics


intrinsically projective.


S1c. What is the meaning of the entries of a density matrix?


Density matrices are a convenient way of describing states of quantum

systems in contact with an environment. (State vectors = wave


are appropriate only for isolated systems at zero absolute


though they can be used in an approximate way in thermally isolated

contexts. But contact with an environment means positive


If the quantum system has only a finite number n of levels,

the density matrix is an n x n matrix; otherwise it is

a linear operator on Hilbert space (but nevertheless called a matrix).

The real use for density matrices is to compute expectations

<f> = trace (rho f)

for quantities f of interest. Indeed, rho is just a collection of

numbers enabling one to calculate these expectations.

The fact that the constant 1 must have expectaon 1 leads to the

restriction that

sum_k rho_kk = trace rho = 1.

Apart from that, rho must be a Hermitian, positive semidefinite matrix,

to satisfy the requirements of statistics. (See quant-ph/0303047 for

details.) For small systems, all such density matrices can indeed be

approximately realized in practice.

Since diagonal entries of a semidefiniteness are always nonnegative,

the p_k:=rho_kk are nonnegave numbers summing to 1 and thus look


probabilities. What the components mean depends on the basis used.

In particluar, if the basis consists of eigenstates of a Hamiltonian,

and the eigenvalues E_k are all nondegenerate, a diagonal element

rho_kk can be interpreted as the probability that upon measuring the

energy of the system one will find the value E_k.

If f is a function of the Hamiltonian H, and the basis used consists of

eigenstates |k> of H, with H|k>=E_k|k> then the density matrix rho

has entries rho_jk = <j|rho|k>. If one now calculates the expectation

of a function f(H), the equation f(H)|k>=f(E_k)|k> implies that

<f(H)> = trace (rho f(H)) = sum_k <k|rho f(H)|k>

= sum_k <k|rho f(E_k)|k> = sum_k <k|rho|k> f(E_k)

= sum_k rho_kk f(E_k).

If we average the results f(E) of a number of measurements of the

energy, where the energy E_k is measured with probability p_k,

we get

<f(H)> = sum_k p_k f(E_k).

Thus, to match the expectations no matter which function we are

averaging, we need to take p_k=rho_kk. This gives the claimed

probability interpretation of the diagonal entries.

Off-diagonal elements have no simple interpretation.

Usually one does not look at off-diagonal elements at all, but they

are important in intermediate steps of calculations.

Close to absolute zero temperature, and assuming the absence of

degeneracy, (but also in certain other, well prepared nearly

isolated systems), quantum state have the property that all columns

of the density matrix are nearly parallel to a wave function psi

that is convenonally normalized to have norm 1,


(In Dirac language, this says <psi|psi>=1; see the FAQ entry for bras

and kets.). This vector psi, which is clearly determined only up to a

complex number of absolute value 1, is called the wave vector

(or, in infinite dimensions, the wave function) of the state.

Idealizing this situation, one describes such quantum systems by states

in which all columns of the density matrix are exactly parallel to some

nonzero wave vector psi. (Such matrices are called rank 1 matrices;

the wave vector, also referred to as a wave function, is defined

only up to a phase factor.)

Then the k-th column is a multiple c_k psi of psi. The fact that rho

is Hermitian forces each row to be a multiple of psi^*. But this implies

that c_k is a multiple of phi^*_k, so that rho is a multiple of

psi psi^*. Since psi is normalized, the multiplication factor is just

the trace, and since the trace is 1 we find

rho = psi psi^* for any rank 1 density matrix.

If we now calculate the probability of measuring the energy E_k, we


p_k = rho_kk = <k|rho|k> = <k|psi psi^*|k> = <k|psi> <psi|k>,

and since <psi|k> is just the complex conjugate of <k|psi>,

we end up with

p_k = |<k|psi>|^2.

This is Born's squared amplitude formula for calculating probabilities.

Thus one sees that the traditional wave vector calculus is just a

special case of the density matrix calculus, appropriate (only) for

the study of tiny, well-prepared nearly isolated systems and for

systems close to zero absolute temperature. For the study of ordinary

matter under ordinary conditions, one needs to represent states

by density matrices.

Everything that is done with wave vectors can also be done with

density matrices, or equivalently with the associated expectation

mapping. Indeed, everything becomes simpler that way, much closer

to classical mechanics, and much less weird-looking.

See quant-ph/0303047 for an exposion of the foundaons of


mechanics (including the probability interpretation, uncertainty

relations, nonlocality, and Bell's theorem) in terms of expectations.


S1d. Postulates for the formal core of quantum mechanics


Quantum mechanics consists of a formal core that is

universally agreed upon (basically being a piece of mathematics

with a few meager pointers on how to match it with experimental

reality) and an interpretational halo that remains highly disputed

even aNer 80 years of modern quantum mechanics. The laer is the

subject of the foundations of quantum mechanics; it is addressed

elsewhere in this FAQ. Here I focus on the formal side.

As in any axiomatic setting (necessary for a formal discipline),

there are a number of different but equivalent sets of axioms

or postulates that can be used to define formal quantum mechanics.

Since they are equivalent, their choice is a matter of convenience.

My choice presented here is the formulation which gives most

direct access to statistical mechanics, which is the main tool for

real life applications of quantum mechanics. The relativistic case

is outside the scope of the present axioms. Thus the following

describes nonrelativistic quantum statistical mechanics in the

Schroedinger picture. (The traditional starting point is instead

the special case of this setting where all states are assumed to be


There are six basic axioms:

A1. A generic system (e.g., a 'hydrogen molecule')

is defined by specifying a Hilbert space K whose elements

are called state vectors and a (densely defined, self-adjoint)

Hermitian linear operator H called the _Hamiltonian_ or the _energy_.

A2. A parcular system (e.g., 'the ion in the ion trap on this

particular desk') is characterized by its _state_ rho(t)

at every time t in R (the set of real numbers). Here rho(t) is a

Hermitian, positive semidefinite (trace class) linear operator on K

satisfying at all times the conditions

trace rho(t) = 1. (normalizaon)

A state is called _pure_ at time t if rho(t) maps K to a 1-dimensional

subspace, and _mixed_ otherwise.

A3. A system is called _closed_ in a me interval [t1,t2]

if it satisfies the evolution equation

d/dt rho(t) = i/hbar [rho(t),H] for t in [t1,t2],

and _open_ otherwise. (hbar is Planck's constant, and is often set

to 1.) If nothing else is apparent from the context,

a system is assumed to be closed.

A4. Besides the energy H, certain other (densely defined, self-adjoint)

Hermitian operators (or vectors of such operators) are distinguished

as _observables_.

(E.g., the observables for an N-particle system conventionally include

for each parcle a involved several 3-dimensional vectors:

the _position_ x^a, _momentum_ p^a, _orbital_angular_momentum_


and the _spin_vector_ (or Bloch vector) sigma^a of the particle with
label a. If u is a 3-vector of unit length then u dot p^a, u dot L^a

and u dot sigma^a define the momentum, orbital angular momentum,

and spin of particle a in direction u.)

A5. For any particular system, one associates to every vector X

of observables with commuting components a time-dependent


linear functional <dot>_t defining the _expectation_

<f(X)>_t:=trace rho(t) f(X)

of bounded continuous functions f(X) at time t.

This is equivalent to a multivariate probability measure dmu_t(X)

(on a suitable sigma algebra over the spectrum spec(X) of X)

defined by

integral dmu_t(X) f(X) := trace rho(t) f(X) =<f(X)>_t.

A6. Quantum mechanical predicons amount to predicng properties

(typically expectations or conditional probabilities)

of the measures defined in axiom A5 given reasonable assumpons

about the states (e.g., ground state, equilibrium state, etc.)

Axiom A6 specifies that the formal content of the theory is covered

exactly by what can be deduced from axioms A1-A5 without

anything else added (except for restrictions defining the specific

nature of the state), and hence says that Axioms A1-A5 are complete.
The description of a particular closed system is therefore given by

the specificaon of a parcular Hilbert space in A1, the

specificaon of the observable quanes in A4, and the

specification of conditions singling out a particular class of

states (in A6). Everything else is determined by the theory and

hence is (in principle) predicted by the theory.

The description of an open system involves, in addition, the

specification of the details of the dynamical law. (For the basics,

see the entry 'Open quantum systems' in this FAQ.)

In addition to these formal axioms one needs a rudimentary

interpretation relating the formal part to experiments.

The following _minimal_interpretation_ seems to be universally


MI. Upon measuring at times t_l (l=1,...,n) a vector X of observables

with commuting components, for a large collection of independent


(particular) systems closed for times t<t_l, all in the same state

rho_0 = lim_{t to t_l from below} rho(t)

(one calls such systems _identically_prepared_), the measurement

results are statistically consistent with independent realizations

of a random vector X with measure as defined in axiom A5.

Note that MI is no longer a formal statement since it neither defines

what 'measuring' is, nor what 'measurement results' are and what

'statistically consistent' or 'independent identical system' means.

Thus Axiom MI has no mathematical meaning. That's why it is

part of the interpretation of formal quantum mechanics.

However, the terms 'measuring', 'measurement results', 'statistically

consistent', and 'independent' already have informal meaning

in the reality as perceived by a physicist. Everything stated is

understandable by every trained physicist. Thus statement MI is not

for formal logical reasoning but for informal reasoning in the

traditional cultural setting that defines what a trained physicist

understands by reality.

The lack of precision in statement MI is on purpose, since it allows

the statement to be agreeable to everyone in its vagueness; different

philosophical schools can easily fill it with their own understanding

of the terms in a way consistent with the rest.

Interpretational axioms necessarily have this form, since they must

assume some unexplained common cultural background for perceiving

reality. (This is even true in pure mathematics, since the language

stating the axioms must be assumed to be common cultural


Everything beyond MI seems to be controversial. In particular,

already what constitutes a measurement of X is controversial.

(E.g., reading a pointer, different readers may get marginally

different results. What is the true pointer reading?)

On the other hand there is an informal consensus on how to

perform measurements in practice. Good foundations including a

good measurement theory should be able to properly justify this

informal consensus by defining additional formal concepts that

behave within the theory just as their informal relatives with

the same name behave in reality.

In complete foundations, there would be formal objects in the

mathematical theory corresponding to all informal objects discussed

by physicists, such that talking about the formal objects

and talking about the real objects is essentially isomorphic.

We are currently far from such complete foundations.

Although much of traditional quantum mechanics is phrased in terms

pure states, this is a very special case; in most actual experiments

the systems are open and the states are mixed states. Pure states

are relevant only if they come from the ground state of a

Hamiltonian in which the first excited state has a large energy gap.

Indeed, assume for simplicity that H has discrete spectrum. In an

orthonormal basis of eigenstates psi_k,

f(H) = sum_k f(E_k) psi_k psi_k^*

for every function f defined on the spectrum. Setting the Boltzmann

constant to 1 to simplify the formulas, the equilibrium density is

the canonical ensemble,

rho(T) = 1/Z(T) exp(-H/T) = sum_k exp(-E_k/T)/Z(T) psi_k psi_k^*.

(Of course, equating this ensemble with equilibrium in a closed system

is an additional step beyond our axiom system, which would require

jusficaon.) Taking the trace (which must be 1) gives

Z(T) = sum_k exp(-E_k/T),

and in the limit T -> 0, all terms exp(-E_k/T)/Z(T) become 0 or 1,

with 1 only for the k corresponding to the states with least energy

Thus, if the ground state psi_1 is unique,

lim_{T->0} rho(T) = psi_1 psi_1^*.

This implies that for low enough temperatures, the equilibrium state
is approximately pure. The larger the gap to the second smallest

energy level, the better is the approximation at a given nonzero

temperature. In particular (reinstalling the Boltzmann constant kbar),

if the energy gap exceeds a small multiple of E^* := kbar T the

approximation is good.

States of simple enough systems with a few levels only

can often be prepared in nearly pure states, by realizing a source

governed by a Hamiltonian in which the first excited state has a much

larger energy than the ground state. Dissipation then brings the

system into equilibrium, and as seen above, the resulting equilibrium

state is nearly pure.

To see how the more traditional setting in terms of the

Schroedinger equation arises, we consider the case of a closed

system in a pure state rho(t) at some time t.

If psi(t) is a unit vector in the range of the pure state rho(t)

then psi(t), called the _state_vector_ of the system is determined

up to a phase, and one easily verifies that

rho(t) = psi(t)psi(t)^*.

Remarkably, under the dynamics for a closed system specified in the

above axioms, this property persists with time (only) if the system
is closed, and the state vector satisfies the Schroedinger equation

i hbar psi(t) = H psi(t)

Thus the state remains pure at all times.

Moreover, if X is a vector of observables with commuting components

and the spectrum of X is discrete, then the measure from axiom A5

is discrete,

integral dmu(X) f(X) = sum_k p_k f(X_k)

with nonnegave numbers p_k summing to 1, commonly called


Moreover, associated with the p_k are eigenspaces K_k such that

X psi = X_k psi for psi in K_k,

and K is the direct sum of the K_k. Therefore, every state vector psi

can be uniquely decomposed into a sum

psi = sum_k psi_k with psi_k in K_k.

psi_k is called the _projection_ of psi to the eigenspace K_k.

A short calculaon using axiom A5 now reveals that for a pure state

rho(t)=psi(t)psi(t)^*, the probabilities p_k are given by the

so-called _Born_rule_

p_k = |psi_k(t)|^2, (*)

where psi_k(t) is the projection of psi(t) to the eigenspace K_k.

Deriving the Born rule (*) from axioms A1-A5 makes it completely

natural, while the traditional approach starting with (*)

makes it an irreducible rule full of mystery and only justifiable

by its agreement with experiment.


S1e. Open quantum systems


Open quantum systems are usually modelled in a stochastic way

to account for the unpredictability of the measurement process.

(Note that a measurement is any non-negligible interaction with the

environment, whether or not it is observed by something deserving

the name 'detector' or 'observer').

In the simplest setting in which states can be assumed to

be pure and measurements occur at definite, a priori known times

and have a negligible duration, an open quantum system is a discrete

stochastic process with values psi(t) in the Hilbert space of state

vectors, normalized to norm 1. Between two consecuve


the system is assumed to be closed.

Thus between two consecutive measurements at times t' and t''>t',

the normalized state psi(t) evolves according to the Schroedinger


i hbar psidot = H psi,

so that

psi(t''-0)= P psi(t'+0), P = exp (i/hbar (t'-t'')H). (1)

(In the interacon picture, H=0 and psi remains constant between


A measurement at time t is assumed to happen in infinitesimal time

and replaces psi(t-0) independent of other measurements with

probability p_s by

psi(t+0)= P_s psi(t-0)/p_s if p_s>0, (2)

where the P_s are linear operators determined by the experimental

arrangement, satisfying the relation

sum_s P_s^*P_s = 1, (3)


p_s=|P_s\psi(t-0)|^2 (4)

guarantees that psi(t+0) remains normalized. Clearly the p_s are

nonnegave and by (3), they sum up to 1 (since psi(t-0) is normalized).

(For measurements with more than countably many possible


one must replace the probabilities by probability densities and the

sums by integrals.)

Thus this is a well-defined stochastic process.

A von-Neumann measurement of a self-adjoint linear operator A

corresponds to the special case where P_s is an orthogonal projector

to the eigenspace corresponding to the eigenvalue a_s of A

(respective to the set of eigenvalues corresponding to the s-th

interval in a partition of the continuous spectrum of A.)

If the measurement at different times has the same (or different)

nature, the P_s at these times are the same (or different).

It is possible to introduce 'empty measurements' at arbitrary

intermediate mes with a trivial sum over a singleton s, where P_s=1.

For continuous measurements (where the open system cannot be


closed at all but a discrete number of times), one needs to take

a continuum limit of the above description. Depending how one takes

the limit, one gets quantum diffusion processes or quantum jump

processes. In this case, the density matrix for the associated

deterministic expectation evolves according to a Lindblad dynamics.

Realistic measurements (i.e. those taking into account the unavoidable

uncertainty) are not modelled by von-Neumann measurements, but

by positive operator valued measures, short POVMs. These are well

explained in

For more on real measurement processes (as opposed to the

von-Neumann measurement caricature treated in typical textbooks

of quantum mechanics), see, e.g.,

V.B. Braginsky and F.Ya. Khalili,

Quantum measurement,

Cambridge Univ. Press, Cambridge 1992


S1f. Interacon with a heat bath


Quantum mechanics in the presence of a heat bath requires the use

of density matrices. Instead of the usual von-Neumann equation

rhodot = rho \lp H

(for \lp see the section on 'Quantum-classical correspondence'),

the dynamics of the density matrix is given by a dissipative version

of it,

rhodot = rho \lp H + L(rho)

usually associated with the name of Lindblad. Here L(rho)

is a linear operator responsible for dissipation of energy to

the heat bath; it is not a simple commutator but can have

a rather complex form.

To get the Lindblad dynamics from a Hamiltonian description of

system plus bath, one uses the projection operator formalism.

The clearest treatment I know of is in

H Grabert,

Projection Operator Techniques in Nonequilibrium

Statistical Mechanics,

Springer Tracts in Modern Physics, 1982.

The final equaons for the Lindblad dynamics are (5.4.48/49)

in Grabert's book.


S1g. Quantum-classical mechanics

Quantum mechanics and classical mechanics are very close relatives.

There are analogous objects for everything of relevance in

classical and quantum statistical mechanics.

Observable f:

classical - real phase space function f(x,p)

quantum - Hermitian linear operator or sesquilinear form f

Lie product f \lp g:

read \lp as 'Lie', and visualize it as inverted, stylized L;

Macro for LaTeX:


classical: f \lp g = {g,f} in terms of the Poisson bracket

quantum: f \lp g = i/hbar [f,g] in terms of the commutator

The Lie product is bilinear in the arguments and satisfies

f \lp g = - g \lp f

f \lp gh = (f \lp g)h + g(f \lp h) (Leibniz)

f \lp (g \lp h) = (f \lp g) \lp h + g \lp (f \lp h) (Jacobi)

Invariant measure:

classical - integral f := integral dxdp f(x,p)

quantum - integral f := trace f

Integrability: integral |f| finite

quantum integrable <==> f trace class

Partial integration formula:

integral f \lp g = 0.

Dynamics: df/dt = X_H f := H \lp f with Hermitian H

canonical transformations = mappings exp(tX_H) with Hermitian H

Liouville's theorem says that

integral f = integral exp(tX_H)f

The infinitesimal form of this is the partial integration formula.

State rho:

classical - real integrable phase space funcon rho(x,p)>=0

quantum - Hermitian positive semidefinite trace class operator rho

both normalized to integral rho = 1.

expectation of f in state rho:

<f> = integral rho f


S1h. Can all quantum states be realized in Nature?


No. Many mathematically conceivable states do not exist in Nature,

for example, that of water at an absolute temperature of zero.

Quantum mechanics does not demand that all states are realizable.

For a number of tiny systems with a few levels, all states are
realizable with reasonable precision. However, the larger the system

the fewer states are realized.

The number of states realized at a given time of very large systems

such as human beings or galaxy clusters is even so small that it

can be approximately counted!


S1i. Modes and wave funcons of laser beams


The physical state described by a typical laser beam is a state with

an indeterminate number of photons, since it is usually not an

eigenstate of the photon number operator. This essentially means that

in a beam, a certain number of photons cannot be meaningfully


instead, one has a meaningful photon density, referred to as the beam


Thus the traditional N-particle picture does not apply.

Instead one has to work in a suitable Fock space.

The Maxwell-Fock space is obtained by 'second quantization' of the

space H_photon, consisting of all mode functions, i.e., solutions A(x)

of the free Maxwell equations, describing a classical background

electromagnetic field in vacuum. H_photon may be thought of as the

single photon Hilbert space, in analogy to the single electron Hilbert

space of solutions of the Dirac equation. (However, following up on

this analogy and calling A(x) a wave function leads to confusion later

on, and is best avoided.)

Actually, because of gauge invariance, the situation is slightly more

complicasted, and best described in momentum space. The Maxwell

equaons reduce in Lorentz gauge, paral dot A(x) = 0, to

paral^2 A(x)=0, whence the Fourier transform of A(x) has the form

delta(p^2) Ahat(p), and Ahat(p) must sasfy the transversality


p dot Ahat(p) = 0.

By gauge invariance, only the coset of Ahat(p) obtained by adding

arbitrary multiples of p has a physical meaning, reflecting the

transversal nature of the free electromagnetic field.

This coset construction is needed to turn the space of modes

into a Hilbert space H_photon with invariant inner product

<A|B>= integral Ahat(p) dot Bhat(p) Dp,


Dp = d\p/p_0 = dp_1 dp_2 dp_3/p_0,

is the Lorentz invariant measure on the photon mass shell,

0 < p_0 = |\p| = sqrt(p_1^2+p_2^2+p_3^2)

(negative frequencies are discarded to get an irreducible

representation of the Poincare group).

Indeed, without the coset construction, the inner product is only

positive semidefinite, hence gives only a pre-Hilbert space.

Each (sufficiently nice) mode function A(x) gives rise to a coherent

state ||A>> in the Maxwell-Fock space, to an associated annihilation


a(A) = integral Ahat(p) a(p) Dp,

where a(p) is the QED annihilation operator for a photon with

momentum p, and to the corresponding creation operator a^*(A) =


The annihilation and creation operators a(A) and a^*(A) produce a

single-mode Fock subspace consisting of all |A,psi>, where psi is the

unnormalized wave funcon of a harmonic oscillator; |psi|^2 is the

intensity of the beam.

The coherent state itself corresponds to the normalized vacuum state

of the harmonic oscillator, ||A>> = |A,vac>. If psi is a Hermite

polynomial H_k, |A,psi> is an eigenstate of the photon number

with eigenvalue k, and one has a k-photon state.

The Maxwell-Fock space is the closure of the space spanned by all

the |A,psi> together (and indeed, already the closure of the space

spanned by all ||A>>). This space is the pure electromagnetic field

sector of QED, describing a physical vacuum, i.e., a region of the

universe where matter is absent though radiation may be present.

In optics experiments, laser beams are often idealized by ignoring

their extension perpendicular to the transmission direction. Then each

beam can be described by some |A,psi>. In particular, for a

monochromac beam, A is a plane wave, A(x)=A_0 exp(-i p dot x).

Of course, this matches the original approximation that we have a

beam only with a grain of salt, since a plane wave is not normalized.

A coherent pair of laser beams obtained by splitting is described by

a superposion |A_1,psi_1> + |A_2,psi_2> of the two beams.

Beams of thermal light (such as that from the sun) and pairs of

beams created by independent sources, cannot be described by wave

functions alone, but need a density formulation. A single light beam

is then described (in the same idealization) by a mode A and a density

matrix rho in a single-mode Fock space, while k light beams are

described by k modes A and a density matrix rho in a k-mode Fock


In many treatments, the modes are left implicit, so that one works

only in the k-mode Fock space. This simplifies the presentation, but

hides the connection to the more fundamental QED picture.

For a thorough study of the latter, see the bible on quantum optics,

L. Mandel and E. Wolf,

Optical Coherence and Quantum Optics,

Cambridge University Press, 1995.


S1j. Classical and quantum tunneling


Consider a particle in an external potential.

Assume the potential is everywhere finite, locally constant and positive

near the origin, and decays to zero far away.

There is no force, when the motion is deterministic and classical.

In practice, however, the classical, deterministic setting is an

approximation only, and the particle makes random motions.

Thus it moves away from the origin and will sooner or later reach

the nonconstant part of the potential. With low probability p,

it will even escape over any barrier; roughly, log p is proportional

to the negative barrier height. For details, you might

wish to consult my paper

A. Neumaier,

Molecular modeling of proteins and mathematical prediction of

protein structure,

SIAM Rev. 39 (1997), 407-460.

and the references there.

Quantum mechanically, there is always a probability of escaping to

infinity, without assuming any approximations. This is called


In both cases, once the particle is in the infinite region,

the probability that it returns is zero.

Thus a positive potential drives a particle in the long run off to

infinity (though, in case of a high barrier, one has to wait a long

time). In particular, in the classical case one also has a form of

(stochastic) tunneling.

Thus it is justified to refer to a potential such as the above as

repelling. However, no one would object if you call a potential

repelling _only_ in the neighborhood of a strict local minimizer, i.e.,

close to a metastable state.

Of course, a golf ball sitting on top of a flat hill will not move

down the hill; because of friction it remains in a metastable state.

Thus the above is an idealization. But most of physics is idealized,

and the language is also somewhat idealized (and, as actually used by

people, not even completely precise).


S1k. Quanzaon in non-Cartesian coordinates

Textbook quantization rules assume (often silently, without warning)

Cartesian coordinates. The rules derived there are based on

canonical commutation rules and are invalid for systems

described in other coordinate systems.

In particular, a Hamiltonian alone does not have a physical meaning

since it can be quite arbitrarily transformed by coordinate

transformations. The Hamiltonian needs to be combined with the

correct Poisson bracket to yield the correct dynamical equations.

Only if the classical Poisson bracket satisfies the canonical

commutation rules, the quantum mechanics is obtained by imposing

canonical commutation rules on the commutators.

The standard quantization procedure assumes that the symplectic form

underlying the Hamiltonian description has the standard form

p dq - q dp. Under a coordinate transformation, the symplectic form

changes into something nonstandard, and naive quantization gives

wrong results.

To get correct results, one has to take account of the correct

symplectic structure, more precisely of the Poisson bracket defined

by it. This is most naturally done in a differential geometric

setting, in terms of symplectic manifolds and Poisson manifolds.

To proceed, one must quantize a symplectic (or a Poisson) manifold

together with a Hamiltonian defined on it.

This combination is invariant under coordinate transformations

and hence has a coordinate-independent geometric meaning.

How to quantize Hamiltonians on a symplectic (or a Poisson) manifold

is the subject of geometric quantization, about which there is a

significant literature.


S1l. Second quanzaon


Second quantization is a way of writing the quantum mechanics of

indistinguishable particles in such a way that it makes statistical

mechanics calculations easy and makes everything look like field


One starts with a distinguished vacuum state |vac> and a family of

annihilation operators a(x) whith their adjoints, the creation

operators a^*(x), satisfying the canonical commutation relations (CCR)



(This is for Bosons; for Fermions one has instead canonical

anticommutation relations, CAR, and everything below gets additional

minus signs in certain places.)

A pure (permutation symmetric) N-particle state with wave function

psi(x_1:N) is wrien in 2nd quanzaon as

psi = integral dx_1:N psi(x_1:N) a^*(x_1:N) |vac>,

hence the corresponding density matrix

rho = psi psi^*

takes the form

rho = integral dx_1:N dy_1:N rho(x_1:N,y_1:N),

where rho(x_1:N,y_1:N) is the rank one operator


Using this correspondence, one can do in second quantization


one can do in first quantization (i.e., wave mechanics),

and match the results.

If f is a 1-particle operator given by an integral operator with

kernel f(x,y) (the general case follows by taking limits), so that

(f psi)(x_1:N)

= sum_a integral dx f(x_a,x) psi(x_{1:a-1},x,x_{a+1:N}),

the formula

<f> = integral dx dy <x|Rho|y> f(x,y)

defines the 1-particle density matrix Rho. The form of f in second

quantization is

f = integral dx dy f(x,y) a^*(x) a(y)

(exercise: check that it has indeed the desired action on an

N-particle state!), hence one has

<f> = integral dx dy f(x,y) <a^*(y)a(x)>.

and comparison with the definition of Rho gives the formula

<x|Rho|y> = <a^*(y)a(x)> = trace a(x) rho a^*(y),

which can therefore be viewed as the definion of the 1-particle

density matrix in second quantization.

Authers who fear integrals write instead similar formulas with

sums in place of integrals and discrete indices in place of the x,y.

Also, one can do the same in momentum space rather than position

which amounts to a change of basis but generally leads to

computationally more tractable formulations.


S1m. When is an object macroscopic?


One says that thermodynamics and statistical mechanics apply to

macroscopic objects. But when is an object macroscopic?

Thermodynamics and statistical mechanics are approximate,


descriptions valid for 'sufficiently large' objects.

The approximations made are better and better the larger the object.

One can place the barrier anywhere; if one puts it too low, the

approximate description will be poor, if one puts it too high it

won't apply to the system of interest.

Thus the loose language accommodates the freedom in modeling the

user has when choosing the description level and the accuracy level.

It is only in the same sense subjective as is the choice of a

system of interest. What is interesting for one person or investigation

may be different from what is interesting for another person or

investigation; nevertheless, both may employ objective tools.

The mathematical meaning underlying this loose language is called the

thermodynamic limit. It makes the term 'macroscopic'

precise in a similar way as the mathematical notion of a limit N->inf

makes the term 'N sufficiently large' precise.

If one accepts the vague terminology to avoid talking always about

limits, one can give the following definition (which reflects the

subjectivity in the qualification about the modeling accuracy):

In statistical mechanics, all macroscopic observables are ensemble

averages. Thus, formally, a "macroscopic observable" is the


of a space-time dependent field operator which remains constant

within the modeling accuracy under changes in space and time

smaller than the modeling accuracy.


S1n. The role of the ergodic hypothesis


Statistical mechanics textbook often invoke the so-called ergodic

hypothesis (assuming that every phase space trajectory comes

arbitrarily close to every phase space point with the same values of

all conserved variables as the initioal point of the trajectory)

to derive thermodynamics from the foundations. However, textbook

statistical mechanics gives only a gross simplification of the

power of thermodynamics. The ergodic hypothesis is not needed to


thermodynamics valid. Indeed, the ergodic hypothesis is invalid in

many cases - namely always when the system needs additional


to be thermodynamically described.

This is the case for fluids near the critical point, for finite objects

at their surfaces, for systems with interfaces, for metastable states,

for molecular systems in the absence of chemical reactions (here the

number of molecules of each species is conserved), etc.

But this does not invalidate thermodynamics - the latter only requires

that a sufficiently large set of macroscopic variables (in the above

sense) is included in the list of thermodynamic variables.

Indeed, traditional thermodynamics accounts for molecules, surface

tension, metastability, etc., without any change to the formalism.

Probably the ergodic hypothesis, restricted to a limited piece of a

submanifold of the phase space with fixed values of the macroscopic

variables (whether conserved or not) is ''roughly'' equivalent to the

completeness of the set of distinguished macroscopic observables,

in the sense that every other macroscopic observable can be defined

in terms of the distinguished ones. But ...

1. It is the laer property (only) which can be checked experimentally:

Completeness holds if and only if the properties of the system under

study are indeed predicable by the thermodynamics of the


observables. Experiment (or experience), together with simplicity of

the description, decides in _all_ practical situations what is the set

of distinguished observables.

Indeed, we refine a model whenever we discover significant deviations

from the thermodynamical behavior of a previous simpler model.

Thus thermodynamics takes the form of a setting for describing

material properties to which any successful description has to conform

by axiomatic decree.

2. The ergodic hypothesis can be proved only for extremely simple

systems. In particular, these systems must conform to classical

mechanics - there is no simple quantum version of ergodic dynamics.

Moreover, there are many classical systems which are chaotic only in
part of their phase space - they are probably not ergodic, as the

number of conserved quantities depends on where in the phase space



3. Thermodynamics applies also for nearly conserved quanes, where

the ergodic argument becomes vague; conversely, near ergodicity (up


the model accuracy) is enough to make a thermodynamic description

valid. In particular, thermodynamics applies near a critical point

where there cannot be an ergodic argument since there is no extra

conserved quantity but an order parameter is needed to give a correct

description. (At which distance from the critical point should one

ignore the order parameter? Ergodic arguments have nothing to say


4. There are studies about the nonergodic behavior of supercooled

liquids, e.g., Phys. Rev. A 43, 1103 - 1106 (1991).

Thus I think it is best to ignore the ergodic hypothesis as a means for

explaining statistical mechanics, except in some simple model cases.

It should have no deeper relevance than the hard sphere model of a

monatomic gas (which has been shown to be ergodic, I believe).


S1o. Does quantum mechanics apply to single systems?


It is clear phenomenologically that statistical mechanics (and hence

quantum mechanics) applies to single systems like a particular cup of

tea, irrespective of what the discussions about the foundations of

physics say (see many other entries in this FAQ). Thus statistical

mechanics and quantum mechanics do not only apply - as is often

claimed - to large ensembles of independently and identically prepared

systems; when the system is large enough (i.e., macroscopic),

a _single_ system is enough.

(For smaller single systems, see the entry

''How do atoms and molecules look like?'' in the present FAQ.)

In classical statistical mechanics, the traditional bridge between

the ensemble view and thermodynamics (which clearly applies to


systems) is the ergodic hypothesis. But there is not enough time

in the universe to explore more than an extremely tiny region of the

about 10^25-dimensional phase space of the cup of tea to explain the

success of the thermodynamical description by ergodicity.

In quantum mechanics, the situation is even worse - usually it is not

even attempted here to bridge the gap.

The best treatment I know of the foundational problems

involved in classical statistical mechanics is in the book

L. Sklar,

Physics and Chance,

Cambridge Univ. Press, Cambridge 1993.

but it does not present a solution. Other sources are not better in

this respect.

My own solution is the ''thermal interpretation'' of

physics, discussed to some extent in Chapter 7 of the book

Arnold Neumaier and Dennis Westra,

Classical and Quantum Mechanics via Lie algebras,

Cambridge University Press, to appear (2009?).


and in my recent slides

A. Neumaier,

Classical and quantum field aspects of light,


A. Neumaier,

Optical models for quantum mechanics,

and explored in more detail in my German

Ein Theoretische Physik FAQ

under the name ''consistent experiment interpretation''

The key idea is that mathematical expectation has two different

interpretations in physics, one as average over a large number of

cases, and the other as a means of defining observables. That the

two interpretations have the same mathematical properties is the

reason they have been confused in the past. The thermal


separates them neatly and thus gets rid of most of the confusing

aspects of the foundations of physics.


S1p. Dissipave dynamics and Lagrangians


Any system of ordinary differential equations can be brought

into an artificial Lagrangian form, by first rewriting it in first

order form


doubling the degrees of freedom by introducing conjugate variables p,

and then considering the Lagrangian

L(p,q)= p^T F(q,q').

In particular, this provides a Lagrangian formulation of dissipative

systems, such as the damped harmonic oscillator

m q'' + c q' + k q = 0 (m,c,k >0)

Unfortunately, the Hamiltonian in such a formulation has

nothing to do with the physical energy

E = (m q'^2 + k q^2)/2

The same holds for various other representations for the damped

harmonic oscillator found in the literature.

Lagrangians for the damped harmonic oscillator go back to

H. Bateman, Phys. Rev. 38, 815-819 (1931); the treatise

P.M. Morse and H. Feshbach,

Methods of Theoretical Physics

MacGraw-Hill, Boston 1953

discusses the procedure in Chapter 3 in terms of 'mirror images'

= additional dynamical variables needed to absorb the missing energy,

and remarks on p 313:

''The introduction of the mirror image ... is probably too artificial

a prcedure to expect to obtain much of physical significance from


And indeed, the book doesn't make use of it anywhere.

Having a formal Lagrangian or Hamiltonian is no virtue in itself.

In particular, for a _quantum_ system, the Hamiltonian _must_ be the

energy. Playing around with alternative Lagrangians and Hamiltonians

may be amusing, but does not produce relevant physics.

Since dissipative equations (like the diffusion equation or the damped

harmonic oscillator) describe open systems (where energy is lost to an

unspecified environment), they cannot be described by a Schroedinger


Classically, dissipative systems are described by stochastic

differential equations (and their equivalent deterministic

Fokker-Planck equations) or master equations;

the diffusion equation is the particular case of a Fokker-Planck

equation for Brownian motion.

Quantum mechanically, dissipative systems are described by stochastic

Schroedinger equations or, corresponding to the Fokker-Planck level,

by quantum Liouville equations with Lindblad terms. This gives correct

physics in a dissipative environment. Many quantum optical systems

are directly modeled on the Lindblad level, where the terms have an

understandable and experimentally verifiable meaning independent of

any underlying more microscopic model.

An important recent example is that of photons on demand,

M. Keller, B Lange, K Hayasaka, W Lange and H Walther,

A calcium ion in a cavity as a controlled single-photon source,

New Journal of Physics 6 (2004), 95.

There is no trace of a Lagrangian in the modeling, and indeed, a

useful Lagrangian formulation does not exist - unless one extends the

dynamics and explicitly includes the environment.

Of course, in theory, a dissipative system is thought to be a

contracted version of a bigger conservative system which includes

the envoironment, and in simple situations, this theoretical view can

indeed be substantiated.
If one models the dissipative environment explicitly, on gets a

bigger conservative system, not a dissipative system. Of course,

this conservative system has a Hamiltonian or Lagrangian description,

but it does not describe the dissipative system alone. When one

contracts it to the degrees of freedoms of the original system,

one gets an integro-differential equation with memory, which is no

longer described by a physically meaningful Hamiltonian or Lagrangian


The reduced dynamics takes the exact form

m x''(t) + k x(t) = int_0^t G(s) x(t-s) ds + F(t).

with functions F(t) (the noise caused by the environment) and G(s)

(the memory kernel) that depend on the state of the environment.

If the interaction is of the usual, dissipative nature then both F(t)

and G(s) are extremely oscillating, even for intervals short compared

to the inverse frequency T of the oscillator. But the short time

averages of the memory Kernel have an exponentially decaying bound


their size and become negligible after some relaxation time tau << T.

Thus it suffices in a good approximation to take the integral

from s=0 to s=tau only. This allows us to expand x(t-s) in a second

order Taylor expansion (valid since s<=tau<<T) and to express the

integral in closed form as

int_0^t G(s) x(t-s) ds approx = dk x(t) - c x'(t) + dm x''(t)

with renormalization constants

dk = int_0^tau G(s) ds,

c = int_0^tau G(s) s ds,

dm = int_0^tau G(s) s^2 ds,

leading to the memory-free renormalized reduced dynamics

(m-dm) x''(t) + c x'(t) + (k-dk) x(t) = F(t).

Microscopic models of the environment lead in simple cases to explicit

expressions for G(s) from which one can deduce that c>0, recovering

traditional equation for the damped harmonic oscillator, including a

stochastic force term. (Its size can be related to the damping

coefficient and the temperature of the environment, a relation known

as the fluctuation-dissipation theorem.)

A thorough discussion of the reduction of microscopic conservative

large systems to dissipative subsystems of interest is given in

H Grabert,

Projection Operator Techniques in Nonequilibrium

Statistical Mechanics,

Springer Tracts in Modern Physics, 1982

at a much more general level that also applies for

many other dissipative systems.

There are cases where one needs to model the memory to capture the

essence of the reduced dynamics. But in many cases, a simpler,

memory-free description is possible and adequate. One can remove


memory by employing a Markov approximation, and gets again a

differential equation, which defines the Lindblad (or, classicallally,

the Focker-Planck) dynamics. Again, this is no longer described by a

Hamiltonian or Lagrangian framework.

In the extended formulation with explicit environment or with


already a simple damped harmonic oscillator becomes a huge and

unwieldy dynamical system which is no longer equivalent to the


harmonic oscillator, but includes unwanted environment terms or


terms. In cases where one really needs to model the memory, the

therefore is no longer a damped harmonic oscillator. The latter is

described by a simple linear constant coefficient second order

differential equation for a single function, and has no memory.

Its analysis is very simple, and compared to that any more detailed

description is unwieldy.
In practice, the dissipative formulation therefore stands by itself

(apart from lip service paid to a hypothesized more fundamental

conservative description).

The situation is similar to that in fluid dynamics. In theory, the

Navier-Stokes equations (which are dissipative) should be derivable


a Lagrangian. Indeed, such derivations have been given, but only for

very simple model problems such as an ideal gas. However, there is no

microscopic derivation of the Navier-Stokes equations in the practically

interesting case of water at room temperature...


S1q. How can QM be stochasc while the Schroedinger equaon is



The Schroedinger equation is a deterministic wave equation.

But when we set up an experiment to measure either position or

momentum, we get uncertain, stochastic outcomes.

So - is quantum mechanics deterministic or stochastic?

One has to be careful in the interpretation of the foundations...

Fortunately, the same apparent paradox already occurs in classical

physics; hence the paradox cannot have anything to do with the

peculiarities of quantum mechanics.

Indeed, a Focker-Planck equation is a deterministic partial

differential equation. But when measuring a process modelled by it

- such as the position of a grain of pollen in Brownian motion -,

we get only probabilistic results. Now Focker-Planck equations are

essentially equivalent to classical stochstic differential equations.

So - do they describe a deterministic or a stochastic process?

The point resolving the issue is that, both in stochstic differential

equations and in quantum mechanics, probabilities satisfy


equations, while the quantities observed to deduce the probabilities

do not.
Thus, in both cases, probabilities are deterministic ''observables''

while the position of a grain of pollen in classical mechanics, or

position and momentum in quantum mechanics, ar not.


S1r. Measurement theory for real numbers


The standard textbook measurement theory says that the possible

measurement results in measuring an observable given by a Hermitian

operator A are its possible eigenvalues, with a probability density

depending on the state of the system. This is part of the content of

Born's rule, and counts as one of the cornerstones of the

interpretation of quantum mechanics.

But Born's rule gives only a very idealized account of measurement

theory, and gives no sufficient explanation for what is going on in

many nontrivial measurements.

The spectrum of the Hamiltonian of the electron of a hydrogen atom

has a discrete part, catering for its bound states. According to the
idealized textbook measurement theory, a measurement of the energy

of a bound state should produce an infinitely accurate value agreeing

with one of the values in the (QED-corrected) Balmer (etc.) series.

But this is ridiculous. Repeated preparation and measurement of the

position of the ``same'' spectral lines (which provide these energy

measurements, relative to an appropriate zero of the energy) yields

different results, from which the energies themselves can be obtained

only to a certain accuracy.

Thus Born's rule does not account for the interpretation of a

measurement of the energy of an electron. For similar reasons,

measurements of particle masses or resonance energies do not reveal

the exact values (which they should according to Born's rule) but only

approximations whose quality depends a lot on the way the


is done (an aspect that does not figure at all in Born's rule).

Measurements such as that of a particle lifetime or the integral cross

section of a particular reaction do not even have a natural associated

operator of which the measurement result would be an eigenvalue.

The idealized textbook measurement theory based on Born's rule is

appropriate only for the measurement of spin and related variables

that result in recording decisions of finite information content.

Thus the measurement process as described by von Neumann (and


from there to numerous textbooks) is an unrealistic idealization

compared with many (and probably most) real measurements.

The latter are usually much better described by suitable POVMs

(positive operator valued measures) rather than by Born's rule,

which corresponds to PVMs (projection-valued measures), a special


of POVMs in which the positive operators are in fact projections.

See Secons 7.3-7.5 of the book

A. Neumaier and D. Westra,

Classical and Quantum Mechanics via Lie algebras,


for a realistic account of measurement theory not dependent on

Born's rule. The latter is derived there as a special case, together

with giving the condition in which it is applicable.


S1s. The classical limit of quantum mechanics


Classical mechanics is often seen as the formal limit hbar-->0 of

quantum mechanics. Strictly speaking, this cannot be true since hbar

is a constant of nature, which is often even set to one to have

convenient units. The classical limit really is the limit of large

quantum numbers M (typically of mass, number of particles, or size of

angular momentum), when attention is limited to quantities whose

uncertainties are small compared to their expectations.

In these situations, the effect is similar to taking the limit

hbar --> 0. In these cases the relave uncertaines scale with

sqrt(hbar/M), which becomes small if either hbar is made formally

tiny or if M is large.

Indeed, a quantum system is essentially classical if its relevant

quantities have uncertainties that are small compared to their


The relation between classical mechanics is most easily seen if --

as in statistical mechanics -- quantum mechnaics is presented in terms

of mixed states, which correspond to density matrices.

(Almost all quantum mechanics applied to real systems not in

the ground state needs density matrices, since pure states are very
difficult to create and propagate unless a system is in the ground

state. Pure states describe only an idealized version of quantum

reality, which in statistical mechanics appears as the approximation

in the cold limit T-->0.)

Density matrices are intrinsically quantum mechanical.

Nevertheless they exhibit very close analogies to classical densities.

Therefore everyone interested in the relations between classical and

quantum mechanics is well-advised to look at both theories in the

statistical mechanics version, where the analogies are obvious, and

the transition from quantum to classical takes the form of a simple


QM in the statistical mechanics version is almost as intuitive as

classical statistical mechanics. The only somewhat nonintuitive part

is in both cases how to interpret probability. (This is already a

severe problem in classical statistical mechanics, as the book by

Laurence Sklar, Physics and Chance, explains in detail.)

A density matrix describes the stochastic behavior of a quantum


in the same way as a density function describes the stochastic behavior

of a classical system. In both cases, if the system is nice enough that

the stochastic uncertainties (square roots of variances) in the

quantities of interest are much smaller than the quantities themselves,

one can form a deterministic approximation.

This deterministic approximation is given by a classical dynamical

system for the (expectations of the) quantities of interest.

Thus, in a sense, classical variables are simply expectations of

relevant quantum variables with small uncertainty. Then (and only


is a deterministic approximation adequate. The small uncertainty

makes these variables approximately predictable in each individual

event, and hence classical.

Classicality therefore develops whenever the uncertainties of the

quantities of interest become small compared to their expectations.

Of course, there is significant interest in quantum systems where this

does not happen, since these are decidedly non-classical, but quantum

theory gets its strange, counterintuitive feature only when one

concentrates on these systems only.

For more details, see, e.g., Secons 7.3-7.5 of

A. Neumaier and D. Westra,

Classical and Quantum Mechanics via Lie algebras



S1t. The classical limit via coherent states


One method for producing classical mechanics from a quantum theory


by looking at coherent states of the quantum theory. The standard

(Glauber) coherent states have a localized probability distribution in

classical phase space? whose center follows the classical equations

of motion when the Hamiltonian is quadratic in positions and


(For nonquadratic Hamiltonians, this only holds approximately over

short mes. For example, for the 2-body problem with a 1/r^2

interaction, Glauber coherent states are not preserved by the


In this parcular case, there are, however, alternave SO(2,4)-based

coherent states that are preserved by the dynamics, smeared over

Kepler-like orbits. The reason is that the Kepler 2-body problem --

and its quantum version, the hydrogen atom -- are superintegrable

systems with the large dynamical symmetry group SO(2,4).)

In general, roughly, coherent states form a nice orbit of unit vectors

of a Hilbert space H under a dynamical symmetry group G with a

triangular decomposition, such that the linear combinations of

coherent states are dense in H, and the inner product phi^*psi of

coherent states phi and psi can be calculated explicitly in terms of

the highest weight representation theory of G. The diagonal of the

N-th tensor power of H (coding systems with N-fold quantum numbers)

has coherent states phi_N (labelled by the same classical phase space

as the original coherent states, and orresponding to the N-fold highest

weight) with inner product

phi_N^*psi_N=(phi^*psi) N

and for N --> inf, one gets a good classical limit. For the Heisenberg

group, phi^*psi is a 1/hbar-th power, and the N-th power corresponds

to replacing hbar by hbar/N. Thus one gets the standard classical limit.

Basic literature on relations between coherent states and the classical

limit, based on irreducible unitary representations of Lie groups

includes the book

A. M. Perelomov,

Generalized Coherent States and Their Applications,

Springer-Verlag, Berlin, 1986.

and the paper

L. Yaffe,

Large N limits as classical mechanics,

Rev. Mod. Phys. 54, 407--435 (1982)

Both references assume that the Lie group is finite-dimensional and

semisimple. This excludes the Heisenberg group, in terms of which the

standard (Glauber) coherent states are usually defined. However, the

Heisenberg group has a triangular decomposition, and this suffices to

apply Perelomov's theory in spirit. The online book

Arnold Neumaier, Dennis Westra,

Classical and Quantum Mechanics via Lie algebras,


contains a general discussion of the relations between classical

mechanics and quantum mechanics, and discusses in Chapter 16 the

concept of a triangular decomposition of Lie algebras and a summary


the associated representation theory (though in its present version

not the general relation to coherent states).

For other relevant approaches to a rigorous classical limit, see the

online sources



S2a. Lie groups and Lie algebras


Lie groups can be illustrated by continuous rigid motion of a ball

with painted paerns on it in 3-dimensional space. The Lie group


consists of all rigid transformations.

A rigid transformation is essentially the act of picking the ball and

placing it somewhere else, ignoring the detailed motion in between


the location one started.

Special transformations are for example a translation in northern

direcon by 1 meter, or a rotaon by one quarter around the vercal

axis at some particular point (think of a ball with a string attached).

'Rigid' means that the distances between marked points on the ball

remains the same; the mathematician talks about 'preserving


and the distances are therefore labeled 'invariants'.

One can repeat the same transformation several times, or two

transformations and get another one - This is called the product of

these transformations. For example, the product of a translations

by 1 meter and another one by 2 meters in the same direcon gives


of 1+2=3 meters in the same direcon. In this case, the distances add,

but if one combines rotations about different axes the result is no

longer intuitive. To make this more tractable for calculations,

one needs to take some kind of logarithms of transformations - these

behave again additively and make up the corresponding Lie algebra

iso(3) [same leers but in lower case]. The elements of the Lie algebra

can be visualized as very small, or 'infinitesimal', motions.

General Lie groups and Lie algebras extend these notions to to more

general manifolds. A manifold is just a higher-dimensional version

of space, and transformations are generalized motions preserving

invariants that are important in the manifold. The transformations

preserving these invariants are also called 'symmetries', and the

Lie group consisting of all symmetries is called a 'symmetry group'.

The elements of the corresponding Lie algebra are 'infinitesimal

For example, physical laws are invariant under rotations and

translations, and hence unter all rigid motions. But not only these:

If one includes me explicitly, the resulng 4-dimensional space

has more invariant motions or ''symmetries''.

The Lie group of all these symmetry transformations is called the

Poincar'e group, and plays a basic role in the theory of relativity.

The transformations are now about space-time frames in uniform


Apart from translations and rotations there are symmetries called

'boosts' that accelerate a frame in a certain direction, and

combinations obtained by taking products. All infinitesimal symmetries

together make up a Lie algebra, called the Poincar'e algebra.

Much more on Lie groups and Lie algebras from the perspective of

classical and quantum physics can be found in:

Arnold Neumaier and Dennis Westra,

Classical and Quantum Mechanics via Lie algebras,

Cambridge University Press, to appear (2009?).


S2b. The Galilei group as contracon of the Poincare group


The group of symmetries of special relativity is the Poincare group.

However, before Einstein invented the theory of relativity,

physics was believed to follow Newton's laws, and these have a

different group of symmetries - the Galilei group, and its

infinitesimal symmetries form the Galilei algebra.

Now Newton's physics is just a special case of the theory of relativity

in which all motions are very slow compared to the speed of light.

Physicists speak of the 'nonrelativisitic limit'.

Thus one would expect that the Galilei group is a kind of

nonrelativistic limit of the Poincar'e group.

This notion has been made precise by Inonu. He looked at the

Poincar'e algebra and 'contracted' it in an ingenious way

to the Galilei algebra. The construction could then be lifted to

the corresponding groups. Not only that, it turned out to be a

general machinery applicable to all Lie algebras and Lie groups,

and therefore has found many applications far beyond that for which
it was originally developed.


S2c. Representaons of the Poincare group, spin and gauge invariance


Whatever deserves the name ''particle'' must move like a single,

indivisible object. The Poincare group must act on the description of

this single object; so the state space of the object carries a

unitary representation of the Poincare group. This splits into a direct

sum or direct integral of irreducible reps. But splitting means

divisibility; so in the indivisible case, we have an irreducible

representation. Thus particles are described by irreducible unitary

reps of the Poincare group. Additional parameters characterizing the

irreducible representation of an internal symmetry group = gauge

On the other hand, not all irreducible unitary reps of the Poincare

group qualify. Associated with the rep must be a consistent and causal

free field theory. As explained in Volume 1 of Weinberg's book on

quantum field theory, this restricts the rep further to those with

positive mass, or massless reps with quantized helicity.

Weinberg's book on QFT argues for gauge invariance from

causality + masslessness. He discusses massless fields in

Chapter 5, and observes (probably there, or in the beginning

of Chapter 8 on quantum electrodynamics) roughly the following:

Since massless spin 1 fields have only two degrees of freedom,

the 4-vector one can make from them does not transform correctly

but only up to a gauge transformation making up for the missing

longitudinal degree of freedom. Since sufficiently long range

elementary fields (less than exponential decay) are necessarily

massless, they must either have spin <=1/2 or have gauge behavior.

To couple such gauge fields to matter currents, the latter

must be conserved, which means (given the known conservation laws)

that the gauge fields either have spin 1 (coupling to a conserved

vector current), or spin 2 (coupling to the energy-momentum tensor).

[Actually, he does not discuss this for Fermion fields,

so spin 3/2 (gravinos) is perhaps another special case.]

Spin 1 leads to standard gauge theories, while spin 2 leads

to general covariance (and gravitons) which, in this context,

is best viewed also as a kind of gauge invariance.

There are some assumptions in the derivation, which one can find

out by reading Weinberg's papers

Phys.Rev. 133 (1964), B1318-B1322 any spin (massive)

Phys.Rev. 134 (1964), B882-B896 any spin II (massless)

Phys.Rev. 135 (1964), B1049-B1056 grav. mass = ineral mass

Phys.Rev. 138 (1965), B988-B1002 derivation of Einstein

Phys.Rev. 140 (1965), B516-B524 infrared gravitons

Phys.Rev. 181 (1969), 1893-1899 any spin III (general reps.)

on 'Feynman rules for any spin' and some related questions, which

contain a lot of important information about applying the irreducible

representations of the Poincare group for higher spin to field

theories, and their relation to gauge theories and general relativity.

A perhaps more understandable version of part of the material is in

D.N. Williams,

The Dirac Algebra for Any Spin,

Unpublished Manuscript (2003)

Note that there are plenty of interactions that can be constructed

using the representation theory of the Lorentz group (and Weinberg's

constructions), and there are plenty of (compound) particles with

spin >2. See the tables of the parcle data group, e.g., Delta(2950)

(randomly chosen from hp:// ).

R.L. Ingraham,

Prog. Theor. Phys. 51 91974), 249-261,


constructs covariant propagators and complete vertices for spin J

bosons with conserved currents for all J. See also

H Shi-Zhong et al.,

Eur. Phys. J. C 42 (2005), 375-389



S2d. Forms of relavisc dynamics


Relativistic multiparticle mechanics is an intricate subject,

and there are no-go theorems that imply that the most plausible

possibilities cannot be realized. However, these no-go theorems

depend on assumptions that, when questioned, allow meaningful

solutions. The no-go theorems thus show that one needs to be careful

not to introduce plausible but inappropriate intuition into the

formal framework.
To pose the problem, one needs to distinguish between kinematical

and dynamical quantities in the theory. Kinematics answers the

question "What are the general form and properties of objects that

are subject to the dynamics?" Thus it tells one about conceivable

solutions, mapping out the properties of the considered representation

of the phase space (or what remains of it in the quantum case).

Thus kinematics is geometric in nature. But kinematics does not know

of equations of motions, and hence can only tell general (kinematical)

features of solutions.

In contrast, dynamics is based on an equation of motion (or an

associated variational principle) and answers the question 'What

characterizes the actual solution?', given appropriate initial or

boundary conditions. Although the actual solution may not be


in closed form, one can discuss their detailed properties and devise

numerical approximation schemes.

The difference between kinematical and dynamical is one of


and has nothing to do with the physics. By choosing the


i.e., the geometric setting, one chooses what is kinematical;

everything else is dynamical.

Since something which is up to the choice of the person describing

an experiment can never be distinguished experimentally, the physics

is unaffected. However, the formulas look very different in different

descriptions, and - just as in choosing coordinate systems - choosing

a form adapted to a problem may make a huge difference for actual


Dirac distinguishes in his seminal paper

Rev. Mod. Phys. 21 (1949), 392-399

three natural forms of relativistic dynamics, the instant form,

the point form, and the fromt form. They are distinguished by

what they consider to be kinematical quantities and what are the

dynamical quantities.

The familiar form of dynamics is the instant form,

which treats space (hence spatial translations and rotations)

as kinematical and time (and hence time translation and Lorentz


as dynamical. This is the dynamics from the point of view of a

hypothetical observer (let us call it an 'instant observer')

who has knowledge about all information at some time t (the present),

and asks how this information changes as time proceeds.

Because of causality (the finite bound of c on the speed of material

motion and communication), the resulting differential equations

should be symmetric hyperbolic differential equations for which the

initial-value problem is well-posed.

Because of Lorentz invariance, the time axis can be

any axis along a melike 4-vector, and (in special relativity)

space is the 3-space orthogonal to it. For a real observer,

the natural timelike vector is the momentum 4-vector of the material

system defining its reference frame (e.g., the solar system).

While very close to the Newtonian view of reality, it involves

an element of fiction in that no real observer can get all the

information needed as intial data. Indeed, causality implies that

it is impossible for a physical observer to know the present anywhere

except at its own position.

A second, natural form of relativistic dynamics is, according to Dirac,

the point form. This is the form of dynamics in which a particular

space-me point x=0 (the here and now) in Minkowski space is

distinguished, and the kinematical object replacing space is,

for fixed L, a hyperboloid x^2=L^2 (and x_0<0) in the past

of the here and now.

The Lorentz transformations, as symmetries of the hyperboloid,

are now kinematical and take the role that space translations and

rotations had in the instant form. On the other hand, _all_ space and

time translations are now dynamical, since they affect the position

of the here-and-now.

This is the form of dynamics which is manifestly

Lorentz invariant, and in which space and time appear on equal


An observer in the here and now (let us call it a 'point observer')

can - in principle, classically - have arbitrarily accurate

information about the particles and/or fields on the past

hyperboloid; thus causality is naturally accounted for.

Information given on the past hyperboloid of a point can be


to information on any other past hyperboloid using the dynamical

equaons that are defined via the momentum 4-vector P, which is a

4-dimensional analogue of the nonrelativistic Hamiltonian.

The Hamiltonian corresponding to motion in a fixed timelike

direction u is given by H=u dot P. The commutativity of the


of P is the condition for the uniqueness of the resulting state

at a different point x independent of the path x is reached from 0.

In principle, there are many other forms of relativistic dynamics:

As Dirac menons on p. 396 of his paper, any 3-dimensional surface

in Minkowski space works as kinematical space if it meets

every world line with time like tangents exactly once.

In general, those transformations are kinematical which

are also symmetries of the surface one treats as kinematical reference

surface. By choosing a surface without symmetries _all_

transformations become dynamical. For reasons of economy, one


however, a large kinematical symmetry group. The full Poincare group

is possible only for free dynamics.

This leaves as interesng large subgroups two with 6 linearly

independent generators, the Euclidean group ISO(3), leading to the

instant form, and the Lorentz group SO(1,3), leading to the point form,

and one with 7 linearly independent generators, the stabilizer of

a front (or infinite momentum plane), a 3-space with lightlike normal,

leading to the front form. This third natural form of relativistic

dynamics according to Dirac, has many uses in quantum field theory,

but here I won't discuss it further.

All forms are equivalent, related classically by canonical

transformations preserving algebraic operations and the Poisson

and quantum mechanically by unitary transformations preserving

algebraic operations and hence the commutator. This means that any

statement about a system in one of the forms can be translated into

an equivalent statement of an equivalent system in any of the other


Preferences are therefore given to one form over the other depending

solely on the relative simplicity of the computations one wants to do.

This is completely analogous to the choice of coordinate systems

(cartesian, polar, cylindric, etc.) in classical mechanics.

For a multiparticle theory, however, the different forms and the

need to pick a particular one seem to give different pictures of

reality. This invites paradoxes if one is not careful.

This can be seen by considering trajectories of classical relativistic

many-particle systems. There is a famous theorem by

Currie, Jordan and Sudarshan

Rev. Mod. Phys. 35 (1963), 350-375

which asserts that interacting two-particle systems cannot have

Lorentz invariant trajectories in Minkowski space. Traditionally,

this was taken by mainstream physics as an indication that the

multiparticle view of relativistic mechanics is inadequate,

and a field theoretical formulation is essential.

However, as time proceeded, several approaches to valid relativistic

multi-particle (quantum) dynamics were found (see the FAQ entry on

'Is there a multiparticle relativistic quantum mechanics?'),

and the theorem had the same fate as von Neumann's proof that

hidden-variable theories are impossible. Both results are now simply

taken as an indication that the assumptions under which they were

made are too strong.

In particular, once the assumption by Currie, Jordan and Sudarshan

that all observers see the same trajectories of a system of interacting

particles is rejected, their no-go theorem no longer applies.

The question then is how to find a consistent and covariant description

without this at first sight very intuitive property. But once it is

admitted that different observers see the same world but represented

in different personal spaces, the formerly intuitive property becomes

meaningless. For objectivity, it is enough that one can consistently

translate the views of any observer into that of any other observer.

Precisely this is the role of the dynamical Poincare transformations.

Thus nothing forbids an instant observer to observe

particle trajectories in its present space, or a

point observer to observe particle trajectories in its past hyperboloid.

However, the present space (or the past hyperboloid) of two different

observers is related not by kinematical transforms but dynamically,

with the result that trajectories seen by different observers on

their different kinemacal 3-surface look different.

Classically, this looks strange on first sight, although

the Poincare group provides well-defined recipes for translating

the trajectories seen by one observer into those seen by another


Quantum mechanically, trajectories are fuzzy anyway, due to the

uncertainty principle, and as various successful multiparticle

theories show, there is no mathematical obstacle for such a


The mathematical reason of this superficially paradoxical situation

lies in the fact that there is no observer-independent definition

of the center of mass of relativistic particles, and the related fact

that there is no observer-independent definition of space-time

coordinates for a multiparticle system.

The best one can do is to define either a covariant position operator

whose components do not commute (thus definig a noncommutative

space-time), or a spatial position operator, the so-called

Newton-Wigner position operator, which has three commuting

but is observer-dependent.

(See the FAQ entry on 'Localization and position operators'.)


S2e. Is there a mulparcle relavisc quantum mechanics?


In his QFT book, Weinberg says no, arguing that there is no way to

implement the cluster separation property. But in fact there is:

There is a big survey by Keister and Polyzou on the subject

B.D. Keister and W.N. Polyzou,

Relativistic Hamiltonian Dynamics in Nuclear and Particle Physics,

in: Advances in Nuclear Physics, Volume 20,

(J. W. Negele and E.W. Vogt, eds.)

Plenum Press 1991.

that covered everything known at that time. This survey was quoted

at least 116 mes, see

looking these up will bring you close to the state of the art

on this.

They survey the construction of effective few-particle models.

There are no singular interactions, hence there is no need for


The models are _not_ field theories, only Poincare-invariant few-body

dynamics with cluster decomposition and phenomenological terms

which can be matched to approximate form factors from experiment


some field theory. (Actually many-body dynamics also works, but the

many particle case is extremely messy.)

They are useful phenomenological models, but somewhat limited;

for example, it is not clear how to incorporate external fields.

The papers by Klink at

and work by Polyzou at

contain lots of multiparticle relativistic quantum mechanics,

applied to real particles. See also the Ph.D. thesis by Krassnigg at
(Other work in this direction includes Dirac's many-time quantum

theory, with a separate time coordinate for each particle; see, e.g.,

Marian Guenther, Phys Rev 94, 1347-1357 (1954)

and references there. Related multi-time work was done under the

name of 'proper time quantum mechanics' or 'manifestly covariant

quantum mechanics', see, e.g.,

L.P. Horwitz and C. Piron, Helv. Phys. Acta 48 (1973) 316,

but it does not reproduce standard physics, and apparently never

reached a stage useful to phenomenology.)

Note that in the working single-time approaches, covariance is always

achieved through a representation of the Poincare group on a

Hilbert space corresponding to a fixed time (or another 3D manifold in

space-time), rather than through multiple times.

Thus the whole theory has a single time only, whose dynamics is

generated by the Hamiltonian, the generator H=P_0 of the Poincare


(This is completely analogous to the nonrelativistic case,

where multiparticle systems also have a single time only.)

The natural manifestly covariant picture is that of a vector bundle

on Minkowski space-time, with a standard Fock space attached to each

point. An observer (i.e., formally, an orthonormal frame attached at

some space-time point) moves in space-time via the Poincare group,

and this action extends to the bundle by means of the representation

defining the Fock space.


S2f. What is a photon?


According to quantum electrodynamics, the most accurately verified

theory in physics, a photon is a single-particle excitation of the

free quantum electromagnetic field. More formally, it is a state of

the free electromagnetic field which is an eigenstate of the photon

number operator with eigenvalue 1.

The pure states of the free quantum electromagnetic field

are elements of a Fock space constructed from 1-photon states.

A general n-photon state vector is an arbitrary linear combinations

of tensor products of n 1-photon state vectors; and a general pure

state of the free quantum electromagnetic field is a sum of n-photon

state vectors, one for each n. If only the 0-photon term contributes,

we have the dark state, usually called the vacuum; if only the

1-photon term contributes, we have a single photon.

A single photon has the same degrees of freedom as a classical vacuum

radiation field. Its shape is characterized by an arbitrary nonzero

real 4-potential A(x) satisfying the free Maxwell equations, which in

the Lorentz gauge take the form

nabla dot nabla A(x) = 0,

nabla dot A(x) = 0,

expressing the zero mass and the transversality of photons. Thus for

every such A there is a corresponding pure photon state |A>.

Here A(x) is _not_ a field operator but a photon amplitude;

photons whose amplitude differ by an x-independent phase factor are

the same. For a photon in the normalized state |A>, the observable

electromagnetic field expectations are given by the usual formulas

relang the 4-potential and the fields,

<\E(x)> = <A|\E(x)|A>

= - partial \A(x)/paral x_0 - c nabla_\x A_0(x),


<\B(x)> = <A|\B(x)|A> = nabla_\x x \A(x)

[hmmm. check if this really is the case...]

Here \x (fat x) and x_0 are the space part and the me part of a

relavisc 4-vector, \E(x), \B(x) are the electromagnetic

field operators (related to the operator 4-potential by analogous

formulas), and c is the speed of light. Amplitudes A(x) producing

the same \E(x) and \B(x) are equivalent and related by a gauge

transformation, and describe the same photon.

In momentum space (frequently but not always the appropriate


single photon states have the form

|A> = integral d\p^3/p_0 A(\p)|\p>,

where |\p> is a single parcle state with definite 3-momentum

\p (fat p), p_0=|\p| is the corresponding photon energy divided by c,

and the photon amplitide A(\p) is a polarizaon 4-vector.

Thus a general photon is a superposition of monochromatic waves with

arbitrary polarizations, frequencies and directions.

(The Fourier transform of A(\p) is the so-called analytic signal

A^(+)(x), and by adding its complex conjugate one gets the real

4-potential A(x) in the Lorentz gauge.)

The photon amplitude A(\p) can be regarded as the photon's

wave function in momentum space. Since photons are not localizable

(though they are localizable approximately), there is no

meaningful photon wave function in coordinate space; see the

next entry in this FAQ. One could regard the 4-potential A(x) as

coordinate space wave function, but because of its gauge dependence,

this is not really useful.


This is second quantized notation, as appropriate for quantum fields.

This is how things always look in second quantization, even for a

harmonic oscillator. The wave function psi(x) or psi(p) in standard

(first quantized) quantum mechanics becomes the state vector

psi = integral dx psi(x) |x> or integral dp psi(p) |p>

in Fock space; the wave function at x or p turns into the coefficient

of |x> or |p>. In quantum field theory, x, A (the photon amplitude),


E(x) (the electric field operator) correspond to k (a component of the

momentum), x, and p_k. Thus the coordinate index k is inflated to the

spacetime position x, the argument of the wave function is inflated to

a solution of the free Maxwell equations, the momentum operator is

inflated to a field operator, and the integral over x becomes a

functional integral over photon amplitudes,

psi = integral dA psi(A) |A>.

Here psi(A) is the most general state vector in Fock space; for a

single photon, psi depends linearly on A,

psi(A) = integral d\p^3/p_0 A(\p)|\p> = |A>.

Observable electromagnetic fields are obtained as expectation values

of the field operators \E(x) and \B(x) constructed by differentiation of

the textbook field operator A(x). As the observed components of

the mean momentum, say, in ordinary quantum mechanics are

<p_k> = integral dx psi(x)^* p_k psi(x),

so the observed values of the electromagnetic field are

<\E(x)> = <psi|\E(x)|psi> = integral dA psi(A)^* \E(x) psi(A).

<\B(x)> = <psi|\B(x)|psi> = integral dA psi(A)^* \B(x) psi(A).

In a frequently used interpretation (valid only approximately),

the term A(\p)|\p> represents the one-photon part of a


beam with frequency nu=cp_0/h, direcon \n(\p)=\p/p_0, and

polarization determined by A(\p). Here h = 2 pi hbar, where hbar is

Planck's number; omega=cp_0/hbar is the angular frequency.

The polarizaon 4-vector A(\p) is orthogonal to the 4-momentum p

composed of p_0 and \p, obtained by a Fourier transform of the

4-potential A(x) in the Lorentz gauge. (The wave equation translates

into the condion p_0^2=\p^2, causality requires p_0>0, hence

p_0=|\p|, and orthogonality p dot A(\p) = 0 expresses the Lorentz

gauge condition. For massless particles, there remains the additional

gauge freedom to shift A(\p) by a mulple of the 4-momentum p,


can be used to fix A_0=0.)

A(\p) is usually written (in the gauge with vanishing time component)

a linear combination of two specific polarization vectors eps^+(p) and

eps^-(p) for circularly polarized light (corresponding to helicies +1

and -1), forming together with the direcon vector \n(\p) an

orthonormal basis of complex 3-space. In particular,

eps^+(p) eps^+(p)^* + eps^-(p)eps^-(p)^* + \n(\p)\n(\p)^* = 1

is the 3x3 identy matrix. (This is used in sums over helicities for

Feynman rules.) Specifically, eps^+(p) and eps^-(p) can be obtained by

finding normalized eigenvectors for the eigenvalue problem

[check. The original eigenvalue problem is p dot J eps = lambda eps.]

p x eps = lambda eps

with lambda = +-i|p|. For example, if p is in z-direction then

eps^+(p) = (1, -i, 0)/sqrt(2),

eps^-(p) = (i, -1, 0)/sqrt(2),

and the general case can be obtained by a suitable rotation.

An explicit calculation gives almost everywhere

eps^+(p) = u(p)/p_0

where p_0=|p| and

u_1(p) = p_3 - i p_2 p'/p'',

u_2(p) = -i p_3 - i p_1 p'/p''

u_3(p) = p'


p' = p_1+ip_2,

p''= p_3+p_0.

[what is eps^-(p)?]
These formulas become singular along the negave p_3-axis,

so several charts are needed to cover

For experiments one usually uses nearly monochromatic light bundled

into narrow beams. If one also ignores the directions (which are

usually fixed by the experimental setting, hence carry no extra

information), then only the helicity degrees of freedom remain,

and the 1-photon part of the beam behaves like a 2-level quantum

system ('a single spin').

A general monochromatic beam with fixed direction in a pure state is

given by a second-quantized state vector, which is a superposition of

arbitrary multiphoton states in the Bosonic Fock space generated by

the two helicity degrees of freedom. This is the basis for most

quantum optics experiments probing the foundations of quantum


The simplest state of light (generated for example by

lasers) is a coherent state, with state vector proportional to

e(A) = |vac> + |A> + 1/sqrt(2!) |A> tensor |A>

+ 1/sqrt(3!) |A> tensor |A> tensor |A> + ...

where |A> is a one-photon state. Thus coherent states also have the

same degrees of freedom as classical electromagnetic radiation.

Indeed, light in coherent states behaves classically in most respects.

At low intensity, the higher order terms in the expansion are

negligible, and since the vacuum part is not directly observable,

a low intensity coherent states resembles a single photon state.

On the other hand, true single photon states are very hard to produce

to good accuracy, and were created experimentally only recently:

B.T.H. Varcoe, S. Brattke, M. Weidinger and H. Walther,

Preparing pure photon number states of the radiation field,

Nature 403, 743--746 (2000).

see also

Ordinary light is essentially never, and high-tech light almost never,

describable by single photons.

A good informal discussion of what a photon is from a more practical

perspective was given by Paul Kinsler in


But this does not tell the whole story. An interesting collection of

articles explaining different current views is in

The Nature of Light: What Is a Photon?

Optics and Photonics News, October 2003

Further discussion is given in the section ''Coherent states of light

as ensembles'' of the present FAQ.

The standard reference for quantum optics is

L. Mandel and E. Wolf,

Optical Coherence and Quantum Optics,

Cambridge University Press, 1995.

Mandel and Wolf write (in the context of localizing photons),

about the temptation to associate with the clicks of a photodetector

a concept of photon particles. [If there is interest, I can try to

recover the details.] The wording suggests that one should resist the

temptation, although this advice is usually not heeded. However,

the advice is sound since a photodetector clicks even when it

detects only classical light! This follows from the standard analysis

of a photodetector, which treats the light classically and only

quantizes the detector. Thus the clicks are an artifact of

photodetection caused by the quantum nature of matter, rather than

a proof of photons arriving!!!

A coherent light source (laser) produces a coherent state of light,

which is a superposion of the vacuum state, a 1-photon state,

a 2-photon state, etc, with squared amplitudes given by a Poisson

distribution. At low intensity, this is misinterpreted in practice

as random single photons arriving at the end of the beam in a

random Poisson process, because the photodetector produces clicks

according to this distribution.

Incoherent light sources usually consist of thermal mixtures and

produce other distributions, but otherwise the description (and

misinterpretation) is the same.

Nevertheless, one must understand this misinterpretation in order

to follow much of the literature on quantum optics.

Thus the talk about photons is usually done inconsistently;

almost everything said in the literature about photons should be taken

with a grain of salt.

There are even people like the Nobel prize winner Willis E. Lamb

(the discoverer of the Lamb shift) who maintain that photons don't

exist. See towards the end of


The reference mentioned there at the end appeared as

W.E Lamb, Jr.,

Applied Physics B 60 (1995), 77--84

This, together with the other reference mentioned by Lamb, is



W.E Lamb, Jr.,

The interpretation of quantum mechanics,

Rinton Press, Princeton 2001.

I think the most apt interpretation of an 'observed' photon as used

in practice (in contrast to the photon formally defined as above) is

as a low intensity coherent state, cut arbitrarily into time slices

carrying an energy of h*nu = hbar*omega, the energy of a photon at

frequency nu and angular frequency omega.

Such a state consists mostly of the vacuum (which is not directly

observable hence can usually be neglected), and the contributions of

the multiphoton states are negligible compared to the single photon


With such a notion of photon, most of the actual experiments done


sense, though it does not explain the quantum randomness of the

detection process (which comes from the quantized electrons in the

A nonclassical description of the electromagnetic field where states of

light other than coherent states are required is necessary mainly for

special experiments involving recombining split beams, squeezed

state amplification, parametric down-conversion, and similar

arrangements where entangled photons make their appearance.

There is a nice booklet on this kind of optics:

U. Leonhardt,

Measuring the Quantum State of Light,

Cambridge, 1997.

Nonclassical electromagnetic fields are also relevant in the

scattering of light, where there are quantum corrections

due to multiphoton scattering. These give rise to important effects

such as the Lamb shift, which very accurately confirm the quantum

nature of the electromagnetic field. They involve no observable

photon states, but only virtual photon states, hence they are unrelated

to experiments involving photons. Indeed, there is no way to observe

virtual particles, and their name was chosen to reflect this.

(Observed particles are always onshell, hence massless for photons,

whereas it is an easy exercise that the virtual photon mediating

electromagnetic interaction of two electrons in the tree approximation

is never onshell.)

S2g. Parcle posions and the posion operator


The standard probability interpretation for quantum particles

is based on the Schr"odinger wave function psi(x), a square integrable

single- or mulcomponent funcon of posion x in R^3.

Indeed, with ^* denoting the conjugate transpose,

rho(x) := psi(x)^*psi(x)

is generally interpreted as the probability density to find (upon

measurement) the particle at position x. Consequently,

Pr(Z) := integral_Z dx |psi(x)|^2

is interpreted as the probability of the particle being in the open

subset Z of position space. Particles in highly localized states

are then given by wave packets which have no appreciable size

|psi(x)| outside some tiny region Z.

If the position representation in the Schr"odinger picture exists,

there is also a vector-valued position operator x, whose components

act on psi(x) by mulplicaon with x_j (j=1,2,3). In parcular,

the components of x commute, satisfy canonical commutation


with the conjugate momentum

p = -i hbar partial_x,

and transform under rotaons like a 3-vector, so that the commutation

relations with the angular momentum J take the form

[J_j,x_k] = i eps_{jkl} x_l.

Moreover, in terms of the (unnormalizable) eigenstates |x,m> of the

position operator correponding to the spectral value x (and a label m

to distinguish multiple eigenstates) we can recover the position

representation from an arbitrary representation by defining psi(x)

to be the vector with components

psi_m(x) := <x,m|psi>.

Therefore, if we have a quantum system defined in an arbitrary

Hilbert space in which a momentum operator is defined, the necessary

and sufficient condition for the existence of a spatial probability

interpretation of the system is the existence of a position operator

with commuting components which satisfy standard commutation

relations with the components of the momentum operator and the

angular momentum operator.

Thus we have reduced the existence of a probability interpretation

for particles in a bounded region of space to the question of the

existence of a position operator with the right properties.

We now investigate this existence problem for elementary particles,

i.e., objects represented by an irreducible representation of the

full Poincare group. We consider first the case of particles of

mass m>0, since the massless case needs addional consideraons.

A. Massive case, m>0:

Let M := R^3 be the manifold of 3-momenta p. On the Hilbert space

H_m^d obtained by completion of the space of all C^infty functions

with compact support from M to the space C^d of d-component


with complex entries, with inner product defined by

<phi|psi> := integral d\p/sqrt(p^2+m^2) phi(p)^*psi(p),

we define the position operator

q := i hbar partial_p,

which satisfies the standard commutation relations, the momentum in

time direction,

p_0 := sqrt(m^2+|p|^2),

where m>0 is a fixed mass, and the operators

J := q x p + S,

K := (p_0 q + q p_0)/2 + p x S/(m+p_0),

where S is the spin vector in a unitary representaon of so(3) on

the vector space C^d of complex vectors of length d, with the same

commutation relations as J.
This is a unitary representation of the Poincare algebra;

verification of the standard commutation relations (given,

e.g., in Weinberg's Volume 1, p.61) is straighUorward.

It is not difficult to show that this representation is irreducible

and extends to a representation of the full Poincare group.

Obviously, this representation carries a position operator.

Since the physical irreducible representations of the Poincare group

are uniquely determined by mass and spin, we see that in the massive

case, a position operator must always exist. An explicit formula in

terms of the Poincare generators is obtained through division by m

in the formula

mq = K - ((K dot p) p/p_0 + J x p)/(m+p_0),

which is straightforward, though a bit tedious to verify from the above.

That there is no other possibility follows from

T.F. Jordan

Simple derivation of the Newton-Wigner position operator

J. Math. Phys. 21 (1980), 2028-2032.

Note that the position operator is always observer-dependent, in the

sense that one must choose a timelike unit vector to distinguish

space and time coordinates in the momentum operator. This is due to

the fact that the above construction is not invariant under Lorentz
boosts (which give rise to equivalent but different representations).

Note also that in case of the Dirac equation, the position operator is

_not_ the operator multiplying a solution psi(x) of the Dirac equation

by the spacelike part of x (which would mix electron and positron

states), but a related operator obtained by first applying a so-called

Foldy-Wouthuysen transformation.

L.L. Foldy and S.A. Wouthuysen,

On the Dirac Theory of Spin 1/2 Parcles and Its Non-Relativistic


Phys. Rev. 78 (1950), 29-36.

B. Massless case, m=0:

Let M_0 := R^3\{0} be the manifold of nonzero 3-momenta p, and let

p_0 := |p|, n := p/p_0.

The Hilbert space H_0^d (defined as before but now with m=0 and with

M_0 in place of M)

obtained by completion of the space of all C^infty functions

with compact support from M to the space C^d of d-component


with complex entries, with inner product defined by

<phi|psi> := integral d\p/sqrt(p^2+m^2) phi(p)^*psi(p),

carries a natural massless representation of the Poincare algebra,

defined by

J := q x p + S,

K := (p_0 q + q p_0)/2 + n x S,

where q = i hbar partial_p is the position operator, and S is the

spin vector in a unitary representaon of so(3) on C^d, with the

same commutation relations as J.

Again, verification of the standard commutation relations is

straightforward. (Indeed, this representation is the limit of the

above massive representation for m --> 0.)

It is easily seen that the helicity

lambda := n dot S

is central in the (suitably completed) universal envelope of the

Lie algebra, and that the possible eigenvalues

of the helicity are s,s-1,...,-s, where s=(d-1)/2. Therefore, the

eigenspaces of the helicity operator carry by restriction unitary

representations of the Poincare algebra, which are easily seen to be

irreducible. They extend to a representation of the connected

Poincare group. Moreover, the invariant subspace H_s formed by the

direct sum of the eigenspaces for helicity s and -s form a massless

irreducible spin s representation of the full Poincare group.

(It is easy to see that changing K to K-t(p_0)p for an arbitrary

differenable funcon t of p_0 preserves all commutaon relaons,

hence gives another representation of the Poincare algebra.

Since the massless irreducible representations of the Poincare group

are uniquely determined by their spin, the resulting representations

are equivalent. This corresponds to the freedom below in choosing a

position operator.)

Now suppose that a Poincare invariant subspace H of L^2(M_0)^d has a

position operator x satisfying the canonical commutation relations

with p and the above commutator relations with J. Then F=q-x


with p, hence its components must be a (possibly matrix-valued)

funcon F(p) of p. Commutaon with p implies that paral_p x F = 0,

and, since M_0 is simply connected, that F is the gradient of a scalar

function f. Rotation invariance then implies that this function

depends only on p_0=|p|. Thus

F = paral_p f(p_0) = f'(p_0) n.

Thus the position operator takes the form

x = q - f'(p_0) n.

In particular,

x x p = q x p.

Now the algebra of linear operators on the dense subspace of C^infty

functions in H contains the components of p, J, K and x, hence those of

J - x x p = J - q x p = S.
Thus the (p-independent) operators from the spin so(3) act on H.

But this implies that either H=0 (no helicity) or H = L^2(M_0)^d

(all helicities between s and -s).

Since the physical irreducible representations of the Poincare group

are uniquely determined by mass and spin, and for s>1/2, the spin s

Hilbert space H_s is a proper, nontrivial subspace of L^2(M_0)^d,

we proved the following theorem:


An irreducible representations of the full Poincare group with

mass m>=0 and finite spin has a posion operator transforming

like a 3-vector and satisfying the canonical commutation relations

if and only if either m>0 or m=0 and s<=1/2 (but s=0 if only

the connected poincare group is considered).

This theorem was announced without giving details in

T.D. Newton and E.P. Wigner,

Localized states for elementary systems,

Rev. Mod. Phys. 21 (1949), 400-406.

A mathematically rigorous proof was given in

A. S. Wightman,

On the Localizability of Quantum Mechanical Systems,

Rev. Mod. Phys. 34 (1962), 845-872.

See also

T.F. Jordan

Simple proof of no position operator for quanta with zero mass

and nonzero helicity

J. Math. Phys. 19 (1980), 1382-1385.

who also considers the massless representations of continuous spin,


D Rosewarne and S Sarkar,

Rigorous theory of photon localizability,

Quantum Opt. 4 (1992), 405-413.

For spin 1, the case relevant for photons, we have d=3, and the

subspace of interest is the space H obtained by completion of the

space of all vector-valued C^infty functions A(p) of a nonzero

3-momentum p with compact support satisfying the transversality

condion p dot A(p)=0,

with inner product defined by

<A|A'> := integral dp/|p| A(p)^* A'(p).

It is not difficult to see that one can identify the wave functions

A(p) with the Fourier transform of the vector potential in the

radiaon gauge where its 0-component vanishes. This relates the

present discussion to that given in the FAQ entry ''What is a photon?''.

As a consequence of our discussion, photons (m=0, s=1) and gravitons

(m=0, s=2) cannot be given natural probabilies for being in any given

bounded region of space. Chiral spin 1/2 parcles also do not have

a position operator and hence have no such probabilities, by the same

argument, applied to the connected Poincare group.

(Note that measured are only frequencies, intensities and

S-matrix elements; these don't need a well-defined position concept

but only a well-defined momentum concept, from which frequencies

can be found via omega=p_0/hbar - since c=1 in the present seVng,

and direcons via n = p/p_0.)

However, assuming there are scalar massless Higgs parcles (s=0),

one could combine such a higgs, a photon, and a graviton into

a single reducible representaon on L^2(M_0)^5, using the above

construction. By our derivation, one can find position eigenstates

which are superpositions of Higgs, photon, and graviton. Thus to

be able to regard photons and gravitons as particles with a proper

probability interpretation, one must consider Higgs, photons, and

gravitons as aspects of the same localizable particle, which we

might call a graphoton. (Without gravity, a phiggs particle would

also do.)

If the concept of an observable is not tied to that of a Hermitian

operator but rather to that of a POVM (positive operator-valued

measure), there is more flexibility, and covariant POVMs for positon

measurements can be meaningfully defined, even for photons. See,


A. Peres and D.R. Terno,

Quantum Information and Relativity Theory,

Rev. Mod. Phys. 76 (2004), 93.

[see, in parcular, (52)]

K. Kraus, Position observable of the photon, in:

The Uncertainty Principle and Foundations of Quantum Mechanics,

Eds. W. C. Price and S. S. Chissick,

John Wiley & Sons, New York, pp. 293-320, 1976.

M. Toller,

Localization of events in space-time,

Phys. Rev. A 59, 960 (1999).

P. Busch, M. Grabowski, P. J. Lahti,

Operational Quantum Physics,

Springer-Verlag, Berlin Heidelberg 1995, pp.92-94.

Note that a POVM describes the statistics of a particular measurement

process rather than some underlying reality. This is reflected in the

fact that there are many possible nonequivalent possible definitions

of POVMs, pertaining to the possible different ways to get a measured


Therefore, the concept of a photon position is necessarily subjective,

since it depends on the POVM used, hence on the way the

measurement is performed. It does not describe something objective.

The POVM does not allow one to talk about the position of a photon

- which could exist only if the corresponding operator existed -,

but only about the measured position: The photon is somewhere near

range of values established by the measurement, without any more

definite statement being possible. On the other hand, for observables

corresponding to Hermitian operators, there are states in which

a definite statement is (at least theoretically) possible that the

observable has a value in a given range.

Papers related to position operators:

M.H.L. Pryce,

Commuting Co-ordinates in the new field theory,

Proc. Roy. Soc. London Ser. A 150 (1935), 166-172.

(first construction of position operators in the massive case)

B. Bakamjian and L.H. Thomas,

Relativistic Particle Dynamics. II,

Phys. Rev. 92 (1953), 1300-1310.

(first construction of massive representations along the above


L.L. Foldy,

Synthesis of Covariant Particle Equations,

Physical Review 102 (1956), 568-581.

(nice and readable version of the Bakamjian-Thomas construction

for massive representations of the Poincare group)

R. Acharya and E. C. G. Sudarshan,

''Front'' Description in Relativistic Quantum Mechanics,

J. Math. Phys. 1 (1960), 532-536.

(a ''most local'' description of the photon by wave fronts)

I. Bialynicki-Birula,

Photon wave function,

(A 53 page recent review arcle, covering various possibilies

to define photon wave functions without a position operator

acng on them. The best is (3.5), with a nonstandard inner

product (5.8). What is leN of the probability interpretaon is

(5.28) and its subsequent discussion.)

See also the entry ''Localization and position operators'' in this FAQ.

There are a few papers by M. Hawton, e.g.


on a nonstandard position operator which does not transform like a

3-vector. This is unphysical since it does not give orientation

independent probabilities for observing a photon in a given region of

space. Claims to the contrary in


supposedly constructing a Lorentz invariant photon number density,

are erroneous; see

Other nonstandard position operators violating the conditions

necessary for a probability interpretation were discussed earlier,

starting with

M.H.L. Pryce,

The Mass-Centre in the Restricted Theory of Relativity and Its

Connexion with the Quantum Theory of Elementary Particles,

Proc. Roy. Soc. London, Ser. A, 195 (1948), 62-81.


S2h. Localizaon and posion operators


Position operators are part of the toolkit of relativistic quantum


In a relativistic setting, one always has a representation of the

Poincare algebra. From the generators of the Poincare algebra

(namely the 4-momentum p, the angular momentum \J, and the

boost generators \K) one can make up (in massive representations)

a nonlinear expression for a 3-dimensional \x (the position operator)

that together with the space part \p of the 4-momentum has canonical

commutation rules and hence gives a Heisenberg algebra.

(The backslash is a convenient ascii notation to indicate bold face

leers, corresponding to 3-vectors.)

The position operator so constructed is unique, once the time


is fixed, and is usually called the Newton-Wigner position operator,

although it appears already in earlier work of Pryce. Relevant

applications are related to the names Foldy and Wuythousen

(for their transform of the Dirac equation, widely used in relativistic

quantum chemistry) and Bakamjian and Thomas (for their relativistic

multi-particle theories); both groups rediscovered the Newton-Wigner

results independently, not being aware of their work.

That the time coordinate has to be fixed means that the position

operator is observer-dependent. Each observer splits space-time

into its personal me (in direcon of its total 4-momentum) and

personal 3-space (orthogonal to it), and the position operator

relates to this 3-space. By a Lorentz transformation, one can

transform the 4-momentum to the vector (E_obs 0 0 0), which makes

the 0-component. Most papers on the subject work in the latter

For massless representaons of spin >1/2, the construcon breaks


This is related to the fact that massless parcles with spin >1/2

don't have modes of all helicities allowed by the spin

(e.g., photons have spin 1 but no longitudinal modes),

which makes them being always spread out, and hence not completely

localizable. For details, see the FAQ entry

''Particle positions and the position operator''

Here are a few references:

J.P. Costella and B.H.J. McKellar,

The Foldy-Wouthuysen transformation,


* This paper discusses the physical relevance of the Newton-Wigner

representation, and its relation to the Foldy-Wouthuysen


T. D. Newton, E. P. Wigner,

Localized States for Elementary Systems,

Rev. Mod. Phys. 21 (1949) 400-406

* The original paper on localization

L. L. Foldy and S. A. Wouthuysen,

On the Dirac Theory of Spin 1/2 Parcles and Its Non-Relativistic


Phys. Rev. 78 (1950), 29-36.

* On the transform of the Dirac equation now carrying the author's


B. Bakamjian and L. H. Thomas

Relativistic Particle Dynamics. II

Phys. Rev. 92 (1953), 1300-1310.

and related papers in

Phys. Rev. 85 (1952), 868-872.

Phys. Rev. 121 (1961), 1849-1851.

* First constructive papers on relativistic multiparticle dynamics,

based on a 3D posion operator

L. L. Foldy,

Synthesis of Covariant Particle Equations,

Phys. Rev. 102 (1956), 568-581

* A lucid exposition of Poincare representations which start with

a 3D posion operator, and a discussion of electron localizaon

Before eq. (189), he notes that an observer-independent localization

of a Dirac electron (which generally is considered to be a pointlike

particle since it can be exactly localized in a given frame)

necessarily leaves a fuzziness of the order of the Compton wavelength

of the particle. (This is also related to the so-called Zitterbewegung,

see, e.g., the discussion in Chapter 7 of Paul Strange's

"Relativistic Quantum Mechanics".)

A. S. Wightman,

On the Localizability of Quantum Mechanical Systems,

Rev. Mod. Phys. 34 (1962) 845-872

* A group theoretic view in terms of systems of imprimitiviy

T. O. Philips,

Lorentz invariant localized states,

Phys. Rev. 136 (1964), B893-B896.

* A covariant coherent state alternative which does not require

to single out a time coordinate

V. S. Varadarajan,

Geometry of Quantum Theory

(second edion), Springer, 1985

* A book discussing some of this stuff

L. Mandel and E. Wolf,

Optical Coherence and Quantum Optics,

Cambridge University Press, 1995.

* The bible on quantum optics, a thick but very useful book.

Relevant here since it contains a good discussion of the

localizability of photons (which can be done only approximately,

in view of the above) from a reasonably practical point of view.

G.N. Fleming,

Reeh-Schlieder meets Newton-Wigner

* This paper gives some relations to quantum field theory


S2i. Posion operators in relavisc quantum field theory


In relativistic quantum field theory in its usually given form,

position is promoted to the same status as time, and hence becomes a

parameter in the quantum field, while in quantum mechanics it is an

operator vector.
This poses the question of whether there is a position operator in

relativistic quantum field theory. Many people think that there is none.

But even though there is a parameter called x and referred to as

4-dimensional position, there is also an vector defining a

3-dimensional position operator, provided the relativistic system

under consideration is not massless.

Indeed, any relativistic theory possesses the Poincare group as a

symmetry group, whose infinitesimal generators satisfy the standard

commutation rules of the Poincare algebra. But given these, the

standard construction by Newton and Wigner gives (in each Lorentz

frame) a 3-dimensional position operator with commuting


and the associated conjugate momentum operators. (See Secon S2g

''Particle positions and the position operator'' of this FAQ.)

These play exactly the same role as the position and momentum

operators in nonrelativistic quantum mechanics.


S2j. Coherent states of light as ensembles


Let us look in some detail at the setting of a weak laser switched on

at me t_0 and switched off again at me t_1. The me T:=t_1-t_0

that the laser is switched on is a variable that we can choose at will.

Conventionally one models the light produced by a laser by coherent

states. If one tests the photon contents at the end of the beam by a

photodetector, one measures a series of clicks indicating (according to

tradition) the presence of single photons. Each click is conventionally

regarded as the measurement of a single photon; hence one measures


ensemble of photons. Without this interpretation, much of the talk

about photons in quantum optics would not make sense.

Technically, and completely precisely, one has an ensemble of photons

in an indefinite photon number state. (Even a superposition of states

describes an ensemble, in the conventional interpretation.)

In a weak coherent state, the multiparticle contents is negligible;

one has essentially a superposition of the vacuum and the single

particle state. Conventionally (as for all somewhat rare events),

the vacuum part is ignored - one just restricts attention to the

times where a particle is present. This leaves a single particle state.

Thus, at least for weak coherent states, it is a good approximation

to say that a coherent state of definite frequency is an ensemble

of single-particle systems.

More formally, in the usual abbreviated form, a weak coherent state of

a stationary monochromatic beam has the form

|psi> = (1-eps||0> + eps|1> + O(eps^2), (*)

with eps<<1, and

<n> = <a^*a> = <psi|a^*a|psi> = eps^2 + O(eps^3)

is not a mean photon number, but a mean rate - the mean intensity.

More precisely, each coherent state has a mode A=A(p); the modes are

1-1 correspondence with creaon operators a^*(A). They create,

in field theory language, one photon in this mode. So far, these

photons are only constructs on paper, used to be able to write down

multiparticle states, and have not yet an observable meaning.

An N-particle state of mode A is defined recursively from the vacuum

state by

|1,A> := a^*(A)|vac>, |N,A>: = a^*(A)|N-1,,A> for N>0,

and coherent states with mode A have the form

|z,A>> := const* sum_N z^N/sqrt{N!} |N,A>

with a complex amplitude z. and satisfies

a(A)|z,A>> = z|z,A>>.

The mean photon number associated with the coherent state is

Nbar := <N> = <a^*(A)a(A)> = <<z,A|a^*(A)a(A)|z,A>>

= <<z,A|z^*z|z,A>> = z^*z <<z,A|z,A>> = z^*z,


Nbar = |z|^2,

independent of the time T.

The events are the clicks, and there is exactly one click per event

in a weak signal (for strong signals, one cannot separate the events).

But the events happen randomly in time, with a rate proportinal to


It is conventional to regard each click as evidence for the presence

of a single photon - this more or less defines the experimental notion

of a photon. (See also the discussion in the section

''What is a photon?'' of this FAQ.)

Note that two photons arriving at different times cannot be considered

as being part of a N-parcle state with N>1, since states are

considered at a fixed time! Also, the fact that the weak coherent

state has a negligible contribution of doubly excited states

means that N-parcle state with N>1 are here completely irrelevant.

Thus one has an ensemble of single photons.

Clearly, the number of observable photons (in the sense

of detector clicks) is proportional to T. This shows that the

formal photon number operator in Fock space, N = a^*(A)a(A), has

nothing to do with the photon number as defined by the number of


instead its expectation is proportional to the mean rate of clicks

per unit time.

Thus (*) describes an ensemble of O(T*eps^2) single photons, where

$T$ is the duration of the experiment.

In particular, plane monochromatic light in the form of a coherent


(three mathematical idealizations involved here) is an endless stream

of infinitely many photons passing with the speed of light through a

particular position on the beam. The rate of emission of photons is

proportional to the intensity of the incident beam. But the fact that

the model is an approximation only and that for real preparations,

observations are bounded in space and time does not change the

of this analysis.

On the other hand, it is clear that a coherent state is not a 1-photon

state but a state with an indefinite number of photons (i.e., not an

eigenstate of the number operator). Thus there seems to be a conflict

in terminology - weak laser light is describerd by a coherent state

without definite number contents, but it behaves experimentally as

an ensemble of single photons.

This shows that the concept of a photon is somewhat ambiguous.

Different people mean different and often quite vague things by

''photon'', if they bother to spell out the meaning in some detail

(which is usually not done). This can be seen from the diverging

explanations given in a recent special issue on this topic:

The Nature of Light: What Is a Photon?

Opcs and Photonics News, October 2003

which presents five mutually incompatible views,

* Light reconsidered (Arthur Zajonc)

* What is a photon? (Rodney Loudon)

* What is a photon? (David Finkelstein)

* The concept of the photon - revisited (Ashok Muthukrishnan,

Marlan O. Scully, and M. Suhail Zubairy)

* A photon viewed from Wigner phase space (Holger Mack and

Wolfgang P. Schleich)
In QED, a ''one-photon state'' is a well-defined object, but ''one

photon'' in an experiment is not (unless one identifies it with a

detector click - which leaves unsaid what an undetected photon would

be). The relation between the two is quite indirect, and there is no

agreement in the literature on the precise relation.

My own views (not mainstream, but consistent with experiment) are:

1. that clicks have nothing at all to do with photons, they are just

a stochastic measure of intensity, and arise also if the incindent

field is modelled completely classical;

2. that what is typically called a photon is not an arbitrary single

particle state of the electromagnetic field (in particular, never

an approximately plane wave) but a state of the electromagnetic


that at each time is localized in space, whose energy contents is

that of hbar*omega. Otherwise, the idea of producing photons of

demands makes no sense.

3. It is the field of the incident beam that counts; the talk about

photons in the incoming beam is not very meaningful and only blurs

the picture; the right language is that of field theory.

Indeed, a theoretical model of a photo-detector excited by an external

classical monochromatic e/m field contains no photons, but in this

model the detector responds by clicking randomly according to a

stascs; see Chapter 9 of the book

L. Mandel and E. Wolf,

Optical Coherence and Quantum Optics,

Cambridge University Press, 1995.

Thus a precise meaning of ''photon'' is not needed to defend

statement 1.

No maer which view one takes with regard to statement 1., the

queson is how one relates a 1-photon state to what one actually

prepares in a beam of light. What does it mean in experimental terms

to have prepared _one_ photon in this state?

Reading the details of preparation schemes for photons on demands

as discussed (with references to the original literature) in

one finds that no clear answer can be given to this question, but

that the evidence points to statement 2. ov my view presented above.

In this view, the difference between the preparation of a coherent

state and that of a single photon is that a weak coherent state

generates an infinitely long random sequence of Poisson-distributed

clicks, while a single photon (in the above sense of a space-localized

field) generates (in an ideal detector) a single click only.

The practice seems to be that one silently ignores the vacuum

contribution in (*) and obtains after rescaling to a normalized state

a state

psi' = |1> + O(eps) (*')

which, with perfect right, can be considered to be an approximate

1-photon state. Indeed, most photon states produced in the laboratory

are superposition with the vacuum, and still people speak of photons.

This also holds for other systems than simple laser light. For example,

entanglement studies are typically made with squeezed states,

which differ from coherent states only in that they have instead

of (*) a representation

psi = (1-eps||0> + eps|2> + O(eps2), (**)

and everyone refers to (**) as an ensemble of 2-photon states.

Indeed, parametric down conversion is well-known to produce an

ensemble of 2-photon states, but if one looks closer at the models

one finds that they actually produce states of the form (**)

that produce endless streams of photon pairs.

While photons on demand are based on exciting single atoms,

the only way of reliably creating single photons was for a long time

to use a source in a state of the form (**), where the photon pairs
are entangled pairs of photons with different momentum vectors

located on different beams). Then one observes photons (clicks) on the

left beam with a detector, and knows from general principles that at

the same time a photon is underway in the other beam. Thus one can

about the presence of single photons without having them observed


This interpretation again explains away the vacuum part of the state

in (**). One restricts aenon to the 2-photon sector of (**) by

ignoring the times where nothing but the vacuum part is observed, and

focuses on the times when something - and then by the form of (**)

the 2-photon part - is observed. This is the sense in which one

interprets as an ensemble of 2-photon states.

Then one observes the part of the 2-photon system in one beam, to

when a photon is present in the other beam. Bot of course, although

this is the way talked about the situation, in reality one still has

the superposition with the vacuum, except that one chooses to ignore

the times where nothing happens to get rid of the vacuum.


S3a. What are 'bare' and 'dressed' parcles?


A bare electron is the formal entity discussed in textbooks

when they do perturbative quantum electrodynamics. The intuitive

picture generally given is that a bare electron is surrounded

by a cloud of virtual photons and virtual electron-positron pairs

to make up a physical, 'dressed' electron. Only the latter is real

and observable. The former is a formal caricature of the latter,

with paradoxical properties (infinite mass, etc.).

On a more substantial level, the observable electrons are produced

from the bare electrons by a process called renormalization,

which modifies the propagators by self-energy terms

and the currents by form factors. As the name says, the latter define

the 'form' of a particle. (In the above picture, it would correspond

to the shape of the virtual cloud, though it is better to avoid

giving the virtual particles too much of meaning.)

The dressed object is the renormalized, physical object,

described perturbatively as the bare object 'clothed' by the

cloud of virtual particles. The dressed interaction is the 'screened'

physical interaction between these dress objects.

To draw an analogy in nonrelativistic quantum mechanics

think of nuclei as bare atoms, electrons as virtual particles,

atoms as dressed nuclei and the residual interaction between atoms,

computed in the Born-Oppenheimer approximation, as the dressed

interaction. Thus, for Argon atoms, the dressed interaction is

something close to a Lennard-Jones potential, while the bare

interaction is Coulomb repulsion. This is the situation physicists

had in mind when they invented the notions of bare and dressed


Of course, it is only an analogy, and should not be taken very

seriously. It just explains the intuition about the terminology used.

(For the serious version of renormalizaon, see Chapter 8.)

The electrons in QM are real, physical electrons that can be isolated.

The reason is that they are good eigenstates of the Hamiltonian.

On the other hand, virtual particles don't have this nice attribute
since the relativistic Hamiltonian H from field theory contains

creation and annihilation operators which mess things up.

The bare parcles correspond to 1-particle states in the Hilbert

space (though that is not quite true since there is no good Hilbert

space picture in conventional interacting QFT). Multiplying them

with H introduces terms with other particle numbers, hence a bare

particle can never be an eigenstate of H, and thus never be

observable in the way a nonrelativistic particle is.

The eigenstates of the relativistic Hamiltonian are, instead,

complicated multibody states consisting

of a superposition of states with any number of particles and

antiparticles, just subject to the restriction that the total quantum

numbers come out right. These are the dressed particles.

For the computational side of dressing, see, e.g., nucl-th/0102037,



S3b. How meaningful are single Feynman diagrams?

The standard model is a theory defined in terms of a Lagrangian.

To get computable output, Feynman graph techniques are used.

But individual Feynman graphs are meaningless (often infinite);

only the sum of all terms of a given order can be given - after

a process called renormalization - a well-defined (finite) meaning.

This is well-known; so no-one treats the Feynman graphs as real.

What is taken as real is the final outcome of the calculations,

which can be compared with measurements.


S3c. How real are 'virtual parcles'?


Virtual particles are used in perturbation theory with

Feynman diagrams. (See the FAQ entry ''Why Feynman diagrams''

for an explanation of their meaning. They do _not_ describe

processes in space and time, but certain multiple integrals...)

Feynman diagrams change their nature depending on the way

one does perturbation theory and what is resummed.

In their treatise on QED, Landau and Lifshitz discuss virtual particles

in Secon 79. They start at the outset with the remark that things

depend on which kind of perturbation theory is used, and contrast

'virtual' explicitly with 'real'. Virtual particles are called that

in contrast to 'real particles' which are observable and hence real.

Unlike the latter, virtual particles occuring in computations _must_

have disappeared from the formulas by the time the calculations lead

to something that can be compared with experiment.

Whence their 'reality' if there is any is like the reality of

characters in a dream. For example, just as we can fly in a dream,

virtual particles can be faster than light (since they may have

imaginary mass)...

The following is a more detailed discussion of the question how

meaningful it is to ascribe some sort of reality to virtual particles.

All language is only an approximation to reality, which simply is.

But to do science we need to classify the aspects of reality

that appear to have more permanence, and consider them as real.

Nevertheless, all concepts, including 'real' have a fuzziness

about them, unless they are phrased in terms of rigorous mathematical

models (in which case they don't apply to reality itself but only to
a model of reality).

In the informal way I use the notion, 'real' in theoretical physics

means a concept or object that

- is independent of the computational scheme used to

extract information from a theory,

- has a reasonably well-defined and consistent formal basis

- does not give rise to misleading intuition.

This does not give a clear definition of real, of course.

But it makes for example charge distributions, inputs and outputs of

(theoretical models of) scattering experiments, and quarks something

real, while making bare particles and virtual particles artifacts of

perturbation theory.

Quarks must be considered real because one cannot dispense with


in any coherent explanation of high energy physics.

Virtual particles must not be considered real since they arise only in

a particular approach to high energy physics - perturbation theory

before renormalization - that does not even survive the modifications

needed to remove the infinities. Moreover, the virtual particle content

of a real state depends so much on the details of the computational

scheme (canonical or light front quantization, standard or

renormalization group enhances perturbation theory, etc.) that

calling virtual particles real would produce a very weird picture of


Whenever we observe a system we make a number of idealizations

that serve to identify the objects in reality with the

mathematical concepts we are using to describe them. Then we


something, and at the end we retranslate it into reality. If our initial

initialization was good enough and our theory is good enough, the final

result will match reality well. Because of this idealization,

'real' real particles (moving in the universe) are slightly different

from 'mathematical' real particles (figuring in equations).

Modern quantum electrodynamics and other field theories are based


the theory developed for modeling scattering events.

Scattering events take a very short time compared to the

lifetime of the objects involved before and after the event. Therefore,

we represent a prepared beam of particles hitting a target as a single

particle hitting another single particle, and whenever this in fact

happens, we observe end products, e.g. in a wire chamber.

Strictly speaking (i.e., in a fuller model of reality), we'd have to

use a multiparticle (statistical mechanics) setting, but this is never

done since it does not give better information and the added

complications are formidable.

As long as we prepare the particles long (compared to the scattering

time) before they scatter and observe them long enough afterwards,

they behave essentially as in and out states, respectively.

(They are not quite free, because of the electromagnetic self-field

they generate, this gives rise to the infrared problem in quantum

electrodynamics and can be corrected by using coherent states.)

The preparation and detection of the particles is outside this model,

since it would produce only minute corrections to the scattering event.

But to treat it would require to increase the system to include source

and detector, which makes the problem completely different.

Therefore at the level appropriate to a scattering event, the 'real'

real particles are modeled by 'mathematical' in/out states, which

therefore are also called 'real'. On the other hand, 'mathematical'

virtual particles have nothing to do with observations, hence have no

counterpart in reality; therefore they are called 'virtual'.

The figurative virtual objects in QFT are there only because of the

well-known limitations of the foundations of QFT. In a nonperturbative

setting they wouldn't occur at all. This can be seen by comparing with
QM. One could also do nonrelativistic QM with virtual objects but

no one does so (except sometimes in motivations for QFT),

because it does not add value to a well-understood theory.

Virtual particles are an artifact of perturbation theory that

give an intuitive (but if taken too far, misleading) interpretation

for Feynman diagrams. More precisely, a virtual photon, say,

is an internal photon line in one of the Feynman diagrams. But there

is nothing real associated with it. Detectable photons are always

real, 'dressed' photons.

Virtual particles, and the Feynman diagrams they appear in,

are just a visual tool of keeping track of the different terms

in a formal expansion of scattering amplitudes into multi-dimensional

integrals involving multiple propaators - the momenta of the virtual

particles represent the integration variables.

They have no meaning at all outside these integrals.

They get out of mathematical existence once one changes the

formula for computing a scattering amplitude.

Therefore virtual particles are essentially analogous to virtual

integers k obtained by computing

log(1-x) = sum_k x^k/k

by expansion into a Taylor series. Since we can compute the

logarithm in many other ways, it is ridiculous to attach to

k any intrinsic meaning. But ...

... in QFT, we have no good ways to compute scattering amplitudes

without at least some form of expansion (unless we only use the

lowest order of some approximation method), which makes

virtual particles look a little more real. But the analogy

to the Taylor series shows that it's best not to look at them

that way. (For a very informal view of quantum electrodynamics in

terms of clouds of virtual particles see


and the later mails in this thread.)

A sign of the irreality of virtual particles is the fact that

when one does partial resummations of diagrams (which is essential


renormalization), many of the virtual particles disappear.

A fully nonperturbative theory would sum everything, and no virtual

particles would be present anymore. Thus virtual particles are

entirely a consequence of looking at QFT in a perturbative way

rather than nonperturbatively.

In the standard covariant Feynman approach, energy (cp_0) and

momentum (\p; the backslash indicates 'boldface') is conserved,

and virtual particles are typically off-shell (i.e., they

do not sasfy the equaon p^2 = p_0^2 - \p^2 = m^2 for physical

particles). To see this, try to model a vertex in which an electron

(mass m_e) absorbs a photon (mass 0). One cannot keep the incoming

electron and photon and the outgoing photon on-shell (satisfying

p^2 = m^2) without violang the energy-momentum balance.

However, many physicists work in light front quantization.

There one keeps all particles on-shell, and instead has energy and

momentum nonconservation (removed formally by adding an



The effect of this is that the virtual particle structure of the

theory is changed completely: For example, the physical vacuum and

the bare vacuum now agree, while in the standard approach,

the vacuum looks like a highly complicated

medium made up from infinitely many bare particles....

But bare particles must still be dressed to become physical,

though less heavily than in the traditional Feynman approach.

Another group of physicists calculate consequences of the standard

model using quantization on a lattice.

Here virtual particles are completely absent.

Clearly concepts such as virtual particles that depend so much

on the method of quantization cannot be regarded as being real.

Of course, physicists would not talk of virtual particles if the concept

had no relevance at all. One can argue with virtual particles to get an

intuitive idea of 'dressing', and to gain in this way some

understanding of phenomena such as the Casmir effect, Rabi

oscillations, the Lamb shift, anomalous magnetic moments, etc.

From a nonperturbative point of view, these effects all show up as

a consequence of renormalized, effective interactions between

physical (dressed, on-shell) particles.

See also earlier discussions on s.p.r. such as




and followups; maybe


is also of interest.

[For a longwinded alternative view of virtual particles

that I do _not_ share but rather find misleading, see


S3d. What is the meaning of 'on-shell' and 'off-shell'?


This applies only to relativistic particles.

A particle of mass m is on-shell if its momentum p satisfies

p^2 (= p_0^2-p_1^2-p_2^2-p_3^2) = m^2,

and off-shell otherwise. The 'mass shell' is the manifold of

momenta p with p^2=m^2.

Observable (i.e., physical) particles are asymptotic states

(scattering states) described (modulo unresolved mathematical

difficules) by free fields based on the dispersion relaon p^2=m^2,

and hence are necessarily on-shell. Off-shell particles only

arise in intermediate perturbative calculations; they are necessarily


The situation is muddled by the fact that one has to distinguish

(formal) bare mass and (physical) dressed mass; the above is valid

only for the dressed mass. Moreover, the mass shell loses its meaning

in external fields, where, instead, a so-called 'gap equation'



S3e. Virtual parcles and Coulomb interaction


Virtual objects have strange properties. For example,

the Coulomb interaction between two electrons is mediated by

virtual photons faster than the speed of light, with imaginary masses.

(This is often made palatable by invoking a time-energy uncertaintly

relation, which would allow particles to go off-shell.

But there is no time operator in QFT, so the analogy to Heisenberg's

uncertainty relation for position and momentum is highly dubious.)

Strictly speaking,

the Coulomb interaction is simply the Fourier transform of the

photon propagator 1/q^2, followed by a nonrelavisc approximaon.

It has nothing at all to do with virtual particle exchanges ---

except if one does perturbation theory. But then there is no surprise

that it must influence already the tree level. By a hand waving

argument (equate the Born approximations) this gives the

nonrelativistic correspondence.

But to get the Coulomb interaction as part of the Schroedinger


one needs to sum all ladder diagrams with 0,1,2,3,...,n,... exchanged

photons arranged in form of a ladder. Then one needs to approximate

the resulting Bethe-Salpeter equation. These are nonperturbative

techniques. (The computations are still done at few loops only,

which means that questions of convergence never enter.)

Virtual photons mediating the Coulomb repulsion between electrons

have spacelike momenta and hence would proceed faster than light

if there were any reality to them. But there cannot be; one'd need

infinitely many of them, and infinitely many virtual electron-positron

pairs (and then superpositions of any numbers of these) to match


a real, dressed object or interaction.


S3f. Are virtual parcles and decaying parcles the same?


Decaying particles and resonances are used synonymously in the

literature; they are complementary views of the same unstable state.

A very sharp resonance has a long lifetime relative to a scattering

event, hence behaves like a particle in scattering. It is regarded

as a real object if it lives long enough that its trace in a wire

chamber is detectable, or if its decay products are detectable at

places significantly different from the place where it was created.

On the other hand, a very broad resonance has a very short lifetime

and cannot be differentiated well from the scattering event producing

it; so the idealization defining the scattering event is no longer

valid, and one would not regard the resonance as a particle.

Of course, there is an intermediate grey regime where different people

apply different judgment. This can be seen, e.g., in discussions

concerning the tables of the Particle Data Group.

The only difference between a short-living particle and a stable

particle is the fact that the stable particle has a real rest mass,

while the mass m of the resonance has a small imaginary part.

Note that states with complex masses can be handled well in a rigged
Hilbert space (= Gelfand triple) formulation of quantum mechanics.

Resonances appear as so-called Siegert (or Gamov) states.

A good reference on resonances (not well covered in textbooks) is

V.I. Kukulin et al.,

Theory of Resonances,

Kluwer, Dordrecht 1989.

For rigged Hilbert spaces (treated in Appendix A of Kukulin), see also

quant-ph/9805063 and for its funconal analysis ramificaons,

K. Maurin,

General Eigenfunction Expansions and Unitary Representations of

Topological Groups,

PWN Polish Sci. Publ., Warsaw 1968.

But a very short-living particle is not the same as a virtual

particle. Often it is a complicated, nearly bound state of other

particles. On the other hand, virtual particles are essentally always

elementary. (There are exceptions when deriving Bethe-Salpeter


and the like for the approximate calculations of bound states and

resonances, where one creates an effective theory in which the latter

are treated as elementary.)

Even an unstable elementary particle can be distinguished from

a virtual particle. In perturbation theory, unstable elementary

particles are modelled exactly like stable particles,

namely as external lines in a Feynman diagram.

Virtual particles in Feynman diagrams are exactly those parts

of the diagram which are not given by external lines.

In particular, what is real and what is virtual is not affected

by a diagram rotation - this only affects what is input

and what is output.

The difference can also be seen in the mathematical representation.

In an effective theory where the resonance (e.g., the neutron or a

meson) is regarded as an elementary object, the resonance appears

in in/out states as a real particle, with complex on shell momentum

sasfying p^2=m^2, but in internal Feynman diagrams as a virtual

particle with real mass, almost always off-shell, i.e., violating

this equation.

There are also some unstable elementary particles like the weak

gauge bosons. Usually, one observes a 4-fermion interaction and the

gauge bosons are virtual. But at high energy = very short scales,

one can in principle observe the gauge bosons and make them real.

This means that they now appear as external lines in the corresponding

perturbative calculations, which displays their nonvirtual nature.

In any case, from a mathematical point of view, one must choose the

framework. Either one works in a Hilbert space, then masses are real

and there are no unstable particles (since these 'are' poles on the

so-called 'unphysical' sheet); in this case, there are no asymptotic

gauge bosons and all are therefore virtual.

Or one works in a rigged Hilbert space and deform the inner product;

this makes part of the 'unphysical' sheet visible; then the gauge

bosons have complex masses and there exist unstable particles

corresponding to in/out gauge bosons which are real.

The modeling framework therefore decides which language is



S4a. How do atoms and molecules look like?

Today, images of single atoms and molecules can be routinely

M. Herz, F.J. Giessibl and J. Mannhart

Probing the shape of atoms in real space

Phys. Rev. B 68, 045301 (2003)

write in the introduction:

''quantum mechanics specifies the probability of finding an electron

at position x relative to the nucleus. This probability is

determined by |psi(x)|^2, where psi(x) is the wave funcon of the

electron given by Schroedinger's equation. The product of -e and

|psi(x)|^2 is usually interpreted as charge density, because the

electrons in an atom move so fast that the forces they exert on

other charges are essentially equal to the forces caused by a

static charge distribution -e|psi(x)|^2.''

One of the authors, Jochen Mannhart, is one of the 10 winners of the

Leibniz prize 2008,


among others for the achievement that, for the first time, he made

pictures of atoms with subatomic resolution possible.

The Leibniz prize is the highest German academic prize, endowed with

a research grant of up to 2.5 Million Euro for each winner,

awarded each year to a few excellent younger scientists from all

The orbitals one can look at in physics and chemistry books

are the pictures of the squared absolute values of basis functions

used for representing single electron wave functions.

The actual shape of the wave function of each electron is some linear

combination of such basis function. These are calculated (in the

simplest realistic approximation) by Hartree-Fock calculations.

The atom shape is the shape of all electrons together, forming

in the Hartree-Fock approximation a Slater determinant formed from


single-particle wave functions, and in general a linear combinations

of such Slater determinants. These live in a multidimensional space

with 3n dimensions for an atom with n electrons.

The shape one can measure is actually a 3-dimensional charge density

rho(x) (x in R^3) formed by integrang the square of the absolute

value of the 3n-dimensional wave funcon psi over 3n-3 dimensions.

More precisely, it is defined (nonrelativistically) such that

(apart form a constant factor and the charge contribution of the


integral dx rho(x) f(x) = psi^* O_1(f) psi (1)

for all nice 3-dimensional functions f(x) of the space coordinate

vector x, where
O_1(f) = integral f(x) a^*(x) a(x)

is the 1-particle operator corresopnding to f. Here a^* and a denote

creation and annihilation operators. Since rho(x) decays quickly as x

differs more and more from the atom center, the atom looks like a

charge cloud with slightly fuzzy boundary.

For isolated atoms in the absence of external fields,

rho is typically spherically symmetric, giving symmetric shapes.

(In case of particles of nonzero spin, this assumes

that we are in a thermal setting where the spin directions average out.

In this case, we have instead of (1) the formula

integral dx rho(x) f(x) = tr O_1(f) rhohat,

where rhohat is the density matrix of the mixed state.)

For molecules, rho is in fact also a function of the coordinates of

all nuclei involved, and there is no longer any reason to have more

symmetry than the symmetry of the configuration of nuclei,

which is very little and often none.

The shape of molecules is therefore mainly determined by the


of the positions of the nuclei. In equilibrium, these arrange

themselves such that the potential energy, i.e., the smallest

eigenvalue of the Hamiltonian operator for the electrons is minimal

among all other positions (or at least a local minimum from which a

deeper lying state is very difficult to reach). The charge density

of molecules can be identified by means of X-ray crystallography or

nuclear magnetic resonance (NMR) spectroscopy; however, for


molecules, doing this reliably from the available indirect information

is a highly nontrivial art.

A few years ago,

I wrote a survey of molecular modeling of proteins, the largest

molecules in nature (apart from crystals, which are essentially

molecules of macroscopic size):

A. Neumaier,

Molecular modeling of proteins and mathematical prediction of

protein structure,

SIAM Review 39 (1997), 407-460.

Viewing atoms or molecules with a scanning tunneling microscope


or an atomic force microscope (AFM)
amounts to scanning the response of the 3-dimensional charge density

to (or, more precisely, the current or force induced by it on)

the scanning device, from which a computer generates a picture.

Thus rho(x) is actually observable, with a resolution of currently

up to 0.6 Angstrom = 0.6 10^{-10}m.


For a discussion of the charge density of molecules and the resulting

operative interpretation of atoms in molecules see, e.g., the

encyclopedic article

R.F.W. Bader

Atoms in Molecules


or Bader's web site


On the other hand, whether atomic or molecular substructures such as

orbitals are observable is controversial. See, e.g.,

J.M. Zuo et al.,

Direct Observation of d-orbital holes and Cu-Cu bonding in Cu_2O,

Nature 401 (1999), 49-52.




for discussions in 1999-2001, a discussion presenng a posive

majority vote among 22 textbooks:

and from 2007:


Also, see the nice pictures in

M. Herz, F.J. Giessibl and J. Mannhart

Probing the shape of atoms in real space

Phys. Rev. B 68, 045301 (2003)


Apparently, it is a matter of terminology. Those who use the term

orbital to refer to a charge distribution corresponding to a particular

electronic state (and the ball- dumbbell-, or ring-shaped pictures of

orbitals in textbooks show just that) find orbitals observable, while

purists restricting the usage of orbitals to denoting particular

single-electron wave functions find them unobservable.

Note that Scerri, who in


defends the unobservability of orbitals, writes explicitly:

''What can be observed, and frequently is observed in experiments, is

electron density. In fact, the observation of electron density is a

major field of research in which several monographs and review

articles have been written.''

and then cites two books and a review article. A more recent review

article of some aspects is

J.M. Zuo

Measurements of electron densities in solids: a real-space view of

electronic structure and bonding in inorganic crystals

Rep. Progr. Phys. 67 (2004), 2053-2103.


S4b. Why are observable densies state-dependent?


In the preceding, the mass and charge density of a n-particle system

(or of a single particle) depends on its quantum state. This is

sometimes regarded as a reason for denying the 'reality' of the

mass and charge density. However, such a reasoning is misguided.

Indeed, the phenomenon is already present in classical mechanics.

That mass and charge density depends on the state is no more

surprising than that the trajectory of a classical particle depends

on its classical state (its position and momentum), or that the

density of a cloud in the sky depends on its classical state

(the position and momentum of all its particles, or, in the customary

fluid mechanics approximation, its mass density field and its velocity


Of course it has to, to match a particular real life situation.

What seems strange at first sight is that the above applies already to

a single, indivisible particle. But this is really strange only if one

assumes that the particle is pointlike - which we know is the case only

for unphysical, bare particles, but not for the physical, renormalized

ones. (See the entry ''Are electrons pointlike/structureless?''

elsewhere in this FAQ.) Once one realizes that physical particles are
extended (although they are indivisible), there is enough room to

accommodate the internal structure described by densities.

Thus the only quantum paradox that remains is that particles with

nontrivial internal structure (and shape) can nevertheless be

indivisible, a fact coming from the representation theory of the

fundamental symmetry group of our universe: Indivisibility of an

object just means that this object is described by an irreducible

representation which cannot be decomposed further without violating

a fundamental symmetry.


S4c. Are electrons pointlike/structureless?


Both electrons and neutrinos are considered to be pointlike

as bare particles, because of the way they appear in the standard


But physical, relativistic particles are not pointlike.

A pointlike electron would be described exactly by the 1-particle

Dirac equation, which has a degenerate spectrum. But the real electron
is described by a modified Dirac equation, resulting in an anomalous

magnetic moment and a nonzero Lamb shift resolving the degeneracy


the spectrum. Both are measurable to high accuracy.

The relations between form factors for spin 1/2 parcles and

terms in a modified Dirac equation describing the covariant dynamics

of a particle deviating from a point particle are given in

L. L. Foldy

The Electromagnetic Properties of Dirac Particles

Phys. Rev. 87 (1952), 688 - 693.

An intuitive argument for the lack of pointlikeness is the fact that

their localization

to a region significantly smaller than the de Broglie wavelength

would need energies larger than that needed to create

particle-antiparticle pairs, which changes the nature of the system.

(See also this FAQ about localization, and Foldy's papers quoted there.)

On a more formal, quantitative level, the physical, dressed particles

have nontrivial form factors, due to the renormalization necessary to

give finite results in QFT. The form factor measures the deviation

form the behavior of an ideal point particle, i.e., a particle obeying

exactly the the Dirac equation. The form factor can be measured
indirectly, through the anomalous magnetic moment and the Lamb

(A point particle has no anomalous magnetic moment and no Lamb


since it satisfies the Dirac equation exactly.)

Nontrivial form factors give rise to a positive charge radius.

In his book

S. Weinberg,

The quantum theory of fields, Vol. I,

Cambridge University Press, 1995,

Weinberg defines and explicitly computes in (11.3.33) a formula for the

'charge radius' of a physical electron. But his formula is not

fully satisfying since it is not fully renormalized (infrared

divergence: the expression contains a ficticious photon mass,

and diverges if this goes to zero).

For electron form factors in light atoms, see

hep-ph/0002158 = Physics Reports 342, 63-126 (2001):

Equaon (28) uses a binding energy dependent cutoff,

which makes the electron charge radius depend on its surrounding.

Of course, other particles also have form factors and associated

charge radii. For proton and neutron form factors, see hep-ph/0204239
and hep-ph/030305. Neutrons have a negave mean squared charge

This looks strange but is not since the measure for the mean is

not positive; but it means that a classical interpretation of the

charge radius of neutrons is dubious. In the introduction of

S. Kopecky et al

Phys. Rev. C 56, 2229-2237 (1997)

one can read:

''The charge radius of the neutron <r_n^2> or the mean squared charge

radius is described by the volume integral over the neutron

integral rho(r)r^2dr, where r is the distance to the center of

the neutron and rho(r) is the charge density.

Positive as well as negative values of rho(r) will occur coming

from the distributions of valence quarks and the negative p-meson

cloud outside.

Since rho(r) is negative for larger r values, caused by the meson

cloud, the r^2 dependence of the integral will lead to a negave

value of <r_n^2>.''

The paper

L.L. Foldy,

Neutron-electron interaction,

Rev. Mod. Phys. 30, 471-481 (1958).

discusses the extendedness of the electron in a phenomenological

On the numerical side, I only found values for the charge radius

of the neutrinos, computed from the standard model to 1 loop order.

The values are about 4-6 10^-14 cm for the three neutrino species.

See (7.12) in Phys. Rev. D 62, 113012 (2000)

gives in an abstract of a 1982 thesis of Anzhi Lai

an electron charge radius of ~ 10^{-16} cm

(But I haven't seen the thesis.)

The "form" of an elementary particle (considered as a free particle

at rest) is described by its form factor,

which is a well-defined physical function

(though at present computable only in perturbation theory)

describing how the (spin 0, 1/2, or 1) parcle's response to an

external classical electromagnetic field deviates from the

Klein-Gordon, Dirac, or Maxwell equations, respectively.

The form factor contains the complete state-independent information

about a free particle, since it determines the (single-particle)

Hamilton operator of the free particle and everything else can be

computed from it.

In Foldy's paper, the form factors are encoded in the infinite sum

in (16). The sum is usually considered in the momentum domain;

then one simply gets two k-dependent form factors, where k


the 4-momentum transferred in the interaction. These form factors

can be calculated in a good approximation perturbatively from QFT,

see for example Peskin and Schroeder's book.

An extensive discussion of form factors of Dirac particles

and their relation to the radial density function is in

D. R. Yennie, M. M. Levy and D. G. Ravenhall,

Electromagnetic Structure of Nucleons,

Rev. Mod. Phys. 29, 144-157 (1957).


R. G. Sachs

High-Energy Behavior of Nucleon Electromagnetic Form Factors

Phys. Rev. 126, 2256-2260 (1962)

Yennie et al. write:

''Information about the internal structure of the individual nucleons

is contained in the results of a variety of experiments performed in

recent years. [...] The Lamb shift and the hyperfine splitting also

give such information, [...] The charge-current density of the nucleon

(proton or neutron) includes all of the effects of the internal

structure. [...] The nucleon charge-current density must have the form

<formula involving two form factors F_1 and F_2>

The funcons F_1 and F_2 are relavisc generalizaons of the form

factors characteristic of finite extension occurring in other

experiments, [...]''

However, the form factor contains nothing at all about

interaction- or state-dependent information since the

interaction-dependent information is coded in an external potential

or a multiparticle formulation, and the state-dependent information

is coded in the wave function or density matrix, which (at any given

time) is independent of the Hamiltonian.

Also, the information contained in the form factor is only about

the free particle in the rest system, defined by a pure state

in which momentum and orbital angular momentum vanish identically.

In an external potential, or in a state where momentum (or orbital

angular momentum) doesn't vanish, the charge density (and the

resulting charge radius) can differ arbitrarily much from the

charge density (and charge radius) at rest.

For example, for a hydrogen electron in the ground state,

the charge density is significant in a region of diameter about

10^-11 cm (a small mulple of the Bohr radius), while the

charge radius at rest is probably (in view of the above partial

results) << 10^-12 cm.

In all cases, the charge distribution is defined as the

expectation of the charge density operator of the corresponding

quantum field. For molecules, this charge distribution is the

computational target of much of quantum chemistry, and defines the

shape of a molecule. The shape of a particle determined by the form

factor therefore corresponds to the equilibrium shape a molecule

takes in its rest frame in the absence of forces, i.e., in its

ground state, while the state-dependent shape corresponds to the

much less predictable shape of a molecule interacting with its



S4d. How much informaon is in a parcle?


Knowing a particular electron intimately is infinitely precious.

A pure state of an electron is defined by its wave function

(up to a phase). Thus knowing all about an electron requires in the

traditional interpretation to know all about this wave function -

an infinite amount of information.

The information humans are interested in is however always finite,

since they can hardly remember even 20 decimal digits seen only once.

And the amount of information humans are capable of retrieving

by experiment is still limited, since each experiment has only a finite


Thus they simplify things to the point that all they want to know about

an electron is its mass, charge and its state to a small number

of decimal places.

This is only a few bits. But if you want to tell someone else exactly

where the electron is that you are referring to, you have an

infinitely more difficult task. Of course, any human 'else' will not

be patient enough to hear the whole (infinite) story but will be

satisfied with a crude position and momentum

estimate consistent with the uncertainty relation. But this is not the

best possible statement about the electron, which would be telling

its complete wave function. You can do it only if you force the

electron into a prison where it has to behave in a dull (and hence

completely describable) way, being

restricted in its freedom to at most a few bits of change.

This is indeed done when studying qubits for quantum information

For an N-state system, one needs N^2-1 independent pieces of

information to reconstruct (by quantum tomography) the density


of a finite mixed quantum system, and a fortiori the wave function of

a finite pure quantum system. Most natural systems, unlike those

systems carefully prepared by modern technology, have infinitely


states, and therefore need an infinite amount of information for their

reconstruction to full accuracy.


S4e. Entropy and missing information


[This continues the preceding entry.]

How is this notion of information related to information in terms of


Informally, entropy is often equated with information, but this is not

correct - entropy is _missing_ information!

More precisely, in the statistical interpretation, the state belongs

not to a single particle but to an ensemble of particles.

Entropy measures the amount of information missing for a complete

probabilistic description of a system.

Entropy is the mean number of binary questions that must be asked in

an optimal decision strategy to determine the state of a particular

realization given the state of the ensemble to which it belongs.

See Appendix A of my paper

A. Neumaier,

On the foundations of thermodynamics,



The formula for the entropy S found in every

statistical mechanics textbook is, for a system in a mixed state

described by the density matrix rho,

S = <kbar log rho> where <f> = Tr rho f

and kbar is Boltzmann's constant. (I use the bar to be free to use k

as an index.) In any representation where rho is diagonal,

rho = sum_k p_k |k><k|,

this gives

S = kbar sum_k p_k log p_k;

also, since <1>=1 and rho is posive semidefinite,

sum_k p_k = 1 , all p_k >= 0.

Thus p_k can be consistently interpreted as the probability of the

system to occupy state |k>. This probability interpretation

depends on the orthonormal basis used to represent rho; which basis

to use is a famous and not really solved problem in the foundations of

quantum mechanics.

For a pure state psi, rho has rank 1, and the sum extends only over

the single index k with |k> = psi. Thus in this case, p_k = 1 and

S = kbar 1 log 1 = 0, as it should be for a state of maximal

information. The amount of missing information is zero.

For more along these lines, and in particular for a way to avoid

the probabilisc issues indicated above, see Secons 6 and 12

and Appendix A of my paper

A. Neumaier,

On the foundations of thermodynamics,



But how does the infinite amount of information in a pure state (wave
function) square with the finiteness of entropy?

Specifying a mixed state _exactly_ provides already an infinite amount

of information, since the density matrix rho must be specified to

infinite precision.

Defining the eigenstates that are of interest in measurement

amounts to specifying a Hamiltonian operator H _exactly_, which again

provides already an infinite amount of information, since the

coefficients of H in an explicit description must be specified to

infinite precision.

Then only a finite amount of information is missing to determine

in which of the eigenstates a particular particle is.

Of course in practice one just _postulates_ rho and H based on a

finite number of measurements, and _pretends_ (i.e., procedds as if)

they are known exactly, while knowing well that one knows them only


In practice, a number of approximations are made. Frewquently,

one postulates exact equilibrium, hence a grand canonical ensemble,

which of course is not exactly valid. Deviations from equilibrium are

handled by means of a hydrodynamical approximation, in which

is no longer a number but a field - and specifying the entropy density

again requires an infinite amount of information. Of course, one

also represents this only to some limited accuracy, to keep things


Thus finiteness of the entropy in a particular model is enforced by

making simplifying assumptions which are valid only if one doesn't

look too closely.

Indeed, as the Gibbs paradox (discussed, e.g., as Example 9.1 in

my above thermodynamics paper) shows, the amount of entropy


on the level of modeling.

An analogy contributed by Gerard Westendorp:

To describe a classical, slightly biased die exacltly by a

probability distribution also requires an infinite amount of

informaon, namely the specificaon of 5 infinite decimal expansions

of the probabilities p_k for getting k eyes. (The sixth is the

determined since probabilies sum up to 1.) This is much more than

the finite amount of information in saying which particular value k

was obtained in a specific die. On the other hand, _given_ the

distribution, the entropy S = - sum p_k log p_k is finite.

In general, describing the probabilistic state of an ensemble exactly

requires much more information than the exact description of a

particular realization.


S4f. How real is the wave funcon?


In thought experiments one often assigns a state to a single particle.

How defendable is this, and what is the meaning of the state?

In a statistical interpretation - see the section on measurements -,

this would make no sense, since there the state is a property

of the ensemble of particles generated by a given source. But then

it is difficult to visualize what happens in each single case.

Thus many people prefer the 'realistic' language of particles having

definite states. So let us discuss some of its implications.

Suppose that the particle is in the pure state represented by the wave

function psi. It is possible to give the wave function, or rather its

absolute valued squared, a geometric interpretation:


is the mass density and


the charge density.

Thus while the wave function itself has no tangible interpretation,

certain fields computable from it have.

This extends - but not quite in the obvious way - to multiparticle


For a system of several, say n particles, the wave function is

3n-dimensional psi(x_1,...,x_n), each x_i being an ordinary

3-dimensional position vector, but the correct densities are

sll 3-dimensional, obtained by integration:

m(x) = sum_a m_a integral dx_1...dx_n delta(x-x_a)|psi(x_1:n)|^2,

e(x) = sum_a e_a integral dx_1...dx_n delta(x-x_a)|psi(x_1:n)|^2.

This reduces for n=1 to the above, and is consistent with the

definition of mass and charge density in quantum field theory as

m(x) = <Psi_0(x)^* e Psi_0(x)>,

e(x) = <Psi_0(x)^* e Psi_0(x)>,

where Psi_0(x) is the me component of the relevant maer field.

These formulas are the common starting point for the derivation from
first principles of the semiconductor equations in solid state physics.

It is also what chemists draw as molecular shapes, using a cutoff where

m(x) and e(x) are negligible to delineate the boundary. Indeed,

chemists use such an interpretation all the time when visualizing

molecules in terms of orbitals, and with great success. The charge

distribution of the electron cloud of a molecule is one of the

important outputs of quantum chemistry packages such as

GAUSSIAN (commercial)

MOLPRO (commercial)

GAMESS (free after registration)

In the ground state (but also in definite excited states),

the mass or charge distribution is spread out over an infinite region,

although it becomes negligibly small outside a tiny core region

(or, sometimes, such as in Stern-Gerlach experiments, where the

wave function is multimodal, outside a few disconnected core regions).

The infinite extension invites apparent paradox in that upon

collapse (e.g., due to hitting a detector screen), the particle

contracts from its infinite extension to a single spot. This seems

to violate the central tenet of relativity that information cannot

flow faster than the speed of light.

However, special relativity only restricts the observational

consequences of theory. Since most of the wave function of an

individual particle is unobservable, there is no contradiction.

(It is like the nonlocality in tests of Bell's inequalities.

Nonlocality is unavoidable in QM, but the observable consequences

respect the bound relativity puts on the speed of information flow.)

For example, on a TV set, one observes just 3 posion degrees of

freedom of each electron reaching the screen, while - in contrast

to the case of a classical particle - the wave function

characterizing a pure state of the electron sits

in a space of funcons of 3 variables, which has infinitely many

degrees of freedom. Thus one observes only a tiny little bit about

the electron's state. It is like knowing the velocity of the wind

(a 3-dimensional vector field) in the earth's atmosphere

at a single point (giving a velocity vector with 3 coordinates)!

This unobservability of most of the state causes a problem for

those who require that everything a theory is talking about is

observable. But this requirement is not satisfied anyway in current

microphysics - no one ever observed a quark, but it is generally

believed that they make up most of the matter in our universe.

Thus, while it is reasonable to require that theory has observable

consequences in agreement with Nature, it is not reasonable to

require that everything the theory talks about is observable.

Then the unobservability of most of the state of a single particle

is harmless.

On the other hand, one can probe the state of particles in detail

if one has a large ensemble of identically prepared particles

(to make sure that they have the same state). These are usually

by a carefully calibrated source, such as a laser. Then one can

subject them to different kinds of measurements from which one can

reconstruct a reasonable approximation of the state by quantum

tomography. In theory, one can make the approximation arbitrarily


Similarly a particle bound to a surface in a stationary state will

be measurable repeatedly if after the measurement the particle


to its state (which is natural if the bound system is in equilibrium).

Therefore one can measure equilibrium properties quite accurately.

In this sense one can say that the state of a single particle is

indeed real, and objective.

Note that single particles can nowadays be routinely prepared and

studied; see, e.g.,

D. Leibfried, R. Blatt, C. Monroe, and D. Wineland,

Quantum dynamics of single trapped ions,

Reviews of Modern Physics 75 (2003), 281-324.

S.M. Reimann and M. Manninen,

Electronic structure of quantum dots,

Reviews of Modern Physics 74 (2002), 1283-1342.


S4g. How real are Feynman's paths?


In Feynman's version of quantum mechanics, amplitudes are calculated

as sum over all possible classical paths a particle (or a system)

can take in a classical phase space.

The paths in the Feynman picture of QM should not be regarded as

All possible paths are about as real as all possible books that can

be written, or - closer to physics - all possible items in a

statistical ensemble modeling a classical ideal gas. Of course only one

state is realized, not all conceivable ones; all others are just there

to compare to and compute probabilities.

In QM things are slightly more complicated, however, since the 'true'

path is smeared by the uncertainty principle. (Even in the many-wolds

interpretation, quantum objects have no sharp paths, while the paths

integrated over in a path integral must be perfectly accurate.)

The paths are just calculational devices that stop to exist once a

different approach to computations are taken. This is why I don't

ascribe any reality to them. The real objects remain present in

_any_ sensible description; the unreal one's don't.


S4h. Can parcles go backward in me?

In the old relavisc QM (e.g., in Volume 1 of Bjorken and Drell)

antiparticles are viewed as particles traveling backward in time.

This is based on a consideration of the solutions of the Dirac equation

and the idea of a filled sea of negative-energy solutions in which

antiparticles appear as holes (though this picture only works for

fermions since it requires an exclusion principle). One can go some way

with this view, but more sophisticated stuff requires the QFT picture

(as in Volume 2 of Bjorken and Drell and most modern treatments).

In relativistic QFT, all particles (and antiparticles) travel forward

in time, corresponding to timelike or lightlike momenta.

(Only 'virtual' particles may have unrestricted momenta; but these are

unobservable artifacts of perturbation theory.)

The need for antiparticles is in QFT instead revealed by the fact that

they are necessary to construct operators with causal


relations, in connection with the spin-statistic theorem. See, e.g.,

Volume 1 of Weinberg's quantum field theory book.

Thus talking about particles traveling backward in time, the Dirac sea,

and holes as positrons is outdated; it is today more misleading

than it does good.


S4i. What about parcles faster than light (tachyons)?


Tachyons are hypothetical particles with speed exceeding the speed of

light. Special relativity demands that such particles have imaginary

rest mass (negave m^2), and hence can never be brought to rest

(or below the speed of light); unlike ordinary particles, they speed

up as they lose energy,

Charged tachyons would produce Cerenkov radiation in vacuum which


never been observed. However, Cerenkov radiation is indeed observed

when fast particles enter a dense medium in which the speed of light

is smaller than the particle's speed. This is not a problem since

relativity only demands that no particle with real mass is faster

than the speed of light in vacuum.

(Unfortunately, this does no longer allow to discriminate between

massless particles having the vacuum speed of light, and tachyons.)

Neutrinos are uncharged and have a squared mass of zero or very close

to zero, and hence could possibly be tachyons.

Recently observed neutrino oscillations confirmed a small

squared mass difference between at least two species of neutrinos.

This does not yet sele the sign of m^2 for any species.

Direct measurements of m^2 have experimental errors sll compable

with m^2=0. For data see hp://

The inial interest in tachyons stopped around 1980, when it was

clear that the QFT of tachyons would be very different from standard

QFT, and that experiment didn't demand their existence. The


of the particle data group, which contain the biannually revised

consensus of the particle physics community, do not even include the

search for tachyons in their reviews of hypothetical particles:


In fact, the theory of symmetry breaking demands that tachyons do

_not_ exist: When a relativistic field theory is deformed in a way

that the square of the mass (pole of the S-matrix) of some physical

particle would cross zero, the old physical vacuum becomes unstable

induces a phase transition to a new physical vacuum in which all

particles have real nonnegative mass. This would happen already at

ny negave m^2,

and is believed to be the cause of inflation in the early universe.

(Of course, the exact mechanism is not known since it would require a

nonperturbative definition of QFT. But classical and semiclassical

computations strongly suggest the correctness of this picture.)

Expanding a theory (such as the standard model) around an unstable


(e.g., the Higgs with a local maximum at vanishing vacuum


formally produces a bare tachyon. This does not contradict the above

assertion, but only indicates the instability of the bare vacuum.

Asymptotic power series expansions around maxima

(especially those with tiny or vanishing convergence radius)

make meaningless assertions about the behavior of a function near


of its minima. Since physical particles arise from field excitations

near the global minimum of the effective energy, perturbations around

the maximum are unphysical.

An expansion around an unstable state gives no significant


unless one has a system that actually _is_ close such an unstable state

(as perhaps the very early universe). But in that case there are no

relevant excitations (tachyons), since the whole process of motion

(inflation) towards a more stable state proceeds so rapidly that

excitations do not form and everything can be analyzed semiclassically.

The physical Higgs field is far away from the unstable maximum, and

particle excitations have a positive real mass, hence are not tachyons.

Below are some references about tachyons.

the more important papers are marked by an asterisk.

* G. Feinberg,

Possibility of Faster-Than-Light Particles,

Phys. Rev. 159, 1089 (1967).

J. Dhar and E. C. G. Sudarshan,

Quantum Field Theory of Interacting Tachyons,

Phys. Rev. 174, 1808-1815 (1968)

M. Glueck,

Note on Causal Tachyon Fields,

Phys. Rev. 183, 1514 (1969).

D. G. Boulware,

Unitarity and Interacting Tachyons,

Phys. Rev. D 1, 2426 (1970).

* B. Schroer,

Quanzaon of m^2<0 Field Equaons,

Phys. Rev. D 3, 1764 (1971).

G. Feinberg

Lorentz invariance of tachyon theories

Phys. Rev. D 17, 1651 (1978)

C. Schwartz

Some improvements in the theory of faster-than-light particles

Phys. Rev. D 25, 356 (1982)

SM. B. Davis, M. N. Kreisler, and T. Alvaeger

Search for Faster-Than-Light Particles

Phys. Rev. 183, 1132 (1969)

* L. W. Jones

A review of quark search experiments

Rev. Mod. Phys. 49, 717 (1977)

[Section IIIG reviews the vain search for tachyons.]

The Wikipedia entry for tachyons,
gives some more explanations.

although mainly speculating about connections between tachyons and

inflation, has some links with further useful information.


S4j. Do free parcles exist?


Free particles are a convenient mathematical abstraction.

In Nature, there are - strictly speaking - no free particles,

only interacting ones. This holds both for photons and for other

more tangible particles like electrons. However, in sufficiently

localized (and nearly empty) regions of space, particles can be

approximately free. Again, this holds for both photons and other


It is very convenient to approximate such states by free states.

For example, this allows to explain much of quantum mechanics

in terms of particle scattering. The S-matrix interpretation

depends crucially on the fact that the ingoing and outgoing

asymptotic states of photons, electrons, quarks, etc. are free.

Thus, in this sense, free photons exist just as much (or just as

little) as free electrons.


S5a. QM pictures and representaons


QM exists in different pictures, of which the Schroedinger picture,

the Heisenberg picture, the interaction picture, and Feynman's

path integral representation are frequently invoked. There is also

the algebraic approach using unitary representations of canonical

commutation rules (CCR).

The Schroedinger picture, the Heisenberg picture, and the interaction

pictures are equivalent because there are unitary transformations

between them. They all provide different representations of the

same canonical commutation rules

i[p_j,q_k]= hbar delta_jk

between components p_j of momentum p and q_k of position q.

The Stone-von Neumann theorem guarantees that the canonical

commutation relations (or their unitary version, the Weyl relations)

have a unique unitary representation apart from unitary

transformations, and hence suffice to specify the QM of finitely many

degrees of freedom uniquely, no matter which picture is used.

The Stone-von Neumann theorem fails for systems of infinitely many

degrees of freedom (see the FAQ entry on 'Inequivalent

representations of CCR/CAR'), which in a sense 'causes' the

difficulties in quantum field theory.

Nevertheless, QFT still has a Schroedinger picture

and a Heisenberg picture, and these are still equivalent:

The Heisenberg picture can be immediately constructed from the


fields. Then the canonical procedure - fixing the Heisenberg operators

at me t=0 and instead defining dynamical states

psi(t) := exp(-itH)psi

- produces the Schroedinger picture from it.

The Feynman path integral is related to the other pictures via the

Feynman-Kac formula, which makes the often only formally stated

equivalence precise, after analytically continuing the time to purely

imaginary times. The Osterwalder-Schrader theory

[see, e.g., math-ph/0001010 or the book by Glimm and Jaffe]

shows how to go back in case of relativistic quantum field theory.

The Feynman path integral only gives time-ordered expectation values;

this suffices to compute S-matrix elements, but is inadequate for

dynamical investigations needed for nonequilibrium quantum


The latter can be treated with the so-called closed time path (CPT)

integral within the Schwinger-Keldysh formalism.


S5b. Inequivalent representaons of the CCR/CAR


Ordinary quantum mechanics of N particles can be written in terms of

creaon and annihilaon operators for the 3N modes of an associated

reference harmonic oscillator. The field case, on the other hand,

is characterized by the fact that there are infinitely many modes.

If the creation and annihilation operators are those in the action

or Hamiltonian defining the QFT, the different modes are traditionally

referred to as 'bare particles', though this is not recommended for

reasons discussed elsewhere in this FAQ. If the creation and

annihilation operators are properly renormalized so that they

create and annihilate physical particles from the physical vacuum,

the modes are referred to as 'dressed particles'; only these have

physical relevance.

A state in which k modes are excited is called a k-particle state.

In many states of interest, however, (the most prominent ones being

the coherent states) infinitely many modes are excited (although the

notion of infinitely particles is strained in this case). Thus one

needs to cater in the formalism for states with arbitrarily many or

even infinitely many modes. This has subtle consequences, which

account for the big difference between quantum field theory and

ordinary quantum mechanics.

The canonical commutation rules (CCR) for creation and annihilation

operators in field theory take in the simplest case (countably many

modes, corresponding to fields confined to a bounded region) the form

[a(k),a^*(l)] = delta_kl, k,l=0,1,2,... (1)

The Stone-von Neumann theorem, which guarantees that the

commutation relations of quantum mechanics (or their unitary version,

the Weyl relations) have a unique unitary representation apart from

unitary transformations, fails for systems of infinitely many degrees

of freedom.

The reason for this is that the natural representation space for

creation and annihilation operators is the vector space consisting

of all formal linear combinations

sum psi(n1,n2,n3,...) |n1,n2,n3,...>

with _arbitrary_ complex coefficients psi(n1,n2,n3,...), on which

a(k) and a^*(l) act as

a(k)|n1,....,n_k,...> = sqrt(n_k)|n1,....,n_k - 1,...>,

a*(l)|n1,....,n_l,...> = sqrt(1+n_l)|n1,....,1+n_l,...>.

This vector space V has no natural Hilbert space structure.

To provide a definite inner product, one must select a suitable

subspace where this inner product can be defined.

This allows many choices; the choice usually discussed in QFT treatises

is Fock space, where only basis vectors |n1,....,n_k,0,0,...>

with finitely many particles are allowed, and these basis vectors are

declared orthonormal. As a result, Fock space contains only

the linear combinations

sum psi(n1,n2,n3,...,n_k) |n1,n2,n3,...,n_k>

where k is variable and

sum |psi(n1,n2,n3,...,n_k)|^2 is finite.

Unfortunately, if this choice is made for the representation of the

bare creation and annihilation operators, it excludes the states

relevant for the physical, interacting situation. This is the

essential message of Haag's no interaction theorem.

Indeed, the physical states lie in a different, inequivalent unitary

representation, characterized by a different subspace of V. This

subspace is generated by applying to the physical (= renormalized)

vacuum state the dressed (= renormalized) creation operators

an arbitrary number of times, then taking all finite linear

combinations, and finally taking the closure with respect to the

innner product in which all a^*(n_1)...a^*(n_k)|vac> are orthonormal.

In general, this Hilbert space has only the null vector (_not_ the

vacuum) in common with the Fock space, even for the simplest

(i.e.,quadratic) Hamiltonians and actions. This case is well understood,

giving rise to the theory of quasiparticles and in particular of

superconductivity. For example (counting modes by signed nonzero

integers for simplicity - they become momenta in the infinite volume

limit), if the bare a(k) and b(k) satisfy CCR then do the dressed
annihilation operators

alp(k) = A(k) a(k) - B(-k) b*(-k),

bet(k) = A(k) b(k) - B(-k) a*(-k),

and their formal adjoints

alp^*(k) = A(-k) a^*(k) - B(k) b(-k),

bet^*(k) = A(-k) b^*(k) - B(k) a(-k),

provided that A(k), B(k) are real numbers satisfying

A(k)^2 - B(k)^2 = 1,

or, equivalently, that

A(k) = cosh(theta(k)), B = sinh(theta(k)).

If there were only finitely many modes, we could define

in Fock space the unitary operator

G = exp [- sum_k theta(k) (a(k)b(-k) - b*(-k)a*(k))],

and verify that

alp(k) = G a(k) G^{-1},

bet(k) = G b(k) G^{-1},

showing that we get an equivalent representation of the CCR.

We could deduce that

|vac> := G|>,

where |> is the bare vacuum, is the dressed vacuum on which

alp and bet act naturally. The dressed states were simply be

the images of the bare states under the Bogoliubov operator G.

Unfortunately, if there are infinitely many modes, G can no

longer be consistently defined as an operator in Fock space,

and the infinite-dimensional version of this scenario breaks

down. Ignoring this, one would find all sorts of infinities.

Mathematically, however, one simply changed the unitary

representation - G does not exist although the dressed

representation exists.

Physicists say that the above computations hold 'formally',

and mean (if a mathematician tries to give it a precise meaning)

that it holds in finite mode approximations but does not survive

the limit although they usually formulate it in the meaningless,

limit form.

The canonical anticommutation rules (CAR) also have the form (1),

except that the commutator is replaced by an anticommutator.

All statements above are valid with appropriate modifications;

the most important one being that occupation numbers are now

restricted to 0 and 1, and the definion of a^*(l) has 1-n_l in

place of 1+n_l.

For more details see the book

H. Umezawa, H. Matsumoto, and M. Tachiki,

Thermo Field Dynamics and Condensed States,

North Holland 1982.


S5c. Why does QFT look so different from QM?


This is only because of technical reasons and the power of tradition.

In ordinary quantum mechanics, pure states are described by

wave functions (more precisely by rays) in a Hilbert space,

there is a Hamiltonian H and an associated Schroedinger equations

i hbar psidot = H psi, the time evolution is described by a unitary

operator, the bound states are normalized eigenstates of the

Hamiltonian, etc.

This is also done in traditional quantum field theory, though it

is not directly apparent. But one can see it when studying

construcve field theory. It gives everything in case of 2D quantum

fields. There is a well-defined Hilbert space, a well-defined

Hamiltonian constructed without any use of perturbation theory,

a well-defined unitary dynamics, well-defined bound states that

are eigenstates of the Hamiltonian, and everything is invariant under

the 2D Poincare group ISO(1,1). See the book

J. Glimm and A Jaffe,

Quantum Physics: A Functional Integral Point of View,

Springer, Berlin 1987.

The only thing wanting is an explicit formula for H in the traditional

nonrelavisc form H=H_0+V. Instead, H is constructed in a more

abstract way, as analytic continuation of an operator in Euclidean

field theory.

That the 4D case is more difficult has to do with obstacles in geVng

tight enough bounds for the analytic estimates needed. These are

mathematical difficulties, but not inconsistencies - no one proved that

there are contradictions, and the practice of QFT suggests that there

are indeed none (at least for asymptotically free theories).

On the perturbative level, there is no difficulty at all - see, e.g.

the book

M Salmhofer,

Renormalization: An Introduction,

Texts and Monographs in Physics,

Springer, Berlin 1999.

which constructs the Euclidean theory for Phi^4 theory in 4 dimensions

perturbatively, i.e., in the formal power series topology, with full

mathematical rigor. If this construction would work nonperturbatively

(i.e., give functions instead of formal power series),

analytic continuation using Osterwalder-Schrader theory would do

the rest. The laer is described, e.g., in Chapter 6 of the above

book by Glimm and Jaffe.


S5d. Why is QFT based on a classical acon?


The path integral approach to QFT begins with classical fields

that are varied to produce quantum amplitudes as a 'sum over all

possible paths'. But, with exception of the elctromagnetic field,

the classical fields one meets there are not fields occurring

in classical physics. Nevertheless they are rightfully labelled


Classical physics is the physics of processes slowly varying in space

and time; of course, elementary particles do not belong there.

But classical mechanics can also be considered as an abstract

mathematical framework for dynamics in a general phase space

(described by a Poisson manifold), which has much wider applicability.

The classical fields that figure in the path

integral belong in this sense to classical mechanics.

In QFT, one needs a classical action to be able to implement

unitarity of the S-matrix and the cluster decomposition.

The first is essential for a correct probabilistic interpretation of

QFT, since it amounts to preservation of probability, and the second is

necessary to account for the fact that all our experiments are done

locally, and what is far away does not contribute significantly

except through effectively classical far fields. (What happens with

the stars should be irrelevant to experiments on the earth, except for

the experiments of astronomers. This is the basis of all physics.)

In terms of microphysics, cluster decomposition means that one cannot

scatter particles (clusters of elementary particles) at very distant

particles (clusters).

The arguments why this requires a classical action expressed in terms

of creation and annihilation operators are explained in detail in

Weinberg's quantum field theory book, Volume I, Chapters 3-7.

We need cluster decomposition because it is observed. We need

local fields and microcausality, mainly because it implies

(modulo fine print involving contact terms) at least perturbatively

cluster decomposition, and there is no other known way in QFT to

ensure the latter. But there are covariant N-particle models with

cluster decomposition, discussed, e.g., in

B.D. Keister and W.N. Polyzou,

Relativistic Hamiltonian Dynamics in Nuclear and Particle Physics,

in: Advances in Nuclear Physics, Volume 20,

(J. W. Negele and E.W. Vogt, eds.)

Plenum Press 1991.

(The constructions are quite messy; they have, however, the

advantage that they do not need renormalization, and are useful

phenomenological models.)

The lack of references to cluster decomposition in standard textbooks

of QFT is explained by the fact that local QFT automatically satisfies

cluster decomposition. Most people start by taking QFT as starting

point, without asking why. Weinberg's treatise is about the only book

that asks this question and answers it in some depth.

But when you look at the literature on phenomenological covariant

multiparticle models, cluster decomposition plays an essential role

in that it is the main hurdle to overcome to get realistic models for

systems made of more than two unconfined particles. For details see
the survey by Keister and Polyzou mentioned above,

and the references there.

Cluster decomposition for field theory is also discussed from a

rigorous point of view in the book by Glimm and Jaffe, where

connections are made to multiparticle scattering.

Indeed, books on (nonrelativistic) scattering theory are the ones

where the cluster decomposition is discussed in detail, since it is

needed to describe the result of the most general multiparticle

scattering experiments, and an understanding of it is essential for

proving the asymptotic completeness of scattering states.

Nonrelativistic theory also shows that the 'correct'

cluster decomposition is always one for bound states,

as can be seen from a more detailed nonrelativistic analysis.

(This is not apparent from Weinberg's argument,

since perturbation theory breaks down in the presence of

bound states. This explains why QCD has no cluster

decomposition for isolated quarks.)

Unfortunately, most physicists tend to work in isolated fragments of

whole edifice of physics, thus losing connections that may be

to understanding. Cluster decomposition would perhaps be more


in QFT if it were easier to calculate properties of bound states and

their scattering or breaking up, since that is where one can see the

principle at work. But such calculations are presently out of reach

without severe approximations.


S5e. Why does the acon only contain first derivaves?


On the classical level, higher derivatives cause no formal problems,

one can form the variational equations as always. There might be

problems with causality (= symmetric hyperbolicity), however.

These problems become worse (and apparently untractable) in the

quantum case.

In a k derivave theory with k>1, one can always introduce new fields

for the k-1 first derivaves, and add terms to the acon that give

as variation their defining equations. Thus one can reduce any theory
to an equivalent one with only first derivatives in the action.

The problems appear when trying to go from the Lagrangian picture to

the Hamiltonian - then one gets similar difficulties as for gauge



S5f. Why normal ordering?


Field theory often deals with polynomial expressions in annihilation

operators a(p) and their adjoint creation operators a^*(p).

While a(p) is a linear operator on a dense subspace H of the

corresponding Fock space, its adjoint isn't. But both are densely

defined sesquilinear forms on Fock space.

A sesquilinear form is a linear mapping f from a space H (the domain;

a dense subspace of the Hilbert space, in the present case of Fock

space) to its dual space H^* (which properly contains H), while

an operator maps H into H. Thus the latter can be iterated

while the former usually cannot.

<phi|f|psi> is always defined when phi,psi in H (since f|psi> is in H^*,

the inner product is defined). Thus Hermitian sesquilinear forms are

satisfying candidates for 'observables'. However, matrix elements

<phi|fg|psi> of products fg make sense only for

operators f,g, since fg|psi> is not defined if g|psi> is outside H.

In particular, a(p)a(p)^* is a meaningless construct, while

:a(p)a(p)^*: = +-a^(p)*a(p)

makes sense as a Hermitian sesquilinear forms. But f(p)=a^(p)*a(p) is

no longer an operator in any sense (though good 1-particle

operators can be made by integration with suitable test functions).

That's why f(p)f(q) is meaningless while the permuted form

:f(p)f(q): = +-a^*(p)a^*(q)a(q)a(p)

(+ for Bosons, - for Fermions) is well defined (again as sesquilinear

form only).

More generally, any product O of creation and annihilation operators

which has all its creation terms to the left of all its annihilation

terms (these are called normally ordered products) defines a

sesquilinear form. The reason is that such an O can be written as

O=A^*B where A and B are products of annihilation operators only,

hence <phi|O|psi> = <phi|A^*B|psi> can be interpreted as the inner

product of the two vectors A|phi> and B|psi> obtained from phi and

by applying annihilation operators only, which produces vectors in H

for which the inner product is always defined.

Normal ordering just permutes arbitrary products to put them into the

normally ordered and hence well-defined form (and adds a minus sign

if an odd number of transpositions of Fermion operators is needed

to order the product). This is extended by linearity to polynomials

and infinite series in power products. Note that normal ordering is

defined for formal expressions (i.e. strings of letters),

not for operators or forms; only _after_ nornal ordering an

expression O one gets a sesquilinear form :O:.

In Fock spaces over finite-dimensional Hilbert spaces, the situation is

different; there a(p) and a^*(p) are indeed operators on Fock space

(and the index p ranges over finitely many items only). Thus all

products make sense, and the normally ordered version of a product

differs from the original product by terms involving fewer operators.

Normal ordering is usually motivated by starting with a

finite-dimensional discretization where integrals become finite sums;

then one can do all the formal manipulations rigorously. Upon passing

to the continuum limit, most expressions become infinite and hence

meaningless, but the normally ordered expressions happen to have a

well-definedlimit and hence are meaningful. So these are the relevant

'operators' or rather sesquilinear forms. Presenting things as above

avoids any infinities.


S5g. Why locality and causal commutaon relaons?


In measurement terms, locality is the idea that a measurement here

and a simultaneous measurement there can be performed


and in particular don't limit each other in precision. This is encoded

in the requirement that 'local' quantities described by fields

Phi_a(\x,t) here (at \x) and fields Phi_b(\y,t) there (at \y)

commute if the positions \x and \y are distinct.

The covariant form of this locality requirement is that,

with x=(ct,\x) and the +--- norm defined by x^2=x_0^2-\x^2,

[Phi_a(x),Phi_b(y)]=0 if (x-y)^2<0 (*)

Indeed, if x_0=y_0=ct then (x-y)^2=(x_0-y_0)^2-(\x-\y)^2=-(\x-\y)^2<0,

so this commutation relation holds at equal time. But then Lorentz

covariance implies that it must hold whenever (x-y)^2<0, since any

pair (x,y) with (x-y)^2<0 can be transformed into an equal me pair.
Thus locality is a property of distinguished fields satisfying (*),

called local fields. This property is completely independent of states,

since it is understood that the property holds independent of the

coincidental properties of the state.

Quantum field theory is physics in the Heisenberg picture, with

states fixed once and for all, and all spacetime dependence in

the fields. The universe is in a definite though largely unknown state,

and apart from the Lagrangian of the standard model plus gravitation,

all the history, present and future of the universe is encoded in

this universal state.

Lacking knowledge of this state, physicists are usually

contend with describing tiny portions of this state, namely the

restriction of the state to a subalgebra of accessible quantities

within the lab (or at least close to the solar system).

Since there are many such subsystems of interest, and all these

are in different states even if described by the same algebra

(more precisely by isomorphic ones), all generic properties of

physical systems must be independent of the states.


S5h. Creaon operators and rigged Hilbert space


Physicists regard Fock space as the Hilbert space containing the

basis states

|x_1:N> = |x_1,...,x_N>

and their linear combinations. However, there is no Hilbert space

containing these states. The state |x_1:N> = |x_1,...,x_N>

is not in the Hilbert Fock space, for the same reason for which

|x> is not in the 1-particle Hilbert space. It is only a


The Hilbert Fock space is made instead of all wave functions

psi = sum_N integral dx_1:n psi_N(x_1:N) |x_1,...,x_N>

with finite

<psi|psi> = sum_N |psi_N|^2/N!

Physicists also define annihilation operators a(x) and

their adjoints, creation operators a^*(x). However, these are

not operators, but operator-valued distributions. For example,

a^*(x) maps the vacuum state |vac> (with psi_0=1, other psi_N=0)
into a^*(x)|vac> = |x>, which is not in the Hilbert Fock space.

More generally, for every nonzero Hilbert Fock space vector psi,

the vector

psi' = a^*(x) psi

lies outside the Hilbert Fock space state.

Thus the domain of a^*(x) is just {0}.

However, the states |x_1:N> = |x_1,...,x_N> lie in the top

layer H^* of the right Gelfand triple = rigged Hilbert space.

This is the name for a triple H in Hbar in H^* of vector spaces,

where Hbar is a Hilbert space, H a dense 'nuclear' subspace

(containing very smooth states with very good behavior at infintity)

and H^* its dual space (containing among others very singular states

and states with very poor behavior at infintity). Observables (in the

weak sense) are bilinear forms, or, which is the same, linear mappings

from H to H^*. The adjoint of such a linear mapping is again an

observable in the weak sense. Annihilation operators a(x) (and their

adjoints a^*(x)) are observables in this weak sense, although they are

not Hermitian (and a fortiori not self-adjoint).

Most physicists take it lightly since the times of Dirac.

They don't bother about self-adjointness or any other functional

analytic concept, unless ignoring it brings them into trouble.

Almost everything they do in the nonrelativistic regime

can be made rigorous in the rigged Hilbert space, so they fare right

even when they imagine wrongly that they work in a Hilbert

space. Thus they get away with their bad practices.

What they call 'Hilbert space' _is_ in fact always a

rigged Hilbert space; although most of them just don't know and

don't care.


S5i. Why Feynman diagrams?


Feynman diagrams resemble processes with particles moving in space


time, and are often figurately treated as such. But in fact they

do _not_ describe such processes, but certain multiple integrals.

(To emphasize this, the particles involved in Feynman diagrams are

called 'virtual particles'. (Still, many people think mistakenly

that virtual particles are somehow also real. See the entries about

virtual particles elsewhere in this FAQ.)

Although it is nowhere said explicitly, Feynman diagrams are just

a mnemonic for nicely picturing the composition of higher order


Create for each tensor of a theory a different vertex type, draw a

vertex of this type for each occurence of this tensor in a product

expression in Einstein summation convention, and draw a line between

two such vertices whenever they share an index to be summed over.

The form of the lines defines the value of the coefficient function

in such a product, and the sum over Feynman diagrams simply means

one considers a linear combination of these products, integrated over

the arguments. Thus this defines a generic representation of an

expansion of a function of the tensors of the theory.

Tuus Feynman diagrams can be used whenever one expands a function

of one

or more tensors into a linear combination of products of components


these tensors.

Indeed, for this reason, they are also used in classical statistical

mechanics and in the analysis of stochastic differential equations

by functional integration techniques.


S6a. Nonperturbave computaons in quantum field theory


There is well-defined theory for computing contributions to the

S-matrix in quantum electrodynamics (and other renormalizable field

theories) by perturbation theory.

There is also much more which uses handwaving arguments and


to analogy to compute approximations to nonperturbative effects.

Examples are:

- relating the Coulomb interaction and corrections to scattering

amplitudes and then using the nonrelativistic Schroedinger


- computing Lamb shift contributions (now usually done in what is

called the NRQED expansion),

- Bethe-Salpeter and Schwinger-Dyson equations obtained by


infinitely many diagrams.

The use of 'nonperturbative' and 'expansion' together sounds

paradoxical, but is common terminology in QFT. The term


refers to results obtained directly from renormalized Feynman graph

evaluations. From such calculations, one can obtain certain


(tree level interactions, form factors, self energies) that can be

used together with standard QM techniques to study nonperturbative

effects - generally assuming without clear demonstrations that this

transition to quantum mechanics is allowed.

Of course, although usually called 'nonperturbative', these techniques

also use approximations and expansions. The most conspicous

high accuracy applications (e.g. the Lamb shift) are highly

nonperturbative. But on a rigorous level, so far only the perturbative

results (coefficients of the expansion in coupling constants) have any


Although the perturbation series in QED are believed to be asymptotic

only, one can get highly accurate approximations for quantities like the
Lamb shift. However, the Lamb shift is a nonperturbative

effect of QED. One uses an expansion in the fine structure

constant, in the ratio electron mass/proton mass, and in 1/c

(well, different methods differ somewhat). Starting e.g., with

Phys. Rev. Le. 91, 113005 (2003)

one should be able to track the literature.

Perturbative results are also often improved by partial summation of

infinite classes of related diagrams. This is a standard approach to

go some way towards a nonperturbative description. Of course, the

series diverges (in case of a bound state it _must_ diverge, already in

the simplest, nonrelativistic examples!), but the summation is done

on a formal level (as everything in QFT) and only the result

reinterpreted in a numerical way. In this way one can get

in the ladder approximation Schroedinger's equation, and in other

approximations Bethe-Salpeter equations, etc..

See Volume 1 of Weinberg's quantum field theory book.


S6b. The formal funconal integral approach to QFT

On a purely formal level (i.e., with power series in place of actual

numbers), 4D QFT is very alive and useful. It is now almost

always based upon functional integrals.

The path integral is discussed e.g., in Weinberg I, Chapter 9, or

Peskin/Schroeder, also Chapter 9. As one can see there, the

path integral formalism involves no operators at all, only classical

(commuting or anticommuting) fields.

The quantities obtained in the expansion of the path integral in

powers of hbar are time-ordered vacuum expectation values.

Since the original ordering in a time-ordered vacuum expectation value

is immaterial (apart from a sign for fermions), the same must be the

case for the path integral itself, which explains why the fields

in the path integral are classical (i.e., commute or anticommute

at all arguments).

The main strength of the path integral approach is precisely that

it avoids quantum operators and replaces all operator arguments by

averages over classical paths. (The main weakness is that this

averaging process is logically ill-defined.

There exists no prescription how the limit in the ``definition'' of

the path integral is to be taken to yield (in theory - independent

of the difficulty of computing them) numbers that have the properties

commonly ascribed to the path integral.)

The older canonical quantization approach was fraught with difficulties

because of inconsistencies in the operator approach.

For example, the canonical commutation rules (CCR) are

valid only in the free case, and no one knows how they should

be in the interacting case - though one knows that (anti)commutators

must still vanish at spacelike related arguments.

Moreover, the renormalization program plays havoc with operators.

Unfortunately, this means that dynamical isssues and bound states

questions, which are comparatively easy to handle in an operator

framework, become almost intractable in the path integral approach.

However, as Weinberg stresses in his QFT book, an understanding of

the relation between path integral and canonical quantization is

essential to get the properties of the latter correct in cases like

the nonlinear sigma model.

S6c. Funconal integrals, Wightman funcons, and rigorous QFT


QFT assumes the existence of interacting (operator

distribution valued) fields Phi(x) with certain properties, which

imply the existence of distributions


But the right hand side makes no rigorous sense in traditional QFT

as found in most text books, except for free fields. Axiomatic QFT

therefore tries to construct the W's - called the Wightman functions -

directly such that they have the properties needed to get an S-matrix

(Haag-Ruelle theory), whose perturbative expansion

can be compared with the nonrigorous mainstream computations.

This can be done successfully for many 2D theories and for some 3D

theories, but not, so far, in the physically relevant case of 4D.

To construct something means to prove its existence as a


well-defined object. Usually this is done by giving a construction

as a sort of limit, and proving that the limit is well-defined.

(This is different from solving a theory, which means computing

numerical properties, often approximately, occasionally

- for simple problems - in closed analytic form.)

To compare it to something simpler: In mathematics one constructs

Riemann integral of a continuous function over a finite interval by

some kind of limit, and later the solution of an initial value problem

ordinary differential equations by using this and a fixed point

theorem. This shows that each (nice enough) initial value problem is

uniquely solvable. But it tells very little of its properties, and

in practice no one uses this construction to calculate anything.

But it is important as a mathematical tool since it shows that

calculus is logically consistent.

Such a logical consistence proof of any 4D interacng QFT is presently

still missing. Since logical consistency of a theory is important,

the first person who finds such a proof will become famous - it means

inventing new conceptual tools that can handle this currently

intractable problem.

Wightman functions are the moments of a linear functional on

some algebra generated by field operators, and just as linear

functionals on ordinary function spaces are treated in terms of

Lebesgue integration theory (and its generalization), so Wightman

linear functionals are naturally treated by functional integration.

The 'only' problem is that the latter behaves much more poorly from
a rigorous point of view than ordinary integration.

Wightman funcons are the moments <Phi(x_1)...Phi(x_n)> of a


state < . > on noncommutative polynomials in the quantum field Phi,

while time-ordered correlation functions are the moments

<Phi(x_1)...Phi(x_n)> of a complex measure < . > on commutave

polynomials in the classical field Phi.

In both cases, we have a linear functional, and the linearity gives

rise to an interpretation in terms of a functional integral.

The exponential kernel in Feynman's path integral formula for the

time-ordered correlation functions comes from the analogy between

(analytically continued) QFT and statistical mechanics,

and the Wightman functions can also be described in a similar analogy,

though noncommutativity complicates matters. The main formal

reason for

this is that a Wick theorem holds both in the commutative and the

noncommutative case.

For rigorous quantum field theory one essentially avoids the

path integral, because it is difficult to give it a rigorous

meaning when the action is not quadratic. Instead, one only keeps
the notion that an integral is a linear functional, and

constructs rigorously useful linear functionals on the relevant

algebras of functions or operators. In particular, one can define

Gaussian functionals (e.g., using the Wick theorem as

a definition, or via coherent states); these correspond exactly

to path integrals with a quadratic action.

If one looks at a Gaussian functional as a functional on the

algebra of fields appearing in the action (without derivatives

of fields), one gets - after time-ordering the fields - the

traditional path integral view and the time-ordered correlation


If one looks at it as a functional on the bigger algebra of

fields and their derivatives, one gets - after rewriting the

fields in terms of creation and annihilation operators - the

canonical quantum field theory view with Wightman functions.

The algebra is generated by the operators a(f) and a^*(f),

where f has compact support, but normally ordered

expressions of the form

S = integral dx : L(Phi(x), Nabla Phi(x)) :

make sense weakly (i.e., as quadratic forms).

The art and difficulty is to find well-defined functionals

that formally match the properties of the functionals 'defined'

loosely in terms of path integrals.

This requires a lot of functional analysis,

and has been successfully done only in dimensions d<4.

For an overview, see:

A.S. Wightman,

Hilbert's sixth problem:

Mathematical treatment of the axioms of physics,

in: Mathematical Developments Arising From Hilbert Problems,

edited by F. Browder,

(American Mathemacal Society, Providence, R.I.) 1976, pp.147-240.


S6d. Is there a rigorous interacng QFT in 4 dimensions?


The Wightman axioms and the Osterwalder-Schrader axioms

[see, e.g., math-ph/0001010 or the book by Glimm and Jaffe]

are currently the basis on which rigorous quantum field theory

(at least for massive particles) is discussed.

In spite of many attempts (and though numerous uncontrolled

approximations are routinely computed), no one has so far succeeded

in rigorously constructing a single QFT in 4D which

has nontrivial scattering. Not even QED is a mathematical object,

although it is the theory that was able to reproduce experiments

(anomalous magnetic moment of the electron; see the entry

''Is QED consistent"" in this FAQ) with an accuracy of 1 in 10^12.

But till today no one knows how to formulate the

theory in such a way that the relevant objects whose approximations

are calculated and compared with experiment are logically well-


See, e.g., the S.P.R. threads

This probably explains the high prize tag of 1.000.000 US dollars,

promised for a solution to one of the Clay millenium problems,

that asks to find a valid

construcon for d=4 quantum Yang-Mills theories that is strong

enough to prove correlation inequalities corresponding to the

existence of a mass gap. The problem is to explain rigorously

why the mass spectrum for compact Yang Mills QFT begins at a positive

mass, while the classical version has a continuous spectrum

beginning at 0.

The mass gap is a property of the theory, not of a wave function.

Intuitively, it means that, in the rest frame of the total system,

the ground state (=vacuum) is an isolated eigenstate of the

Hamiltonian H, i.e., that the spectrum of H is a subset of

{0} union [E_1,inf]. The largest E_1 with this property defines

the mass gap m_1=E_1/c^2.

This would make proper sense for a nonrelativistic theory.

For a relativistic theory one has to read between the lines and

interpret everything in terms of suitable analogies,

for lack of a consistent mathematical theory.

The millenium problem essentially asks for a rigorous mathematical

setting in which the above can be made precise and proved.

The real problem is the rigorous construction of a Hilbert space with

a unitary representation of the Poincare group, such that a

perturbation argument recovers the traditional renormalized order by

order approximation of quantum field theory.

The state of the art at the time the problem was crowned by

a prize is given in

and the references quoted there. See also

I don't think significant progress has been published since then.

(The paper hep-th/0511173 which claims to have solved the problem

only consists of a bunch of heuristic arguments. That the author calls

it a proof doesn't turn it into a mathematical proof.)

Yang-Mills theories are (perhaps erroneously) believed

to be the simplest (hopefully) tractable case,

being asymptotically complete while not having the

extra difficulties associated with matter fields.

(There are only gluons, no quarks or leptons.)

Of course, one would like to show rigorously that QED is consistent.

But QED has certain problems (the Landau pole, see below) that are

absent in so-called asymptotically free theories, of which

Yang-Mills is the simplest.

Note that rigorous interacng relavisc theories in 2D and 3D exist;

see, e.g.,

J. Glimm and A Jaffe,

Quantum Physics: A Functional Integral Point of View,

Springer, Berlin 1987.

This book is quite difficult on first reading.

Volume 3 of Thirring's Course in Mathemacal Physics

(which only deals with nonrelativistic QM but in a reasonably

rigorous way) might be a good preparation to the functional analysis

needed. A more leisurely introduction of the physical side of the

matter is in

Elcio Abdalla, M. Christina Abdalla, Klaus D. Rothe

Non-Perturbave Methods in 2 Dimensional Quantum Field Theory

World Scienfic, 1991, revised 2nd. ed. 2001.


The book is about rigorous results, with a focus on solvable models.

Note that 'solvable' means in this context 'being able to

find a closed analytic expression for all S-matrix elements'.

These solvable models are to QFT what the hydrogen atom is to

quantum mechanics. The helium atom is no longer 'solvable' in the

present sense, though of course very accurate approximate


are possible.

Unfortunately, solvable models appear to be restricted to 2


The deeper reason for the observaon that dimension d=2 is special
seems to be that in 2D the line cone is just a pair of lines.

Thus space and time look completely alike, and by a change of


2g. (light front quanzaon), one can disentangle things nicely

and find a good Hamiltonian description.

This is no longer the case in higher dimensions. (But 4D light front

quantization, using a tangent plane to the light cone, is well alive

as an approximate technique, e.g., to get numerical results from QCD.)

Thus, while 2D solvable models pave the way to get some rigorous

understanding of the concepts, they are no substitute for the

functional analytic techniques needed to handle the non-solvable

models such as Phi^4 theory.


S6e. Construcve field theory


Rigorously defined Lorentz-covariant quantum field theories are


to exist in 2 and 3 dimensions; the standard reference (for d=2)

is the book by

J. Glimm and A. Jaffe,

Quantum physics. A functional integral point of view

New York, 1981

A recent review of the achievements of constructive

quantum field theory in dimensions < 4 is

V. Rivasseau

Constructive Field Theory and Applications:

Perspectives and Open Problems,

J. Math. Phys. 41 (2000), 3764-3775.

The case d=4 is a famous unsolved problem; the special case of 4D

quantum Yang-Mills gauge theory with a compact simple, nonabelian

gauge group is one of the Clay Millenium problems with a 1 million

Dollar prize attached to its solution.

Let me explain some aspects of the construction given in

Glimm and Jaffe.

First one needs to understand that the construction breaks the Lorentz

symmetry. This is (although they don't draw this connection) because

irreducible Poincare representations, one can construct only three

commuting coordinates, and their construction is observer-dependent,

i..e, dependent on singling out a preferred time. Of course, the final

theory is again Lorentz invariant.

To motivate construction, one therefore needs to choose a time

coordinate, then one makes analytical continuation to Euclidean time

(i.e. it in place of t), and shows that one gets an SO(4) symmetric

field theory in place of the Lorentz symmetry. The advantage gained is

that the functional calculus over a space with definite metric is

well-defined mathematically (via a limit approach through lattices, or

via Wiener measures) - this is just classical stochastic calculus.

Conversely, and this is the constructive part, given an SO(4) symmetric

field theory, one can choose a direction as Euclidean time and obtain

(via a fairly simple construcon detailed in Chapter 7) within that

theory a well-defined Hamiltonian on a suitably constructed Hilbert

space of 3-dimensional fields. This Hamiltonian defines a time

evolution as in ordinary quantum mechanics. The nontrivial part


is the Osterwalder-Schrader reconstruction theorem stated in Chapter


but proved much later in the book - the forward references in Glimm

and Jaffe are, unfortunately, quite confusing) is to show that the

resulting theory is Lorentz invariant.

Thus the construction reduces to constructing the Euclidean field

theory. This is done via a Lattice regularization; indeed, all lattice

field theory and computation is based on the Euclidean formulation

rather than the Minkowski formulation.

In 2D and 3D, the exisng analyc error esmaon techniques are

sufficient to prove the existence of the limit with suitably

renormalized operators. In 4D, there are addional technical

problems that have not been overcome so far. But neither has it been

proved that any of the 4D field theories cannot exist. There are some

informal arguments suggesting this or that, but none of them is

conclusive in the sense of having paved the way towards a


or a no-go theorem.


S6f. The classical limit in relavisc QFT

The classical limit of a quantum field theory is the

theory defined by taking the Lagrangian occuring in the functional

formalism and making the corresponding action stationary.

Note that a functional integral is an integral in which all

fields have classical meaning. The quantum interpretation comes

from taking the functional integral as a generating functional for

S-matrix elements, while the classical interpretation comes from

taking a saddle point approximation. Since the k loop contributions

scale with hbar^k, they disappear in the classical limit hbar to 0,

so only the tree diagrams are left in the expansion, which correspond

to the saddle point approximation in the functional integral.

This needs a slight qualification for Fermions, e.g., electrons.

A fermion field Psi(x) itself, being an anticommuting field,

has no direct classical meaning, but has the numerical advantage

that it is a field in 3 instead of 6 variables. Products of two

Psi terms commute with each other, hence have a direct classical

interpretation. Indeed, classically there is an electron density

field W(x,p) given by the Wigner transform of Psi(x)Psi(y)^*,

where Psi(x) is the classical Grassmann field occuring in the

Lagrangian, satisfying a Dirac equation with an electromagnetic

interaction added. This field W(x,p) is measurable and plays a role

in semiconductor modeling. (In the definition of the Wigner transform,

a second hbar appears, a remnant of second quantization. If one

moves this to zero, too, the description in terms of Psi is no longer

possible, and one gets instead a Vlasov equation for W.)

Thus the classical limit of the standard model is a mathematically

well-defined theory, while the quantum version is only perturbatively

defined, which means, it is mathematically undefined - even for QED.

Nevertheless, the renormalization prescription make at least the

coefficients of the asymptotoc series in hbar well-defined, which is

what particle physicists use to extract approximate physical


In this relaxed sense, the quantum standard model is also well-defined.


S6g. What are interpolang fields?


Traditional QFT has rules for computing reasonable approximations

to the S-matrix of a field theory. The S-matrix describes the behavior

of a state of the system under a transition from time t=-inf to time

t=+inf. But in a complete dynamical theory, one would like to be able

know what happens in-between at finite times. In nonrelativistic QM,

this information is given by the Schroedinger equation. In QFT it is

given by the interpolating field - called interpolation since it

interpolates between the infinite limiting times.

More precisely, the dynamical information about the interpolating

field is represented mathematically in the Wightman functions,

which give the (renormalized) vacuum expectations of field products

at arbitrary combination of space-time points.

Unfortunately, no one knows how to compute the latter in relativistic

$D quantum field theories. However, Wightman functions have been

constructed rigorously in lower dimension (more precisely

in certain superrenormalizable theories in 2 and 3 dimensions).


S6h. Hilbert space and Hamiltonian in relavisc quantum field theory


Most of current quantum field theory (i.e., everything with exception

of 2D and 3D construcve field theory - which doesn't even cover QED)

does not have a well-defined Hilbert space at all, in which a

time operator would be defined.

Well-defined are only the asymptotic Hilbert spaces of in and out

states for scattering experiments. These are Fock spaces of

free particles, and hence defined on a mass shell.

There is a basic result called Haag's theorem which states that

these asymptotic Fock spaces cannot carry a nontrivial local dynamics,

as would be required for a field theory.

The full dynamics can be defined only indirectly, via CTP (closed

time path) integration, and subject to all interpretation problems

of the renormalization procedures.

Constructing for a relativistic field theory a physical

Hamiltonian which is bounded below is really difficult, and has

been achieved only in less than 4D theories.

The construction is usually based on a preferred time coordinate

which is needed in all cases I am familiar with;

- in the Foldy-Wouthuysen transformation (for the Dirac equation,

where p_0 also fails to have the right properes),

- in the Newton-Wigner construction (for single particles in

an arbitrary massive irreducible representation of the

Poincare group) and

- in the Osterwalder-Schrader reconstruction theorem (for

Lorentz-invariant field theories from Euclidean field theories).

While the Hilbert space and the Hamiltonian depend on the choice of

the time coordinate, the physics is independent of it since all these

Hilbert spaces are isomorphic via isomorphisms that maps the

Hamiltonians into each other.


S6i. 2-dimensional quantum field theory


Much of the state of the art in 2-dimensional relativistic quantum

field theories is covered in two books,

Elcio Abdalla, M. Christina Abdalla, Klaus D. Rothe

Non-Perturbave Methods in 2 Dimensional Quantum Field Theory

World Scienfic, 1991, revised 2nd. ed. 2001.


J. Glimm and A Jaffe,

Quantum Physics: A Functional Integral Point of View,

Springer, Berlin 1987.

The first book treats exactly solvable theories, the second book
treats general polynomial interactions. The methods are completely

different in the two cases, and the two books are essentially disjoint.

Unfortunately, both books are somewhat difficult to read.

Abdallah et al. treat those (very special) 2-dimensional quantum field

theories having closed analytic expression for all S-matrix elements'.

These solvable models are to 2-dimensional quantum field theory what

the hydrogen atom is to quantum mechanics. It gives lots of details

about many solvable models, but I found it too specialized to give me

a feeling of general 2-dimensional quantum field theory.

Glimm and Jaffe assume a lot of measure theory and functional


This is summarized in Appendix A of their Part I, but working first

through Volume 3 of Thirring's Course in Mathemacal Physics (which

only deals with nonrelativistic QM but in a reasonably rigorous way)

would be a good preparation for tackling Gliimm and Jaffe.

They construct - rigorously - for 2-dimensional relativistic

Lagrangian scalar field theories with polynomial interaction a Hilbert

space, a well-defined Hamiltonian, a well-defined unitary dynamics,

with well-defined bound states that are eigenstates of the


and everything is invariant under the 2D Poincare group ISO(1,1).

Chapter 3 defines a rigorous version of the path integral for ordinary

quantum mechanics, or rather for the Euclidean version of it, with the

i in the Schroedinger equation dropped. This amounts to analytic

continuation to imaginary time, where everything is easy and

respectable. In place of a hyperbolic differential equation one gets

a parabolic one (the heat equation), which makes things tractable

since the heat kernel is positive and hence the measures needed to

make the path integral rigorous are positive Wiener measures, with a

good rigorous theory.

Quantum field theory starts in Chapter 6. It is presented in a

Euclidean and a Minkowski version, the former being an analytic

continuation of the latter. Both versions are defined axiomatically,

by the Osterwalder-Schrader axioms and the Wightman axioms,

respectively. Again, the Euclidean version is the tractable one,

in which one can generalize the path integral and perform the

estimates needed for proving the existence of all the tools.

The Osterwalder-Schrader theory then guarantees that, given the

satisfaction of the Euclidean axioms, analytic continuation to

the Minkowski case is indeed possible. This is outlined in Secon 6.1;

the remainder of the chapter discusses the (easy) special case of

free fields.
Chapters 7-12 and 19 then define the machinery needed to show how

to sasfy the axioms in the case of 2-dimensional relativistic

Lagrangian scalar field theories with polynomial interaction.

Chapter 7 discusses the Gaussian measures that define the Euclidean

path integral of free fields, Chapter 8 presents a rigorous theory of

perturbation theory for Euclidean path integrals, and the remaining

chapters mentioned provide the estimates needed to make sure that

everything works.


S7a. What is the mass gap?


In a relativistic theory, whenever there is a state with definite

4-momentum p, there is also one with definite momentum p' =

Lambda p

obtained by applying a Lorentz transform Lambda. The orbit of

4-momenta obtained in this way forms a hyperboloid in the future

cone (because of causality), characterized by a mass m=>0.

p^2=m^2, p_0>0.

This includes as a liming case massless states with m=0,

where the orbit consists of the future light cone with 0 excluded.

Therefore the possible values of p are characterized by the possible

values of m, which defines the mass spectrum of the theory. The mass

spectrum is the relativistic analogue of the energy spectrum of the

Hamiltonian in a nonrelativistic theory, shifted such that the ground

state has E=0.

The only state with zero momentum is the ground state, usually called

the vacuum. If the values of p^2 for the realizable nonzero p is

bounded below by a positive number, the theory is said to have a mass

gap. The largest value of m>0 for which m^2 is such a lower bound

defines the precise value of the mass gap. Usually there is a state

for which p^2=m^2; this is then interpreted as the state of a

single 'dressed' particle.

In general, the mass spectrum consist of a discrete and a continuous

part. The discrete part of the spectrum corresponds to bound states,

the continuous part to scattering states.

The continuous spectrum starts when there is the possiblity of

scattering. which means that the energy is large enough that two

asymptotically independent systems can exist. Given a state of mass

m, one expects to have states with two almost independent systems of

mass m and an arbitrary relative momentum, giving a continuous

spectrum of scattering states with all possible squared momenta

exceeding (2m)^2, as a simple calculaon reveals:

If p is the sum of two melike vectors p1,p2 of mass m then

p^2 = (sqrt(\p1^2+m^2)sqrt(\p2^2+m^2))^2 - (\p1+\p2)^2

= 2m^2 + 2 sqrt((\p1^2+m^2)(\p2^2+m^2)) -2\p1 dot \p2

By making \p2=-\p1 one gets arbitrarily large values of p^2, hence

part of the connuous spectrum. The minimum of p^2 must occur by

Cauchy/Schwarz for \p2=\p1, and is then (2m)^2, independent of the

spatial momentum.

Thus the connuous spectrum extends from mass 2m to infinity,

where m is the mass gap.

There may be bound states with mass m_b<2m, forming the discrete

spectrum. These are not scattering states, hence not obtained by

simply adding momenta. For bound states of k particles with masses

m_1,...,m_k, one needs to subtract from (m_1+...+m_k)c^2 the binding

energy of the bound particles. There might be bound states

with mass m_b>2m embedded in the connuous spectrum, but these

possible only if there are selection rules that forbid the decay into

particles with smaller mass.

In particular, the state of minimal mass m, if it exists, is always

a bound state (including the case of a single particle).

If there is no mass gap, one expects massless dressed particles

to be present. This corresponds to the limiting case m --> 0 of the

above discussion.


S7b. Why can a bound state of massless quarks be heavy?


A system has a well-defined mass if it is in an eigenstate of p^2,

where p is the total momentum operator (whatever this is;

relativistically, bound states are very poorly understood).

So to understand, view it from a nonrelativistic perspective.

Because of E=mc^2, the mass shows up as energy, i.e., as eigenstate

of the Hamiltonian.

Now a bound state at rest defines the rest energy, and by giving

it uniform motion one can increase the energy by an arbitrary amount

of kinetic energy. The rest energy (and hence the rest mass), on the

other hand, is determined by the discrete spectrum of the Hamiltonian

in reduced coordinates, i.e., with center of mass motion separated out.

For forces that decay with distance, a bound state necessarily has

a mass that is less than the sum of the masses of the constituents.

For particles involving quarks, this does not apply since the strong

force increases with distance. Hence the rest mass of a bound state of

quarks could be anything.


S7c. Bound states in relavisc quantum field theory


Bound states are supposed to be poles of the S-matrix, and

Bethe-Salpeter equations for the bound state dynamics can be

obtained approximately from resumming infinite families of

Feynman diagrams. See Chapter 14 of Weinberg's QFT I. But...

Perturbative QED (even in Scharf's rigorous treatment)

has nothing at all to say about how to model bound states.

Bound states don't exist perturbatively: The poles in the S-matrix

can arise only by summing infinitely many Feynman diagrams.

(Sum the geometric series 1+x+x^2+... to see how poles arise by


I haven't seen a single rigorous treatment of such an issue in

quantum field theory.

Weinberg states in his QFT book (Vol. I) repeatedly that bound state

problems (and this includes the Lamb shift) are still very poorly

understood (though the Lamb shift is one of the most accurately

predicted physical quanty). On p.564 he says,

'These problems are those inbolving bound states [...]

such problems necessarily involve a breakdown of ordinary

perturbation theory. [...] The pole therefore can only arise

from a divergence of the sum of all diagrams [...]'

On p.560, he writes,

'It must be said that the theory of relativistic effects

and radiative corrections in bound states is not yet in an

entirely satisfactory shape.'

This remark suggests that he seems to think that, in contrast,

for scattering problems, the theory is in an entirely satisfactory

state, as given in the rest of his book. Thus 'satisfactory'

does not mean 'mathematically rigorous', but only

'well understood from a physical, approximate point of view'.

There are, of course, methods for approximating bound state


based on Bethe-Salpeter equations, Schwinger-Dyson equations, and

some other approaches. See, e.g., the review

H. Grotch and D.A. Owen,

Foundaons of Physics 32 (2002), 1419-1457.

or hep-ph/0308280.

But all of this is done in completely uncontolled approximations,

and to get numerically consistent results is currently more an

art than a science.

This leaves plenty of scope for interesting (but hard)

new work on bound states on both the physical and mathematical


S8a. Why renormalizaon?


Quantum field theory is what particle physicists define it is, and

this includes many working interacting QFTs. But it is not a theory

in the mathematical sense. This is due to the freedom they take

when discussing the renormalization needed to remove formal

infinities from their theories.

Finite renormalization just refers to the fact that the coefficients

in a Hamiltonian are not directly measurable but only computable as

function of some key observables. It is simply a consequence of the

historical accident that these coefficients were given names (masses,

charges) that sound like real properties, while they are in fact

indirectly related to them.

Thus in solid state physics one gets bare masses of quasiparticles

from the coefficients of a Hamiltonian, but they are just parameters

and related to the measurable masses by some transformation, which


dubbed the finite renormalization.

Infinite renormalization is needed in ordinary QM when the potential

gets too singular, for example with delta-function potentials that

model contact interactions. Hardly ever discussed in textbooks but

important for understanding. See, e.g., hep-th/9710061, or Chapter I.3


R. Jackiw,

Diverse topics in theoretical and mathematical physics,

World Scienfic, Singapore 1995.

A paper by Dimock (Comm. Math. Phys. 57 (1977), 51-66) shows


that, at least in 2 dimensions, delta-function potentials define

the correct nonrelativistic limit of local scalar field theories.

In mathematical terms, infinite renormalization means that the

interaction is a limit of regularized interactions related to fixed

measurable quantities by finite transformations which, however,

diverge when the regularization is removed. The limiting interaction

remains, however, well-defined as a densely defined operator in

Hilbert space.
For exactly the same reason it is needed in relativistic QFT, since

local fields imply singular interacons. But in 4 dimensions, the

limiting process is not well understood mathematically.

In 1+1 dimensions, everything is well-defined mathematically

in terms of rigorous renormalization theory, for arbitrary polynomial

interactions. (See the book by Glimm and Jaffe).

The 1+2-dimensional case is significantly more difficult and needs

a restriction on the polynomial degree. There is a nontrivial

renormalizaon theory for Phi^4 theory, which is mathemacally


Only the 1+3 dimensional case is at present completely open.

What is loosely called 'infinite' in traditional discussions of

renormalization means, strictly speaking, only that the limit where

a cutoff goes to infinity does not exist. At any finite value of the

cutoff, both the Hamiltonian and the counterterms are finite.

If it were not so, one couldn't do renormalization and get something

finite. The problem solved by Tomonaga, Schwinger and Feynman,

for which they got the Nobel prize, was that they discovered how to
produce a well-defined limiting theory for cutoff to infinity

which allows to extract finite values for quantities that can be

compared with experiment.

All renormalization until today follows the same pattern.

One does certain formal computations at

finite cutoff and at some point where it no longer harms

moves the cutoff to infinity, being left with approximate

formulas at some (fixed or variable) loop order which no

longer contain a cutoff and have finite values.


S8b. Renormalizaon without infinies I


Renormalization in QFT is often associated with the need to handle

infinities. This makes everything look as nonsense from a

mathematical point of view. But this is just the sloppiness of

physicists; it is not difficult to get a satisfying view of

renormalization without encountering any weird infinities.

The basic principles can be explained without knowing anything about

quantum mechanics, since renormalization is a much more general

phenomenon associated with idealizations in a theory and the

corresponding limits. As such it is also needed in various classical

situations (classical point electrons, turbulence, etc.)

hep-th/0212049 is a nice paper discussing most of renormalizaon

without ever mentioning fields (which come in quite late).

In all cases, we want to describe a situation which is a limit of more

complex and often less symmetric situations. This limit is the only

problematic thing, and sometimes generates infinities if done in an

improper way. Just as when trying to compute

s_N = sum_{k=0:N} (-1)^k/(k+1)^s = u_N - v_N

by summing the even and odd contributions u_N and v_N separately.

The limit N to inf is well-defined for s>0, but can be obtained only

for s>1 by going to the limit in u_N and v_N separately.

One needs to proceed similar as in techniques to evaluate limits which

give naively inf-inf, by using some transformation that cancels the

infinities analytically. Example:

lim sqrt(n^2+n)-sqrt(n^2+1)

= lim ((n^2+n)-(n^2+1))/(sqrt(n^2+n)+sqrt(n^2+1))

= lim (n-1)/(sqrt(n^2+n)+sqrt(n^2+1)) = 1/2.

In quantum physics, the data (the Hamiltonian in QM, the action in

depends on some parameter vector v of dimension d, say, without


physical meaning. For example, v may consist of bare mass,

bare charge, and bare coupling constant.

Without the renormalization conditions we get a family solution

parameterized by v from which we can compute measurable quantities

combined into a vector q=q_N(v) of some dimension e>d.

where N is the parameter in which we want to take the limit.

(N might be an energy cutoff at energies beyond observability, and q


observed particle spectrum.)

Anything we can reliably measure must clearly be essentially


of N, once N is large enough. Therefore the equation q=q_N(v) defines


(generically) d-dimensional manifold in R^e whose limit as a set is also

a well-defined d-dimensional manifold. This is the manifold of interest,

since picking a particular finite value for N is usually subjective.

In a theory with finite renormalization, this limit manifold can still

be parameterized by v, since the limit

q(v)= lim_{N to inf} q_N(v) (*)

exists. Although v is unobservable it can be calculated from the

measurements by solving the equation q=q(v) in the least squares


Rather than doing that (which would be numerically best in case the

measurements are inexact or q(v) is not exactly known) one proceeds

in theoretical work as if an s-dimensional vector mu of key physical

data and a corresponding subset of d equations were known exactly,

and can be solved exactly for v=v(mu).

Then one gets a renormalized parameterization

q=q_ren(mu), with q_ren(mu)= q(v(mu)), (**)

expressing everything in terms of the physical parameters mu.

When the limit (*) does not exist, the situation is more complicated.

Since there is no limiting q, one has to work at finite N. Proceeding

as before, one solves d of the equations in q=q_N(v) for v, getting

v=v_N(mu), but since the limit (*) does not exist, there will also be no


v(mu) = lim_{N to inf} v_N(mu)

which would enable the use of (**). Instead, v_N(mu) diverges.

Loosely speaking, we get infinite bare masses and bare coupling

constants. But this limit will never be used, hence there are no
problems. It is just the loose way of speaking that creates the

impression of weirdness. The 'infinities' are caused by the nature

of the interactions. If they are too singular for a standard treatment

then the limits needed for a finite renormalization simple do not

exist anymore.

But this does not mean that the theory becomes meaningless but only

that one has to be careful in performing the limit only where it is

allowed to do so. This requires a small change in our procedure.

At finite N, we can still define a renormalized


q = q_{N,ren}(mu), with q_{N,ren}(mu)= q_N(v_N(mu)).

For a renormalizable theory, the limit

q_ren(mu) = lim_{N to inf} q_N,ren}(mu)

exists although neither q_N nor v_N converge.

Once this limit replaces the naive bare recipe (*)-(**) which is

ill-defined, everything behaves properly as it should.

The situation may be slightly more complex than indicated above.

Instead of working with directly measurable quantities one often

works with formally more tractable quantities q that are finitely

related to the key measurable quantities mu (such as observed mass

spectra). However, their definition depends on an additional scale

parameter E that fixes the renormalization conditions. (This parameter

should not be mixed up with the cutoff energy, which after

renormalization is always infinite!)

Thus we actually have q=q_N(v,E), solve some of these equations for

v=v_N(mu,E), and get as a result

q = q_{N,ren}(mu,E), with q_{N,ren}(mu,E)= q_N(v_N(mu),E),


q_ren(mu,E) = lim_{N to inf} q_{N,ren}(mu,E).

But since the scale E can be chosen arbitrarily, the final renormalized

result of physical predictions P(q,E) must be

independent of E. Thus,

d/dE P(q_ren(mu,E),E) = 0,

which is a form of the renormalization group equations.

To get a renormalized Hamiltonian, one also needs wave function

renormalization, which means using a cutoff-dependent inner

product in the space of wave functions (in the functional

Schr"odinger picture). The limiting Hamiltonian is perturbatively

well-defined in the physical Hilbert space obtained as limit of

renormalized Hilbert spaces at finite cutoff, as the cutoff goes

to infinity.

S8c. Renormalizaon without infinies II


In bare (divergent) QFT, infinities arise because integrals taken over

unbounded momenta don't exist; so doing it leads to nonsense.

Instead, proper QFT takes regularized integrals, for example by

adding an explicit cutoff Lambda. This simply means that everything is

calculated with an action that depends on Lambda as an additional

parameter. Once this is done, everything is finite, but


The only problem with that is that the cutoff destroys Lorentz

covariance - apart from that it would be a completely respectable

field theory in itself. Now Lorentz invariance is violated only

at energies >O(Lambda); hence to have the theory conform to

physics that can be checked it suffices to take Lambda large.

But for aesthetic reasons or since we believe that symmetries are

fundamental, we want to have fully invariant theories. This requires

that we let Lambda go to infinity.

But in order that the results have a finite limit we must at the
same time make the coupling constants g dependent on Lambda.

If this is done in a correct way (and the textbooks on QFT teach

one or more of the known correct ways under the heading of

'renormalization'), one encounters no infinities at all in the

whole process.

Thus renormalized quantities are never infinite.

The essentials of the renormalization process, namely the need for

Lambda-dependent coupling constants for sufficiently singular

Hamiltonians, can be understood nonperturbatively on the

nonrelativistic level.

What happens is that one has a family of Hamiltonians

H(Lambda,g) that depend on a scale parameter Lambda and and a


constant g (or several). H(Lambda,g) has a good limit H(g) as Lambda

to inf, with g fixed, but the corresponding limit of the resolvent

G(Lambda,g) does not exist; hence if one tries to do calculations with

H(g) directly (the 1930 way of doing things, which was a dead end),

one gets infinities all over the place.

On the other hand, if one chooses a good parameterization

then, although H(Lambda,g(Lambda,mu)) has no longer a good limit as

Lambda to inf, its resolvent G(Lambda,g(Lambda,mu)) has a well-


limit G(mu). (At least in 1D and 2D field theory, where this can

be proved in certain cases. In 3D and 4D, one probably needs also

a Lambda-dependent inner product defining the Hilbert space

to ensure that one ends up in the right representation,

and Lambda-dependent wave functions to ensure that the limiting

renormalized wave functions remain bounded in the limiting

renormalized inner product.)

Since all dynamical information including scattering information

is in the resolvent, G(mu) defines a good physical model for a

scattering process.

In some simple cases, renormalization can be done nonperturbatively.

For example, standard perturbation theory for a Hamiltonian

p^2/2m +g delta(x) produces infinies. The renormalizaon of this

particular example is treated nonperturbatively in hep-th/9305052.

Thus, infinities only appear if one takes the limit in a way it

cannot be taken consistently.

Of course, the relativistic case is more involved and at present

not understood nonperturbatively, but there is no difference in


The local interaction of the formal Lorentz invariant action is

replaced by a nonlocal interaction depending on the UV cutoff Lambda.

Thus one has V(g,Lambda) in place of V(g), where g are the coupling

constants (including masses).

To do so, one writes the (Euclidean = Wick rotated) field as

Phi(x) = integral dp exp(-i p dot x) Phihat(p)

and substitutes it into the action. This gives an action in the

momentum representation. Then one regularizes the interaction term


throwing away the momenta above some cutoff Lambda.

Introducing the cutoff makes the interaction nonlocal, as one can see

by going from the momentum representation of the regularized

interaction term back to the position representation by substituting

Phihat(p) = const * integral dx exp(i p dot x) Phi(x).

Instead of the delta functions which would appear without the

cutoff there are now explicit nonlocal potential terms.

(Note that Coulomb interaction in nonrelativistic QFT is nonlocal.

See also

H. Ekstein, Phys. Rev. 117, 1590-1595 (1960)

for more on nonlocal interactions and relations to the S-matrix.)

(But actually one does not need to care about locality or not,

since the regularized interaction in the momentum representation

is mathematically ok and one can do everything else in this


More precisely, one starts with the smeared Lagrangian interaction

defined by the cutoff, uses the representation of the S-matrix as

a time-ordered exponential to work out the corresponding

Hamiltonian interaction in the interaction picture, and takes this

as definition of the regularized dynamics. (Note that Haag's theorem,

which asserts that a nontrivial Lorentz-invariant theory satisfying

microlocality cannot have an interaction picture, does not apply since

the theory with cutoff is neither Lorentz invariant nor microlocal.)

From here on, one can do standard perturbation theory without

encountering any infinity at all; one gets meaningful

formulas throughout the whole renormalization procedure.

All contributions to the S-matrix elements of this regularized theory

are finite, and give (after analytic continuation to real time)

the S-matrix of the regularized interaction.

The result is an asymptotic series S(g,Lambda) for the S-matrix of

the regularized interaction, with finite, computable coefficients.

This S-matrix is unitary and has all properties one would like to have,

except that, because of the cutoff, it is only approximately Lorentz


Of course, for a general nonlocal theory in position representation,

one gets more complicated Feynman rules than those traditionally

written down. In momentum space, the formulas become the standard

formulas, but with explicit cutoff included. Thus to do the suggested

exercise, one should always work in the momentum representation.

To restore Lorentz invariance, one uses a running coupling constant

g=g(Lambda,mu) which, for fixed renormalization point mu (a vector of

the same dimension of g containing the free constants in the matching

of the renormalization conditions), is uniquely determined

(for any fixed renormalization scheme) as the solution of a

renormalization group equation whose coefficients are also defined

as a (presumably even convergent) asymptotic expansion.

Having this, one can take the limit

S(mu) = lim_{Lambda to inf} S(g(Lambda,mu),Lambda)

which is an asymptotic series in hbar with finite, computable

coefficients when the theory is renormalizable, and is Lorentz

invariant and microlocal.

Thus one gets the desired Lorentz invariant, microlocal theory

as a perturbatively well-defined limit of perturbatively well-defined

but not Lorentz invariant or microlocal theories.

At the very end one can pass to the limit, but not earlier.

The only infinity encountered is not worse than the infinity

encountered in defining Riemann integrals over the real line,

where one also gets a finite limit by letting a finite cutoff go to


The real mathematical difficulties in QFT are not in the renormalization

procedure but in giving a nonperturbative construction of the S-matrix



S8d. Renormalizaon and coarse graining


In QFT, there are two different scales, one on the bare level and one
on the renormalized level, and the meaning of the renormalization

group is slightly different from that in statistical mechanics.

On the statistical mechanics level, there is the cutoff beyond which

one cannot (or does not want to) observe anything. This effective

cutoff is a parameter Lambda in an effective theory defined by coarse


The effective theory depends on E: For different values of E you get a

_different_ effective theory, though their low energy predictions are

essentially the same. This is expressed by the Wilson flow, described

by renormalization group equations that relate the parameters

g(Lambda,mu) in the different effective theories such that some key

low energy observables mu keep the same values.

The number of such key observables (i.e, the dimension of mu)

equals the number of parameters in the effective theory

(i.e, the dimension of g); most other observables are different

at different cutoffs (though only slightly if they are observable at

low energy), because of the coarse graining done when lowering

the cutoff scale Lambda.

In QFT, the above is mimicked on the _bare_ level. The cutoff is a

large energy Lambda beyond which the bare interaction is modified to

be able to get a meaningful limit; this corresponds to coarse-graining.

The resulting bare theory with cutoff Lambda is a well-defined

effective theory and behaves precisely as described above.

To define the renormalized theory, one needs, in addition to the

cutoff, renormalization conditions defining the bare parameters in

terms of renormalized parameters q.

These conditions depend on a renormalization scale E figuring in the

equations defining the renormalization conditions. Because of the

dimensional nature of momentum, there always has to be such a

parameter E, no matter which renormalization procedure is followed.

In QFT, one usually refers to a mass scale M, which is the same as

E=Mc^2 in units such that c=1. Then M is the constant needed in the

renormalization conditions to relate certain computable expressions

to the renormalized parameters. This is discussed at length in

the QFT book by Peskin and Schroeder, Secon 12.2, for a massless

Phi^4 theory, and in Secon 12.5 for the general case. (For an online

source, see, e.g., equaons (90-(11) of hep-th/9804079.

M is introduced there without comment, the role of M is described

later, aNer (20).) In the following, I connue to use E in place of M.

Thus the bare parameters are functions g(Lambda,q,E) of the cutoff

Lambda, the renormalized parameters q, and the renormalization scale

The renormalization group equations in the statistical mechanics

sense (the Wilson flow) would describe how g(Lambda,q,E) changes as

the cutoff Lambda is altered. However, in QFT, this is of no physical

interest. Indeed, Lambda is completely eliminated from considerations:

The renormalized theory is obtained at fixed E by letting the cutoff

Lambda go to infinity. This has the effect that the bare parameters

become meaningless, since the limit

lim_{Lambda to inf} g(Lambda,q,E)

does not exist. At this stage it becomes obvious that all bare objects

are unphysical.

Although nonphysical, the renormalization group equations in

Lambda are an important tool in the _construction_ of QFTs, where the

limit of all correlation functions must be shown to exist in a

suitable topology, and the absence

of divergences shown. In the weakest topology, based on the

ultrametric norm and corresponding to perturbation theory at all

orders, this is shown rigorously in a nice book

M. Salmhofer,

Renormalization: An Introduction,

Springer, Berlin 1999.

Unfortunately, this topology is too weak to give the existence of

the correlation functions as functions; they are only shown to exist

as formal power series.

All expressions of the theory that survive the limit, in particular

all n-point correlaon funcons, n=1,2,3,...,

describe observable physics. They can therefore be expressed as

functions of q and E only, whose detailed form comes from the

standard theory. However, there is a little twist since the scale E

can be chosen arbitrarily, hence cannot be measurable.

In terms of a fixed set of physical parameters mu (measurable

under well-defined experimental conditions), we can predict mu

by some function of q and E, mu=mu(q,E). Solving for q, we can

express q in terms of mu and E,


But the exact renormalized result of a physical prediction P(q,E)

must be completely independent of E, uniquely determined by the

physical parameters mu. Thus,

d/dE P(q_ren(mu,E),E) = 0,

which are the Callan-Symanzik equations, the renormalization group

equations of interest in quantum field theory.

In contrast to the Wilson flow, however, the sliding scale in the

Callan-Symanzik flow is the renormalization scale E and _not_ the

cutoff Lambda (which at this stage is already infinite). Moreover,

since observable physics is completely independent of the

renormalization scale E, the latter has no intuitive 'physical'


There is no relation between the two flows, except by analogy.

The Wilson flow is needed to _get_ the renormalized theory

at fixed renormalization conditions, the Callan-Symanzik flow

describes what happens when you _change_ these conditions.


S8e. Renormalizaon scale and experimental energy scale


The picture drawn in the preceding is somewhat incomplete with

regard to the practice of computing, due to the fact that we cannot

compute this renormalized theory at any E, since it is exceedingly


Thus we need to consider approximations. These approximations are

no longer independent of E, since the approximation errors depend

on it. It turns out that the approximation errors are small only

when the energy scale of the experiment for which a prediction is

made is close to the renormalization scale E, since (see, e.g.,

Weinberg's QFT book, Vol. 2, Chapter 18.1) the perturbave

expansion contains arbitrary powers of log(E_experiment/E) which

therfore must be kept small.

Thus one needs to evaluate the theory near the scale of interest.

However, perturbation theory is valid only near a fixed point E^* of

the renormalization group equations. Therefore, one determines

approximate formulas for the quantities q_ren(mu,E) with E close to

the appropriate fixed point E^*, and then uses (also approximate)

renormalization group equations to transform the result to the

scale of interest.

Thus there are two different scales involved, the energy scale

E_exp where the experiments are done, and the renormalization scale

E_ren (previously denoted by E).

On the experimental side, coupling constants (such as the charge)

are determined with reference to some effective, coarse grained


(such as the nonrelativistic Schroedinger equation). This effective

theory depends on E_exp (for QED, the charge is traditionally defined

in the low energy limit E_exp to 0). This effecve theory behaves

like any other coarse-grained theory, giving rise to running coupling

constants such as e=e_exp(E_exp). But these depend on the details of

the coarse-graining scheme, and the computed results depend on the

coarse-graining, too, and hence on E_exp.

The experimental running coupling constants are only loosely related

to the running coupling constants such as e=e_ren(E_ren) obtained by

the Callan-Symanzik equation (= the renormalization group equation

in terms of the renormalization scale E_ren). The latter are, in theory,

uniquely defined by the renormalization prescription. There the

coupling constants are defined not by an experimental prescription

but as parameters in the renormalization prescription. For example,

in Phi^4 theory, lambda=lambda(M) is defined by equaon (12.30) in

Peskin/Schroeder (and E_ren=Mc^2), and the charge e=e(M) in QED

by (10.39) [but at spacelike momentum p^2=-M^2 as in Chapter 12].

As discussed, the physical predictions at any energy are completely

independent of M if e(M) and the other renormalized parameters slide

with M. At least this would be the case in a fully nonperturbative

calculation (which we cannot do). However, the few-loop

depend heavily on M, and give a reasonable approximation to the

exact theory _only_ at energies close to E_ren=Mc^2. Thus the few-


approximation behaves just like an effective theory, provided we


E_ren = E_exp (or close). But the analogy is not complete since

in a true effective theory we could choose the coarse-graining scale

anywhere at or above E_exp, while for good few-loop approximations

we need to choose it always close to E_exp.

Thus, if one could solve the equations exactly, the dependence on M

and the Callan-Symanzik equation would be completely irrelevant,

and nothing at all could be extracted from it. But in practice one

can work only at few loops, and then different values of M may give

vastly different results, and the equation is very useful since it

enables one to work with the right M.

The renormalization group equations are used to move from an

M near the fixed point (where one can do perturbation theory and has

reliable few-loop calculations but where the approximation errors =

the higher order terms in the perturbation series are huge)

to an M near the experimental scale (where the approximation error

is small, and the few-loop calculation therefore reasonably accurate).

This is often expressed by saying, loosely, that the renormalization

group approach partially resum the perturbation series.

One gets what is called 'renormalization group improved perturbation

theory', which is predictive about a much larger range of coupling

constants than simple renormalized perturbation theory (which only

works for very weak coupling).


S8f. Dimensional regularizaon


The neatest way to perform regularization, and the only one which

works well in complicated cases such as nonabelian gauge theories

is dimensional regularization. Unfortunately, it is presented

in most textbooks in a way that looks quite mysterious, involving

unphysical fractional dimensions. This is however just sloppiness

on the side of physical tradition, and a more rigorous approach

removes everything strange.

The rules for dimensional regularization are derived in Euclidean

space rather than Minkowski space. To get the latter, one needs an
additional analytic continuation.

For p in Euclidean d-space (d>0 integral), we put p^2=p^Tp.

If d is a posive integer and f(p^2) is integrable (i.e. decays fast

enough), then standard Lebesgue integration gives the formula

integral dp^d/(2 pi)^d f(p^2)

= C_d integral_0^inf dr r^{d-1}f(r^2), (1)

where C_d is given in terms of the Gamma function as

C_d = 2 pi^{d/2}/Gamma(d/2). (2)

We observe that the formula (2) makes sense for arbitrary complex d

with nonnegative real part, and that therefore for

f(s)=r^2j/(r^2+m^2)^n, n>j+d/2,

the well-defined right hand side of (1) is an expression I(d,j,n)

which depends analytically on d,j,n.

In parcular, the cases j=0 and j=1 lead to the expressions

given in P/S (7.85/86). A similar reasoning produces (7.87) and

more complicated rules analogous to those given in P/S on p.807

(where, however, analytic continuation to Minkowski space has

already been performed). These rules, together with the

Feynman trick stated as (A.39)on p.806 of P/S, can be used to evaluate

integrals of arbitrary rational Lorentz-invariant expressions

provided that they decay fast enough.

Note that the resulting formula

integral dp^d/(2 pi)^d f(p^2) = I(d,j,n) (3)

is valid only if n>j+d/2, which ensures sufficiently fast decay at

infinity to make the Lebesgue integral well-defined and integral d.

For other values the above computations are meaningless, and any

contradiction derived from it is therefore irrelevant.

As irrelevant as the well-known fact that a divergent alternating

infinite sum can be given any value whatsoever by formal


Remarkably, however, I(d,j,n) (and the analogous formulas on

p.807) can be analycally connued to the interesng case

d=4-eps. This allows us to _define_ an _extended_ Lebesgue integral

for d=4-eps by the formula

integral (dp/2 pi)^d p^2j/(p^2+m^2)^n:= I(d,j,n) (4)

and similar expressions for arbitrary rational Lorentz-invariant

expressions. Moreover, if these expressions happen to have

good limits for eps to 0 (which cannot happen for (4) but for

suitable linear combinaons) they define the value also for d=4.

The derivation ensures that it gives the correct results in all

cases where the integral makes sense in the traditional (Lebesgue)

Thus we have defined a consistent extension of the Lebesgue

integral of Lorentz-invariant expressions to the singular case.

This is similar in spirit to Lebesgue's extension of the Riemann

integral to the Lebesgue integral.

A good, mathematically rigorous exposition of d-dimensional


theory for general complex dimension d is given in

P. Etingof,

Note on dimensional regularization,

Ppp. 597-607 in: Pierre Deligne et al.,

Quantum Fields and Strings, A Course for Mathematicians, Vol. 1,

Amer. Math. Soc., Providence, Rhode Island, 1999

See also

The theory of renormalization now shows that all integrals

occuring in the expressions for S-matrix elements in renormalizable

theories have a well-defined _extended_ Lebesgue integral for d=4.

This is all that is required for consistency.

For those who dislike unphysical complex dimensions,

the uniqueness of analytic continuation implies that one can

get completely equivalent results by keeping the physical

dimension d=4. In this case, one must replace the propagator

(p^2+m^2)^{-1} by (p^2+m^2)^{-n} with sufficiently large n,

and connue the result analycally to the physical value n=1.

Then all integrals are (in Euclidean space) ordinary Lebesgue

integrals. The formulas used for the extended Lebesgue integral

defined as above still apply; however, computations are now slightly

more involved.

Those who worry about the appropriateness of analytic continuation

might wish to consider the functions f, g defined by



in the real domain. They are equal for d<2 but f does not make

sense for d>=2. Nevertheless, it makes exceedingly much sense

to extend the definion of f to arguments d>2 by making


a definition. Indeed, g(d) is the unique meromorphic extension

of f to arbitrary complex arguments.

This uniqueness is in the nature of analytic continuation,

and makes the latter an extremely useful device in many applications.

It is the reason why we consider such useful equations

such as exp(ix)=sin(x)+i*cos(x), which one would have no right

to use if one would not silently identify analytic functions

defined on part of their domain with the full analytic function

on the associated Riemann surface.


S8g. Nonrelavisc quantum field theory


The right way to understand relativistic QFT is to regard it as

a limit of nonlocal nonrelativistic quantum field theory.

The latter is much better behaved.

Interacng QFT in 3+1 dimensions exists, however, as a rigorous

mathematical theory in the nonrelativistic case, since there only

finite renormalizations are needed and no infinities occur.

In this context, Feynman-Dyson perturbation theory can be given a

rigorous meaning. Note that nonrelativistic QFT is nonlocal

because of the Coulomb potential interaction.

Interacting QFT based on Feynman-Dyson perturbation theory

in 3+1 dimensions exists as a rigorous mathemacal theory

in the relativistic case, as a limit of smeared, nonrelativistic

theories. This is done for Phi^4 theory in all details in

Salmhofer's book. For technical reasons, one gets the results

however only in a very weak topology corresponding to power series

in the coupling constant, rather than as true functions of the

coupling constants. Thus perturbative relativistic QFT is rigorously

established in 4D while nonperturbave relavisc QFT in 4D

is still elusive.

However, the infinies that plague 4D relavisc QFT are already

present in 3D, and there rigorous construcon have been given.

Exactly the same kind of renormalizaon tricks are used in 3D.

Thus our present lack of understanding cannot be blamed on

renormalization, but has to do with the difficulty of getting

the hard analytical estimates needed to justify the constructions.


S8h. Nonrenormalizable theories as effecve theories

The difference between renormalizable and unrenormalizable theories

that the former are specified by a (small) finite number of parameters

while the latter are specified by an infinite number of parameters.

In a renormalizable quantum field theory, only few counterterms

must be added to the action in order to get a consistent

finite perturbative expansion at all orders. This means that a few

parameters suffice to get a consistent theory which will be correct

at the energies of interest (which should be essentially independent

of what happens at the inaccessible large energies).

In a nonrenormalizable quantum field theory, infinitely many

counterterms must be added to the action in order to get a consistent

finite perturbative expansion at all orders. This means that with a few

parameters one can only get an effective low order theory, which may,

however, still be good enough at the energies of interest.

But for better approximation, one needs to determine more and

more parameters...

In both cases, it is possible to extract approximate results from

computations, and the parameters can be tuned to fit the

results. This gives a consistent procedure for predictions. Indeed,

many nonrenormalizable theories are in use as effective field theories.

(See hep-ph/0308266 for a recent survey on effective field theories.)

People who dislike nonrenormalizable theories do this on the basis of

a claim that their predictive value is nil because of the infinitely

many constants. But this is as unfounded as saying that


is not predictive because it depends on a function (the expression for

the free energy, say) that requires an infinite number of degrees of

freedom for its complete specification. Clearly, in the latter case,

the widespread use of finitely parameterized imperfect free energies

does not hamper the usefulness of thermodynamics. The same can be

said about nonrenormalizable field theories. It only implies that to

extract arbitrarily precise predictions one needs correspondingly

much information as input. We know that this is the case already for

many simpler phenomena in physics.

See also

J.Gegelia, G.Japaridze, N.Kiknadze, K.Turashvili

"Renormalization" Of Non Renormalizable Theories


J Gegelia, G Japaridze
Perturbative Approach to Non-renormalizable Theories



S8i. What about infrared divergences?


Renormalization theory deals with the regularization of ultraviolet

divergences, occuring at very high but unobservable energies.

In contrast, infrared divergences arise if there are problems at

very low energies. They are not cured by renormalization and need

completely different techniques.

Theories without massless particles have no infrared problems at all,

since at low energies only few particles can coexist. Indeed,

the sum of the rest masses of physical particles is bounded by the

total energy of the system.

In QED one has infrared problems because the photon is massless,

so a bound on the sum of the rest masses does not limit the number of

possible photons. indeed, a closer calculations shows that there

may be an arbitrary number of very low energy ('soft') photons.

One can handle the situation in some approximation by giving the

photon a tiny mass mu. But this is an _additional_ parameter, quite

different from the renormalization scale M. And the renormalized

theory at finite mu depends on mu (so that one needs to take

in the end the limit mu to 0 to get physically correct results),

while it is still independent of M.

A better way to handle the infrared divergences is to avoid them

completely by using coherent states. These sum the contributions of

arbitrarily many soft photons in a coherent way.


S9a. Summing divergent series

There is a second kind of divergences, different from those cured

by renormalization.

Most perturbation series in QFT are believed to be asymptotic only,

hence divergent. Strong arguments (which haven't lost in half a

century their persuasive power) supporting the view that one should

expect the divergence of the QED (and other relatvistic QFTs)

power series for S-matrix elements, for all values of

alpha>0 (and independent of energy) are given in

F.J. Dyson,

Divergence of perturbation theory in quantum electrodynamics,

Phys. Rev. 85 (1952), 613--632.

The remarkable fact is that QED is very accurate in spite of this.

It produces verifiable predictions by restricting attention to the

first few terms of a (most probably divergent) asymptotic series,

but it has no way to make sense of the whole series.

This is what Dirac found deficient in the foundations.

An asymptotic series is a series such as

f(x) = sum_{k=0:inf} k! x^k

with radius of convergence zero. For small enough x, the first few

terms give seemingly good approximations, but if one includes - for

any fixed nonzero x - enough terms, the series diverges. Thus, as Dirac

asserts, one neglects arbitrarily large terms to get the approximations

which work so well in QED.

There are infinitely many different ways to assign to an

asymptotic series a function with this series as Taylor expansion.

The problem is to have a way to choose the right one. Borel


is often taken as default, but seems to be no cure for QFT in view

of the so-called renormalon problem.

At present, there is no sound mathematical foundation of relativistic

quantum field theory. Who finds one will be awarded one of the

1 Million Dollar Clay Millenium prizes...

If we have a well-defined Hamiltonian H(g) depending infinitely

differentiably on a parameter g, it typically has a well-defined

S-matrix S(g), also depending infinitely differentiably on g.

Perturbation theory computes a power series expansion

S(g)=S_0 + S_1 g + ...

which often diverges for all g although each S_k is finite.

This happens already for the anharmonic oscillator with

H(g)= 1/2 (p^2+q^2) +g q^4.

Thus a correct Hamiltonian with a convergent (in the harmonic

oscillator case even finite, hence trivially convergent) expansion

is quite consistent with a divergent expansion of the S-matrix.

However, one can one still extract information by so-called

resumming techniques. One can study these things quite well with

functions which have known asymptotic expansions

(e.g., improper integrals, using Watson's lemma).

In many cases (and under well-defined conditions), the resulting

infinite series is Borel summable in the following sense: To sum

f(x) = sum a_k x^k (1)

if it is divergent or very slowly convergent, one can sum instead

its Borel transform

Bf(z) = sum a_k/k! z^k (2)

which obviously converges much faster (if not yet, one could probably

repeat the procedure). In many cases, f can be reconstructed from Bf

by means of

Sf(x) = integral_0^inf dz/x exp(-z/x) Bf(z)

= integral_0^inf dt exp(-t) Bf(tx).

Sf is called the Borel summaon of the asymptoc series (1),

and is defined whenever Bf is convergent. If Bf has singularities,

the integral over t may have to be done along a contour in the complex
plane; see, e.g., physics/0010038.

It is easy to show that BSf=Bf and that Sf has the same asymptotic

expansion as f. Moreover, the identity Sf(x)=f(x) can be easily

verified if (1) has a posive radius of convergence, but also under

other natural assumptions (but stronger than simply asserng that (1)

is an asymptotic expansion for f).

The book

J.S. Feldman, T.R. Hurd, L. Rosen and J.D. Wright,

QED: A proof of renormalizability,

Lecture Notes in Physics 312,

Springer, Berlin 1988

claims to prove on p. 112ff that the coefficients in the loop

expansion of the QED S-matrix are bounded by const*(N!)^{1/2)/R^n


some R>0, which would imply that it is locally Borel summable.

But hep-ph/9701418 seems to make oppsite claims.

See also hep-ph/9807443.

Of course, since there are many functions with the same asymptotic

expansion (e.g., one can add arbitrary multiples of terms like

e^{-a/x}, e^{-a/x) log x, etc.),

one has to show that the Borel summed Sf actually has the
properties that the original f was supposed to have (and from which

the asymptotic series was derived). If, in addition, f is

uniquely determined by these properties, we know that f=Sf.

Unfortunately, a proof for such a statement is missing in QED.

In some 2D cases, where nonperturbave QM applies, one can show


the nonperturbative result satisfies the properties needed to show

that Borel summation of the perturbative expansion reproduces the

nonperturbative result. See also the thread

Re: unsolved problems in QED

starting with


With experimental results one just has numbers, and not infinite

series, so questions of convergence do not occur.

On the other hand, if one knows of an infinite series a finite number

of terms only, the result can be, strictly speaking, anything.

But usually one applies some extrapolation algorithm

(e.g., the epsilon or eta algorithm) to get a meaningful guess for the

limit, and estimates the error by doing the same several times,

keeping a variable number of terms. The difference between

consecutive results can count as a reasonable (though not foolproof)

error estimate of these results. Similarly, given a finite number

of coefficients of a power series, one can use Pade approximation to

find an often excellent approximation of the 'intended' function,

although of course, a finite series says, strictly speaking, nothing

about the limit of the sequence.

But to have reliable bounds one needs to know an exact definition of

what one is approximating, and work from there. Such an exact

defintion is, at present, missing for quantum electrodynamics.


S9b. Is QED consistent?


Quantum electrodynamics (QED) gives the most accurate predictions

quantum physics currently has to offer.

The anomalous magnetic dipole moment matches the experimental


to 12 significant digits:

M. Passera,

Precise mass-dependent QED contributions to leptonic g-2 at order

alpha^2 and alpha^3,

Phys. Rev. D 75, 013002 (2007).

B. Odom, D. Hanneke, B. D'Urso, and G. Gabrielse,

New Measurement of the Electron Magnetic Moment Using a

One-Electron Quantum Cyclotron,

Phys. Rev. Le. 97, 030801 (2006)


The Lamb shift, whose prediction made QED and renormalization

respectable, is much more difficult to measure with high precision,

hence offers no such phenomenal test of accuracy:

S.G. Karshenboim,

Precision physics of simple atoms: QED tests, nuclear structure

and fundamental constants,

Phys. Rep. 422 (2005), 1-63

Nevertheless, many physicists think that QED cannot be a consistent

theory. There is a phenomenon called the Landau pole:

It indicates that at extremely large energies (far beyond the range of

physical validity of QED, even far beyond the Planck energy) something

might go wrong with QED. (QED loses its validity already at energies

of about 10^11 eV, where the weak interacon becomes essenal.

The Planck energy at about 10^28 eV is the limit where some current

theories try to make predictions. But the Landau pole, if it exists,

has an energy far larger than the latter.)

This is probably why Yang-Mills and not quantum electrodynamics was

chosen as the model theory for the millenium prize.

Since the existence of the Landau pole is confirmed only in low order

perturbation theory and in lattice calculations,

hep-lat/9801004 and hep-th/9712244

the question whether the alleged landau pole implies limits to the

consistency of QED has currently no rigorous mathematical substance.

The observations about the Landau pole in perturbation theory can be

recast in mathematically rigorous terms using so-called renormalons,

obstructions to Borel summability; see

V Rivasseau

From Peturbative to Constructive Renormalization

Princeton 1991

But the resulting analysis is inconclusive as regards the existence

of the theory.
The quality of the computed approximations to QED are a strong

indication that there should be a consistent mathematical foundation

(for not too high energies), although it hasn't been found yet.

There is no indication at all that at the energies where QED

suffices to describe our world (with electrons and nuclei considered

elementary particles), it should be inconsistent. To show this

rigorously, or to disprove therefore remains another unsolved

(and for physics more important) problem.

Perturbative QED is only a rudimentary version of the 'real QED';

which can be seen that Scharf's results on the external field case

G. Scharf,

Finite Quantum Electrodynamics: The Causal Approach, 2nd ed.

New York: Springer-Verlag, 1995.

are much stronger (he constructs in his book the S-matrix)

than those for QED proper (where he only shows the existence of

the power series in alpha, but not their convergence).

J.S. Feldman, T.R. Hurd, L. Rosen and J.D. Wright,

QED: A proof of renormalizability,

Lecture Notes in Physics 312,

Springer, Berlin 1988

gives a rigorous proof of perturbative existence of QED at all orders.

This means that a formal power series for the S-matrix is shown to

exist rigorously. This includes renormalization and is sufficient for

actual computations since a few terms in the power series give very

high accuracy.

However, the power series is believed to diverge if enough

(i.e., infinitely many) terms are added, and a consistent

nonperturbative treatment of full QED is presently missing.

The quest for 'existence' of QED is the quest for a framework

where the formulas make sense nonperturbatively, and where the

power series in alpha is a Taylor expansion of a (presumably

nonanalytic) function of alpha that is mathematically well-defined

for alpha around 1/137 and not too high energy. This is sll open.

More precisely: Probably QED (and thus the QED S-matrix

exists nonperturbavely as a 2-parameter theory depending on

the fine structure constant alpha and the electron mass m_e; these

parameters are the zero energy limits of the corresponding


running coupling constants, and is defined for alpha <= 1/137 and

input energies <= some number E_limit(alpha,m_e) larger

than the physical validity of pure QED. What is needed is

a mathematical proof that the QED S-matrix exists for 0<alpha<1/137

(rather than only for infinitesimal alpha, as currently established)

as a unitary operator S(alpha) in the Hilbert space H(Emax) of all

in-states of energy <= E_max=E_limit(alpha), for some reasonable


We know from perturbation theory how to compute in such a range


coefficients of an asymptotic series in alpha for S(alpha).

We also have a number of nonperturbative approximation schemes


give certain nonperturbative results (such as the Lamb shift).

But we currently do not have a way to ascertain that some well-


object S(alpha) exists that has this asymptotic series. The quest for

proving that QED exists is that of finding a construction for S(alpha)

that makes rigorous sense and has the known asymptotic expansion.

QED is renormalizable at all loops, which means that the power series

expansion of the S-matrix is mathematically well-defined at ordinary

energies. The _only_ thing that is missing is to give its limit a

mathematically well-defined meaning.

Note that the S-matrix S commutes with the Hamiltonian;

hence if P is the orthogonal projector to the space H_limit of

states involving only energies < E_limit(alpha)

then PSP is unitary on H_limit, and my conjecture is that PSP has

some (yet unknown but) rigorous nonperturbative construction.

The Landau pole (if it exists) just gives an upper bound to the allowed

energies. E_limit(alpha) is a function of alpha, which according to

perturbation theory has to satisfy

E_limit(0) < (Landau-pole in lowest order)

(and possibly decreases with increasing alpha); apart from that,

the known approximate results do not restrict the likely mathematical

validity of pure quantum electrodynamics.

A cautious evaluation of the situation is given in Weinberg's QFT book,

Vol. 2, pp.136-138 - all options are left open. On the other hand,

D. Espriu and R. Tarrach,

The case for triviality,

Phys. Le. B383 (1996) 482,

argue that, because of the Landau pole, quantum electrodynamics is

only an effective field theory.

To summarize:
QED is renormalizable at all loops, which means that the power series

expansion of the S-matrix is mathematically well-defined at ordinary

energies. The _only_ thing that is missing is to give its limit a

mathematically well-defined meaning derived from a formulation of

QED that makes sense also at finite times and not only as a transition

from t=-infinity to t=+infinity.


S9c. What about relavisc QFT at finite mes?


Although many time-dependent observable consequernces of QED

can be deduced in a nonrigorous way in the Schwinger-Keldysh

= closed time path (CPT) formalism, there is at present no rigorous

relavisc quantum field theory at finite mes in 4 dimensions.

In lower dimensions, for all theories where Wightman

functions can be constructed rigorously, there is an associated

Hilbert space on which corresponding (smeared) Wightman fields

and generators of the Poincare group are densely defined.

This implies that there is a well-defined Hamiltonian H=cp_0 that

provides via the Schroedinger equation the dynamics of wave

in time.

In particular, if the Wightman functions are constructed via the

Osterwalder-Schrader reconstruction theorem, both the Hilbert space

and the Hamiltonian are available in terms of the probability measure

on the function space of integrable functions of the corresponding

Euclidean fields. For details, see, e.g., Secon 6.1 of

J. Glimm and A Jaffe,

Quantum Physics: A Functional Integral Point of View,

Springer, Berlin 1987.

In parcular, (6.1.6), (6.1.11) and Theorem 6.1.3 are relevant.

Unfortunately, no Wightman functions have been constructed so far

for interacng 4D quantum field theorys; see the FAQ entry on

'Is there a rigorous interacng QFT in 4 dimensions?'.

However, the functional integration measure of Euclidean QED is


to exist perturbatively at all orders (Tomonaga, Schwinger and


got the Nobel prize for this), though a nonperturbative construction

is still missing. By analytic continuation as in the

Osterwalder-Schrader reconstruction theorem , one should be able to

obtain a perturbavely valid Hamiltonian for QED (cf. Theorem 6.1.3

in Glimm and Jaffe).

Current 4D QFT in its usual textbook form is based on perturbaon

theory for free (i.e., asymptotic in- and out-) states; therefore it

gives only predictions that relate the in- and out-states.

(But see below for the CTP techniques, which are not of the standard

perturbative form and give far mor information.)

This information is contained in the S-matrix elements.

From the S-matrix, one can the derive further information, e.g.,

about bound state energies as poles.

In nonrelativistic QM, one has a well-defined dynamics at finite

times, given by the Schroedinger equation. This dynamics can be recast

in terms of Feynman path integrals. Unfortunately, this does not

extend to the relativistic case.

The problem with relativistic path integrals is that they are formal

objects without a clear numerical meaning: whatever one tries to

compute with them turns out to be infinite.

Only selected objects derived from path integrals can be given

meaning by means of the renormalization procedure. The books show
how to

give meaning to S-matrix elements between asymptotic in and out


The (Minkowski space) path integral is ill-defined as a number,

but, after regularization, well-defined as a formal power series in

hbar (the laer is oNen set to 1 to simplify typography,

but this make things more difficult to grasp). The Legendre transform

of the logarithm is then also defined as a formal power series, and

by letting the coupling constants depend on the regularization


eps (or Lambda), one can take the limit eps to 0 (or Lambda to inNy)

to get the effective action, again as a formal power series.

From there, one can get the S-matrix, again as a formal power series.

FOR QED, the first few terms give highly accurate approximations;

for other QFTs, partial resumming of these series give acceptable

results in agreement with experiment.

Expanding objects of interest as power series is the hallmark of the

so-called perturbative approach. In contrast, nonperturbative methods

try to give meaning to the actual sums, though no one succeeded so


Indeed, convergence questions are open, although it is generally

believed that (as most series coming from a saddle point expansion

of an integral) the series is only asymptotic. See the section on

'Summing divergent series' in this FAQ.

But I haven't seen a single article that gives meaning (i.e.,

infrared and ultraviolet finite, renormalization scheme independent

properties) to, say, quantum electrodynamics states at finite t and

their propagation in time.

People don't even know what an inial state should be in a 4D

relativistic QFT (i.e., from which space to take the states at

finite t); so how can they know how to propagate it...

Thus the standard textbook theory gives an S-matrix (or rather an

asymptotic series for it) but not a dynamics at finite times.

This does not mean that there is no dynamical reality underlying

4D relavisc QFT. It only means that no one has been able to find

a working, logically consistent framework for it.

Probably people working in QFT imagine something like a state


in some unspecified Hilbert space underlying their formalism.

After all, this is how one justifies that the functional integral works.

Indeed, one can compute - nonrigorously, in renormalized perturbation

theory - many time-dependent things, namely via the Schwinger-


(or closed time path = CTP) formalism; see, e.g.,

For example,

E. Calzetta and B. L. Hu,

Nonequilibrium quantum fields: Closed-time-path effective action,

Wigner function, and Boltzmann equation,

Phys. Rev. D 37 (1988), 2878-2900.

derive finite-time Boltzmann-type kinetic equations from quantum

field theory using the CTP formalism.

There are also successful nonrelativistic approximations with

relativistic corrections, within the framework of NRQED and NRQCD,

which are used to compute bound state properties and spectral shifts.

See, e.g., hep-ph/9209266, hep-ph/9805424, hep-ph/9707481, and


There is also an interesting particle-based approximation to QED

by Barut, which might well turn out to become the germ of an exact
particle interpretation of standard renormalized QED. See

A.O. Barut and J.F. Van Huele, Phys. Rev. A 32 (1985), 3187-3195,

and the discussion in Phys. Rev. A 34 (1986), 3500-3501,3502-3503.

Approximately renormalized Hamiltonians, and with them an


dynamics, can also be constructed via similarity renormalization;

see, e.g.,

S.D. Glazek and K.G. Wilson,

Phys. Rev. D 48 (1993), 5863-5872.


It is usually applied in the front form (cf. the FAQ entry on

'Forms of relativistic dynamics'), as the many references to this

paper in (search for:

author:glazek author:wilson) show. See also hep-ph/0009071.

S.D. Glazek and K.G. Wilson,

Phys. Rev. D 49 (1994), 4214-4218.

gives a proof that, for renormalizable theories without

massless particles, similarity renormalization results in a

perturbatively finite Hamiltonian at all orders of perturbation theory.

A different, more explicit renormalized Hamiltonian framework is

given for quantum electrodynamics (but with a small photon mass

to avoid infrared problems) in the instant form in

E.V. Stefanovich,

Quantum Field Theory without Infinities

Ann. Phys. 292 (2001), 139-156.

Apart from the photon mass, it appears to be equivalent with QED

on the the renormalized level, and provides on the perturbative level

a representation of the Poincare group in the instant form, and

therefore a dynamical interpretation. (But many of his foundational

views voiced in hp:// are far from

being justifed.)

Thus both similarity renormalization and Stefanovich renormalization

give infrared-regularized QED a dynamical content at every order of

perturbation theory by providing approximate but finite,

UV renormalized Hamiltonians to each order that are asymptotic to a

formal Hamiltonian that acts formally as the

generator of translations in the Poincare group.

Convergence questions are not discussed. Also, the infrared

divergences are not addressed but must be removed by assuming a


photon mass, thus spoiling gauge invariance.

While Dyson's argument (see the FAQ entry on 'Summing divergent

series') implies that it is not reasonable to demanding a convergent

S-matrix expansion, the limit Hamiltonian in these approaches could

still be convergent. If this could be shown and the massless limit

for the photon performed, it would amount to an existence proof of

quantum electrodynamics.

In general, the correct Hamiltonian is

H = lim_{Lambda to inf} H(Lambda,g(Lambda,E,q_phys)),

where H(Lambda,g) is the canonical Hamiltonian with cutoff Lambda

and parameter vector g (containing the so-called bare mass, charge or

coupling constant, and field renormalization factor), and

g(Lambda,E,q_phys) is the cutoff-dependent parameter vector as

determined by renormalization conditions at energy scale E, which

relate g to a set of physical parameters q_phys (consisting, in case

of QED of the physical electron mass m, the physical electron charge e,

or, equivalently, the observed value of the fine structure constant


This limit probably exists, at least for renormalizable,

asymptocally free theories, at least in 1D and 2D field theory,

where this can be proved in certain cases. In 3D and 4D, one probably
needs also a Lambda-dependent inner product defining the Hilbert

to ensure that one ends up in the right representation,

and Lambda-dependent wave functions to ensure that the limiting

renormalized wave functions remain bounded in the limiting

renormalized inner product.

The consistency problem in a Hamiltonian approach to quantum field

theory is precisely to show that this limit indeed exists.

The missing consistent dynamical theory in 4D relavisc QFT

may also have consequences for the foundations of quantum


Clearly, measurements happen in finite time, hence cannot

be described at present in a fundamental way (i.e., beyond the

nonrelativistic QM approximation). Thus foundational

studies based on nonrelativistic QM are naturally incomplete.

This implies that it is quite possible that a solution of the

unresolved issues in relativistic QFT are related to the unresolved

issues in quantum measurement theory.


S9d. Perturbaon theory and instantaneous forces


In classical relativity theory, causality demands that all forces

are retarded. In relativistic quantum theory, this principle is

somewhat obscured, due to the approximations needed to get a

dynamical picture. The general practice is to expand in powers

of v/c, where v is a velocity and c is the speed of light.

When doing this, the resulting formulas look instantaneous at

each order of perturbation theory, which might invite unfounded


However, the same already happens at the classical level, where

the situation is easy to understand. The retarded terms must

reappear when summing terms to all order.

This is most easily seen by noting that a retarded differential

equaon (for simplicity 1D, but the 4D case is similar)

dx(t)/dt = f(x(t-tau)),

when expanded in powers of the small parameter tau, becomes a

higher order ordinary differential equation at fixed order.

To see this, differentiate the original equation k times and

introduce new functions

x_0=x, x_1=dx/dt, ..., x_k=d^kx/dt^k

to get a system of retarded differential equations.

Then expand the equation for dx_k/dt up to order n-k.

Then substitute terms on the right hand side.

The approximate equation is manifestly instantaneous, but it

describes the perturbative behavior of the retarded equation.

Thus perturbation theory in v/c cannot be used to decide about the

instantaneous or retarded nature of quantum dynamics.


S9e. QED and relavisc quantum chemistry


Relativistic quantum chemistry is needed to predict properties

of heavy atoms. This is usually done by invoking the Dirac-Fock

Hamiltonian, which is an approximation of the QED Hamiltonian

for which the multiparticle bound state problem is tractable.

Here are a few samples of what can be done:

The first is explicitly time-dependent;

the second is about bound states calculations;

the third shows how to add further QED corrections;

The fourth shows how the Dirac-Fock Hamiltonian arises as

approximation of QED. The last gives a discussion of some

mathematical problems involved.


Electron correlations and spin-orbit interaction in two-photon

ionization of closed-shell atoms: A relativistic time-dependent

Dirac-Fock approach

Phys. Rev. A 42, 3801-3818 (1990)

Bieron et al.

Large-scale multiconfigurational Dirac-Fock calculations of the

hyperfine-structure constants and determination of the nuclear

quadrupole moment of 49Ti

Phys. Rev. A 59, 4295-4299 (1999)


Multiconfiguration Dirac-Fock calculations of transition energies

with QED corrections in three-electron ions

Phys. Rev. A 42, 5139-5149 (1990)

P Chaix and D Iracane

From quantum electrodynamics to mean-field theory.

I. The Bogoliubov-Dirac-Fock formalism

J. Phys. B: At. Mol. Opt. Phys. 22 (1989) 3791-3814

M Defranceschi and C Le Bris

Computing a molecule in its environment: A mathematical viewpoint

Int J Quantum Chemistry 71 (1999) 227-250


S9f. Are protons described by QED?


The traditional field equations of quantum electrodynamics (QED),

which can be found in any textbook on quantum field theory, describe

only electrons, positrons, and photons, but not protons, although

the latter have electromagnetic interactions.

The reason is that, unlike free electrons and positrons, free protons

do not obey the Dirac equation since they have form factors which are

(unlike for electrons and positrons) determined not only by

with photons, but primarily by the inner structure of the proton.

Thus even bare protons cannot be understood as point particles, which

makes standard QED equations inapplicable.

To understand the proton's frm factors from first principles needs

quantum chromodynamics (QCD) - and even then they are imperfectly


In the traditional QED treatment of molecules and their interaction

with light, protons and other nuclei are typically treated as classical

sources of electromagnetic fields when determining the structure of

the electron. (The resulting effective potential between the

nuclear positions is quantized afterwards if a full classical treatment

is not adequate). This gives excellent agreement with experiment,

in particular for the hydrogen atom.

Of course, one can tread QED together with a proton field as an

effective (and nonrenormalizable) theory, in which in addition to the

Dirac equation for the bare electrons there is a Dirac-like equation,

modified by the form factors, for the bare protons. To describe atoms

correctly, one needs also fields for neutrons and mesons, and

appropriate interaction terms between them, leading to quantum

hadrodynamics (plus QED). This accounts for all practically

relevant properties of atoms (including nuclear fission and fusion).


S10a. How are matrices and tensors related?


Mathematicians and physicist differ in the notation used for

vectors, tensors, matrices, and multilinear forms. Here is

a dictionary.

T^q = tensor product of q copies of the vector space T;

in parcular, T0=S is the algebra of scalar fields and T1=T.

T^p_q = space of all linear mappings from T^q to T^p;

elements are (p,q)-tensors with p upper and q lower indices.

T_0^q = T^q

T_p0 =: T_p = (T^p)^* is the so-called dual space of T^q;

in parcular, T_1 = T^* is the dual space of T;

its elements are the linear forms = covectors.

One can associate with every A in T_p^q canonically a multilinear

mapping B: T_q tensor T^p --> S with

B(s,t) = t(As) for s in T^q, t in T_p,

and conversely; indeed, since the image As of s under A is in T^p,

its image t(As) is a well-defined scalar. Using the B's in place of

the A's gives an alternative way of defining tensors, although one

less convenient for visualization.

Given a basis on T and a dual cobasis on T^*, one can use coordinates.

Then physicists write

- elements of T as vectors = column vectors with an upper index,

- elements of T^* as linear forms = 1-forms = covectors = row vectors

with a lower index,

- elements of T^q as multivectors with q upper indices,

- elements of T_p as multicovectors with p lower indices,

- elements of T_p as mixed multi/ko/vectors with p lower and q upper


(There is also a dual version of this, where vector are considered

as rows and covectors as columns. The remainder then changes

In particular.

(0,0)-tensor = scalar,

(1,0)-tensor = vector (vector in T=T1) = column vektor,

(0,1)-tensor = covector (vector in the dual space T^*=T_1)

= row vector,

(1,1)-tensor = matrix (linear mapping from T to T).

Clearly, the columns of the matrix A_i^k are column vectors = vectors,

the rows are row vectors = covectors, and the indexing is consistent.

The requirement that basis and cobasis are dual is equivalent to the

statement that for every vector u and covector w (i.e., linear mapping

from vectors to scalars),

w(u) = w_i u^i;

here the Einstein convention is used that formulas involving

pairs of equally labelled indices, one of them a lower index

and the other an upper index must be interpreted as a sum over these


Mathematicians using linear algebra (where no tensors of order

>2 appear) write instead all indices as lower indices, no maer

whether they belong to row vectors, column vectors, or matrices.

They also write all sums explicitly, consider all vectors given

by a single leer as column vectors, and write covectors (1-forms)

explicitly using the transposition sign (^T, but statisticians often

use a prime ' instead, which is also the form used in Matlab).

This has many advantages and allows a simple notation

which increases understandability of otherwise long formulas.

Phys. notation: s = x^k y_k x vector, y covector

Math. notation: s = sum_k y_k x_k

or simply s=y^Tx.

Phys. notation: y_i = A_i^k x_k x,y vectors, A matrix

Math. notation: y_i = sum_k A_ik x_k

or simply y=Ax.

Phys. notation: s = A_i^i A matrix

Math. notation: s = sum_i A_ii

or simply s = tr A (trace).

Phys. notation: y_i = A_i^j B_j^k x_k x,y vectors, A,B matrices,

Math. notation: y_i = sum_jk A_ij B_jk x_k

or simply y=ABx.

Phys. notation: y_i = A_i^j B_j^k C_k^l D_l^m x_k

x,y vectors, A,B,C,D matrices

Math. notation: y_i = sum_jklm A_ij B_jk C_kl D_lm x_k

or simply y=ABCDx.

The linear algebra notation is compact and index-free,

in spite of the fact that coordinates are being used.

For higher order tensors, the advantages of the linear algebra

notation are less pronounced since one has to specify which

pairs of indices must be contracted. However, often, an index-free

notation is still possible:

Phys. notation: A_li = R_ijkl b^j c^k

Math. notation: A(u,v) = R(v,b,c,u)

Phys. notation: A_l^i = R^i_j^k_l b^j c_k

Math. notation: A(u,v^T) = R(v^T,b,c^T,u)

Phys. notation: A_i^j = R_i^kkj

Math. notaon: A = tr_23 R,

where the subscripts indicate which indices must be contracted.

All this is completely independent of any metric.

If a metric = nondegenerate symmetric (0,2)-tensor g is given on T,

which associates with u,v in T the scalar g(u,v),

one can canonically identify vectors and covectors, at the

expense of some confusion if one is not careful.

This reads in physicists notation as follows: The metric is

g_ik=g_ki (expressing the symmetry),

and for every vector u^k, the associated covector is

u_i = g_ik u^k.

Conversely, one can reconstruct the vector from the covector using

u^k = g^ik u_i,

where g^ik=g^ki is the inverse metric, a symmetric (2,0)-tensor which

for consistency must satisfy the equations

g_ij g^kj = delta_i^k (*)

with the Kronecker delta

delta_i^k = 1 if i=k, = 0 otherwise,

which is the identy matrix wrien as a (1,1)-tensor in index

notation. Nondegeneracy is precisely the solvability of (*) for the

dual metric.

Mathematicians find it confusing to label different objects with the

same symbol, and prefer to always distinguish between a vector and


canonically associated covector. Given a basis of T and the dual

cobasis of T^*, coordinates (row and column vectors) can be used to

define the elements of T and T^*; the metric g in T_2 is represented

in these coordinates by an invertible symmetric matrix = (1,1)-tensor G

such that

g(u,v) = u^TGv for u,v in T.

The canonical pairing induced by the metric therefore associates with

the vector u the covector

w^T = u^TG. (**)

Conversely, one can reconstruct from the covector w^T the canonically

associated vector

u = G^{-1}w.

The dual metric therefore maps u^T, v^T to u^TG^{-1}v, and is

represented by the inverse matrix G^{-1}.

The relation between the physicists form and the linear algebra form

of writing things can be inferred from (**) - we simply have

Phys. notation: g_ik

Math. notation: G = (g_ik)

Phys. notation: g^ik

Math. notation: G^{-1} = (g^ik)

Again, the linear algebra notation is compact and index free,

in spite of the fact that coordinates are being used.


S10b. Is quantum mechanics compable with general relavity?


The difficulty to reconcile quantum mechanics and general relativity

counts as one of the big problems of fundamental physics.

There appears to be a problem because canonical quantum gravity

based on quantizing the Hilbert action is nonrenormalizable.

(See the section on 'Renormalization in quantum gravity' in this

FAQ about how nevertheless to renormalize a nonrenormalizable field


The difference between renormalizable and unrenormalizable theories


that the former are specified by a (small) finite number of parameters

while the latter are specified by an infinite number of parameters.

In both cases, it is possible to extract approximate results from

computations, and the parameters can be tuned to fit the


results. This gives a consistent procedure for predictions. Indeed,

many nonrenormalizable theories are in use as effective field theories.

(See hep-ph/0308266 for a recent survey on effecve field theories.)

People who dislike nonrenormalizable theories do this on the basis of

a claim that their predictive value is nil because of the infinitely

many constants. But this is as unfounded as saying that


is not predictive because it depends on a function (the expression for

the free energy, say) that requires an infinite number of degrees of

freedom for its complete specification. Clearly, in the latter case,

the widespread use of finitely parameterized imperfect free energies

does not hamper the usefulness of thermodynamics. The same can be

said about nonrenormalizable field theories. It only implies that to

extract arbitrarily precise predictions one needs correspondingly

much information as input. We know that this is the case already for

many simpler phenomena in physics. (For indications that canonical

quantum gravity is nonperturbatively renormalizable see, e.g.,

hep-th/0110021, hep-th/0312114, hep-th/0304222.)

A different matter is the dream of a fundamental theory without any

free parameters, which of course conflicts with a theory in which

infinitely parameters are needed for its complete specification.

But there is no theorem that says that nature is governed by unique

principles. It is quite likely that the designer of the universe

had some choices besides the constraints imposed by logical


Thus I think this dream (which also fuels string theory) is misguided,

and the correct quantum version of general relativity is standard,

nonrenormalizable canonical quantum gravity.

This means that, quite likely, general relativity is fully compatible

with quantum mechanics.

Of course this conflicts with the view of powerful groups within

theoretical physics, who maintain that their approach to quantum

gravity (either string theory, or loop quantum gravity) is the road

to suuccess. But from what I have seen (at a somewhat superficial

level of understanding) I trust neither string theory

nor loop quantum gravity to be close to the truth. In any case,

both are completely separated from experiemental verification.

If experiments in the near future can probe some features of quantum

gravity, it will be for small quantum systems interacting with

external electromagnetic and gravitational fields. See gr-qc/0408010.


S10c. Difficules in quanzing gravity


(i) (mathematical) No consistent interaction relativistic quantum

field theory is known in 4 dimensions.

(ii) (theoretical) The accepted ways to avoid divergences in

expressions for scattering amplitudes that work in simpler theories

all fail because of the lack of renormalizability. See, e.g.,

the references in Secon 2.2 of

(iii) (theoretical) The theories for which a (perturbatively)

finite scattering theory is available have not been related

quantitatively to the established theories.

A convincing classical limit (to general relativity),

nonrelativistic limit (to a multiparticle Schroedinger equation with

Newtonian interaction), and low energy limit (at currently accessible

energies no new particles apart from the graviton) would be needed.

(iv) (conceptual) The three limits pose severe constraints on possible

quantum gravity theories, and it requires much imagination to come


with a conceptual basis in which these limit make sense and are

tractable. (But see the preceing entry.)

(v) (experimental) Quantum effects in gravity are so weak that no

experiments sensitive to quantum effects are in reach in the near

future, and the data from astromomy that may cast light on quantum

gravity are scarce. (Quantum gravity is not demanded by unexplained

data but only by the quest for consistency with particle physics.)


S10d. Renormalizaon in quantum gravity


Renormalization of QFTs is needed to make the coefficients in the

loop expansion (i.e., the expansion in powers of Planck's number hbar)

of the S-matrix well-defined.

Canonical quantum gravity is the theory obtained by writing down the

Einstein-Hilbert acon in a (3+1)-dimensional splitting (ADM

and either fixing coordinates and solving the constraints (reduced


space quantization) or quantizing using Dirac's approach to constrained

systems (Dirac quantization).

Covariant quantum gravity is the theory obtained as follows:

Write down the classical Hilbert action for general relativity,

look at the corresponding functional integral defined perturbatively

as for QED or QCD, and try to compute S-matrix elements using the

usual renormalization prescriptions for the integrals corresponding

to the various Feynman diagrams.

Quantum field theories are nowadays almost always defined in the

covariant way; the covariant approach has the advantage of being

manifestly invariant under the full symmetry group. (The canonical

approach to scalar QED fails in certain versions to preserve

Poincar'e symmetries, due to term ordering problems; see

gr-qc/9403065.) On the other hand, the canonical approach is

intrinsically nonperturbative, while the covariant approach needs

extra tricks (renormalization group enhancements) to get partial

nonperturbative results.
Covariant quantum gravity only works in the traditional way up to

1 loop (and together with maer not even then); at higher loops

(i.e., for corrections of higher order in the Planck constant hbar)

one needs more and more counterterms to make the resulting


of integrals finite. See

S. Deser,

Infinities in Quantum Gravities,

(and references [2,4] there). This is called 'nonrenormalizability',

and is the main blemish of covariant quantum gravity.

(For other potential problems, see, e.g., gr-qc/0108040.)

Note that quantum gravity, though nonrenormalizable in the

established sense, is renormalizable in a weak sense,

where infinitely many counterterms are allowed; see

J. Gomis and S. Weinberg,

Are Nonrenormalizable Gauge Theories Renormalizable?

Most researchers in quantum gravity want a renormalizable theory

in the strong sense (so that finitely many counterterms suffice);

then covariant quantum gravity is out, and people look

for fancy alternatives (loop quantum gravity, superstring

theory, etc.). However, these theories have their own difficulties.

Some online references are:

gr-qc/9803024: Strings, loops and others: a crical survey

of the present approaches to quantum gravity

gr-qc/9710008: Loop quantum gravity

hep-th/9709062: Introducon to superstring theory

astro-ph/0304507: Update on string theory

hep-th/0311044: The nature and status of string theory

physics/0605105: a short review of superstring theories

gr-qc/0410049 shows how gravity derives from string theory;

a more complete derivaon is in secon 3.7 of Polchinski's book.

Phys. Rev. Le. 60, 2105-2108 (1988) discusses the lack of Borel

summability of the S-matrix expansion for the bosonic string.

hp:// tells about the state

in 2003 concerning the claims of (super)string theory to be a

renormalizable quantum theory. Only the 2 loop case seems to be

settled; see arXiv:hep-th/0501197 and hep-th/0211111 (especially

Secon 14 of the laer for the unsolved problems at 3 loops and


Others treat covariant quantum gravity just as they treat

nonrenormalizable effective field theories, and fare well with it.

See, for example,

C.P. Burgess,

Quantum Gravity in Everyday Life:

General Relativity as an Effective Field Theory

Living Reviews in Relativity 7 (2004), 5

for 1-loop corrections, and

Donoghue, J.F., and Torma, T.,

Power counting of loop diagrams in general relativity,

Phys. Rev. D, 54, 4963-4972,

for higher-loop behavior.

Secon 4.1 discussed recent computaonal studies showing that

covariant quantum gravity regarded as an effective field theory

predicts quantitative leading quantum corrections to the

Schwarzschild, Kerr-Newman, and Reisner-Nordstroem metrics.

Only a few new parameters arise at each loop order, in particular only

one (the coefficient of curvature^2) at one loop.

In particular, at one loop, Newton's constant of gravitation becomes

a running coupling constant with

G(r) = G - 167/30pi G^2/r^2 + ...

in terms of a renormalization length scale r.

Here is a quote from Secon 4.1:

''Numerically, the quantum corrections are so miniscule as to be

unobservable within the solar system for the forseeable future.

Clearly the quantum-gravitational correction is numerically extremely

small when evaluated for garden-variety gravitational fields in the

solar system, and would remain so right down to the event horizon

if the sun were a black hole. At face value it is only for separations

comparable to the Planck length that quantum gravity effects become

important. To the extent that these estimates carry over to quantum

effects right down to the event horizon on curved black hole

geometries (more about this below) this makes quantum corrections

irrelevant for physics outside of the event horizon, unless the

black hole mass is as small as the Planck mass''


S10e. Hadamard states and their Hilbert spaces

In his book on qunatum field theory in curved spacetime

Wald delineates a class of 2-point functions called Hadamard states

that have locally the same kind of singular behavior as the flat

free 2-point functions. This class of states is also natural from

several other points of view, though I cannot give details off-hand

since this is slightly outside my field of knowledge.

Associated to each Hadamard state is a Gaussian state |0>

of the quantum field which is constructed from the 2-point function

via Wick's theorem. This state is often called a 'vacuum state',

though this is not quite appropriate, unless one allows the vacuum

to carry gravitational and electromagnetic fields. A more appropriate

name would be a 'coherent state' since it is the generalization of

coherent states in the Fock spaces considered in optics.

Each Gaussian state produces a Hilbert space of wave functions

consisng of linear combinaons of the a*_k1 a*_k2 ...|0>,

weighted by sufficiently smooth functions of the k's to render

their norm finite.

All states in this Hilbert space are also physically reasonable,

but they do not have the same basic (vacuum-like)

status as the Hadamard states since they are no longer Gaussian,

and hence are harder to work with.

But you can evaluate <psi|phi(x)phi(y)|psi> in such a state by

expanding everything in terms of vacuum expectations of expressions

in a's and a^*'s and applying Wick's theorem. Their leading singular

behavior is probably the same as for the Gaussian state itself,

though I haven't tried to check this.


S10f. Why do gravitons have spin 2?


The reason is that gravitation is described by a metric

(symmetric 2-tensor field) modulo general covariance,

which gives locally, in the tangent Minkowski space of any point,

a spin 2 representaon of the Poincare group.

Gravitational waves have to be (classically) long range,

which requires (after quantization) massless particles.

Thus gravitons (although never observed) should be massless

spin 2 parcles.

S10g. What is the tetrad formalism?


A way of writing general relativity such that it can be

applied to a spinor (e.g. electron) field.

A tetrad is a set of four linearly independent

vector fields e_0, e_1, e_2, e_3.

Considering them orthonormal in the sense that

g(e_j,e_k)=eta_jk (*)

where eta is the Minkowski metric defines the

metric g uniquely; conversely, for any metric one can

choose (on any chart) such an orthonormal basis.

If the manifold is parallelizable then one can choose

the ONB even globally. In 4 dimensions, any manifold

which allows to define spinors consistently is

parallelizable (by a result of Geroch), hence reality

is most likely described by such a manifold.

Using (*), one can rewrite any formula involving the

metric into one involving instead tetrads, and many

things simplify - using tetrads is closer to the Cartan

formalism of differential geometry than using the metric

directly. E.g.,

sqrt(-det g) = det(e).

One has to be slightly careful not to confuse curved

and flat indices, but this is learnt very quickly.

Then one needs much less index shifting.

For gravitation coupled to a (classical) Dirac field,

the tetrad formalism is indispensable, since spinors

cannot be defined without a flat representation.


S10h. Energy in general relativity


Energy is no absolute concept, but depends on the observer

(in the nonrelativistic case, by choice of a velocity,

in the relativistic case, by choice of time-like unit

vector that defines the direction of time and hence the

time coordinate).

In classical mechanics there is always a (up to rotations)

distinguished center of mass frame where the whole system

is at rest and the center of mass at zero.

The observer is usually (silently) considered to be at rest

with respect to that frame; then there is no ambiguity

left in the energy.

In special relativity things are already more problematic

since there is no natural center of mass. But one can fix

the time direction by taking it to be that of the total

4-momentum of the whole system. This again fixes a frame,

now up to Euclidean motions. On the other hand, this is not

what an observer (who has a slightly different eigentime

depending on its 4-momentum) sees, and must be corrected


In general relavity the conserved total 4-momentum is

identically zero, so there is no longer a way to fix a

time direction. But assuming an asymptotically flat

space-time one can take its flat coordinate system

(determined up to a Poincare transformation) and

use it to chart the localized part, and gets a Minkowski

description, to which the preceding applies.

In general relativity, the concept of energy depends on the

choice of a spacelike hypersurface defining a region of space

and a time-like vector field along that hypersurface defining

the direction of time: Then the integral of [part of]

the (0,0)-component of the energy-momentum tensor over this

hypersurface defines the corresponding [part of the]

energy in this region.

This allows one to talk about the (observer-dependent) energy

of a subsystem, or of all matter in the universe, etc.

Observer-independent is the energy-momentum tensor density

as a whole, but not energy.

The weak-field limit defines a preferred coordinate system,

thus reducing the arbitrariness to the choice of the time

direction, and the nonrelativistic limit fixes this choice

to be the direction of the total momentum of the reference

object (e.g., the earth or sun or our galaxy). This makes

everything completely determined and gives us a good

energy for everyday life.

Note that using the concept of energy does not require

a global conservation law.

Even in nonrelativistic classical mechanics, energy is conserved

only for isolated systems, while the concept is used very

profitably in all sorts of nonisolated settings. It just means that

one needs to account in the balance equations for what happens

at the boundary, and (if necessary) include friction terms

(which describe, so to speak, the boundary to the neglected

microscopic degrees of freedom).

Thus, to connect general relativity to what most physicists actually

study, namely systems localized in a small region of space and time

(small may mean, e.g., a laboratory, the earth, the solar system,

or our galaxy - within an hour, a year, a few millenia, etc.)

one needs to make precise what energy means for such pieces of the

whole universe.

This requires that the observer specifies the region of space of

interest, and the length of time of interest, including the way time

is supposed to flow. The observer also has to specify which part of

the energy is of interest, i.e., the terms in the energy-momentum

tensor that define the system (contrasted to the environment -

which make up all the other terms).

After all that is done, energy has a well-defined meaning,

as given above.
On the other hand, the observer-independent notion generalizing

energy is the full energy-momentum tensor; its tensor

nature reflects the need for observer information to extract

from it numerical values, i.e. real numbers that can be compared

with experiment. But apart from energy it also contains the

observer-independent part of the information about momentum

and stress, which themselves are also observer-dependent.


S10i. What happened to the aether?


The aether as supporting substance for electromagnetic waves

was a standard hypothesis in the 19th century but fell out of

favor with the successes of relativity theory.

When in vogue, the aether was the substance filling empty space

- i.e., the physics of the aether is the physics of empty space.

In a way, the classical background field (also termed the 'vacuum',

or more neutral a 'coherent state' or - in quantum gravity -

a 'Hadamard state') around which the quantum field is expanded into

excitation modes (photons, gravitons, etc.) is the modern equivalent

of the aether. However nobody uses the term since it it fraught with

misleading connotations, and not really needed.

In modern language, the aether is called the vacuum, and the


of the aether are the properties of the vacuum.

While the 19th century aether was thought to be at rest,

the 20th century aether (= the vacuum in a quantum field theory)

is a Poincare invariant state with zero quantum numbers.

(In a putative quantum gravity, it would even be a diffemorphism

invariant state, should something like that exist. The Unruh effect

indicates, however, that there is probably no objective vacuum,

since emptiness is observer dependent.)

Indeed, Poincare invariance is the modern way of saying

'being at rest' - the momentum of a Poincare invariant state is zero

in every frame of reference, and the mass of a Poincare invariant

state must also be zero, which implies that the vacuum is empty

in terms of mass. (It is however allowed to be filled by a constant

nonzero Higgs field, as required in the standard model.)

Identifying the aether and the vacuum is consistent with the way
Einstein thought about the topic, as the following quotes from

Einstein's lecture (in German) at the University of Leyden, 1920, show:

''Da solche Felder auch im Vakuum - d.h. im freien Aether - auftreten,

so erscheint auch der Aether als Traeger von elektromagnetischen


''Man kann hinzufuegen, dass die ganze Aenderung der


welche die spezielle Relativitaetstheorie brachte, darin bestand,

dass sie dem Aether seine letzte mechanische Qualitaet, naemlich die

Unbeweglichkeit, wegnahm.''

''Man kann die Existenz eines Aethers annehmen; nur muss man

verzichten, ihm einen bestimmten Bewegungszustand zuzuschreiben,

d.h. man muss ihm durch Abstraktion das letzte mechanische Merkmal

nehmen, welches ihm Lorentz noch gelassen hatte.''

''Der Aether der allgemeinen Relativitaetstheorie ist ein Medium,

welches selbst aller mechanischen und kinematischen Eigenschaften

bar ist, aber das mechanische (und elektromagnetische) Geschehen

''Man kann also wohl auch sagen, dass der Aether der allgemeinen

Relativitaetstheorie durch Relativierung aus dem Lorentzschen Aether

hervorgegangen ist.''

''... Den Aether leugnen bedeutet letzten Endes annehmen, dass dem

leeren Raume keinerlei physikalische Eigenschaften zukommen...''

For the complete speech in German and in English translation, see


(the part with the above quotes is not freely available online).

Note that the QFT vacuum is considered by many as a very dynamical

entity, being able

1. to have excitaons, namely single particles and multiparticle

states; in particular photons = quantized electromagnetic waves,

2. to exhibit spontaneous symmetry breaking, and

3. to generate random parcle-antiparticle pairs.

(In some people's imagination, being able to

4. allow whole universes to pop up or disappear!)

Thus the modern vacuum looks much more like the 19th century
(whose excitations were the classical electromagnetic waves)

than the classical vacuum to which Einstein was referring.


S10j. What is me?


It is commonly asserted that in general relativity there is no

absolute simultaneity. On the other hand, it is asserted that

we see the Sun as it was 8 minutes ago and the Andromeda nebula

as it was 2.5 million years ago. This seems to conflict with

each other - apparently we have no diffeomorphism invariant way

of assigning a relative time to a distant object.

Let us take a closer look at the issues involved.

The invariant way of defining present is to say that

x and y are present if the two points are in a spacelike relation,

and to say y was earlier (or later) than x if y lies in or on

the past (or future) light cone.

Thus the present is well-defined as the complement of the

closed light cone.

Now suppose that you look at the sun. If one is really pedantic,

one would have to say that you see the sun in your eye, as a

2D object, and not out there in 3D. But we are accustomed to

interpret our sensations in 3D and hence put the sun far away

but into the here.

In general relativity, one goes a step further.

One thinks in terms of the 4D spaceme manifold and places the sun

there. Calculang the length of the geodesic gives a value of 0,

so the sun is not in your present. Consideration of the sign of the

time component in an arbitrary proper Lorentz frame, one finds that

the sun is in your past, as everything you observe.

But the amount of invariant time passed, as measured by the metric,

is zero. This looks like a paradox. What happened with the claimed

8 minutes?

The answer is that the metric time is not the right way to measure

time. It is the only time available in a Poincare-invariant flat

universe, or in a diffeomorphism invariant curved universe.

An empty universe where only noninteracting observers

move has no notion of simultaneity.

But a matter-filled, homogeneous and isotropic universe

generally has one, defined by the rest frame of the galactic fluid

with which general relativity models cosmology.

Since the fluid breaks Lorentz symmetry (except in

very special cases, which are ruled out by experiment)

it creates a preferred foliation of spacetime.

This foliation gives a well-defined cosmic time, when

scaled to make the expansion of the universe uniform.

(Actually there are several natural scalings = monotone

transformations of the time parameter;

see Secon 27.9 in Misner/Thorne/Wheeler, so cosmic me

without a reference to the scale used is ambiguous.)

This cosmic time figures in all models of cosmology.

The values commonly talked about when quoting times

for cosmological events, such as the date of the big bang

or the time a photon seen now left the Andromeda nebula,

refer to this cosmological time.


S10k. Time in quantum mechanics

In the traditional formulation of quantum mechanics, time is not an

observable. Nevertheless it can be observed...

In the Schroedinger picture, the state is defined at fixed times,

which distinguishes the time. In this picture, time measurement

is difficult to discuss since the time at which a state is considered

is always sharp.

In the Heisenberg picture, time is simply a parameter in the

observables, and therefore also distinguished, but in a different way.

Parameters are in fact just continuous indices and not observables.

As 3 is not an observable while p_3 is one, so t is not an observable

but H(t) is one. Observables have at _each_ time an expected value;

the moment of time (''now'') is not modelled as observable.

But what can be modelled is a clock, i.e., a system with an observable

which changes with time in a predictable way. If the observable u(t)

of a system satisfies

ubar(t) := <u(t)> = u_0 + v (t - t_0) (v nonzero) (*)

with sufficient accuracy, one has a clock and can find out by means of

<u(t)> how much time

T = Delta t

passed between two observed data sets.

This is also the usual way we measure time in classical physics.

Of course, to be a meaningful time measurement, T must be large


compared with the intrinsic uncertainty

Sigma_T := |v^{-1}| sigma(u(t)).


sigma(u(t)) = sqrt(<(u(t)-ubar(t))^2>)

is the standard deviation in the properly calibrated

(quantum mechanical) state <.>. If (*) has significant errors

then Sigma_T is of course correspondingly larger.

In relativistic quantum field theory (which in its covariant Version

can only be formulated in the Heisenberg picture), the 1-dimensional

me t turns into the 4-dimensional space-time position x. Now x

is a vector parameter in the observables (fields), and hence is not an

observable. Space and time are now on the same level (allowing a

covariant point of view), but both as non-observables.

The observables are fields; positions and times of particles are

modelled by unsharp 1-dimensional world lines characterized by a high

density of the expectations of the corresponding fields.

(Think of the trace of a particle in a bubble chamber.)

For posion and me measurement, one now needs a 4-vector field

u(x) with

<u(x)> = u_0 + V (x - x_0)

and a nonsingular 4x4 matrix V, and the intrinsic uncertainty

takes the form

Sigma_T := sigma(V^{-1}u(x))


sigma(a(x)) = sqrt(<(a(x)-abar(x))^*(a(x)-abar(x))>),


Conclusion: In nonrelativistic quantum mechanics, time is always

measured indirectly via the expectations of distinguished observables

of clocks in calibrated quantum mechanical states. In relativistic

quantum field theory, the same holds for both position and time.

However, this analysis works only when one assigns to single clocks

a well-defined state, hence assumes a version of the Copenhagen


From the point of view of the minimal statistical interpretation,

one needs in contrast a whole ensemble of identically prepared

clocks to measure time...

Note that in relativistic quantum mechanics, a single particle is

described (in the absence of an external field) by an irreducible

representation of the Poincare group. Here only the components of

4-momentum and the 4-angular momentum are observables. From


one can reconstruct observer-dependent 3-dimensional (Newton-


position operators satisfying canonical commutation rules, but not

a time operator.


S10l. Diffeomorphism invariant classical mechanics


In mechanics, me is a point in a 1-dimensional manifold,

and diffeomorphisms are just smooth reparameterizations of the time.

For any Lagrangian of the form

L(q,qdot,t) := U(q(t)) qdot(t),

where q is an n-dimensional column vector and U an n-dimensionaler

row vector, the action

S = integral L(q,qdot,t) dt

is diffeomorphism invariant. As a consequence, the Noether energy

(the formal Hamiltonian constructed in the transition from a


to a Hamiltonian formulation) vanishes identically and has no physical

content. For one can bring an arbitrary Hamiltonian system

xdot=H_p(p,x) , pdot=-H_x(p,x),

where H is the physically relevant energy, into the above form by


q^T = (x^T,p^T,s),

U(q) = (p^T,0^T,-H(p,x)).

For a careful discussion see Secon 4.3 of

PJ Olver,

Applications of Lie groups to differential equations,

Springer, New York 1993.

Those who can read German, can find more in the Section on

''Diffeomorphismeninvariante klassische Mechanik'' in my

German Theoretische-Physik-FAQ at

For diffeomorphism invariant reformulations of arbitrary field

theories, see
C.G. Torre,

Covariant phase space formulation of parameterized field theories,

J. Math. Phys. 33 (1992) 3802-3812



S10m. The concept of ''Now''


Time is passing - what is ''now'' in our subjective experience

changes. But there is no concept of ''now'' in physics.

Classical nonrelativistic mechanics does not know the concept of now.

One declares some time to be ''now'' - but which time one declares to

be ''now'' is completely subjective (i.e., in different situations it

will be declared differently). Similarly, one declares some position

to be ''here'', but which position you declare to be ''here'' is

completely subjective, in the same sense.

Classical relativistic mechanics does not know the concept of now,

either, but things change a little: Here one declares some event
(= spacetime point) to be ''here and now'' - but which event one

declares to be ''here and now'' is completely subjective.

Nonrelativistic quantum mechanics treats time completely differently

from space (time is a parameter, space coordinates are operators),

and introduces stochastic elements into the dynamics.

but with respect to ''here'' and ''now'', the situation is identical

with that in the classical nonrelativistic case.

Relativistic quantum mechanics restores the treatment of space and

time on equal footing (space annd time coordinates are parameters),

and introduces stochastic elements into the dynamics.

But with respect to ''here and now'', the situation is identical

with that in the classical relativistic case.

Once one has chosen ''here'' and ''now'', respectively ''here and now'',

it serves as origin of the tangent hyperplane, in which localized, flat

physics can be done, reflecting faithfully what happens in a

neigborhood of the spacetime point. This is the domain of relativistic

quantum field theory.

S11a. A concise formulaon of the measurement problem of QM


Quantum mechanics asserts in the Born rule (also called Lueder's rule)

that when a particle prepared in a pure state passes an ideal

measuring instrument characterized by a finite family of mutually

orthogonal projectors P_k (with P_k = P_k^*, P_k P_l = delta_kl P_l

and sum_k P_k = 1), it transforms the pure state psi into the pure

state psi_k = P_k psi/p_k with probability p_k= psi^* P_k psi.

This is a consistent rule in a purely statistical interpretation

in which psi is an objective property of a source (describing the

statistical behavior of an ideal - stationary and pure - source of

particles) rather than an objective property of each individual


The measurement problem arises when (as is commonly informally


the wave function is regarded as an objective property of a particle.

Then the stochastic transformation demanded by the Born rule, called

the collapse of the wave function, conflicts with the deterministic,

unitary dynamics of the wave function demanded by quantum


of the joint system consisting of particle+instrument+environment.

The unitary dynamics predicts that the joint system is in a macroscopic

superposition, which is not observed.

Note that a measurement does not need a conscious observer.

A measurement is any permanent record of an event, whether or not

anyone has seen it. Thus the terabytes of collision data collected

by CERN are measurements, although most of them have never been

looked at by anybody. We human beings only look at crude summaries

of such high tech data, but the collapse (which gives rise to

individual particle tracks) is clearly independent of whether or when

we look at them.


S11b. The double slit experiment


The double slit experiment, where a broad beam of particles passes

a screen with two slits, is one of the most fundamental quantum


Standard wave function arguments for purely unitary quantum

predict (at best) that the effect of the screen is to turn a particle

in a pure state psi into a superposition of at least three terms,

one each for being in one of the two beams (for sufficiently wide

slits) or spherical waves (if the slits are narrow enough)

passing the slit and a third (or more) for the particle being stuck

somewhere on the screen.

This conclusion is arrived at as a simple consequence of linearity of

the Schroedinger equation, together with natural assumptions of what

happens for particles prepared in coherent states.

But it is generally believed - and assumed in _all_ discussions of

interference - that a double slit screen projects a particle with

incoming wave function psi with the correct Born probability to a

particle in a superposition of the two beams that pass the slits.

The challenge is to derive this from a quantum model of the situation,

without invoking explicit collapse anywhere in the derivation.

Before this cannot be done convincingly, I don't consider the

measurement problem solved.

For a precise version of a (slightly different) challenge, see

S11c. The Stern-Gerlach experiment


Another basic quantum experiment is the Stern-Gerlach experiment.

An input beam of silver atoms is passed through an inhomogeneous

magnetic field in a fixed direction, which produces a sideways

classical force on each silver atom proportional to the atom's

magnetic moment. The magnetic field is said to split the input beam

into two separate beams corresponding to atoms of spin up and down,

respectively, which shows in the experiment as silver spots where

the beams hit a screen. If the beam of silver atoms is replaced by

a beam of electrons with very low intensity and the screen is replaced

by a more sensitive detector, one observes single detection events,

each randomly at one of the two spots. Each such event is generally

interpreted as a spin measurement (up or down), which makes sense

only if the wave function actually collapses to |up> or |down>.

(Though this is very questionable since the electron stops existing

as an object separable from the screen.)

If a blocker is put in the way of one of the beams, the corresponding

spot on the screen disappears, but if the blocker is sensitive as well,

single observations are found to occur at the blocker as well.

According to strictly orthodox but purely unitary quantum mechanics,

the situation is the following:

If a single particle leaves the magnetic area, it is in an entangled

state consisting of a bilocal superposition of wave packets somewhere

along the two beams. When it encounters the blocker,

this single electron turns into a still bilocal superposition of wave

packets: One remains stuck where the blocked beam meets the block

and the other continues its motion along the unblocked beam.

A little later, this second wave packet meets the screen, and we end up

with a still bilocal superposition of wave packets, now both sitting

at the end points of the respective beam. Without the blocker,

essentially the same happens, except that the electron ends up

in a superposition of two spots on the screen.

More precisely, what happens is that if one starts with a pure

state |x,p> |left>, where |x,p> denotes an approximately coherent

state with position x and momentum p, and


one gets approximately a superposition

1/sqrt(2)(|x^+(t),p^+(t)>|up> +|x^-(t),p^-(t)>|down>),

where the parameters in the approximately coherent states follow

classical paths in phase space determined by approximately classical

motion due to the magnetic field, the blocker and the screen -

After hitting blocker and screen. respectively, positions are constant

and momenta vanish, and the particle is in a superpostion of two


All this follows without difficulty from the superposition principle,

i.e., from the linearity of the Schroedinger equation.

To match observations in an objective interpretation of the wave

function, one needs a mechanism for changing the unobserved

superposition of spots into the observed definite spot. In an

observer-independent interpretation this has to happen in the split

moment between the particle feeling the presence of blocker or screen

and hitting or passing it. This is the so-called collapse of the wave


According to the old school (von Neumann, London and Bauer,


in a purely unitary setting it requires a conscious look at what

really happened to change the superposition of spots into a definite

spot, which gives quantum mechanics an uncomfortable subjective,

human-centered touch.

S11d. The minimal interpretaon


The minimal interpretation of quantum mechanics does not model

what really happens - it only claims probabilities.

When quantum mechanics is applied to small systems, one usually asks

only for statistical information. Here a collapse simply means a

change of the point of view resulting in taking conditional

expectations, and all difficulties disappear.

In that case, each particle simply moves in an undeclared and

undeclarable fashion along the experimental setting, the classical

instruments are always in a definite state, and instead of

superpositions one has probabilities of observation of exactly one

of the possible results in the superposition.

Now all objectivity (sources and preparation, detectors and

measurements) is in the classical setting only, which coexists

with the somewhat spooky quantum world, connected by quantum


The problem here is how to unify what happens classically

and quantum mechanically. This minimal view becomes inconsistent

once one wants to consider the classical system as a large quantum

system - all objectivity disappears since macroscopic superpositions

are possible.

(Generally, nonlinear modifications of the Schroedinger dynamics

are considered a possible way out, but this introduces other problems.)

The main limitation of the minimal interpretation is that it does not

apply to systems that are so large that they are unique.

Today no one disputes that the sun is governed by quantum


But one cannot apply statistical reasoning to unique systems, such as

the sun as a whole.

If quantum mechanics is a universal theory of nature, it should also

apply to the sun as a whole. At least we know that it applies to the

extent that it governs the energy generating processes in the sun.

The actual numerical analysis of models of the sun use just

treats the nuclear reactions within a classical reaction-diffusion

framework, which (in principle - I don't know whether anyone has

actually done it) should be derivable from quantum mechanics using

statistical mechanics arguments.

A purely statistical interpretation has also a problem with the

notion of probability. (See the discussion on probability elsewhere

in this FAQ.) Probability (and hence the quantum state that predicts it)

is often seen as a subjective view about the experimenter's assumed

knowledge, or the knowledge an experimenter could gain when 100%

attentive. There is the subjectivist difficulty to determine

whose knowledge counts and why unobserved (and hence unknown)

classical processes still make a difference;

but one could imagine an ideal classical observer of the status of

Laplace's demon, for whom these problems would be absent.


S11e. The preferred basis problem


Born's rule, stated in the form that |<phi|psi>|^2 is the probability

that a system prepared in state psi is, upon measurement, found in


phi, is valid only if a complete set of commuting observables is

measured and phi belongs to the preferred basis determined by the

experimental setting (i.e., the family of projectors).

Given the present state of the universe (which fixes the experimental

setting), there is no choice in the preferred basis. Thus, in a

mathematical model of quantum mechanics in the large, it has to

be deduced from the assumptions about the initial state and the


The preferred basis is fully determined by Nature, and that's why we


find it out. Given an unknown instrument, one finds out by

experimenting with the new piece, letting it interact with systems

of known properties, and matching the collected data to trial models

until one fits. This is how things are indeed done in practice.

The process is called model calibration (or parameter estimation if

the model is fixed up to adjustable parameters).

At first, one never knows a new instrument precisely, and has to check

out its properties. After sufficient experience with enough instruments,

one knows reasonably well what to expect of the next, similar one.

Then only fine-tuning is needed, which saves time. And this knowledge

can be used to create new instruments which are likely to behave a

certain way; but one still has to check to which extent they actually

do, since no theoretical design is realized exactly in practice.

Not even in the classical, macroscopic domain!

Nature's choice is systematic, hence after having

seen that a number of screens have a preferred position basis,

we conclude that this is the case generally. As for a spectrometer,

if it is built with a prism to analyze light, it is reduced by theory

to the observation of light or current at certain positions of the

screen, which is done in the preferred position basis. Something

similar can be said about the Stern-Gerlach experiment.

So once one knows _some_ of Nature's preferences and the general


one can deduce other preferences.

The challenge posed in the measurement problem is to deduce

from first principles that a screen made of quantum matter,

with two slits in it, actually has a preferred position basis and

projects the incoming system to the part determined by the slits.


S11f. Master equaon and pointer variables


On an approximate level, the preferred basis problem is approached

via quantum master equations.

A quantum master equation is a dynamical equation for the density

of a dissipative quantum systems, which approximates a quantum


weakly coupled to an environment at time scales long compared to the

typical interaction time but short enough to avoid recurrence effects.

More precisely, the dynamics is given by a completely positive

Markovian semigroup in a representation named after Lindblad,

wo discovered its general form.

For a classical damped linear system xdot(t)=Ax(t) with a matrix A

whose spectrum is in the left complex half plane, the contribution of x

in the invariant subspace corresponding to eigenvalues which are not

purely imaginary decays to zero, so that at large times t,

x(t) essentially approaches the invariant subspace corresponding to

purely imaginary eigenvalues.

For a quantum master equation, a similar analysis holds and shows


(under suitable conditions) the density matrix at times much larger

than the so-called decoherence time approaches a block diagonal form

in a suitable basis. Thus it (almost) commutes with a special set

of observables, which define the 'pointer variables' of the system.

These pointer variables therefore behave essentially classically.

If the pointer variables form a complete set of commuting variables,

the density matrix approaches a diagonal matrix, and the basis in

which this happens is called the 'preferred basis'.

For details, see, e.g., cond-mat/0011204 or gr-qc/9406054


S11g. Does decoherence solve the measurement problem?


Many physicist nowadays think that decoherence provides a fully

satisfying answer to the measurement problem. But this is an illusion.

Decoherence is the (experimentally verified) decay of

off-diagonal contributions in a density matrix (written in a

preferred basis), when information dissipates into unobservable

degrees of freedom in the environment of a system.

In particular, decoherence reduces a pure state to a _mixture_

of eigenstates. This is enough to induce classical features

in many large quantum systems, characterized by a lack of

interference terms.
Thus decoherence is very valuable in understanding the classical

features of a world that is fundamentally quantum.

On the other hand, the 'collapse of the wave function'

selects _one_ of the eigenstates as the observed one.

This ''problem of definite outcomes'' is part of the measurement

problem. It is still a riddle, and not explained by decoherence.

See the excellent survey article

M. Schlosshauer,

Decoherence, the measurement problem, and interpretations of



Rev. Mod. Phys. 76 (2005), 1267-1305.


The champions of the decoherence approach are (not always

but at least sometimes) quite careful to delineate what decoherence

can do and what it leaves open. For example, Erich Joos, coauthor

of the nice book 'Decoherence and the Appearance of a Classical World

in Quantum Theory',

explicitly states in the last paragraph of p.3 in quant-ph/9908008

that (and why) decoherence does not resolve the measurement

If the big crowd has a cruder point of view, it means nothing but

lack of familiarity with the details.

If the quantum mechanical state is taken only as a description

of a large ensemble, as in the Statistical Interpretation

(see next question), there is no problem.

But the riddle is present if one insists that the quantum mechanical

state describes a single quantum system (as seems to be required for

today's experiments on single atoms in a ion trap), which makes the

collapse a necessity.

In spite of all results about decoherence,

Wigner's mathematically rigorous analysis of the incompatibility

of unrestricted unitarity, the unrestricted superposition principle

and collapse, Chapter II.2 in:

J.A. Wheeler and W. H. Zurek (eds.),

Quantum theory and measurement.

Princeton Univ. Press, Princeton 1983,

in parcular pp. 285-288, is unassailable.

In a nutshell, Wigner's argument goes as follows:

If a measurement of 'up' turns the complete system

(including the measured system, the detector, and the environment)

into the state

psi_1 = |up> tensor |up-detected> tensor |env_1>

and a measurement of 'down' turns it into

psi_2 = |down> tensor |down-detected> tensor |env_2>

and the projections of these states are stable under repetition

of the measurement (but possibly with different |env> parts>)

then, by linearity, measuring the state

|leN> = (|up> + |down>)/sqrt(2)

necessarily places the whole system into the superposition

(psi_1 + psi_2)/sqrt(2)

of such states and _not_ (as would be needed to account for the

experimental observaons) into a state of the form as psi_1 or psi_2,

depending on the result of the measurement.

Wigner's reasoning implies that a

resolution of the measurement problem requires giving up one of

the two holy tenets of traditional quantum mechanics: unrestricted

unitarity or the unrestricted superposition principle.

Von Neumann and with him most textbook authors opted for giving up
unitarity by introducing collapse as a process independent of the

Schroedinger equation. This is no longer adequate since we now know

that there is no dividing line between classical and quantum, so

that a measurement can no longer be idealized in the traditional

fashion. But then there is no longer a clear place for when the

collapse happens, and more specific solutions are called for.

My paper

A. Neumaier,

Collapse challenge for interpretations of quantum mechanics


(see also

contains a collapse challenge for interpretations of quantum


that brings to a focus the requirements for a good solution of the

measurement problem.

In my opinion, the collapse is no fundamental principle but

the result of _approximating_ the entangled dynamics of a system

with its environment by a Markovian dynamics for the system itself,

resulting in a dissipative master equation of Lindblad type.

The latter have a built in collapse. The validity of the Markov

approximation is an _additional_ assumption beyond decoherence,

which is responsible for the collapse. Its nature is similar to

that of the socalled Stosszahlansatz in the derivation of the

Boltzmann equation.

Quantum optics and hence all high quality experiments for

the foundations of quantum mechanics are unthinkable without

the Markov approximation.


S12b. Which textbook of quantum mechanics is best for foundations?


For large ensembles, there seems to be no disagreement about the

interpretation. The book

A. Peres,

Quantum theory - concepts and methods,

Kluwer, Dordrecht 1993

is probably the most useful (i.e., both clear and applicable)

account of foundational aspects on this level. It is not the easiest

book, though, and reading it demands more attention than, say

Sakurai's book. The latter is much more readable but has sloppy

foundations only; see the discussion in


There are also nice online treatises on certain aspects.

For the basics as related to quantum information theory, see, e.g.,

M. Plenio, Quantum Mechanics

M.B. Plenio and V. Vedral

Entanglement in Quantum Information Theory


M.B. Plenio and P.L. Knight

The Quantum Jump Approach to Dissipative Dynamics in Quantum



Modern experiments appear to need, however, a quantum mechanics

of individual systems, and that's where controversy and confusion

prevails. I find none of the existing interpretations convincing,

and wrote up in Int. J. Mod. Phys. B 17 (2003), 2937-2980

= quant-ph/0303047 my own construcve (but incomplete) view

of the matter.

This paper is completely self-contained and works directly

with the statistical mechanics version of QM, with the

benefit that it avoids many of the traditional obscurities.

It discusses complementarity, ensembles, uncertainty relations,

probability, quantum logic, nonlocality, Bell inequalities,

sharpness of measurements, and rudiments of quantum dynamics.

The German ''Theoretische Physik FAQ'' at

contains a German language exposition of my consistent experiment

interpretation of quantum mechanics, which is a much extended


of the above and gives a consistent setting for a quantum universe

which explains the nature of quantum chance. A paper on this

(in English) is in preparation.

For the history of the interpretation of QM, see the excellent book

Max Jammer

The philosophy of quantum mechanics.

The interpretations of quantum mechanics in historical perspective

Wiley, New York 1974

and the collection of original papers,

J.A. Wheeler and W. H. Zurek (eds.),

Quantum theory and measurement.

Princeton Univ. Press, Princeton 1983,


S12c. What is the role of quantum logic?


Quantum logic is a variant of logic often thought to be

appropriate for the foundations of quantum mechanics.

A good exposition is given in

K. Svozil,

Quantum Logic,

Springer, Singapore 1998.

The book is nice and useful for its material on hidden-variable

related arguments.

However, all that is commonly argued in textbooks about QM is argued

in terms of classical logic. An even cursory look at the large

quantum mechanical literature reveals that quantum logic only has

a marginal spectator role in QM, while all proofs of all properties

of quantum systems have always been discussed using the familiar

classical logic. Even in Svozil's book, one can see that quantum

logic is argued in terms of classical logic, and that it has

essentially no role in the analysis of actual physical situations

(apart from those used for testing the foundations).

Beyond a certain point, quantum logic is sterile, which is the reason

it never figures in textbooks (except perhaps in passing).

All one ever needs to know about quantum logic (unless one wants to

specialize in it) is summarized in Secons 6 and 7 of my paper

Int. J. Mod. Phys. B 17 (2003), 2937-2980 = quant-ph/0303047.


S12d. Stochasc quantum mechanics


For certain Hamiltonians, the Schroedinger equation can be


as a classical diffusion process. This leads to the stochastic

quantum mechanics of Nelson. For an overview, see, e.g.,
While it gives an interesting aspect to quantum mechanics and its

classical limit, Nelson's description has a severe deficiency

in that it cannot handle the situation when the wave function vanishes

at some point. At all such points, R has a singularity, and S is

entirely undefined. This happens, e.g., for excited states of hydrogen,

hence is an integral part of standard quantum mechanics.

Even if one argues that such states are idealized and cannot occur,

it seems not be possible to show that a state that is everywhere

nonzero will preserve this property under time evolution.

Thus Nelson's representations may develop spurious singularities

which are not in the observable part of quantum mechanics.

Also, it is awkward to do scattering calculations in Nelson's

framework. Moreover, Nelson, as quoted on p. 16 of the above paper,

says correctly,

''Quantum mechanics can treat much more general Hamiltonians

for which there is no stochastic theory.''

Thus it is unlikely to be useful as a 'fundamental' description

of nature.

Instead, natural stochastic forms of quantum mechanics are those of

quantum diffusion processes and quantum jump processes, in which

wave function itself is regarded as a classical random object.

For their use in an experimental context, see, e.g., quant-ph/9805027.


S12e. Is there a relavisc measurement theory?


Real measurements take time, and are not instantaneous.

To treat the collapse as instantaneous is an idealization,

valid for many applications of quantum mechanics.

If relativistic effects play a role, one needs to use

quantum field theory. However, the measurement process in

quantum field theory is very poorly researched.

Thus statements about the conflict of instantaneous collapse

and relativity theory are based on very shaky grounds.

For measurement in the relativistic case (but without

invoking field theory) see quant-ph/9906034 and other papers

by Peres and/or Terno available in the arxiv.

They indicate the absence of problems, as far as such a simplified

analysis can be trusted.


S12f. Quantum mechanics and dice


It is frequently held that quantum mechanics makes only statements

about probabilities and not about single events.

This is very strange for a theory that claims to be the foundation

for everything scientifically observable.

According to the probabilistic view, quantum mechanics is incapable

of making any statement about dice that have been thrown already.

Although we can observe with perfect accuracy the value of the throw,

all that traditional quantum mechanics can give is the probability

distribution of the possible values of the throw, if this value were

not yet known.

Quantum mechanics has similar difficulties coping with other

actual events, since it never ever predicts what must happen or what

must have happened, but only gives probabilities.

This is of little consequence for quantities like the value of a

throw of three dice, but is a severe defect when discussing the

trajectories of the planets of the Solar System (for which we cannot

make meaningful statistics), of air planes, or of cars.

Clearly there must be something objective about these, although

traditional quantum mechanical interpretations - taken seriously -

are unable to accont for definite individual events.


S13a. Random numbers and other random objects


In probability theory, a random number is just a random variable x,

i.e., a measurable function on the set Omega of possible experiments,

that assigns to each experiment omega in Omega the value x(omega)

of x in this experiment.

In the important, 'noninformative' case where the measure is invariant

under a group transitive on Omega, so that all experiments are

identical copies of one another, physicists refer to this set Omega

as a (classical) 'ensemble',

although they are usually too vague to express this in formal terms.

The terminology easily extends to the inhomogeneous case if one

allows in ensembles each realization with a different frequency.

Mathematicians prefer to leave the set Omega (which they call the

'sample space') unspecified and talk about 'realizations' in place of

'experiments'. Thus, for each experiment omega in Omega, x(omega) is

a realization of x, i.e., what physicists would call the value found

in this particular experiment.

By giving a specific definition of the sigma algebra of interest,

and specific recipes defining x(omega), one has a model world in which

realizations make perfect sense.

A difficulty is, of course, that we do not have such a model for the

real world, and hence must resort to empirical approximations when

treating real-life problems. (This places physicists at a slight

disatvantage; however, there is the compensating advantage that their

results apply to real life instead of only satisfying one's sense of

beauty and precision....)

The only thing not specified in probability theory (unless one specifies

a particular model as indicated above) is the mechanism that draws

the number, and hence there is no way to know which experiment

has been realized. Therefore, probability theory makes only


about _all_ realizations simultaneously.

Example. Given the axioms of probability theory, a random number

uniformly distributed between zero and one is defined as a random

variable x such that

<f(x)> = integral_0^1 f(s) ds

for all Lebesgue-integrable funcons f on [0,1], and any x(omega) is a

realizaon of it, i.e., an actual number in [0,1]. (In parcular,

random numbers are _not_ numbers!)

Mechanisms to draw numbers that may be used as approximations to


sequence of independent realizations x(omega) are called randon


generators. They do not produce random numbers (since random


are not numbers but measurable functions). Instead, they produce

sequences that look like typical

realizations of sequences of independent, uniformly distributed


numbers (in the sense that they usually pass with high confidence level

certain statistical tests valid for such random sequences).

Therefore, the numbers they generate are used in practice as (often

completely adequate) substitutes for random numbers.

(On the other hand, there is no uniformly distributed random natural

number since the uniform measure on natural numbers,

mu(f) = sum_{k>=0} f(k) is not normalizable.)

Random numbers are comparably simple objects. More complicated


objects need more sophisiticated ensembles but otherwise everything

remains analogous.

Let us consider the physically important example of Brownian motion.

Brownian motion (the random walk in space) is modelled by an


whose realizations (members) are the H"older differentiable

funcons on R^3 with exponent 1/2. The probability of any parcular

realization of a random walk is exactly zero, and statements with

positive probability must hold in uncountably many realizations.

Nevertheless, the ensemble is precisely the set Omega composed of all

such realizations. And the appropriate sigma algebra carrying the

Wiener measure needed to describe the random walk is indeed an


of subsets of Omega.
Repeatedly tossing a fair coin is also a (kind of trivial) stochastic

process. A fair coin that can be thrown an unlimited number of times

with independent outcomes (sampling with replacement) cannot be

modelled by the sigma algebra 2^{0,1} over Omega_1 ={0,1}, since

this has not even two independent bits. Its sigma algebra is based

on the infinite ensemble Omega_inf consisting of all possible

sequences of outcomes, and is the tensor product of infinitely many

copies of 2^{0,1}. This seVng is necessary in order to provide

meaning to the concept of 'independent trial' which

underlies most of statisitcal reasoning.

Because of the assumed independence of the trials, one can reduce all

computations to computaons within 2^{0,1}. This is generally done

in elementary probability theory, to simplify the presentation.

But once one looks at binary processes which are even slightly

correlated (history-dependent), one needs the full sigma algebra

over Omega_inf.


S13b. What is the meaning of probabilies?

To say that

"The probability that someone in risk group A will die of cancer is 1/3"

does _not_ mean that

"10 out of 30 people in risk group A will die of cancer".

It only means that,

"on the average, 10 out of 30 randomly chosen people in risk group A

will die of cancer".

This can be checked (in the limit) by many repeated simulations,

or (directly) by a theoretical computation; both require that the

complete ensemble is available. Of course, in using probabilities for

predictive purposes, an insurance company tacitly assumes

(without any guarantee)

that the group of 30 people of interest is actually well approximated

by a random sample, so that one can expect 10 out of the 30 to die of

cancer. But this tacit assumption may well turn out to be wrong.

Statements about ensembles are in principle exactly checkable:

Operationally, to say that "The probability that someone in

risk group A will die of cancer is 1/3" means nothing more or less

than that exactly 1/3 of _all_ people in risk group A will die of


(This assumes that risk group A is finite. For infinite ensembles,

to define the precise meaning of '1/3 of all',one needs to go into

technicalities leading to measure theory. Indeed, measures are the

mathematically rigorous versions of 'classical ensembles' in general.

For quantum ensembles, see quant-ph/0303047.)

Of course, we cannot check this before we have information about

how _all_ people in risk group A died, but once we have this

information, we can check and verify or falsify the statement.

In terms of precise mathematics: A classical ensemble is the set of

elementary events underlying the sigma algebra over which the


is defined. For example, in any finite sigma algebra containing random

variables represenng a fair coin (realizaons 0,1; 1=head)

with probability 50%), one has a finite ensemble of elementary events,

and exactly half of them come out heads. For an infinite sigma algebra,

the ensemble is infinite; but with the natural weighting, again exactly

half of them come out head.

Usually, however, we only have incomplete knowledge about the


For example, 'Tossing 10 fair coins' is just a sloppy way of saying

'Selecng a sample of size 10 from the total ensemble'.

The sigma algebra for modeling this must contain at least 10

random variables representing fair coins. This is the case, e.g., in the

direct product of N>=10 sigma algebras isomorphic to 2^{0,1}. For


it is obvious that here the number of heads is 5 (=50%) only on

average over many random samples; and it is impossible to infer the

exact probability from a single sample.

This is why statisticians say that they _estimate_ probabilities

based on _incomplete_ knowledge, collected from a sample.

The resulting estimated probabilities are known to be inherently

inaccurate; but they can be checked approximately by independent


(cross-validation) providing confidence levels indicating how much

the predictions can be trusted.

On the other hand, they _compute_ probabilities from


knowledge about the ensemble, namely the theoretical probability

distribution. Thus if complete information goes in, exact information

comes out, while computations based on incomplete information

naturally only gives approximate results inheriting some uncertainty

from the input.

Computed probabilities are powerful, but only if the assumed

model is correct. Empirical estimates are usually inaccurate but useful.

The two approaches are not contradictory; indeed, they are combined

practice without difficulties at all.

The only subjective aspect in the whole thing is the choice of a

stochastic model when making theoretical predictions; and even this

is made almost objective by the standard rules of statistical

inference and model building.

Indeed, the choice of ensemble is _always_ a subjective act that

determines what the probabilities mean. It encodes what the user is

prepared to assume about the given situation. Once the ensemble is

chosen - either a theoretical, exactly known ensemble, defined by

specifying a distribution, or as a real life ensemble of which only

a (perhaps growing) sample is available, all probabilities have an

objective meaning.

A chosen ensemble is knowledge precisely if it is close to the correct

ensemble, and we have a good idea of how close it is.

That's why we value highly scientists such as Gibbs who guessed

the right ensembles for statistical mechanics, which turned out to be

a highly accurate description of equilibrium situations.

Only good choices are knowledge.

And what is good is found out only through proper checking,

and not through the principle of insufficient reason.

In case of tossing a coin we know that the fairness assumption is

usually reasonable, being consistent with experience.

In case of taking an exam at a newly appointed professor about whom

no one knows anything, reasoning from the two possible outcomes

(pass or fail) and the principle of insufficient reason to assign

a probability of 50% failure is ridiculous, and dangerous for those

who are not prepared.


S13c. What about the subjecve interpretaon of probabilies?


People with a preference for subjective interpretations would say

''probabilities depend on someone's knowledge''.

instead of

''probabilities are a property of the ensemble under consideration''.

They talk of ''arrival of new information'' or ''learning'' instead of

the objective and unassailable formulation ''restricting the ensemble

a subset defined by the conditions'' when discussing conditional

probabilities (the classical analogue of the statistical collapse of

the wave packet in quantum mechanics).

But knowledge is an even more poorly defined concept than


which at least has an undisputed axiomatic basis. Thus explaining

probability in terms of knowledge only makes the meaning of


more foggy by putting it deep into the psychological realm.

Moreover, the subjective interpretation based on the Bayesian


of conditional probability has no formal way of coping with

misinformation (the ensemble grows if one learns that some of the

information one believed to know turns out to be false!) while,

on the objective level, the latter is just another change of the


Thus the subjective interpretation of probability is an inadequate

foundation for the use of probabilities in physics.


S13d. Are probabilies limits of relave frequencies?


Sometimes, probabilities are regarded as limits of relative

frequencies as the number of trials becomes arbitrarily large.

But the weak law of large numbers only guarantees that most trial

histories will give a sequence of relative frequencies that converge

to the probability. It might just fail for the one actually tried...

Moreover, in practice we only have partial knowledge of such an


sequence of trials (which cannot be performed). This knowledge about

the sample give no knowledge at all about the limiting ensemble.

Just as the knowledge of the first n items of a sequence give, in

theory, no knowledge at all about the limit of the sequence.

That we often estimate the limit using a small part of the sequence

is asnother matter, and is like estimating probabilities from samples.

But the estimate may be completely wrong.

Thus interpreting probability as relative frequency is a philosophically

difficult interpretation step. For a thorough discussion, see the very

informative books by

T.L. Fine,
Theory of probability; an examination of foundations.

Acad. Press, New York 1973.


L. Sklar,

Physics and Chance,

Cambridge Univ. Press, Cambridge 1993.


S13e. How meaningful are probabilies of single events?


(Note: In this FAQ, 'event' is always understood in the ordinary sense

of the word, as 'something specific happening'.

In axiomatic probability theory based on Kolmogorov's axioms,

there is a slightly different, formal meaning of an event as an

element of the underlying sigma algebra.

An axiomatic foundation of probability theory equivalent to that of

Kolmogorov, but not based on sigma algebras, can be found in the


'probability via expectation' by Paul Whittle, and a quantum extension

in quant-ph/0303047.)

Probabilities of single events are not at all meaningful

- at least not in any scientific sense -, although we are

used to scientific-sounding phrases such as

''There is a 60% probability for rain tomorrow''.

Instead, probabilities are properties of ensembles of events.

In the case just cited, the ensemble is the set of all tomorrow's,

(or rather an infinite idealization of it), and the probability is not

an exact probability, but an estimate computed on the basis of a


of former 'tomorrow's, together with statistical weather models.

Probability assignments to single events can be neither verified nor

falsified. Indeed, suppose we intend to throw a coin exactly once.

Person A claims 'the probability of the coin coming out head is 50%'.

Person B claims 'the probability of the coin coming out head is 20%'.

Person C claims 'the probability of the coin coming out head is 80%'.

Now we throw the coin and find 'head'. Who was right? It is

Thus there cannot be objective content in the statement

'the probability of the coin coming out head is p', when applied to

a single case. Subjectively, of course, every person may feel

(and is entitled to feel) right about their probability assignment.

But for use in science, such a subjective view (where everyone is right,

no matter which statement was made) is completely useless.

What is the probability that a particular person, Mrs. X, will die of

cancer? This is a single event that either will happen, or will not

happen. If one considers this single event only, the probability is 1

or 0, depending on what will actually happen. (But this sort of

probability is not what we talk about in physics.)

On the other hand one may assign a probability based on some facts

about Mrs. X (smoker? age? gender? already ill?, etc).

Each collection of such facts determine an ensemble of people,

from which one can form a statistical estimate of the probability.

It clearly depends on which sort of ensemble one regarde Mrs. X

to belong to, what probability one will assign. Mrs. X belongs to many

ensembles, and the answer is different for each of these.

Thus probabilities are meaningful not as a property of the single event

but only as a property of the ensemble under consideration.

This can also be seen from the mathematical foundations. Classical

probabilities are determined by measures over some sigma algebra.

All statements in measure theory are _only_ about expectations and

probabilities of all possible (often infinitely many) realizations

simultaneously, and say nothing at all about any particular


For a random sequence consisng of 9 independent bits, with 0 and 1

equally likely, the sequence 111111111 has exactly the same status

and exactly the same probability as the sequences 110100101 or

000000000, although only the second sequence looks random.

(A random sequence is _not_ a sequence of numbers but a sequence of

random numbers = measurable functions. Only the _realizations_ of a

random sequence are sequences of ordinary numbers. Sequences of

ordinary numbers are _never_ random, but they can 'look random',

in a subjective sense.)


S13f. Objecve probabilies


Consider a physical die (for simplicity assumed perfectly symmetric)

with six elementary events 1,...,6.

If the die is not thrown, all events are equivalent, and the

probabilies are 1/6 for each event. These probabilies are

associated to the die (_not_ to a throw), and can be determined

uniquely from the knowledge of the geometry and composition of

the die. All of probability theory happens at this level,

since the 'happening' of an event is not formally defined.

If the die is thrown, a given event (say 3) either happens or

does not happen. If the event happens (does not happen), the

statement 'This throw is a 3' is true (false), hence has a

probability of 100% (0%), although before the throw, these

probabilities are not yet known. These probabilities are

associated to each particular throw (_not_ to the die).

Thus a die functions as a potential stationary source of throws,

and hence _defines_ an ensemble of (conceivable) throws.

An actual throw, though a realization of this ensemble,

is determined by the outcome, and cannot be assigned a

probability different from 0 or 1.

[See, e.g., the wikipedia entry

''Omega is a non-empty set, sometimes called the "sample space",

each of whose members is thought of as a potential outcome of a

random experiment.''

'is thought of' signifies the interpretational level.

Probabilities are only about 'potential outcomes' (what I call

conceivable), not abut actual ones.]

A stationary source has objective probability distributions

for random vectors computable from observations made on it.

These are given in terms of an objective expectation mapping

and an associated density. In principle, this density can be

measured arbitrarily well, and if the form and composition of

the source is known, can be objectively predicted from

physical theories.

Thus objective probability distributions exist always when the

generating ensemble is completely known, and more generally

whenever it is objectively determined.

Similarly, in quantum theory, a laser is a potential stationary source

of photons, the oven in a Stern-Gerlach experiment is a stationary

source of electrons, etc. The sources are in well-defined,

objective quantum mechanical states, defining ensembles with

objectively predictable properties.


S13g. How probable are realizaons of stochasc processes?


In a stochastic setting, _every_ realization of a stochastic process

typically has probability 0; nevertheless, exactly one of them actually


Taking for simplicity the stochastic process defined by independent

flips of a fair coin, a realization is an infinite binary sequence,

and each of these has probability zero. (Partial realizations of

finite length N each have a probability of 2^-N which is extremely

tiny for large N.)

For discrete stochastic processes having a continuum of allowed values

at each time step, even partial realizations have zero probability,

except in degenerate situations. The same holds for continuous-time

stochastic processes.

The case of measuring electron spin, say, is more difficult to analyze

because as stated, it is not yet a well-defined stochastic process.

If it is taken as a continuous measurement, the flips occur at random

times, and so even a single flip at a definite time has probability


If it is taken as a discrete process, we need to specify a measuring

protocol that applies at definite, equidistant times. Then it is likely

that there are some correlations, and probabilities even of finite

pieces of a particular realization are hard to get by. Nevertheless,

under reasonably random circumstances (for example, when

measuring spins

of independent electrons), the probability of the most likely sequence

of N measurements decreases exponentially with N, and the

probability of

a complete realization (infinite sequence) is again zero.


S13h. How do probabilies apply in pracce?


If one has a sound probabilistic model of a multitude of independent

events e_i with same assigned probability p one would be surprised

if the frequency of events is not close to p within a small multiple of

sqrt(p(1-p)/N). Rather than just accepting a rare occurence

(e.g., a brick going upwards due to fluctuations) as something within

one's probabilistic model, one would probably rather try to explain

it away by assuming a hidden, unobserved cause (someone throwing


The way probabilities are used in practice is always as informative

guides of what to expect, but not as statements with a 100% exact

meaning. I wrote a paper on surprise:

A. Neumaier,

Fuzzy modeling in terms of surprise,

Fuzzy Sets and Systems 135 (2003), 21-38.

that may help understand the fuzziness inherent in our concepts of



S13i. Incomplete knowledge and stascs


It is offen erroneously assumed that incomplete knowledge can

always be described by statistics. But this is by no means the case.

If one knows about a number x only that it is in [0,1], one cannot

apply statistics since one knows nothing at all about the distribution

(except for its support). It is perfectly consistent with

the knowledge that in fact always x=0.75, except that one does not

know it, or that x oscillates regularly, or....

The ignorance is in this case simply deterministic lack of information.

In particular, it would be a mistake to assume that the distribution

is uniform (ignorance interpretation). Using the noninformative prior

of the Bayesian school, which makes this assumption, may be seriously


More realistically, in engineering, an uncertainty in the elasticity

module of 5% in steel bars may be the only information available

to an architect; but 3/4 of the bars used later in the building

may have a deviaon of 0.1% and the remaining quarter one of 3.7%.

In general, all one can deduce from information that takes the form of

deterministic bounds on a vector x of variables and/or on expressions

in x are bounds on derived quantities y=f(x) one would like to compute

from it. This leads to global optimization problems, where f(x) is

minimized or maximized subject to the known constraints. See

The lack of knowledge that statistics can model is of a different kind.

It assumes that the _maximal_attainable_ knowledge about the

- at the given level of description - is a probability distribution,

and that this probability distribution is indeed known.

The knowledge of the probability distribution can be replaced by a

qualitative knowledge of it (e.g. 'some Gaussian distribution'),

together with the knowledge of an incomplete sample from the


of interest; in this case, however, the best statistics can offer are

parameter estimation techniques that give credible probability

distributions compatible at some confidence level with the sample


There are also combinations of both kinds of incomplete information,

where one knows the maximal knowledge about a system should be

stochastic, but one lacks complete information on the distribution.

This is handled by the field of 'imprecise probability', although

there is not yet a generally accepted way for analyzing such

situations, and different schools with quite different basic

approaches compete. See, e.g, the links in
Theoretical physics is always concerned about describing the maximal

attainable knowledge about a system (at a given level of description),

irrespective of what anyone actually knows about it. In this way,

and only in this way, it is possible to get close to the objectivity

that science always is striving for.


S13j. Priors and entropy in probability theory


For a probability distribution on a finite set of alternatives,

given by probabilies p_n summing to 1, the Shannon entropy is

defined by

S = - sum p_n log_2 p_n.

The main use of the entropy concept is the maximum entropy


used to define various interesting ensembles by maximizing the


subject to constraints defined by known expectation values

<f> = sum P_n f(n)

for certain key observables f.

If the number of alternatives is infinite, this formula must be

appropriately generalized. In the literature, one finds various

possibilities, the most common being, for random vectors with

probability density p(x), the absolute entropy

S = - k_B integral dx p(x) log p(x)

with the Boltzmann constant k_B and Lebesgue measure dx.

The value of the Boltzmann constant k_B is conventional and has no

effect on the use of entropy in applications.

There is also the relative entropy

S = - k_B integral dx p(x) log (p(x)/p_0(x)),

which involves an arbitrary posive funcon p_0(x). If p_0(x)

is a probability density then the relative entropy is nonnegative.

For a probability distribution over an _arbitrary_ sigma algebra

of events, the absolute entropy makes no sense since there is no

distinguished measure and hence no meaningful absolute probability

density. One needs to assume a measure to be able to define a

probability density (namely as the Radon-Nikodym derivative,

assuming it exists). This measure is called the prior (it is often

improper = not normalizable to a probability density).

Once one has specified a prior dmu,

<f(x)> = integral dmu(x) rho(x) f(x)

defines the density rho(x), and then

S(rho)= <-k_B log(rho(x))>

defines the entropy with respect to this prior. Note that the

condition for rho to define a probability density is

integral dmu(x) rho(x) = <1> = 1.

In many cases, symmetry considerations suggest a unique natural


For random variables on a locally compact homogeneous space (such


the real line, the circle, n-dimensional space or the n-dimensional

sphere), the conventional measure is the invariant Haar measure.

In particular, for probability theory of finitely many alternatives,

it is conventional to consider the symmetric group on the set of

alternatives and take as the (proper) prior the uniform measure, giving

<f(x)> = sum_x rho(x) f(x).

The density rho(x) agrees with the probability p_x, and the

corresponding entropy is the Shannon entropy is one takes


For random variables whose support is R or R^n, the conventional

symmetry group is the translation group, and the corresponding

(improper) prior is the Lebesgue measure. In this case one obtains

the absolute entropy given above. But one could also take as prior

a noninvariant measure

dmu(x) = dx p_0(x);
then the density becomes rho(x)=p(x)/p_0(x), and one arrives at the

relative entropy.

If there is no natural transitive symmetry group, there is no natural

prior, and one has to make other useful choices. In particular, this

is the case for random natural numbers.

Choice A. Treating the natural numbers as a limiting situation of

finite interval [0:n] suggests to use the measure with

integral dmu(x) phi(x) = sum_n phi(n)

as (improper) prior, making

<f(x)> = sum_n rho(n) f(n)

the definition of the density; in this case, p_n=rho(n) is the

probability of getting n.

Choice B. Statistical mechanics suggests to use as (proper) prior

instead a measure with

integral dmu(x) phi(x) = sum_n h^n phi(n)/n!,

where h is Planck's constant, making

<f(x)> = sum_n rho(n) h^n f(n)/n!

the definition of the density; in this case, p_n=h^n rho(n)/n! is the

probability of getting n.
The maximum entropy ensemble defined by given expectations
depends on

the prior chosen. In particular, if the mean of a random natural number

is given, choice A leads to a geometric distribution, while

choice B leads to a Poisson distribution. The latter is the one

relevant for statistical mechanics. Indeed, choice B is the prior

needed in statistical mechanics of systems with an indefinite

number n of particles to get the 'correct Boltzmann counting' in the

grand canonical ensemble. With choice A, the maximum entropy

solution is unrelated to the distributions arising in statistical


Thus while the geometric distribution has greater Shannon entropy

than the Poisson distribution, this is irrelevant for classical physics.

In statistical physics with an indeterminate number of particles,

only the relative entropy corresponding to choice B is meaningful.

(In the quantum physics of systems with discrete spectrum, however,

the microcanonical ensemble is the right prior, and then Shannon's

entropy is the correct one.)

The identification of 'information' and 'Shannon entropy'

is dubious for situations with infinitely many alternatives.

Shannon assumes in his analysis that without knowledge, all

alternatives are equally likely, which makes no sense in the infinite

case, and may even be debated in the finite case.

(One of the problems of a subjective, Bayesian approach to

probability is that one always needs a prior before information

theoretic arguments make sense. If there is doubt about the former

the results become doubtful, too. Since information theory in

statistical mechanics works out correctly _only_ if one used the

right prior (choice B) and the right knowledge (expectations of

the additive conserved quantities in the equilibrium case),

both the prior and the knowledge are objectively determined.

But this is strange for a subjective approach as the information

theoretic one, and casts doubt on the relevance of information

theory in the foundations.)


S14a. Theorecal challenges close to experimental data

Many theoretical physicists seem to think that the only worthwhile

challenges in theoretical physics can be found at >TeV energies.

But, (un?)fortunately, there are challenges, as difficult and

as exciting, in the realm of normal energies, deep in the limits

of the unknown (as regards understanding), and far more relevant

in my opinion.

The manpower and money invested in the exotic realms of nature

at very large energies would be much better spent on these challenges

closer to experimental data..

For example, finding a consistent nonperturbative setting for

QFT, or giving a meaning to the concept of the ground state of

a Helium atom in quantum electrodynamics (extended by a field

describing the nuclei).

I have not seen a single field-theoretic treatment of Helium,

surely a simple system.

Helium is a bound state with well-defined asymptotic behavior,

as well-defined as a dressed electron or photon, but there is no

clear conceptual basis for this in QFT although there should be

such a concept. That's why I think it is a very important

unsolved problem.

There are papers making heuristic approximations

(see hep-ph/9612330) which give accurate predicons - cf.

Phys Rev. A 65 (2002), 032516 and Phys. Rev Le. 84 (2000), 3274 -,

but they don't give a clue what a helium atom 'is' in QFT.

Moreover, they treat two electrons in a classical external

Coulomb field instead of a system of two electrons and a nucleus.

The current treatment of bound states in QFT (see elsewhere in this


is a very loose patchwork of techniques borrowed from perturbative

field theory and nonrelativistic quantum mechanics that should make

every theoretician shudder. There are some beginnings in algebraic

QFT of what bound states should be, but nothing convincing on the

quantitative level.

A theory of everything should also be able to answer questions

that are well established experimentally but not understood

from the foundations.

For example, deriving the Navier-Stokes equations

for water from quantum theory is another challenge

that so far remained unmet; it has been done long ago for

dilute gases, but no one extended it to dense fluids.

There are severe difficulties to overcome, but we know both

the final result (to much better accuracy than the parameters

of the standard model) and the supposed underlying microscopic

model (unlike in quantum gravity); and the availablility of

a derivation might even have long-term engineering consequences

for predicting properties of fluids under thermodynamic conditions

where experiments are difficult or impossible.

I am not an expert in this topic, but here are some pointers to

what I have seen about the problem.

I have never seen any microscopic derivation of Navier-Stokes

for water, although this is by far the most important application.

The statistical mechanics text of Reichl derives the equations

in Chapter 14F from thermodynamics, and in Chapter 16C-F

(for dilute monatomic gases) from classical statistical physics.

Fujita, Nonequilibrium statistical mechanics, derives Boltzmann

from QM in Chapter 4.2 and Chapter 6; Navier-Stokes would

be roughly analogous (for dilute gases). Similarly for many

other books on nonequilibrium stat. mech..

Mueller/Ruggeri, Extended Thermodynamics, treat relativistic versions,

deriving them from the Boltzmann equation and from


Volume 9 of Landau/Lifschitz discusses techniques for the condensed

state in general, but no derivation of Navier-Stokes.

J. Math. Phys 11 (1970), 2481 is a paper summarizing in the

introducon what was known by 1970.

Phys. Rev. D 53, 5799-5809 (1996) derives hydrodynamic equaons

from quantum fields but only in a scalar phi^4 theory.

More recent related work includes

Phys. Rev. D 68, 085009 (2003)

Phys. Rev. D 64, 025001 (2001)

Phys. Rev. D 61, 125013 (2000)

Thus there is a well-trodden pathway for the dilute gas case,

and a set of tools for the condensed phase, but no synthesis

of the two.
If you find better references, please let me know.


S14b. Does the standard model predict chemistry?


The standard model is widely believed to be in agreement with

all we know about matter and radiation on earth, within the range of

accessible energies, as long as gravitational effects can be neglected.

But this does not mean that it has a high predictivity, except

on the level of high energy elementary particle scattering.

The reason is that we can compute from it almost nothing at the scales

of interest in nuclear, atomic, or molecular physics.

Lattice gauge calculations show that the standard model implies the

existence of baryons such as proton and neutron with masses that

match the experimental masses with an accuracy of about 5%.

This is far too low to be of use in chemistry or even in nuclear

physics. The accuracy of the effective forces between them is even

We have very little control over confinement, which is essential to

get useful forces at the energies relevant for nuclear physics.

Thus predictivity of the standard model for nuclear information

is almost nil.

And indeed, nuclear physicists do not use the standard model

(except for paying religious lip service to it), but work with

their own phenomenological models. They just borrow some of the

symmetries. These were of course known long before the standard

model was born, and built into the latter to match reality; so they

cannot count as predictions from the standard model.

If we had only the standard model and the numerical estimates

for the constants of effective actions computed from it,

this would give _very_ poor predictions of properties of protons,

neutrons, and their bound states.

One can show that the effective dynamics of protons and neutrons is

governed by effective field theories whose form can be derived

from the standard model (but also follows from assumed symmetry

principles built into the standard model) but whose coefficients

are derived by fitting calculations to _measured_ data about form

factors of proton and neutron, which have _not_ been calculated

from the standard model but must be put in by hand as additional


From this, one can calculate the energy of the nuclei, using a combined

droplet/shell model. We understand the structure of nuclei, in


with the standard model, but _not_ derived from it.

If we had only the standard model and the numerical estimates


from it, this would give _very_ poor predictions of nuclear properties.

There would be neither nuclear energy nor nuclear weapons based on

knowledge derived form the standard model only.

Even knowing the properties of proton and neutron from


and the effective equations (but nothing else) does not allow to get

highly accurate predictions for the properties of larger nuclei.

At atomic distances from the nucleus (for QED-dominated


one can further approximate the theory by Dirac-Fock equations,

or, for light nuclei, by Schroedinger's equation

for electrons and nuclei together with relativistic corrections.

The details of the nuclei become irrelevant for atomic physics and

chemistry, except for their atomic weights. These cannot be derived

accurately enough from lower levels, and must again be supplemented

additional experimental information.

If we had only the standard model and the numerical estimates


from it, this would give _very_ poor predictions of most chemical

properties of everything including the hydrogen spectrum.

Only starting on this level, _assuming_ the properties of the nuclei

and the electron, we are able to predict much of macroscopic physics:

We can solve the Dirac equation exactly for hydrogen, and

compute the radiation corrections from QED and other corrections


the Standard Model. It agrees with the experimental measurement of

hydrogen spectra to extraordinary accuracy. We can understand why


periodic table works, and predict the properties of even large

atoms (such as the color of gold) reasonably well using the Dirac-Fock


From this level on upwards, one has enough experimental data to

calculate chemical information for small molecules that is predictive

in the sense that it may give quantitative information that is

reasonably accurate and not put in by hand.

But already for proteins, one again needs to complement the


input by measurements to get predictions of reasonable accuracy.

Thus the standard model is a very inaccurate tool for chemistry.

It is useful only for elementary particle scattering experiments.

At each higher level, one needs additional information from

experiment to complement the predictions of the lower levels.


S14c. Is the result of a measurement a real number?


A single measurement (reading from a scale) always gives a rational

number, at least if the scale is in terms of rastional units.

(If the scale gives an angle in degrees which is then converted into

arc length, the measuremnt gives rational multiples of pi instead).

However, this is by convention only, since a pointer position is

just a position in 3-space which must be translated into a number

by a subjective reading or by a digital reading device of limited

resolution. Thus the true position is not determined accurately

enough to associate it with a single number.

Infinitely many rationals (and uncountably many reals) are

compatible with any observable state of the voltmeter.

That's why the error bars are intrinsic to measurement results, even to

single readings. Deleting them and claiming exact measurement results


just laziness, acceptable when the resolution of an instrument is


Therefore, according to the standards of NIST (National Institute

of Standards and Technology), a measurement gives an interval

consisting of a rational number together with an error bar; see

Of course, the error bar is also somewhat uncertain, but one generally

accounts for this uncertainty by rounding it upwards, to make the

whole estimate conservative.

The NIST definition has the advantage that it also applies to indirect

measurements obtained from raw measurements by some


Indeed, most high quality measurements are of this kind.

Nevertheless there is no contradiction if one assumes that reality is

governed by equations in terms of exact real (or complex) numbers,

and only the measurement abilities are limited.


S14d. Why use complex numbers in physics?


Complex numbers are _the_ natural number system for all but

elementary physics; one needs them to make sense of many advanced

concepts. Avoiding complex numbers would make much

of what is done incomprehensible.

Already Fourier analysis is most natural with complex numbers,

though here it could be avoided by using trigonometric series


The time-independent Schroedinger equation defines the

Fourier components of real, measurable expectations. So it is

very natural that quantum mechanics is based on complex entities,

Dispersion relations in optics are natural only in a complex setting.

Spectra of nonhermitian operators, essential for dissipative systems

even in the classical case, are always complex.

Analytic continuation plays a significant role in some physical

theories. For example, lattice gauge theory works in a continuation

of quantum field theory to Euclidean space, and the results must be

continued back to Minkowski space to get physical meaning.

On the other hand, at first sight it seems that only real quantities

are measurable. However this only holds for the most direct

where you read a number from a meter. Most measurements are of a

more indirect kind, and then this restriction no longer applies.

To measure a family of physical quanes x_l (l=1,...,n),

one measures some related real quanes r_1,...,r_m connected to

the x_l by a system of equaons F(x,r)=0 (in the absence of

measurement errors). In fact, there will always be measurement


hence one generally uses more equations than unknowns and solves

the least squares problem ||F(x,r)||^2=min (or a more complicated

related problem if a model of measurement errors is avaialble)

to get an estimate of x.

This recipe is universally used for all sorts of measurements and

works whether the x_l are real or complex.


S15a. How precise can physical language be?


The relation between theory and reality necessarily uses ordinary

language and is therefore somewhat fuzzy. If one insists on 100%

unambiguous statements, one is on the level of pure mathematics or

mathematical physics (platonic reality), and cannot have any contact

with (physical) reality.

The best one can do is to have completely precise concepts on the

theoretical level and a description in ordinary, informal language

that relates theory to reality. In the formal theory, all concepts

can be precisely defined, and get names corresponding to their

use in reality. This ensures that one knows precisely what one talks

about - on the conceptual level.

In this informal language there must be room for linguistic

approximations without specifying their quality more than by

fuzzy words interpreted by the circumstances, since this is the way

we necessarily perceive reality.

When formulating the interface between theory and reality,

one must use the formulations people use who are using this interface,

They know how 'large' something must be to be taken as 'infinite'.

They estimate limits from finite sequences (most of numerical

analysis would be void if we couldn't...), usually quite successfully

- although this is meaningless mathematically.

A mathematical limit in theory does _not_ translate into a


limit in reality.

This is necessary since all our observations are finite, and most of

them are noisy. As there are approximate ways of determining the


of the Moon, but no exact methods, so there are approximate methods

for determining probabilities, but no exact ones. Exact real numbers

belong to theory, not to reality. (Even counting is not sure to result

in an integer. What about the number of people in a room when just

someone enters?)

Careful protocols for experimentation and measurement are useful to

achieve a certain amount of objectivity and repeatability, but even the

best protocols cannot reduce the level of fuzziness in the interface

between theory and reality to zero. I recommend

Experimentation and Measurement, by W.J. Youden,

reprinted 1997 by the Naonal Instute of Standards and Technology


Although a very old paper (from 1961), it is sll considered by NIST

to be up to date and exemplary in its lessons about measurements.

Among other things, it discusses on pp. 26ff in greatest detail

how to measure the thickness of a sheet of paper in an ensemble of

sheets typically called a thick book.

If one follows his argument closely, one finds that even classically,

observables such as the 'thickness of a sheet of paper' are

probabilistic only, notwithstanding that probably everything relevant

about paper can be understood by classical mechanics and

Thus there are no exact concepts in observed Nature.

But in a good theory of Nature, all concepts should be exact.


S15b. Why bother about rigor in physics?


Approximate methods are almost always more efficient than rigorous


You can see this, for example, from the way integrals are calculated in

numerical analysis. No one uses the 'constructive proof' by

Riemann sums or, harder, by measure theory.

But for the logical coherence of a theory, the rigorous approach

is important.

To prove that a long, complicated expression in a single variable is

monotone may be quite hard and exceed the capacity of a typical

mathematician or phycisist, but to evaluate it at a few hundred points

and look at the plot generated is easy.

If you (the reader) are satisfied with the latter, never try to

understand mathematical physics - it will be a waste of your time.

But if you want to have physics in general look like classical

Hamiltonian mechanics - a beautiful piece of mathematically rich

and powerful theory, then you should not be satisfied with the way

current quantum field theory (say) is done, and keep looking for

a better, more solid, foundation.

About the pitfalls of using mathematics ''formally'' (i.e., without

bothering about convergence of the expressions, existence or

interchangability of limits, etc.), I recommend reading

F. Gieres,

Mathematical surprises and Dirac's formalism in quantum


Rep. Prog. Phys. 63 (2000) 1893-1931.



G. Bonneau, J. Faraut, G. Valent,

Self-adjoint extensions of operators and the teaching of quantum


Amer. J. Phys. 69 (2001) 322-331.


See also:

K Davey,
Is Mathematical Rigor Necessary in Physics?

Brish J. Phil. Science 54 (2003), 39-463.

On the other hand, on the way towards finding out what is true,

nonrigorous first steps are the rule, even for hard die

mathematicians. The role of intuition and nonrigorous thinking in

mathematics is well depicted in the classics

J. Hadamard,

An essay on the psychology of invention in the mathematical field,

Princeton 1945.


G. Polya,

Mathematics and plausible reasoning,

2 Vols., 1954.


G. Polya,

Mathematical discovery,

John Wiley and Sons, New York, 1962.

More recently, the article

A. Jaffe and F. Quinn,

"Theoretical mathematics": Toward a cultural synthesis of

mathematics and theoretical physics,

Bull. Amer. Math. Soc. (N.S.) 29 (1993) 1-13.


reports on the potential and dangers of nonrigorous approaches

to scientific truth. This paper was commented in contributions

by a number of influential mathematicians and mathematical physcists


M. Atiyah et al.,

Responses to ``Theoretical Mathematics: Toward a cultural

synthesis of mathematics and theoretical physics'',

by A. Jaffe and F. Quinn,

Bull. Amer. Math. Soc. 30 (1994) 178-207.


and the response of Jaffe and Quinn is given in

A. Jaffe and F. Quinn,

Response to comments on ``Theoretical mathematics'',

Bull. Amer. Math. Soc. 30 (1994) 208-211.


See also

D. Zeilberger,

Theorems for a Price: Tomorrow's Semi-Rigorous Mathematical


J. Borwein, P. Borwein, R. Girgensohn and S. Parnes

Experimental Mathematics: A Discussion



S15c. Jusfying the foundaons of a theory


Quantum mechanics is a somewhat unintuitive theory, and generated

a lot of foundational literature aimed at justification and

explanation of the conceptual basis.

Justification of the basic postulates of any theory is necessarily

circular. If it were not, the postulates were not basic but derivable.

One must take all the basic postulates as a single foundation

on which everything else rests without circularity.

But the basic postulates themselves can only be motivated, but not


Most people simply trust that tradition selected good foundations.

If you want to probe that trust you can go into studying the sea of

publications on the foundations of quantum mechanics. But unless

you are very dedicated and spend a lot of effort on it,

it is likely that you'll drown there before having found satisfaction...


S15d. Foundaons, theory and experiment


Foundations of physics is the quest for getting the mathematical

concepts right to be able to do correct physics and think correctly

about it. Without correct concepts operational statements have no

meaning. The theory defines what a measurement is. Outside the

immediate realm of everyday experience, one needs already the

conceptual basis to even discuss what has operational meaning.

These statements apply both to good and bad theories. Even a bad

defines what a measurement is; it just defines is more poorly.

There is in fact a crossfertilization between measurement and

foundations. If one gets better the other profits from it.

On the other hand, fuzzy foundations lead to poor judgment and

ambiguity in measurements, and poor measurements lead to low

discrimination among theoretical alternatives.

One can observe from history that progress in concepts lead to better

inverstigations of nature, and better experiments lead to higher

demands on the theory, forcing people to look for more stringent

concepts and simpler or more encompassing frameworks.


S15e. Theorecal physics as a formal model of reality


Can the meaning of all terms in a physical model be determined

precisely without an infinite regress? I want to show that the

answer is a clear `yes'.

Look at the question `What is a force?' To answer this, one needs

to consider the concepts of force, mass, acceleration, pressure,

stress, recoil, perhaps the gravitational field, etc., in total

a small number of physical items. If we want to define them in reality,

we don't get an infinite chain but a circular definition -- we can only

define one in terms of another, illustrating the concepts by pointing

to situations where we hope everything is obvious.

In practice (i.e., in teaching physics), this works alright since

each of us knows reality already

and only needs enough context to identify the usage of the concepts --

there is essentially only one fit that works, and once the

light goes on, we understand -- or at least the level of

understanding deepens. (Later, when doing high precision


we may notice that our understanding is not adequate,

and become more careful and sophisticated, and at some advanced s

tage one can probably write a whole book to get definitions

that are really precise...)

But there is another way that is fruitful and neither circular nor

infinite. It is obtained by mimicking how modern logic investigates

its foundations. It assumes that we know at the 'external reality'

level what logic is; then it builds a formal model, a 'formal reality',

in which one can talk about everything one talks in 'real' logic,

but in completely formal terms.

You don't need to know what truth, propositions, etc. are in reality,

but you declare the rules for manipulating

with them -- since this is the heart of the matter.

This is done in exactly the same way as the Greeks declared rules for

manipulating geometric terms. In addition, they had definitions like

'a point is what has no parts'; but in modern geometry, this is

considered to be not a well-defined formal statement

(instead it has the circular character of relating the concept

to reality), and hence is simply dropped from the list of axioms.

So modern geometers define a projective plane by a few simple


''There are points, there are lines, there is a relation which

tells which points are on which lines, through any two distinct

points there is exactly one line, and any two distinct lines

have exactly one common point.''

That's all, and it is enough to do planar projective geometry with

full clarity and completeness. We do not need to know anything about

the objects to analyze a situation

(unless we want to check it's impact on external reality).

Of course, it is good to have a few more restrictions and concepts to

go really deep, but this is supposed to be just an example.

In the same way, one can discuss _everything_ about

the real logic in the formal model of logic, and reach clarity.

It is my proposal to do this for physics as well.

Actually it has been nearly achieved in classical physics,

and fully achieved in Hamiltonian mechanics.

You start with a phase space and a Hamiltonian which fall from

heaven. (They are motivated by circular arguments, but these


are not part of the theory in the formal sense.)

Having this, you can build a whole world, with atoms,

dynamics, paths, forces, accelerations, stress, etc.

In fact, you can discuss any question about the classical world

in this mathematical frame, without ever needing any undefined term.

Formal reality is define by what is expressible in terms of the

concepts already available, and 'true' reality with its circularity

never enters except as a guide to formulating new concepts and to

discuss their consequences.

This is what I think theoretical physics is about.

It builds a formal model of the world, with a 'formal reality',

in which every important concept from experimental physics has

a well-defined formal meaning, and in which every reasonable

question about the physical world can be posed and investigated.

What can be posed and analyzed in such a framework counts

as understood, and understanding of nature increases by bringing

more and more into such a formal model, until everything about

physical nature is representable.

My vision is that the same is possible and desirable for quantum

physics. For me, realizing this vision is

equivalent to having understood quantum physics.

So I want to have a mathematical quantum model of nature,

in which one can talk about all the things physicists talk about

when they talk about nature in the physical sense. In particular,

there will be concepts like particles, fields, detectors, measurement,

probability, memory, etc. but -- unlike in real nature --

they will have a precise and unambiguous formal definition,

of the same formal quality as force, acceleration, etc. are defined

in Hamiltonian mechanics.

Then we can ask about the "meaning" of each term,

and get a well-defined answer within the formalism,

without infinite regress.


S16a. On progress in science


The frontier in science is the frontier because there is no clear

understanding of what is beyond. All that is there is a set of

questions bothering those close to the frontier, and a set of

experiences of more or less failed attempts to push the frontier


Real improvements in difficult matters never come by starting from

scratch - they come from patiently building upon the best of

what already exists, being open-minded but critical about new

possibilities, and trying to integrate what looks most promising.

Those who had the questions and found real answers published it and

andvanced the state of the art. The others can only share their

experience and their chart of the uncharted territory. As one can see

from the conflicting opinions, these charts are not reliable.

If you (the reader) want to proceed further, you need to learn to

see with your own eyes, take your own risks, and find out for yourself

what can be trusted. There are no guides beyond a certain point.

And don't count on recognition before you actually succeed!

As long as ideas are tentative and not validated by experiment,

they are always hard to defend. Success comes late -

either with a triumphal experimental verification, or if people

realize that a new way is significantly simpler than the tradition.

If neither happens, people will stick to the tradition,

except for a minority who lives from exploring the consequences of

the idea.

Innovative research is always a risky business - one must be prepared

to continue one's work no matter how much it is criticied, but one


also learn as much as possible from one's critics. Then - if it is

indeed the right track - success will come sooner or later.

But who knows beforehand what will turn out to be the right track?

So people have a right to be critical...

Critics usually just present a statement, or point to an incoherence in

an opponent's statement. To learn from it is a nontrivial task,

since it means that one has to find out

a) how to make the criticism strongest, in a constructive sense, and

b) how best to defend the original statement.

Finding this out is learning from it.

Everyone starts their journey from where they are, in the direction

find most promising. The others observe what they do and have to

up their own mind. If people knew what is the right start and the right

direction, all important unsolved problems were solved by now.

The journey is a journey to collect understanding of the ill-understood.

To find bugs in a computer code one doesn't go around speculating,

but one carefully compares evidence available and stays as close as

possible to the code. Physics needs to find the bug in its

foundations, and as with computer programs, it will be very subtle

and will be found only by a careful investigator, not by a dreamer.

Of course, a certain amount of creativity is needed. But it must be

guided closely by general knowledge of similar problems already


and on the structure of the system, together with the information

turned up by a detailed analysis of the code.

Thus imaginative speculation works only if checked and confirmed by

detailed code analysis. And most of the wild ideas are useless.

Not a procedure I'd recommend for research, though, unfortunately,

it has become fashionable in some quarters of theoretical physics...

Rather, learn as much as you can about how and why the good

work, and if you have the calibre to be an innovator, you might be

able to spot what went wrong. But not by searching in the mist; your

search should always be well-directed, or you'll go in circles...

Judging from my own experience, understanding is not something that

springs into one's head without preparation, but is the result of

walking attentively and openminded along many blind alleys, until one

sees one which smells like being the real thing. Then one starts

grinding away in this direction, and in this process discovers what

should have been the guiding principle that would have avoided all

the dead ends, bringing one directly to the goal. Then, and only then,

the right understanding governs the remainder of the search.

This is not only my personal experience but seems to be the general

pattern: See

G. Polya,

Mathematical Discovery,

John Wiley and Sons, New York, 1962.


S16b. How different are physical sciences and social sciences


From the subject matter treated, a lot. From the modeling side far less.

There is no difference in principle. All science is based on observation

and experiment. All experimental data must be observed according to

well-defined protocols, to be objective (and hence science).

The main difference between physical sciences and social sciences is

that in the former one generally studies systems which are strongly

constrained by the experimental setting, so that they give much more

predictable results.

In both cases, however, the correct mathematical model is that of a

stochastic process, and physiccal sciences and social sciences only

differ in the size of the noise relative to the signal.

Sometimes to the extent that one can ignore the noise and treat a

physical system as deterministic, while a social system can never

be controlled well enough to make the remaining fluctuations


S16c. Can good theories be falsified?


The philosopher Karl Popper claimed that falsifiability is the

hallmark of scientific theories. But scientific practice speaks

against him.

A correct theory cannot be falsified, and in this sense is not

falsifiable, in spite of Popper. (Falsifiability can be asserted

only in a contrafactual sense, that there are _conceivable_ situations

that, according to the theory, are excluded. But for a correct theory,

these situation will never happen, hence are completely ficticious.)

What happens with good theories is, at worst, that their region of

validity or accuracy gets restricted as new data about more remote

instances come in.

In today's understanding, people are careful to indicate the

limits where a theory is claimed to be valid, and the accuracy

to which its answers are to be trusted.

For example, the Standard Model is claimed to be valid whenever

gravitation is negligible, accuracies conform to present possibilities,

and energies are well below a putative unification scale.

Failures outside this domain are not counted as falsifications.

While limits and accuracy claims are not necessarily part

of the theory proper, they are part of the theory as actually taught

and applied. Indeed, although people try to extrapolate, one can

never be sure whether a theory is correct outside the domain where

the data were collected.

But one can be reasonably sure within the domain where enough data

are available. Good scientific practice requires that a good theory

agrees with the data within the tolerances claimed.

Once this is the case, these theories can never be falsified.

Rather, if people find disagreement in experiments, the

theory falsifies the experimental arrangement or analysis.

All science students who ever did experiments in the lab know

very well that this is common practice.

The degree of caution and care at the highest

level of quality has been increasing through the centuries.

It is now too late to ask Newton whether he believed his theory was
valid without restrictions. (Or are there any hints in the Principia

Mathematica?) Certainly Newton's theory as taught today is taught

(i.e., with the restriction that it is valid at speeds small compared

to c and at distances large compared to the radius of the largest atom).

But we nevertheless believe that it is the 'same' theory, and if

Newton would live today, I think he would agree with that.

And Newton's theory will never be falsified, unless God suddenly

decides to change the physics of the Universe.

(That the observed advance of Mercury's perihelion did not match

Newton's theory was known as a limitating condition already before

relativity was born.)


S16d. What, then, disnguishes a good theory?


We can _know_ whether a theory has been correct in the past,

and we can _trust_ that it will remain so in the future.

There is no other kind of knowledge than that of the past.

Relying on that ''anything in the future is like in the past'' is an

act of faith. The question is not about faith or not, but about

faith in what is best supported by past experience.

Theories that conform with the past are easy to trust.

But they come in different degrees of stringency.

Theories which are not restrictive at all but accommodates everything

(such as astrology or psychoanalysis) are in vogue (as society shows)

but useless (and probably harmful). These are the ones that Popper


Highly restrictive theories (what Popper calls scientific) are preferred

by those who want to control their destiny as far as possible.

Theories like Newton's, general relativity, or QED are extremely

restrictive and in agreement with past experience, hence both

trustworthy and very useful.

What makes a theory good is not its potential falsifiability, but that

it drastically reduces the number of possibilities which are present

without the theory, without eliminating something that can actually


If you have no theory and put two marbles into your empty pocket,

and then another two, you don't know how many marbles you can
take out.

If you know arithmetic and the law of conservation of marbles you can

predict that exactly four can be taken out. This is testable, and will

always come out correct. So you have a correct theory. Of course, its

validity is not unlimited, since it assumes that your pocket does not

have a hole; so if some experiment does not conform to your theory

since you can only take out three, you suspect that the domain of

validity was violated; you check for the hole - and surely you'll

find it.

This is exactly analogous to the way Newton's theory works, within

its domain of validity. If it fails, we suspect speed close to c,

or highly accurate measurements, or tiny distances. And surely

we'll find it so.


S16e. When is a theory preferred to another one?


Frequently, Ockham's razor

''frustra fit per plura quod potest fieri per pauciora'',

that we should not use more degrees of freedom than are

necessary to model a phenomenon, is invokes to argue that the theory

with the fewest parameters is the best. But this is true only

when taken with many grains of salt.

Chemists prefer as a starting point of their deepest investigations

the theory based on Dirac-Fock theory or even cruder approximations,

treating the nuclei (for large problems even atoms) as elementary.

This gives them all the information they need, while they can deduce

nothing at all from the standard model which is supposed to be a much

more exact and general theory.

Thus what is preferred depends a lot on which use can be made of it

Ockham's razor is appropriate only if two theories allow the same

deductions with a similar amount of work, or if the more parsimonious

theory is even superior in allowing one to derive the desired


Nothing in science is against a complicated model if it gives more ready

access to the quantities of interest than a formally simpler but

computationally more difficult or even untractable formulation.

Given only the standard model +classical relativity

(allegedly correctly describing all phenomena of the world at

accessible energies, distances, and accuracy), we'd know very little

about our world, and only very inaccurately. Not even the masses of

the nuclei can be predicted at present with any confidence, let alone

the properties of water or gold.

And given only string theory (a theory without any free parameter),

we'd know essentially nothing about our world.

(See hp://

for further discussion of Ockham's razor.)


16f. What is a fact?


In discussion on sci.physics.research, one often finds very good

information, but also often poor and misleading information.

How to distinguish the good from the poor?

Everything called knowledge is in fact a set of beliefs of the

person claiming it. And this set of beliefs is more or less close

to the objective truth, depending on the standards of that persons.

Calling so-called knowledge a set of beliefs does not contradict the

objectivity of mathematical definitions. When I say that a Banach

space is a normed, complete vector space, I both state my belief

and happen to coincide with the social consensus of the guild of

mathematicians. And when I say that state reduction is a

physical process, I both state my belief and happen to coincide with

famous physicists like von Neumann and many others, and this is good

enough to make this statement honestly, since the community has not

reached an agreement on the matter.

Telling others what one thinks is true in no way manipulates

others any more than feeding others what one thinks is nourishing.

But as we shouldn't accept being fed by those with poor judgment

about food, we shouldn't accept an opinion for the truth if offered

by someone with poor judgment about the relevant areas.

It is obvious that whatever a person claims is first and foremost

his or her personal opinion, and not a fact. Who takes it for a
fact is simply misleading himself or herself. Thus there is no

need to qualify each of one's statements by clumsy phrases like

'in my opinion', or 'according to what I have read/understood', or

'as far as I am informed' or 'since this makes most sense to me'.

These phrases accompany silently any statement by anyone.

It is also obvious that an opinion doesn't become a fact because

it is believed by half the number of people from a particular

ensemble; truth would otherwise become dependent on the choice

of this ensemble.

Thus one needs to check the claims, to listen to different sides

of a controversy, to ask for sources or justification of an opinion.

In this way, anyone who wants to get a clear picture soon notices

which claims are trustworthy, which ones are tenable but somewhat

shaky, and which ones are poorly founded.

On the other hand, in participating in a discussion,

honesty only requires that one asserts what one thinks

is true, and gives one's reasons upon request.

This is the scientific approach, since it lets others check upon

the trustworthiness of a claim.


S16g. Physics and experience


On superficial reasoning, time is only a concept that helps us

to order our experiences. Thus,

''experience exists; time does not''.

By exactly the same, argument,

''experience exists; space does not''

''experience exists; mass does not''

''experience exists; charge does not''

''experience exists; gravitation does not''


Physics is exactly about the concepts that are substituted for

experience to make experience quantitatively predictable.

Therefore, in this deeper sense, time, space, mass, charge,

gravitation, etc. exist, and are more fundamental than experience.


S16h. Modeling reality


In describing reality from a physics point of view,

the person modeling a system of interest makes certain

choices. These consist in choosing a mathematical model

of the system, and setting up a correspondence between

informal objects related to the system and formal objects

in the mathematical model.

More specifically, an assertion about reality is modelled

as a mathematical assertion about mathematical objects in the

mathematical model that carry the same names as those

in the reality they are supposed to model.


S16i. What is a system (e.g., an ideal gas)?


Theories of physics do not say what a system (such as an

electron, a star, an ideal gas, a crystal) is in reality.

Nevertheless, it is possible to check the reality contents

of a physical theory. How does this come about?

Let us consider thermodynamics. Thermodynamics does not say which

system is an ideal gas, which is only a van-der-Waals gas,

which is a liquid, or a solid.

Indeed, such questions need not be answered by the theory.

Instead, they are answered by checking how a system behaves:

If a real system behaves as the theory for an ideal gas (a solid,

a crystal, an electron) requires, a physisict will say it 'is'

an ideal gas (a solid, a crystal, an electron); if not, it is not.

While this definition may seem circular, it isn't once it is recognized

that one can check some characterizing properties of systems that


a particular label (such as 'ideal gas') by a small amount

of measurements, and then deduce many more properties from the


that can be checked subsequently.

Engineers call this process 'system identification'.

Thus the task of theory is to provide models with just enough

flexibility that they cover the range of relevant possibilities,

while being still restricted enough so that one can identify

the system with a limited amount of data. Exactly in this case

a theory has predictive value.


S16j. When is a theory confirmed?


Any deviation from a law can only be 'confirmed' by narrowing error

bars for the parameters modeling the deviation. As long as the error

bars contain zero, the law counts as confirmed.

With time, confirmation of the law may be at a higher level of

accuracy, or (as in the case of neutron masses) confirmation of the

deviation (if the more accurate error bars no longer contain zero).

If one disputes any of the established theories because of not enough

confirmation, one can as well dispute Lorentz symmetry, translation

invariance, zero photon mass, general relativity, etc., which are

basic to contemporary physics but all confirmed only to a certain


There are experiments testing the limits of all these assumptions,

but even when one of these experiments succeeds (as in the case of

neutron masses), the previous theory remains valid to the accuracy

it was known to be valid before. In this sense, older theories don't

die even when they are superseded. A well-known case is Newton's

gravitational theory which is still taught and heavily used

although not completely correct.


S16k. What is real?


All physics is just a handy way of thinking about certain phenomena.

This - a handy way of thinking - is what it means that something

- the concept we find useful - exists.

We say that people exist, because they are a handy way to describe

certain blobs of matter like ourselves. We say that electrons exist,

because they are a handy way to describe ionization phenomena.

We say that photons exist because they are a handy way to describe
quantum optics phenomena.

Photons are objectively real because they are needed in the only

comprehensive coherent theory of microscopic interactions that we

know of.

On the other hand, 'photon' is merely a word that physicists use on

paper and in conversation. But in precisely the same sense that

entropy, energy, or the electromagnetic field are merely words that

physicists use on paper and in conversation.

Even our best concepts are 'merely' words.

If we give up concepts, only an undifferentiated happening in

space-time remains, and even talking about this becomes impossible.


S16l. How many angels fit onto the tip of a needle?


Anton Zeilinger writes in
''the question whether such a description exists or not was therefore

similarly irrelevant as, according to Pauli, the old question

how many angels fit onto the tip of a needle.''

This question has become a well-known metaphor for doing

irrelevant physics.

But how old is this question really?

Who was the person who discussed it seriously?

mentions explicitly Chillingworth's

''Religion of Protestants a Safe Way to Salvation''

(1638, reprinted 1972, 12th unnumbered page of the preface)

accusing unnamed scholars of debating

''Whether a Million of Angels may not fit upon a needles point?''

It seems that, as here, the question has always been used in a derisive

manner only. In the historical essay

E.D. Sylla,

Swester Katrei and Gregory of Rimini:

Angels, God and mathematics in the fourteenth century,

pp. 251-270 in:

Mathematics and the Divine: A Historical Study

(T. Koetsier and L. Bergmans, eds.)

Elsevier 2005,


Sylla conjectures that the question might have been coined by

Thomas Hobbes, who had learnt the scholastic tradition in Oxford

between 1603 and 1608. See also

But similar questions were discussed much earlier. Sylla

menons an anonymous 14th century myscal trease

''Swester Katrei'' (= Sister Kate or Sister Catherine) referring to

''a thousand souls in heaven sitting on the point of a needle''.

Cf. also the paper

G.M. Ross,


Philosophy 60 (1985), 495-511.

and the web site

Sister Catherine (Schwester Katrei)


Even earlier and most prominent is the discussion of angels

in Thomas Aquinas' ''Summa Theologica'', published in 1266.

It is surprisingly interesting.

It looks as if Aquinas was the first writer anticipating quantum theory

and the Pauli exclusion principle. Replace 'angel' by 'electron' and he

sounds surprisingly modern; in modern terms, angels are Fermions,

according to Thomas Aquinas.

An English translation of the ''Summa Theologica'' is available online.

Part I (hp://

contains the chapter on angels.

The secons 50-53 on their substance relates to their physical

properties and hence is of scientific interest.

There he discusses the properties of a point particle from

a logical point of view. His 'angels' are not the winged creatures
we might imagine them to be, but incorruptible, indivisible,

extended objects, ''form without matter'', with quite precise


Two angels cannot be in the same place, but they have

virtual (sic!) positions, and can be in an extended place:

''So the entire body to which he is applied by his power,

corresponds as one place to him.''

They may go from one place to another with or without being


in between:

''But an angel's substance is not subject to place as contained

thereby, but is above it as containing it: hence it is under

his control to apply himself to a place just as he wills,

either through or without the intervening place.''

Their number roughly matches those of the number of electrons:

''Hence it must be said that the angels, even inasmuch as they are

immaterial substances, exist in exceeding great number, far beyond

all material multitude.''

(With ''angel'' interpreted as ''electron'', ''immaterial'' could thus

be interpreted as zero baryon number.)

Like early chemists hiding their scientific insights in an alchemist

guise, he might have phrased his speculations in terms of notions

acceptable to his clerical collegues...

If we attribute to the Greeks the concept of the atom (though they

thought of it in - for modern ears bizarre - terms that have little

to do with our modern view), we should perhaps be as generous


Aquinas and attribute to him the exclusion principle.

On a more tongue-in-cheek basis, the Annals of Improbable Research

published an article

A. Sandberg,

Quantum Gravity Treatment of the Angel Density Problem,

Annals of Improbable Research 7 (Issue 3), (2001), 5-8.


S17a. How to get informaon from sci.physics.research


If you read sci.physics.research out of curiosity, you may find that

the discussions get too specific for you but make you curious to

learn more about the background. But it may be difficult to find out

where to get started.

The right way to find out is to ask on sci.physics.research

for what you need, in response to someone's contribution.

The writers usually know how they got the knowledge, and are happy

give you hints or recommendations, and others will join in if they

think they have better advice. The more specific your question, the

more likely you'll get an answer, and the more useful it will

be for others, too. By asking good questions you are doing a

service to all.

My Lord Jesus Christ, for whom I live, asserted:

"Ask, and it will be given you; search, and you will find; knock,

and the door will be opened for you. For everyone who asks receives,
and everyone who searches finds, and for everyone who knocks,

the door will be opened." (Mah. 7:7-8)

It took me a while to realize that this was excellent advice.


S17b. How to get your work published


You did some work that you think is great (or at least reasonable),

but it was rejected by the journal you sent it to?

This is disappointing, but not the end of all hope...

Rejection letters usually give some reasons for rejection; if they

don't you may request (in a polite way!) getting reasons so that you

can learn from them. And then _do_ learn from them! Usually the

for rejection are sound and mean at least that you didn't pose your

case well. It also takes some time to learn the standards that

publications should respect, and it is likely that you violated

some of the unspoken rules without realizing it.

If your idea is far from mainstream, you need also convince people
that your approach is sound and merits spending the time to read

through the new proposal. This is difficult since you need to build

up trust; it requires that you have a high level of frustration


The less mainstream an idea the stronger must be its contents and the

more careful it must be argued to be publishable; use the feedback you

get to find out the standards expected and then go and meet them.

The difference between a crank and a serious researcher is that

the letter learns from criticisms and grows through each feedback,

while the former 'knows' (and acts on this assumption) that he is right

and that established physics is just rejecting him or her for no good


If you enter a correspondence with anyone who takes the time to

read your work, stay polite even when the answers you get are not

what you hoped for. Once the tone of your mail gets defensive or

aggressive, you probably lost your case - your partner sees that

you try to replace facts by emotions and your credibility is gone.

Time is precious for active scientists. So keep your article as short

as possible without losing substance. 120 pages of detailed analysis,

say, is too much for most people to read, unless they already have high
confidence that the contents is sound. If you really need 120 pages

to make your case you need to make short versions of your long paper

that allow others to do checks for reasonableness with less efforts.

You'd then have a 1/2 page abstract, a 3 page introductory essay,

a 7 page outline version, a 20 page version with the key steps, and a

full paper with all the details, and each of these versions should be

self-contained and allow the reader to get a feeling of what you do,

and why you succeed - in terms of background that shows that you are

familiar with the state of the art, and in a language that is both

understandable and concise. Then anyone reading it gets a sense of

high quality work that is informative and inviting.

Note that the most important task is not to present your claim and

praise or defend your work, but to convince others that your claim

deserves trust enough to spend time on checking it.

It is all too easy to make claims that are unsubstantiated but

embedded in a complicated manuscript where one gets easily lost,

loses track of what is important, and therefore misses the mistakes

or gaps in the arguments. It is the responsibility of the innovator

to present the news in a way that makes checking and trusting


Of course one can find many published papers that do not meet these
standards. This is probably because their contents is not important

enough to require high standards of checking, or because their

conclusions are not inviting suspicion. But innovative work invites

suspicion since it is far from the common, and if relevant requires

therefore higher standards to be accepted.


S17c. How to respond to crical referee's reports


{This is taken verbatim from]

What Should I Do When a Referee Criticizes My Paper?

Read the referee report carefully and dispassionately. Approach the

report with an open mind. What may at first seem like a devastating

blow is perhaps a request for more information or for a more detailed

explanation. At other times the referee may indeed have found a fatal

flaw in the research or logic. Put yourself in the position of a

reader, which is exactly the position of the referee. Is the paper

well written? Is the presentation clear, unambiguous, and logical?

Respond to all referee comments, suggestions, and criticisms.

Explain which changes have been made and state your position on

of disagreement. In our experience, appropriate response to some

referee comments may require more research or even reconsideration

of the research project.


S17d. How to sell your revoluonary idea


Unless you don't care about making a fool of yourself, don't tell

it to others before you worked out enough details to be convincing.

Your audience is very likely to be skeptic (since there are too many

revolutionary ideas around which don't stand the test); so you need

to make best use of this fact.

The secret is that most people like to answer questions that

fall into their field of expertise, if it does not take too

much effort to reply. But few like to listen to half-baked

(or even fully baked but only outlined) ideas;

too many such offers come from cranks. The devil is always
in the details; and if you can't provide them it is likely

they'll think it is because it does not work or does not offer

any advantage.

So the right approach is to ask them for (and afterwards study!)

information about what is known in the direction you want to go,

rather than proposing the revolutionary way of doing it correctly.

Take heed of the advice of an old saint:

"Let every man be swift to hear, slow to speak, slow to wrath."

(The Bible, James 1:19)

If you really can do it better than others, and you don't find

prior relevant work in the literature, work it out yourself and

show with a nontrivial application that you can do _something_

more effciently than tradition. Then submit it to a respectable

journal, and people are likely to listen.

If you get negative feedback from referees, take it seriously,

learn from it as much as you can. Raise your standards according

to what you learn, and accomodate the criticism in your future work.

The referees are usually competent and have a point in what they say.

If not, it is likely that your work was presented in a fashion prone

to misunderstanding - in this case formulate your results more

carefully, taking into account accepted tradition. It is an author's

obligation to minimize the chances of misunderstanding by potential

readers. I gained a lot from considering the referee's advice in the

many papers I have written. And it takes a while to learn how to

write good papers...

Even if your work is good but not mainstream, it may take

persistence to publicize it properly; publishing is not enough.

But publicizing does not mean boasting with great claims -

this makes people suspicious and is therefore counterproductive.

Be modest in your claims - claim what you can actually prove,

but not what you only dream of proving one day.

See also: The Crackpot Index (by John Baez) at


S17e. Useful background, online lecture notes, etc.


(incomplete, just some useful references)

The Nobel Prize Winners in Physics

Nobel lectures of the laureates, and their biographies

worth reading - can be regarded as a sort of lively answer

to the question:

What has been important enough in physics to deserve a big prize?

Gerard 't Hooft,

How to Become a _good_ theoretical physicist


A tree (well, almost) of physics fields, subfields, and concepts.

The leaves explain things in some detail. There is an index

but it does not contain references to each node.

Organization seems to be experimental physics oriented;

for example, I have not found nodes with 'statistical mechanics'

or 'quantum field theory'.

Everything You Always Wanted to Know About the Hydrogen Atom

(But Were Afraid to Ask)


Theory of Renormalization and Regularization

contains a very useful set of online notes that may serve

as an introduction to QFT from a mathematical physics point of view.

Lecture Scripts and Online Courses on Quantum Mechanics


and on other physics topics

Introduction to General Relativity (video)

Review articles on Local Quantum Physics

Norbert Dragon,

Remarks on Quantum Mechanics
Lost & Regained Causes in theoretical physics

Selected Classic Papers from the History of Chemistry

Digest of moderated newsgroup sci.physics.research

Historical Physics Lecture Notes

* Freeman J. Dyson, Advanced Quantum Mechanics 1951

* Fritz Rohrlich, Applied Quantum Electrodynamics, 1953

* Green and Sengers Proc. 1965 conference on crical phenomena

* Cyril Domb's brief historical survey on critical phenomena, 1985

* 1993 roundtable, Physics in Transion

Resources for the History of Physics & Allied Fields

Sidney Coleman

Lecture notes on quantum field theory



S17f. Stories about physicists


Memories about Theoretical Physicists (by R.F. Streater)

Short Stories


Parables for Modern Academia (by D. and L. Haarsma)



S17g. Other physics FAQs


Usenet Physics FAQ

(extensive, has also links to further physics-related FAQs)

Physics FAQ

(a list of links)

Plasma FAQ

Quantum Physics FAQ

(current views of Erich Joos)

Physik und das Drumherum

(Physics FAQ in German)


S17h. Naming in science

How do scientific concepts, effects, or inventions named after

their discoverers?

It is good practice to name important concepts, effects, or inventions

created by esteemed collegues after them - good names are always


to find, and besides names clearly related to the content, names

naturally related to the history stick best. If a naming is successful

(in that others find it appropriate and useful) it will spread,

and soon everywhere is using it. Then the name is established.

It is bad practice if authors calls something by their own name

before it has been established by others. It suggests both vanity

and a lack of confidence that others do a good naming job.

And if the self chosen vanity name does not stick, it serves them

right for having made a fool of themselves.

On the other hand, naming is at times unfair. Not rarely in the past,

a concept (or theorem, etc.) got the name of one of its main

rather than that of its creator.

There are several reasons for this.

It takes time (and a certain amount of interest) to find the true

origin of a concept; but a good name is needed once it is used by more

than a few people. But once a name is established, it is nearly

impossible to change it.

A concept may also be rediscovered independent of its first inception.

If the time wasn't ripe for it the first time, it is likely that the

name of the rediscoverer sticks, and the voices of those who had

the first source come too late.

See also:

List of misnamed theorems


S18a. What is the meaning of 'self-consistent'?

A self-consistent solution (or method, or theory) refers to the fact

that one has two sets of equations relating two sets of unknown

quantities, and wants to solve the equations jointly for the unknowns.

If aspect A of a theory says y=x^2 and aspect B of the theory

(or of another theory) says x=y-2 then self-consistency means that

both equations are assumed to be valid, giving

x^2 = y = x+2,

which leads to the two solutions

x=2, y=4 and x=-1, y=1.

That's all. Of course, the self-consistent Hartree-Fock method,

say, has more variables and is harder to solve, but the principle

is the same.


S18b. What is a vector?


A vector is (for the beginner) a list of numbers written below each

other. For example the x,y, and z coordinate of a point in a

3-dimensional coordinate system. Physicists write the three

coordinates as x_1, x_2, x_3 and combine it to a vector

simply called x.

/ \

| x_1 |

x = | x_2 | (The parentheses look a bit awkward in ascii.)

| x_3 |

\ /

The same for a list of n numbers. This gives a vector x with n

coordinates x_1,...x_n, and is thought of as a point in a

space with n dimensions.

Two vectors are added or subtracted just by adding or subtracting

their entries. A vector is multiplied by a number just by multiplying

each entry with the number. Then there is the inner product of two


x dot y = sum_i x_i*y_i

which is a number and not a vector.

Once you mastered vectors you need to understand matrices.

These are rectangular arrays of numbers.

Later you need to enrich the meaning of a vector by learning

the concept of a vector space. Now all sorts of objects might

also deserve the name vector, most prominently functions,

matrices, tensors, operators. They behave in many respects

just like ordinary vectors.


S18c. Learning quantum mechanics at age 14


If you want to learn about quantum physics and really understand

you need to learn first how to do calculations with vectors and

matrices. Look in your local library for math books, about

'linear algebra' or 'analytic geometry'. You may have to try

several before you find one suitable at your level.

Linear algebra (i.e., vectors and matrices) is more fundamental

to quantum mechanics than calculus, although the latter is needed

to understand how things change steadily with time.

But one can understand the time-independent part of quantum


already without calculus, namely everything involving entanglement,

Schroedinger's cat, quantum cryptography, and the like.

This only needs linear algebra, which may be easier.

(On the other hand, calculus is not really difficult either,

once one gets used to it.)

Maybe at first it is better to get math schoolbooks from your

older peers. Good school books are written in a way that they can be

used for self study. If you are motivated it can be very exciting!

If you like math it is much less work than you might think,

and it is fun! Just start with next years textbook and read it

in your spare time! I started reading math beyond my age when I

was 12, and never regreed it.

With the right movaon, you can learn 10 mes as fast

as when you just wait till the subject comes up in school!

And it will be 10 mes as interesng!

You don't need to do all the exercises but just enough that you

think you know how it works. Go back to practicing more if you

need it. This speeds up things a lot.

Also, you don't need to read everything in the order it is in

the book - just go where your curiosity leads you, and if you
encounter something you don't know yet, go back to where it was

introduced. In this way you get the idea of what is happening

long before you understand it thoroughly, and it will be a

motivation to learn the missing things.

Learning math and physics is a life-long challenge (so much

interesting stuff accumulated over the centuries...),

and you can't start early enough.

And at any time in life there will be parts you understand well,

parts you understand partly or superficially only, and parts

where you know little more than a few buzz words. So you need

not aim at understanding everything fully on first acquaintance,

but learn whatever you can in whatever order you pick it up.

The stuff to be practiced and learnt well is only the part that

comes up over and over again. When you realize that then you know

what to learn, and you quickly see how to do it!


S18d. Research at age 16

With 16, you should spend your me with learning rather than

with doing research. Lacking ideas means knowing too little...

Once you know enough about what others did and where they

got stuck, you'll have more than enough ideas to work on.

I'd like to suggest that you read the Nobel lectures of the

physics Nobel laureates,

The material spans a whole century, and will occupy you for long!

It will put your mind to themes that have been important enough

to merit the prize; most of them will continue to be important in

the future.

In parallel, use the web to sort out all concepts used in the Nobel

lectures that you don't yet understand; at first it will be a lot,

and you have to search a bit to find out where the basics you need are

well explained. Some items might be explained in this theoretical

physics FAQ, or in the book mentioned at the top of this FAQ.

Doing both will put you on a learning track which will end in a

research career and bear plenty of fruit.


S18e. Are there indefinite Hilbert spaces?


There are no indefinite Hilbert spaces. There are, however,

vector spaces with a distinguished indefinite inner product;

these are called Krein spaces. Their structure is much weaker than

that of Hilbert spaces; there is no natural topology, no completeness,

nothing resembling a Hilbert space except the inner product.

Since there are physical situations where indefinite inner products

arise naturally, some people show their lack of knowledge of the

literature by referring to Krein spaces as indefinite Hilbert spaces.

But if a few people do so, it doesn't mean that the terminology is


For example, quant-ph/0211048 uses this poor terminology.

The ghosts referred to in this paper are nonphysical vectors in a

Krein space which contains a definite subspace of physical vectors

whose completion gives the physical Hilbert space. This is a natural

construction in gauge theories (Gupta-Bleuler formalism) where

the direct construction of a physical Hilbert space would

manifestly break Lorentz and/or gauge invariance, while the

nonphysical, bigger Krein space enjoys all desired invariance


The indefinite metric in relativity, also mentioned in that paper,

has nothing to do with indefinite Hilbert spaces, since the

underlying vector spaces (Minkowski space in special relativity,

the tangent spaces at space-time points in general relativity)

are 4-dimensional spaces with the ordinary Euclidean topology

(although the metric is non-Euclidean).


S19a. God and physics


This is most likely to be controversial; but you might be

interested in how the author of this FAQ sees the issues.

The following links are to some relevant pages from my web site.

How Do We Know Whether God Acts In The World?

''I found the assumption that `God acts in the world' a superior
way of organizing the events that I see or hear happen.''

Knowledge, Chance, and Creation

(On the difficulty to know, and the role of the second law of

thermodynamics as an instrument of creation)

How to study

''When I questioned the bible about the attitude appropriate

to the study of science I found the following instructions.''

How to Create a Universe - Instructions for an Apprentice God.

(A fantasy to be read at leisure time)

Science and Faith

(an extensive collection of links)

''Science is the truth only in matters that can be objectified;

in the spiritual world, where values, goals, authority and purpose

are located, science has nothing to say. It is a poor life that is

restricted to the scientific standard of truth, where you and I are

nothing but a collection of atoms without meaning and purpose.

Realizing the narrow-minded nature of science opens the gate to an

understanding of God that complements the scientific truth and gives

life, love and peace.''

and in German:

Gott - die grosse Unbekannte

Mathematik, Physik und Ewigkeit (mit einem Augenzwinkern



S20a. Acknowledgments


Thanks to the contributors to the newsgroup sci.physics.research

for their more or less challenging questions and comments, without

which this FAQ wouldn't exist.

Thanks also to Steve Carlip, Norbert Dragon,

Hendrik van Hees, Don Koks, Nick Maclaren,

Alejandro Rivero, Joe Rongen, and Gerard Westendorp

for useful comments that lead to improvements in the FAQ.

Finally, thanks to God for his wonderful and interesting universe,

and for the gift of being able to understand his wonders.