Anda di halaman 1dari 838

Fundamental Quantum Mechanics for Engineers

Leon van Dommelen

03/22/09 Version 4.2 alpha


Copyright
Copyright 2004, 2007, 2008 and on, Leon van Dommelen. You are allowed
to copy or print out this work for your personal use. You are allowed to attach
additional notes, corrections, and additions, as long as they are clearly identified
as not being part of the original document nor written by its author.
Conversions to html of the pdf version of this document are stupid, since
there is a much better native html version already available, so try not to do it.
Dedication

To my parents, Piet and Rietje.

iii
Contents

Preface xxvii
To the Student . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii
Comments and Feedback . . . . . . . . . . . . . . . . . . . . . . . . xxx

I Basic Quantum Mechanics 1


1 Mathematical Prerequisites 3
1.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Functions as Vectors . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 The Dot, oops, INNER Product . . . . . . . . . . . . . . . . . . 8
1.4 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Hermitian Operators . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Additional Points . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.1 Dirac notation . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.2 Additional independent variables . . . . . . . . . . . . . 17

2 Basic Ideas of Quantum Mechanics 19


2.1 The Revised Picture of Nature . . . . . . . . . . . . . . . . . . 19
2.2 The Heisenberg Uncertainty Principle . . . . . . . . . . . . . . 22
2.3 The Operators of Quantum Mechanics . . . . . . . . . . . . . . 23
2.4 The Orthodox Statistical Interpretation . . . . . . . . . . . . . 25
2.4.1 Only eigenvalues . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Statistical selection . . . . . . . . . . . . . . . . . . . . . 27
2.5 A Particle Confined Inside a Pipe . . . . . . . . . . . . . . . . . 28
2.5.1 The physical system . . . . . . . . . . . . . . . . . . . . 29
2.5.2 Mathematical notations . . . . . . . . . . . . . . . . . . 30
2.5.3 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 30
2.5.4 The Hamiltonian eigenvalue problem . . . . . . . . . . . 31
2.5.5 All solutions of the eigenvalue problem . . . . . . . . . . 32

v
vi CONTENTS

2.5.6 Discussion of the energy values . . . . . . . . . . . . . . 36


2.5.7 Discussion of the eigenfunctions . . . . . . . . . . . . . . 37
2.5.8 Three-dimensional solution . . . . . . . . . . . . . . . . . 40
2.5.9 Quantum confinement . . . . . . . . . . . . . . . . . . . 43
2.6 The Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . 46
2.6.1 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 47
2.6.2 Solution using separation of variables . . . . . . . . . . . 48
2.6.3 Discussion of the eigenvalues . . . . . . . . . . . . . . . . 51
2.6.4 Discussion of the eigenfunctions . . . . . . . . . . . . . . 53
2.6.5 Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6.6 Non-eigenstates . . . . . . . . . . . . . . . . . . . . . . . 59

3 Single-Particle Systems 63
3.1 Angular Momentum . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.1 Definition of angular momentum . . . . . . . . . . . . . 63
3.1.2 Angular momentum in an arbitrary direction . . . . . . . 64
3.1.3 Square angular momentum . . . . . . . . . . . . . . . . . 66
3.1.4 Angular momentum uncertainty . . . . . . . . . . . . . . 69
3.2 The Hydrogen Atom . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.1 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 70
3.2.2 Solution using separation of variables . . . . . . . . . . . 71
3.2.3 Discussion of the eigenvalues . . . . . . . . . . . . . . . . 76
3.2.4 Discussion of the eigenfunctions . . . . . . . . . . . . . . 79
3.3 Expectation Value and Standard Deviation . . . . . . . . . . . 84
3.3.1 Statistics of a die . . . . . . . . . . . . . . . . . . . . . . 85
3.3.2 Statistics of quantum operators . . . . . . . . . . . . . . 86
3.3.3 Simplified expressions . . . . . . . . . . . . . . . . . . . . 88
3.3.4 Some examples . . . . . . . . . . . . . . . . . . . . . . . 89
3.4 The Commutator . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.4.1 Commuting operators . . . . . . . . . . . . . . . . . . . 92
3.4.2 Noncommuting operators and their commutator . . . . . 93
3.4.3 The Heisenberg uncertainty relationship . . . . . . . . . 94
3.4.4 Commutator reference [Reference] . . . . . . . . . . . . . 95
3.5 The Hydrogen Molecular Ion . . . . . . . . . . . . . . . . . . . 98
3.5.1 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 99
3.5.2 Energy when fully dissociated . . . . . . . . . . . . . . . 99
3.5.3 Energy when closer together . . . . . . . . . . . . . . . . 100
3.5.4 States that share the electron . . . . . . . . . . . . . . . 101
3.5.5 Comparative energies of the states . . . . . . . . . . . . 103
3.5.6 Variational approximation of the ground state . . . . . . 104
3.5.7 Comparison with the exact ground state . . . . . . . . . 106
CONTENTS vii

4 Multiple-Particle Systems 109


4.1 Wave Function for Multiple Particles . . . . . . . . . . . . . . . 109
4.2 The Hydrogen Molecule . . . . . . . . . . . . . . . . . . . . . . 111
4.2.1 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 111
4.2.2 Initial approximation to the lowest energy state . . . . . 112
4.2.3 The probability density . . . . . . . . . . . . . . . . . . . 113
4.2.4 States that share the electrons . . . . . . . . . . . . . . . 114
4.2.5 Variational approximation of the ground state . . . . . . 117
4.2.6 Comparison with the exact ground state . . . . . . . . . 118
4.3 Two-State Systems . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.4 Spin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.5 Multiple-Particle Systems Including Spin . . . . . . . . . . . . . 125
4.5.1 Wave function for a single particle with spin . . . . . . . 125
4.5.2 Inner products including spin . . . . . . . . . . . . . . . 128
4.5.3 Commutators including spin . . . . . . . . . . . . . . . . 129
4.5.4 Wave function for multiple particles with spin . . . . . . 130
4.5.5 Example: the hydrogen molecule . . . . . . . . . . . . . 132
4.5.6 Triplet and singlet states . . . . . . . . . . . . . . . . . . 133
4.6 Identical Particles . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.7 Ways to Symmetrize the Wave Function . . . . . . . . . . . . . 137
4.8 Matrix Formulation . . . . . . . . . . . . . . . . . . . . . . . . 143
4.9 Heavier Atoms [Descriptive] . . . . . . . . . . . . . . . . . . . . 147
4.9.1 The Hamiltonian eigenvalue problem . . . . . . . . . . . 147
4.9.2 Approximate solution using separation of variables . . . 148
4.9.3 Hydrogen and helium . . . . . . . . . . . . . . . . . . . . 150
4.9.4 Lithium to neon . . . . . . . . . . . . . . . . . . . . . . . 152
4.9.5 Sodium to argon . . . . . . . . . . . . . . . . . . . . . . 156
4.9.6 Potassium to krypton . . . . . . . . . . . . . . . . . . . . 157
4.10 Pauli Repulsion [Descriptive] . . . . . . . . . . . . . . . . . . . 157
4.11 Chemical Bonds [Descriptive] . . . . . . . . . . . . . . . . . . . 159
4.11.1 Covalent sigma bonds . . . . . . . . . . . . . . . . . . . 159
4.11.2 Covalent pi bonds . . . . . . . . . . . . . . . . . . . . . . 160
4.11.3 Polar covalent bonds and hydrogen bonds . . . . . . . . 161
4.11.4 Promotion and hybridization . . . . . . . . . . . . . . . . 162
4.11.5 Ionic bonds . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.11.6 Limitations of valence bond theory . . . . . . . . . . . . 166

5 Time Evolution 169


5.1 The Schrödinger Equation . . . . . . . . . . . . . . . . . . . . . 169
5.1.1 Energy conservation . . . . . . . . . . . . . . . . . . . . 170
5.1.2 Stationary states . . . . . . . . . . . . . . . . . . . . . . 171
5.1.3 Time variations of symmetric two-state systems . . . . . 172
viii CONTENTS

5.1.4 Time variation of expectation values . . . . . . . . . . . 173


5.1.5 Newtonian motion . . . . . . . . . . . . . . . . . . . . . 174
5.1.6 Heisenberg picture . . . . . . . . . . . . . . . . . . . . . 175
5.1.7 The adiabatic approximation . . . . . . . . . . . . . . . 178
5.2 Unsteady Perturbations of Systems . . . . . . . . . . . . . . . . 178
5.2.1 Schrödinger equation for a two-state system . . . . . . . 179
5.2.2 Stimulated and spontaneous emission . . . . . . . . . . . 180
5.2.3 Effect of a single wave . . . . . . . . . . . . . . . . . . . 181
5.2.4 Forbidden transitions . . . . . . . . . . . . . . . . . . . . 183
5.2.5 Selection rules . . . . . . . . . . . . . . . . . . . . . . . . 184
5.2.6 Angular momentum conservation . . . . . . . . . . . . . 185
5.2.7 Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
5.2.8 Absorption of a single weak wave . . . . . . . . . . . . . 189
5.2.9 Absorption of incoherent radiation . . . . . . . . . . . . 191
5.2.10 Spontaneous emission of radiation . . . . . . . . . . . . . 193
5.3 Position and Linear Momentum . . . . . . . . . . . . . . . . . . 194
5.3.1 The position eigenfunction . . . . . . . . . . . . . . . . . 195
5.3.2 The linear momentum eigenfunction . . . . . . . . . . . 197
5.4 Wave Packets in Free Space . . . . . . . . . . . . . . . . . . . . 199
5.4.1 Solution of the Schrödinger equation. . . . . . . . . . . . 199
5.4.2 Component wave solutions . . . . . . . . . . . . . . . . . 200
5.4.3 Wave packets . . . . . . . . . . . . . . . . . . . . . . . . 201
5.4.4 Group velocity . . . . . . . . . . . . . . . . . . . . . . . 203
5.5 Motion near the Classical Limit . . . . . . . . . . . . . . . . . . 206
5.5.1 Motion through free space . . . . . . . . . . . . . . . . . 206
5.5.2 Accelerated motion . . . . . . . . . . . . . . . . . . . . . 207
5.5.3 Decelerated motion . . . . . . . . . . . . . . . . . . . . . 207
5.5.4 The harmonic oscillator . . . . . . . . . . . . . . . . . . 207
5.6 WKB Theory of Nearly Classical Motion . . . . . . . . . . . . . 209
5.7 Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
5.7.1 Partial reflection . . . . . . . . . . . . . . . . . . . . . . 213
5.7.2 Tunneling . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.8 Reflection and Transmission Coefficients . . . . . . . . . . . . . 216
5.9 Alpha Decay of Nuclei . . . . . . . . . . . . . . . . . . . . . . . 217

II Advanced Topics 225


6 Numerical Procedures 227
6.1 The Variational Method . . . . . . . . . . . . . . . . . . . . . . 227
6.1.1 Basic variational statement . . . . . . . . . . . . . . . . 227
6.1.2 Differential form of the statement . . . . . . . . . . . . . 228
CONTENTS ix

6.1.3 Example application using Lagrangian multipliers . . . . 229


6.2 The Born-Oppenheimer Approximation . . . . . . . . . . . . . 231
6.2.1 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 232
6.2.2 The basic Born-Oppenheimer approximation . . . . . . . 233
6.2.3 Going one better . . . . . . . . . . . . . . . . . . . . . . 235
6.3 The Hartree-Fock Approximation . . . . . . . . . . . . . . . . . 238
6.3.1 Wave function approximation . . . . . . . . . . . . . . . 238
6.3.2 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 244
6.3.3 The expectation value of energy . . . . . . . . . . . . . . 246
6.3.4 The canonical Hartree-Fock equations . . . . . . . . . . . 248
6.3.5 Additional points . . . . . . . . . . . . . . . . . . . . . . 250

7 Solids 259
7.1 Molecular Solids [Descriptive] . . . . . . . . . . . . . . . . . . . 259
7.2 Ionic Solids [Descriptive] . . . . . . . . . . . . . . . . . . . . . . 261
7.3 Introduction to Band Structure [Descriptive] . . . . . . . . . . . 266
7.4 Metals [Descriptive] . . . . . . . . . . . . . . . . . . . . . . . . 268
7.4.1 Lithium . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
7.4.2 One-dimensional crystals . . . . . . . . . . . . . . . . . . 270
7.4.3 Wave functions of one-dimensional crystals . . . . . . . . 271
7.4.4 Analysis of the wave functions . . . . . . . . . . . . . . . 274
7.4.5 Floquet (Bloch) theory . . . . . . . . . . . . . . . . . . . 275
7.4.6 Fourier analysis . . . . . . . . . . . . . . . . . . . . . . . 276
7.4.7 The reciprocal lattice . . . . . . . . . . . . . . . . . . . . 277
7.4.8 The energy levels . . . . . . . . . . . . . . . . . . . . . . 278
7.4.9 Electrical conduction . . . . . . . . . . . . . . . . . . . . 279
7.4.10 Merging and splitting bands . . . . . . . . . . . . . . . . 280
7.4.11 Three-dimensional metals . . . . . . . . . . . . . . . . . 282
7.5 Covalent Materials [Descriptive] . . . . . . . . . . . . . . . . . . 286
7.6 Confined Free Electrons . . . . . . . . . . . . . . . . . . . . . . 289
7.6.1 The Hamiltonian eigenvalue problem . . . . . . . . . . . 290
7.6.2 Solution by separation of variables . . . . . . . . . . . . 290
7.6.3 Discussion of the solution . . . . . . . . . . . . . . . . . 292
7.6.4 A numerical example . . . . . . . . . . . . . . . . . . . . 294
7.6.5 The density of states and confinement . . . . . . . . . . 295
7.6.6 Relation to Bloch functions . . . . . . . . . . . . . . . . 301
7.7 Free Electrons in a Lattice . . . . . . . . . . . . . . . . . . . . . 301
7.7.1 The lattice structure . . . . . . . . . . . . . . . . . . . . 303
7.7.2 Occupied states and Brillouin zones . . . . . . . . . . . . 305
7.8 Nearly-Free Electrons . . . . . . . . . . . . . . . . . . . . . . . 309
7.8.1 Energy changes due to a weak lattice potential . . . . . . 310
7.8.2 Discussion of the energy changes . . . . . . . . . . . . . 312
x CONTENTS

7.9 Quantum Statistical Mechanics . . . . . . . . . . . . . . . . . . 317


7.10 Additional Points [Descriptive] . . . . . . . . . . . . . . . . . . 322
7.10.1 Thermal properties . . . . . . . . . . . . . . . . . . . . . 322
7.10.2 Ferromagnetism . . . . . . . . . . . . . . . . . . . . . . . 327
7.10.3 X-ray diffraction . . . . . . . . . . . . . . . . . . . . . . 330

8 Basic and Quantum Thermodynamics 337


8.1 Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
8.2 Single-Particle and System Eigenfunctions . . . . . . . . . . . . 339
8.3 How Many System Eigenfunctions? . . . . . . . . . . . . . . . . 344
8.4 Particle-Energy Distribution Functions . . . . . . . . . . . . . . 349
8.5 The Canonical Probability Distribution . . . . . . . . . . . . . 351
8.6 Low Temperature Behavior . . . . . . . . . . . . . . . . . . . . 353
8.7 The Basic Thermodynamic Variables . . . . . . . . . . . . . . . 356
8.8 Introduction to the Second Law . . . . . . . . . . . . . . . . . . 360
8.9 The Reversible Ideal . . . . . . . . . . . . . . . . . . . . . . . . 361
8.10 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
8.11 The Big Lie of Distinguishable Particles . . . . . . . . . . . . . 374
8.12 The New Variables . . . . . . . . . . . . . . . . . . . . . . . . . 374
8.13 Microscopic Meaning of the Variables . . . . . . . . . . . . . . . 381
8.14 Application to Particles in a Box . . . . . . . . . . . . . . . . . 382
8.14.1 Bose-Einstein condensation . . . . . . . . . . . . . . . . 384
8.14.2 Fermions at low temperatures . . . . . . . . . . . . . . . 385
8.14.3 A generalized ideal gas law . . . . . . . . . . . . . . . . . 387
8.14.4 The ideal gas . . . . . . . . . . . . . . . . . . . . . . . . 387
8.14.5 Blackbody radiation . . . . . . . . . . . . . . . . . . . . 389
8.14.6 The Debye model . . . . . . . . . . . . . . . . . . . . . . 392

9 Electromagnetism 395
9.1 All About Angular Momentum . . . . . . . . . . . . . . . . . . 395
9.1.1 The fundamental commutation relations . . . . . . . . . 396
9.1.2 Ladders . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
9.1.3 Possible values of angular momentum . . . . . . . . . . . 400
9.1.4 A warning about angular momentum . . . . . . . . . . . 402
9.1.5 Triplet and singlet states . . . . . . . . . . . . . . . . . . 403
9.1.6 Clebsch-Gordan coefficients . . . . . . . . . . . . . . . . 405
9.1.7 Pauli spin matrices . . . . . . . . . . . . . . . . . . . . . 409
9.1.8 General spin matrices . . . . . . . . . . . . . . . . . . . . 411
9.2 The Relativistic Dirac Equation . . . . . . . . . . . . . . . . . . 412
9.3 The Electromagnetic Hamiltonian . . . . . . . . . . . . . . . . 414
9.4 Maxwell’s Equations [Descriptive] . . . . . . . . . . . . . . . . . 417
9.5 Example Static Electromagnetic Fields . . . . . . . . . . . . . . 424
CONTENTS xi

9.5.1 Point charge at the origin . . . . . . . . . . . . . . . . . 425


9.5.2 Dipoles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
9.5.3 Arbitrary charge distributions . . . . . . . . . . . . . . . 433
9.5.4 Solution of the Poisson equation . . . . . . . . . . . . . . 436
9.5.5 Currents . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
9.5.6 Principle of the electric motor . . . . . . . . . . . . . . . 439
9.6 Particles in Magnetic Fields . . . . . . . . . . . . . . . . . . . . 442
9.7 Stern-Gerlach Apparatus [Descriptive] . . . . . . . . . . . . . . 445
9.8 Nuclear Magnetic Resonance . . . . . . . . . . . . . . . . . . . 446
9.8.1 Description of the method . . . . . . . . . . . . . . . . . 446
9.8.2 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 447
9.8.3 The unperturbed system . . . . . . . . . . . . . . . . . . 449
9.8.4 Effect of the perturbation . . . . . . . . . . . . . . . . . 451

10 Some Additional Topics 455


10.1 Perturbation Theory . . . . . . . . . . . . . . . . . . . . . . . . 455
10.1.1 Basic perturbation theory . . . . . . . . . . . . . . . . . 455
10.1.2 Ionization energy of helium . . . . . . . . . . . . . . . . 457
10.1.3 Degenerate perturbation theory . . . . . . . . . . . . . . 461
10.1.4 The Zeeman effect . . . . . . . . . . . . . . . . . . . . . 463
10.1.5 The Stark effect . . . . . . . . . . . . . . . . . . . . . . . 464
10.1.6 The hydrogen atom fine structure . . . . . . . . . . . . . 467
10.2 Quantum Field Theory in a Nanoshell . . . . . . . . . . . . . . 480
10.2.1 Occupation numbers . . . . . . . . . . . . . . . . . . . . 481
10.2.2 Annihilation and creation operators . . . . . . . . . . . . 487
10.2.3 Quantization of radiation . . . . . . . . . . . . . . . . . . 495
10.2.4 Spontaneous emission . . . . . . . . . . . . . . . . . . . . 502
10.2.5 Field operators . . . . . . . . . . . . . . . . . . . . . . . 505
10.2.6 An example using field operators . . . . . . . . . . . . . 506

11 The Interpretation of Quantum Mechanics 511


11.1 Schrödinger’s Cat . . . . . . . . . . . . . . . . . . . . . . . . . . 512
11.2 Instantaneous Interactions . . . . . . . . . . . . . . . . . . . . . 513
11.3 Global Symmetrization . . . . . . . . . . . . . . . . . . . . . . 518
11.4 Conservation Laws and Symmetries . . . . . . . . . . . . . . . . 518
11.5 Failure of the Schrödinger Equation? . . . . . . . . . . . . . . . 522
11.6 The Many-Worlds Interpretation . . . . . . . . . . . . . . . . . 525

A Notes 533
A.1 Why another book on quantum mechanics? . . . . . . . . . . . 533
A.2 History and wishlist . . . . . . . . . . . . . . . . . . . . . . . . 537
A.3 Lagrangian mechanics . . . . . . . . . . . . . . . . . . . . . . . 541
xii CONTENTS

A.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 541


A.3.2 Generalized coordinates . . . . . . . . . . . . . . . . . . 542
A.3.3 Lagrangian equations of motion . . . . . . . . . . . . . . 543
A.3.4 Hamiltonian dynamics . . . . . . . . . . . . . . . . . . . 546
A.4 Special relativity . . . . . . . . . . . . . . . . . . . . . . . . . . 548
A.4.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
A.4.2 Overview of relativity . . . . . . . . . . . . . . . . . . . . 549
A.4.3 Lorentz transformation . . . . . . . . . . . . . . . . . . . 552
A.4.4 Proper time and distance . . . . . . . . . . . . . . . . . . 555
A.4.5 Subluminal and superluminal effects . . . . . . . . . . . 556
A.4.6 Four-vectors . . . . . . . . . . . . . . . . . . . . . . . . . 558
A.4.7 Index notation . . . . . . . . . . . . . . . . . . . . . . . 559
A.4.8 Group property . . . . . . . . . . . . . . . . . . . . . . . 560
A.4.9 Intro to relativistic mechanics . . . . . . . . . . . . . . . 562
A.4.10 Lagrangian mechanics . . . . . . . . . . . . . . . . . . . 565
A.5 Completeness of Fourier modes . . . . . . . . . . . . . . . . . . 569
A.6 Derivation of the Euler formula . . . . . . . . . . . . . . . . . . 573
A.7 Nature and real eigenvalues . . . . . . . . . . . . . . . . . . . . 573
A.8 Are Hermitian operators really like that? . . . . . . . . . . . . . 573
A.9 Are linear momentum operators Hermitian? . . . . . . . . . . . 574
A.10 Why boundary conditions are tricky . . . . . . . . . . . . . . . 574
A.11 Extension to three-dimensional solutions . . . . . . . . . . . . . 575
A.12 Derivation of the harmonic oscillator solution . . . . . . . . . . 576
A.13 More on the harmonic oscillator and uncertainty . . . . . . . . 580
A.14 Derivation of a vector identity . . . . . . . . . . . . . . . . . . . 581
A.15 Derivation of the spherical harmonics . . . . . . . . . . . . . . . 581
A.16 The effective mass . . . . . . . . . . . . . . . . . . . . . . . . . 584
A.17 The hydrogen radial wave functions . . . . . . . . . . . . . . . . 587
A.18 Inner product for the expectation value . . . . . . . . . . . . . 590
A.19 Why commuting operators have a common set of eigenvectors . 590
A.20 The generalized uncertainty relationship . . . . . . . . . . . . . 591
A.21 Derivation of the commutator rules . . . . . . . . . . . . . . . . 592
A.22 Is the variational approximation best? . . . . . . . . . . . . . . 594
A.23 Solution of the hydrogen molecular ion . . . . . . . . . . . . . . 594
A.24 Accuracy of the variational method . . . . . . . . . . . . . . . . 595
A.25 Positive molecular ion wave function . . . . . . . . . . . . . . . 597
A.26 Molecular ion wave function symmetries . . . . . . . . . . . . . 597
A.27 Solution of the hydrogen molecule . . . . . . . . . . . . . . . . 598
A.28 Hydrogen molecule ground state and spin . . . . . . . . . . . . 600
A.29 Shielding approximation limitations . . . . . . . . . . . . . . . 601
A.30 Why the s states have the least energy . . . . . . . . . . . . . . 601
A.31 Why energy eigenstates are stationary . . . . . . . . . . . . . . 602
CONTENTS xiii

A.32 Better description of two-state systems . . . . . . . . . . . . . . 602


A.33 The evolution of expectation values . . . . . . . . . . . . . . . . 602
A.34 The virial theorem . . . . . . . . . . . . . . . . . . . . . . . . . 603
A.35 The energy-time uncertainty relationship . . . . . . . . . . . . . 603
A.36 The adiabatic theorem . . . . . . . . . . . . . . . . . . . . . . . 604
A.36.1 Derivation of the theorem . . . . . . . . . . . . . . . . . 604
A.36.2 Some implications . . . . . . . . . . . . . . . . . . . . . . 607
A.37 The two-state approximation of radiation . . . . . . . . . . . . 608
A.38 Selection rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
A.39 About spectral broadening . . . . . . . . . . . . . . . . . . . . . 613
A.40 Derivation of the Einstein B coefficients . . . . . . . . . . . . . 614
A.41 Parseval and the Fourier inversion theorem . . . . . . . . . . . 617
A.42 Derivation of group velocity . . . . . . . . . . . . . . . . . . . . 618
A.43 Details of the animations . . . . . . . . . . . . . . . . . . . . . 621
A.44 Derivation of the WKB approximation . . . . . . . . . . . . . . 629
A.45 WKB solution near the turning points . . . . . . . . . . . . . . 630
A.46 Three-dimensional scattering . . . . . . . . . . . . . . . . . . . 634
A.46.1 Partial wave analysis . . . . . . . . . . . . . . . . . . . . 637
A.46.2 The Born approximation . . . . . . . . . . . . . . . . . . 641
A.46.3 The Born series . . . . . . . . . . . . . . . . . . . . . . . 643
A.47 The evolution of probability . . . . . . . . . . . . . . . . . . . . 644
A.48 A basic description of Lagrangian multipliers . . . . . . . . . . 648
A.49 The generalized variational principle . . . . . . . . . . . . . . . 650
A.50 Spin degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . 651
A.51 Derivation of the approximation . . . . . . . . . . . . . . . . . 652
A.52 Why a single Slater determinant is not exact . . . . . . . . . . 657
A.53 Simplification of the Hartree-Fock energy . . . . . . . . . . . . 658
A.54 Integral constraints . . . . . . . . . . . . . . . . . . . . . . . . . 662
A.55 Generalized orbitals . . . . . . . . . . . . . . . . . . . . . . . . 663
A.56 Derivation of the Hartree-Fock equations . . . . . . . . . . . . . 665
A.57 Why the Fock operator is Hermitian . . . . . . . . . . . . . . . 671
A.58 “Correlation energy” . . . . . . . . . . . . . . . . . . . . . . . . 672
A.59 Explanation of the London forces . . . . . . . . . . . . . . . . . 675
A.60 Ambiguities in the definition of electron affinity . . . . . . . . . 679
A.61 Why Floquet theory should be called so . . . . . . . . . . . . . 681
A.62 Superfluidity versus BEC . . . . . . . . . . . . . . . . . . . . . 681
A.63 Explanation of Hund’s first rule . . . . . . . . . . . . . . . . . . 684
A.64 The mechanism of ferromagnetism . . . . . . . . . . . . . . . . 686
A.65 Number of system eigenfunctions . . . . . . . . . . . . . . . . . 687
A.66 The fundamental assumption of quantum statistics . . . . . . . 690
A.67 A problem if the energy is given . . . . . . . . . . . . . . . . . 692
A.68 Derivation of the particle energy distributions . . . . . . . . . . 693
xiv CONTENTS

A.69 The canonical probability distribution . . . . . . . . . . . . . . 699


A.70 Analysis of the ideal gas Carnot cycle . . . . . . . . . . . . . . 701
A.71 The recipe of life . . . . . . . . . . . . . . . . . . . . . . . . . . 702
A.72 Checks on the expression for entropy . . . . . . . . . . . . . . . 703
A.73 Chemical potential and distribution functions . . . . . . . . . . 706
A.74 The Fermi-Dirac integral for small temperature . . . . . . . . . 710
A.75 Physics of the fundamental commutation relations . . . . . . . 710
A.76 Multiple angular momentum components . . . . . . . . . . . . 711
A.77 Components of vectors are less than the total vector . . . . . . 711
A.78 The spherical harmonics with ladder operators . . . . . . . . . 712
A.79 Why angular momenta components can be added . . . . . . . . 712
A.80 Why the Clebsch-Gordan tables are bidirectional . . . . . . . . 713
A.81 How to make Clebsch-Gordan tables . . . . . . . . . . . . . . . 713
A.82 Machine language version of the Clebsch-Gordan tables . . . . . 713
A.83 The triangle inequality . . . . . . . . . . . . . . . . . . . . . . . 714
A.84 Awkward questions about spin . . . . . . . . . . . . . . . . . . 715
A.85 More awkwardness about spin . . . . . . . . . . . . . . . . . . . 716
A.86 Emergence of spin from relativity . . . . . . . . . . . . . . . . . 717
A.87 Electromagnetic evolution of expectation values . . . . . . . . . 719
A.88 Existence of magnetic monopoles . . . . . . . . . . . . . . . . . 721
A.89 More on Maxwell’s third law . . . . . . . . . . . . . . . . . . . 721
A.90 Various electrostatic derivations. . . . . . . . . . . . . . . . . . 722
A.90.1 Existence of a potential . . . . . . . . . . . . . . . . . . 722
A.90.2 The Laplace equation . . . . . . . . . . . . . . . . . . . . 723
A.90.3 Egg-shaped dipole field lines . . . . . . . . . . . . . . . . 724
A.90.4 Ideal charge dipole delta function . . . . . . . . . . . . . 724
A.90.5 Integrals of the current density . . . . . . . . . . . . . . 725
A.90.6 Lorentz forces on a current distribution . . . . . . . . . . 726
A.90.7 Field of a current dipole . . . . . . . . . . . . . . . . . . 727
A.90.8 Biot-Savart law . . . . . . . . . . . . . . . . . . . . . . . 729
A.91 Energy due to orbital motion in a magnetic field . . . . . . . . 730
A.92 Energy due to electron spin in a magnetic field . . . . . . . . . 731
A.93 Setting the record straight on alignment . . . . . . . . . . . . . 733
A.94 Solving the NMR equations . . . . . . . . . . . . . . . . . . . . 733
A.95 Derivation of perturbation theory . . . . . . . . . . . . . . . . . 733
A.96 Stark effect on the hydrogen ground state . . . . . . . . . . . . 739
A.97 Dirac fine structure Hamiltonian . . . . . . . . . . . . . . . . . 740
A.98 Classical spin-orbit derivation . . . . . . . . . . . . . . . . . . . 747
A.99 Expectation powers of r for hydrogen . . . . . . . . . . . . . . . 750
A.100Symmetry eigenvalue preservation . . . . . . . . . . . . . . . . 754
A.101Everett’s theory and vacuum energy . . . . . . . . . . . . . . . 755
A.102A tenth of a googol in universes . . . . . . . . . . . . . . . . . . 755
CONTENTS xv

Web Pages 759

Notations 761
xvi CONTENTS
List of Figures

1.1 The classical picture of a vector. . . . . . . . . . . . . . . . . . . 6


1.2 Spike diagram of a vector. . . . . . . . . . . . . . . . . . . . . . 6
1.3 More dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Infinite dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 The classical picture of a function. . . . . . . . . . . . . . . . . 7
1.6 Forming the dot product of two vectors. . . . . . . . . . . . . . 8
1.7 Forming the inner product of two functions. . . . . . . . . . . . 9

2.1 A visualization of an arbitrary wave function. . . . . . . . . . . 20


2.2 Combined plot of position and momentum components. . . . . . 22
2.3 The uncertainty principle illustrated. . . . . . . . . . . . . . . . 23
2.4 Classical picture of a particle in a closed pipe. . . . . . . . . . . 29
2.5 Quantum mechanics picture of a particle in a closed pipe. . . . . 29
2.6 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 One-dimensional energy spectrum for a particle in a pipe. . . . . 36
2.8 One-dimensional ground state of a particle in a pipe. . . . . . . 38
2.9 Second and third lowest one-dimensional energy states. . . . . . 39
2.10 Definition of all variables. . . . . . . . . . . . . . . . . . . . . . 40
2.11 True ground state of a particle in a pipe. . . . . . . . . . . . . . 42
2.12 True second and third lowest energy states. . . . . . . . . . . . . 43
2.13 A combination of ψ111 and ψ211 seen at some typical times. . . . 45
2.14 The harmonic oscillator. . . . . . . . . . . . . . . . . . . . . . . 47
2.15 The energy spectrum of the harmonic oscillator. . . . . . . . . . 52
2.16 Ground state ψ000 of the harmonic oscillator . . . . . . . . . . . 54
2.17 Wave functions ψ100 and ψ010 . . . . . . . . . . . . . . . . . . . . 55
2.18 Energy eigenfunction ψ213 . . . . . . . . . . . . . . . . . . . . . . 56
2.19 Arbitrary wave function (not an energy eigenfunction). . . . . . 59

3.1 Spherical coordinates of an arbitrary point P. . . . . . . . . . . 64


3.2 Spectrum of the hydrogen atom. . . . . . . . . . . . . . . . . . . 76
3.3 Ground state wave function ψ100 of the hydrogen atom. . . . . . 79
3.4 Eigenfunction ψ200 . . . . . . . . . . . . . . . . . . . . . . . . . . 80

xvii
xviii LIST OF FIGURES

3.5 Eigenfunction ψ210 , or 2pz . . . . . . . . . . . . . . . . . . . . . . 81


3.6 Eigenfunction ψ211 (and ψ21−1 ). . . . . . . . . . . . . . . . . . . 81
3.7 Eigenfunctions 2px , left, and 2py , right. . . . . . . . . . . . . . . 82
3.8 Hydrogen atom plus free proton far apart. . . . . . . . . . . . . 100
3.9 Hydrogen atom plus free proton closer together. . . . . . . . . . 100
3.10 The electron being anti-symmetrically shared. . . . . . . . . . . 102
3.11 The electron being symmetrically shared. . . . . . . . . . . . . . 103

4.1 State with two neutral atoms. . . . . . . . . . . . . . . . . . . . 114


4.2 Symmetric sharing of the electrons. . . . . . . . . . . . . . . . . 116
4.3 Antisymmetric sharing of the electrons. . . . . . . . . . . . . . . 116
4.4 Approximate solutions for hydrogen (left) and helium (right). . 151
4.5 Approximate solutions for lithium (left) and beryllium (right). 153
4.6 Example approximate solution for boron. . . . . . . . . . . . . . 155
4.7 Covalent sigma bond consisting of two 2pz states. . . . . . . . . 159
4.8 Covalent pi bond consisting of two 2px states. . . . . . . . . . . 160
4.9 Covalent sigma bond consisting of a 2pz and a 1s state. . . . . . 161
4.10 Shape of an sp3 hybrid state. . . . . . . . . . . . . . . . . . . . . 163
4.11 Shapes of the sp2 (left) and sp (right) hybrids. . . . . . . . . . . 164

5.1 Emission and absorption of radiation by an atom. . . . . . . . . 180


5.2 Triangle inequality. . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.3 Approximate Dirac delta function δε (x − x) is shown left. The
true delta function δ(x − x) is the limit when ε becomes zero, and
is an infinitely high, infinitely thin spike, shown right. It is the
eigenfunction corresponding to a position x. . . . . . . . . . . . 196
5.4 The real part (red) and envelope (black) of an example wave. . . 201
5.5 The wave moves with the phase speed. . . . . . . . . . . . . . . 201
5.6 The real part (red) and magnitude or envelope (black) of a wave
packet. (Schematic) . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.7 The velocities of wave and envelope are not equal. . . . . . . . . 203
5.8 A particle in free space. . . . . . . . . . . . . . . . . . . . . . . 206
5.9 An accelerating particle. . . . . . . . . . . . . . . . . . . . . . . 207
5.10 An decelerating particle. . . . . . . . . . . . . . . . . . . . . . . 208
5.11 Unsteady solution for the harmonic oscillator. The third picture
shows the maximum distance from the nominal position that the
wave packet reaches. . . . . . . . . . . . . . . . . . . . . . . . . 208
5.12 Harmonic oscillator potential energy V , eigenfunction h50 , and
its energy E50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.13 A partial reflection. . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.14 An tunneling particle. . . . . . . . . . . . . . . . . . . . . . . . 214
5.15 Penetration of an infinitely high potential energy barrier. . . . . 215
LIST OF FIGURES xix

5.16 Schematic of a scattering potential and the asymptotic behavior


of an example energy eigenfunction for a wave packet coming in
from the far left. . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.17 Half-life versus energy release for the atomic nuclei marked in
NUBASE 2003 as showing pure alpha decay with unqualified en-
ergies. Top: only the even values of the mass and atomic numbers
cherry-picked. Inset: really cherry-picking, only a few even mass
numbers for thorium and uranium! Bottom: all the nuclei except
one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.18 Schematic potential for an alpha particle that tunnels out of a
nucleus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
5.19 Half-life predicted by the Gamow / Gurney & Condon theory
versus the true value. . . . . . . . . . . . . . . . . . . . . . . . . 224

7.1 Billiard-ball model of the salt molecule. . . . . . . . . . . . . . . 262


7.2 Billiard-ball model of a salt crystal. . . . . . . . . . . . . . . . . 263
7.3 The salt crystal disassembled to show its structure. . . . . . . . 265
7.4 Sketch of electron energy spectra in solids. . . . . . . . . . . . . 266
7.5 The lithium atom, scaled more correctly than in chapter 4.9 . . 268
7.6 Body-centered-cubic (bcc) structure of lithium. . . . . . . . . . 269
7.7 Fully periodic wave function of a two-atom lithium “crystal.” . . 271
7.8 Flip-flop wave function of a two-atom lithium “crystal.” . . . . . 272
7.9 Wave functions of a four-atom lithium “crystal.” The actual
picture is that of the fully periodic mode. . . . . . . . . . . . . . 273
7.10 Reciprocal lattice of a one-dimensional crystal. . . . . . . . . . . 277
7.11 Schematic of energy bands. . . . . . . . . . . . . . . . . . . . . . 278
7.12 Energy versus linear momentum. . . . . . . . . . . . . . . . . . 280
7.13 Schematic of merging bands. . . . . . . . . . . . . . . . . . . . . 281
7.14 A primitive cell and primitive translation vectors of lithium. . . 282
7.15 Wigner-Seitz cell of the bcc lattice. . . . . . . . . . . . . . . . . 283
7.16 Schematic of crossing bands. . . . . . . . . . . . . . . . . . . . . 287
7.17 Ball and stick schematic of the diamond crystal. . . . . . . . . . 288
7.18 Allowed wave number vectors. . . . . . . . . . . . . . . . . . . . 292
7.19 Schematic energy spectrum of the free electron gas. . . . . . . . 293
7.20 Occupied wave number states and Fermi surface in the ground
state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
7.21 Density of states for the free electron gas. . . . . . . . . . . . . . 296
7.22 Energy states, top, and density of states, bottom, when there is
severe confinement in the y-direction, as in a quantum well. . . . 297
7.23 Energy states, top, and density of states, bottom, when there
is severe confinement in both the y- and z-directions, as in a
quantum wire. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
xx LIST OF FIGURES

7.24 Energy states, top, and density of states, bottom, when there is
severe confinement in all three directions, as in a quantum dot or
artificial atom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
7.25 Wave number vectors seen in a cross section of constant kz . Top:
sinusoidal solutions. Bottom: exponential solutions. . . . . . . . 302
7.26 Assumed simple cubic reciprocal lattice, shown as black dots, in
cross-section. The boundaries of the surrounding primitive cells
are shown as thin red lines. . . . . . . . . . . . . . . . . . . . . 304
7.27 Occupied states for one, two, and three free electrons per physical
lattice cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
7.28 Redefinition of the occupied wave number vectors into Brillouin
zones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
7.29 Second, third, and fourth Brillouin zones seen in the periodic
zone scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
7.30 The red dot shows the wavenumber vector of a sample free elec-
tron wave function. It is to be corrected for the lattice potential. 310
7.31 The grid of nonzero Hamiltonian perturbation coefficients and
the problem sphere in wave number space. . . . . . . . . . . . . 311
7.32 Tearing apart of the wave number space energies. . . . . . . . . 313
7.33 Effect of a lattice potential on the energy. The energy is repre-
sented by the square distance from the origin, and is relative to
the energy at the origin. . . . . . . . . . . . . . . . . . . . . . . 314
7.34 Bragg planes seen in wave number space cross section. . . . . . 315
7.35 Occupied states for the energies of figure 7.33 if there are two
valence electrons per lattice cell. Left: energy. Right: wave
numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
7.36 Smaller lattice potential. From top to bottom shows one, two
and three valence electrons per lattice cell. Left: energy. Right:
wave numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
7.37 Sketch of electron energy spectra in solids at absolute zero tem-
perature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
7.38 Sketch of electron energy spectra in solids at a nonzero temperature.318
7.39 Specific heat at constant volume of gases. Temperatures from
absolute zero to 1200 K. Data from NIST-JANAF and AIP. . . 323
7.40 Specific heat at constant pressure of solids. Temperatures from
absolute zero to 1200 K. Carbon is diamond; graphite is similar.
Water is ice and liquid. Data from NIST-JANAF, CRC, AIP,
Rohsenow et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
7.41 Depiction of an electromagnetic ray. . . . . . . . . . . . . . . . . 331
7.42 Law of reflection in elastic scattering from a plane. . . . . . . . 331
7.43 Scattering from multiple “planes of atoms”. . . . . . . . . . . . 332
7.44 Difference in travel distance when scattered from P rather than O.334
LIST OF FIGURES xxi

8.1 Graphical depiction of an arbitrary system energy eigenfunction


for 95 distinguishable particles. . . . . . . . . . . . . . . . . . . 341
8.2 Graphical depiction of an arbitrary system energy eigenfunction
for 95 identical bosons. . . . . . . . . . . . . . . . . . . . . . . . 342
8.3 Graphical depiction of an arbitrary system energy eigenfunction
for 31 identical fermions. . . . . . . . . . . . . . . . . . . . . . . 343
8.4 Illustrative small model system having 4 distinguishable particles.
The particular eigenfunction shown is arbitrary. . . . . . . . . . 346
8.5 The number of system energy eigenfunctions for a simple model
system with only three energy buckets. Positions of the squares
indicate the numbers of particles in buckets 2 and 3; darkness
of the squares indicates the relative number of eigenfunctions
with those bucket numbers. Left: system with 4 distinguishable
particles, middle: 16, right: 64. . . . . . . . . . . . . . . . . . . 346
8.6 Number of energy eigenfunctions on the oblique energy line in 8.5.
(The curves are mathematically interpolated to allow a continu-
ously varying fraction of particles in bucket 2.) Left: 4 particles,
middle: 64, right: 1024. . . . . . . . . . . . . . . . . . . . . . . . 348
8.7 Probabilities for bucket-number sets for the simple 64 particle
model system if there is uncertainty in energy. More proba-
ble bucket-number distributions are shown darker. Left: iden-
tical bosons, middle: distinguishable particles, right: identical
fermions. The temperature is the same as in figure 8.5. . . . . . 353
8.8 Probabilities of bucket-number sets for the simple 64 particle
model system if bucket 1 is a non-degenerate ground state. Left:
identical bosons, middle: distinguishable particles, right: identi-
cal fermions. The temperature is the same as in figure 8.7. . . . 354
8.9 Like figure 8.8, but at a lower temperature. . . . . . . . . . . . . 354
8.10 Like figure 8.8, but at a still lower temperature. . . . . . . . . . 355
8.11 Schematic of the Carnot refrigeration cycle. . . . . . . . . . . . 362
8.12 Schematic of the Carnot heat engine. . . . . . . . . . . . . . . . 365
8.13 A generic heat pump next to a reversed Carnot one with the same
heat delivery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
8.14 Comparison of two different integration paths for finding the en-
tropy of a desired state. The two different integration paths are
in black and the yellow lines are reversible adiabatic process lines. 368

9.1 Example bosonic ladders. . . . . . . . . . . . . . . . . . . . . . . 399


9.2 Example fermionic ladders. . . . . . . . . . . . . . . . . . . . . . 399
9.3 Triplet and singlet states in terms of ladders . . . . . . . . . . . 405
9.4 Clebsch-Gordan coefficients of two spin one half particles. . . . . 406
9.5 Clebsch-Gordan coefficients for lb equal to one half. . . . . . . . 407
xxii LIST OF FIGURES

9.6 Clebsch-Gordan coefficients for lb equal to one. . . . . . . . . . . 408


9.7 Relationship of Maxwell’s first equation to Coulomb’s law. . . . 418
9.8 Maxwell’s first equation for a more arbitrary region. The figure
to the right includes the field lines through the selected points. . 419
9.9 The net number of field lines leaving a region is a measure for
the net charge inside that region. . . . . . . . . . . . . . . . . . 420
9.10 Since magnetic monopoles do not exist, the net number of mag-
netic field lines leaving a region is always zero. . . . . . . . . . . 421
9.11 Electric power generation. . . . . . . . . . . . . . . . . . . . . . 422
9.12 Two ways to generate a magnetic field: using a current (left) or
using a varying electric field (right). . . . . . . . . . . . . . . . . 423
9.13 Electric field and potential of a charge that is distributed uni-
formly within a small sphere. The dotted lines indicate the values
for a point charge. . . . . . . . . . . . . . . . . . . . . . . . . . 428
9.14 Electric field of a two-dimensional line charge. . . . . . . . . . . 429
9.15 Field lines of a vertical electric dipole. . . . . . . . . . . . . . . 430
9.16 Electric field of a two-dimensional dipole. . . . . . . . . . . . . . 431
9.17 Field of an ideal magnetic dipole. . . . . . . . . . . . . . . . . . 432
9.18 Electric field of an almost ideal two-dimensional dipole. . . . . . 433
9.19 Magnetic field lines around an infinite straight electric wire. . . 437
9.20 An electromagnet consisting of a single wire loop. The generated
magnetic field lines are in blue. . . . . . . . . . . . . . . . . . . 438
9.21 A current dipole. . . . . . . . . . . . . . . . . . . . . . . . . . . 439
9.22 Electric motor using a single wire loop. The Lorentz forces (black
vectors) exerted by the external magnetic field on the electric
current carriers in the wire produce a net moment M on the loop.
The self-induced magnetic field of the wire and the corresponding
radial forces are not shown. . . . . . . . . . . . . . . . . . . . . 440
9.23 Variables for the computation of the moment on a wire loop in a
magnetic field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
9.24 Larmor precession of the expectation spin (or magnetic moment)
vector around the magnetic field. . . . . . . . . . . . . . . . . . 450
9.25 Probability of being able to find the nuclei at elevated energy
versus time for a given perturbation frequency ω. . . . . . . . . 452
9.26 Maximum probability of finding the nuclei at elevated energy. . 452
9.27 A perturbing magnetic field, rotating at precisely the Larmor
frequency, causes the expectation spin vector to come cascading
down out of the ground state. . . . . . . . . . . . . . . . . . . . 453

10.1 Graphical depiction of an arbitrary system energy eigenfunction


for 95 distinguishable particles. . . . . . . . . . . . . . . . . . . 482
LIST OF FIGURES xxiii

10.2 Graphical depiction of an arbitrary system energy eigenfunction


for 95 identical bosons. . . . . . . . . . . . . . . . . . . . . . . . 483
10.3 Graphical depiction of an arbitrary system energy eigenfunction
for 31 identical fermions. . . . . . . . . . . . . . . . . . . . . . . 484
10.4 Example wave functions for a system with just one type of single
particle state. Left: identical bosons; right: identical fermions. . 485
10.5 Annihilation and creation operators for a system with just one
type of single particle state. Left: identical bosons; right: identi-
cal fermions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488

11.1 Separating the hydrogen ion. . . . . . . . . . . . . . . . . . . . . 513


11.2 The Bohm experiment before the Venus measurement (left), and
immediately after it (right). . . . . . . . . . . . . . . . . . . . . 514
11.3 Spin measurement directions. . . . . . . . . . . . . . . . . . . . 515
11.4 Earth’s view of events (left), and that of a moving observer (right).517
11.5 Bohm’s version of the Einstein, Podolski, Rosen Paradox . . . . 525
11.6 Non entangled positron and electron spins; up and down. . . . . 526
11.7 Non entangled positron and electron spins; down and up. . . . . 526
11.8 The wave functions of two universes combined . . . . . . . . . . 526
11.9 The Bohm experiment repeated. . . . . . . . . . . . . . . . . . . 529
11.10Repeated experiments on the same electron. . . . . . . . . . . . 530

A.1 Coordinate systems for the Lorentz transformation. . . . . . . . 552


A.2 Example elastic collision seen by different observers. . . . . . . . 563
A.3 A completely inelastic collision. . . . . . . . . . . . . . . . . . . 565
A.4 Example energy eigenfunction for the particle in free space. . . . 621
A.5 Example energy eigenfunction for a particle entering a constant
accelerating force field. . . . . . . . . . . . . . . . . . . . . . . . 622
A.6 Example energy eigenfunction for a particle entering a constant
decelerating force field. . . . . . . . . . . . . . . . . . . . . . . . 623
A.7 Example energy eigenfunction for the harmonic oscillator. . . . . 624
A.8 Example energy eigenfunction for a particle encountering a brief
accelerating force. . . . . . . . . . . . . . . . . . . . . . . . . . . 625
A.9 Example energy eigenfunction for a particle tunneling through a
barrier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
A.10 Example energy eigenfunction for tunneling through a delta func-
tion barrier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
A.11 The Airy Ai and Bi functions that solve the Hamiltonian eigen-
value problem for a linearly varying potential energy. Bi very
quickly becomes too large to plot for positive values of its argu-
ment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
A.12 Connection formulae for a turning point from classical to tunneling.632
xxiv LIST OF FIGURES

A.13 Connection formulae for a turning point from tunneling to classical.632


A.14 WKB approximation of tunneling. . . . . . . . . . . . . . . . . . 633
A.15 Scattering of a beam off a target. . . . . . . . . . . . . . . . . . 635
A.16 Graphical interpretation of the Born series. . . . . . . . . . . . . 644
A.17 Possible polarizations of a pair of hydrogen atoms. . . . . . . . . 676
A.18 Schematic of an example boson distribution in a bucket. . . . . 690
A.19 Schematic of the Carnot refrigeration cycle. . . . . . . . . . . . 702
List of Tables

2.1 First few one-dimensional eigenfunctions of the harmonic oscillator. 50

3.1 The first few spherical harmonics. . . . . . . . . . . . . . . . . . 67


3.2 The first few radial wave functions for hydrogen. . . . . . . . . . 74

4.1 Abbreviated periodic table of the elements, showing element sym-


bol, atomic number, ionization energy, and electronegativity. . . 152

9.1 Elecromagnetics I: Fundamental equations and basic solutions. . 426


9.2 Elecromagnetics II: Electromagnetostatic solutions. . . . . . . . 427

xxv
xxvi LIST OF TABLES
Preface

To the Student
This book was primarily written for engineering graduate students who find
themselves caught up in nano technology. It is a simple fact that the typical
engineering education does not provide anywhere close to the amount of physics
you will need to make sense out of the literature of your field. You can start
from scratch as an undergraduate in the physics department, or you can read
this book.
This book covers the real quantum mechanics; it is not just a summary
of selected results as you can find elsewhere. The first part of this book pro-
vides a solid introduction to classical (i.e. nonrelativistic) quantum mechanics.
It is intended to explain the ideas both rigorously and clearly. It follows a
“just-in-time” learning approach. The mathematics is fully explained, but not
emphasized. The intention is not to practice clever mathematics, but to under-
stand quantum mechanics. The coverage is at the normal calculus and physics
level of undergraduate engineering students. If you did well in these courses, you
should be able to understand the discussion, assuming that you start reading
from the beginning. There are some hints in the notations section, if you forgot
some calculus. If you forgot some physics, just don’t worry too much about it:
quantum physics is so much different that even the most basic concepts need to
be covered from scratch.
Derivations are usually “banned” to notes at the end of this book, in case
you need them for one reason or the other. They correct a considerable number
of mistakes that you will find in other mainstream books. No doubt they add
some new ones. Let me know and I will fix them in a jiffy; that is the advantage
of a web book.
Some sections are marked [descriptive]. These sections provide an a non
mathematical introduction to material that you will almost certainly run into if
you stay in nano technology. Read through them, more than once, so that you
have a general idea of what they are all about. If you need to know more about
these topics, the introductions in this book should make dedicated textbooks
more easily accessible.

xxvii
xxviii LIST OF TABLES

The second part of this book discusses more advanced topics. It starts with
numerical methods, since engineering graduate students are typically supported
by a research grant, and the quicker you can produce some results, the better.
A description of density functional theory is still missing, unfortunately.
The remaining chapters of the second part are intended to provide a crash
course on many topics that nano literature would consider elementary physics,
but that nobody has ever told you about. Most of it is not really part of what
is normally understood to be a quantum mechanics course. Reading, rereading,
and understanding it is highly recommended anyway.
The purpose is not just to provide basic literacy in those topics, although
that is very important. But the purpose is also explain enough of their funda-
mentals, in terms that an engineer can understand, so that you can make sense
of the literature in those fields if you do need to know more than can be covered
here. Consider these chapters gateways into their topic areas.
There is a final chapter on how to interpret quantum mechanics philosoph-
ically. Read it if you are interested; it will probably not help you do quantum
mechanics any better. But as a matter of basic literacy, it is good to know how
truly weird quantum mechanics really is.
The usual “Why this book?” blah-blah can be found in a note at the back
of this book, {A.1} A version history is in note {A.2}.

Acknowledgments
This book is for a large part based on my reading of the excellent book by
Griffiths, [10]. It now contains essentially all material in that book in one way
or the other. (But you may need to look in the notes for some of it.) This book
also evolved to include a lot of additional material that I thought would be
appropriate for a physically-literate engineer. There are chapters on numerical
methods, thermodynamics, solid mechanics, and electromagnetism.
Somewhat to my surprise, I find that my coverage actually tends to be closer
to Yariv’s book, [20]. I still think Griffiths is more readable for an engineer,
though Yariv has some very good items Griffiths does not.
The discussions on two-state systems are mainly based on Feynman’s notes,
[9, chapters 8-11]. Since it is hard to determine the precise statements being
made, much of that has been augmented by data from web sources, mainly those
referenced.
The nanomaterials lectures of colleague Anter El-Azab that I audited in-
spired me to add a bit on simple quantum confinement to the first system
studied, the particle in the box. That does add a bit to a section that I wanted
to keep as simple as possible, but then I figure it also adds a sense that this is
really relevant stuff for future engineers. I also added a discussion of the effects
LIST OF TABLES xxix

of confinement on the density of states to the section on the free electron gas.
I thank Swapnil Jain for pointing out that the initial subsection on quantum
confinement in the pipe was definitely unclear and is hopefully better now.
I thank Johann Joss for pointing out a mistake in the formula for the aver-
aged energy of two-state systems. Harald Kirsch reported various problems in
the sections on conservation laws and on position eigenfunctions.
The note on the derivation of the selection rules is from [10] and lecture notes
from a University of Tennessee quantum course taught by Marianne Breinig.
The subsection on conservation laws and selection rules is mainly from Ellis,
[4].
The section on the Born-Oppenheimer approximation comes from Wikipe-
dia, [[9]], with modifications including the inclusion of spin.
The section on the Hartree-Fock method is mainly based on Szabo and
Ostlund [18], a well-written book, with some Parr and Yang [13] thrown in.
The section on solids is mainly based on Sproull, [16], a good source for
practical knowledge about application of the concepts. It is surprisingly up to
date, considering it was written half a century ago. Various items, however,
come from Kittel [11]. The discussion of ionic solids really comes straight from
hyperphysics [[4]]. I prefer hyperphysics’ example of NaCl, instead of Sproull’s
equivalent discussion of KCl. My colleague Steve Van Sciver helped me get
some handle on what to say about helium and Bose-Einstein condensation.
The thermodynamics section started from Griffiths’ discussion, [10], which
follows Yariv’s, [20]. However, it expanded greatly during writing. It now comes
mostly from Baierlein [3], with some help from Feynman, [7], and some of the
books I use in undergraduate thermo.
The derivation of the classical energy of a spinning particle in a magnetic
field is from Yariv, [20].
The section on nuclear models is pieced together from the Handbook of
Physics, Hyperphysics, Mayer’s Nobel prize lecture, and various web sources.
The brief description of quantum field theory is mostly from Wikipedia, with
a bit of fill-in from Feynman [7] and Kittel [11]. The example on field operators
is an exercise from Srednicki [17], whose solution was posted online by a TA of
Joe Polchinski from UCSB.
The many-worlds discussion is based on Everett’s exposition, [5]. It is bril-
liant but quite impenetrable.
The idea of using the Lagrangian for the derivations of relativistic mechanics
is from A. Kompanayets, theoretical physics, an excellent book.
Acknowledgements for specific items are not listed here if a citation is given
in the text, or if, as far as I know, the argument is standard theory. This is a text
book, not a research paper or historical note. But if a reference is appropriate
somewhere, let me know.
xxx LIST OF TABLES

Comments and Feedback


If you find an error, please let me know. There seems to be an unending supply
of them. As one author described it brilliantly, “the hand is still writing though
the brain has long since disengaged.”
Also let me know if you find points that are unclear to the intended reader-
ship, ME graduate students with a typical exposure to mathematics and physics,
or equivalent. Every section, except a few explicitly marked as requiring ad-
vanced linear algebra, should be understandable by anyone with a good knowl-
edge of calculus and undergraduate physics.
The same for sections that cannot be understood without delving back into
earlier material. All within reason of course. If you pick a random starting word
among the half million or so and start reading from there, you most likely will
be completely lost. But sections are intended to be fairly self-contained, and
you should be able read one without backing up through all of the text.
General editorial comments are also welcome. I’ll skip the philosophical
discussions. I am an engineer.
Feedback can be e-mailed to me at quantum@dommelen.net.
This is a living document. I am still adding things here and there, and fixing
various mistakes and doubtful phrasing. Even before every comma is perfect,
I think the document can be of value to people looking for an easy-to-read
introduction to quantum mechanics at a calculus level. So I am treating it as
software, with version numbers indicating the level of confidence I have in it all.
Part I

Basic Quantum Mechanics

1
Chapter 1

Mathematical Prerequisites

Quantum mechanics is based on a number of advanced mathematical ideas that


are described in this chapter.

1.1 Complex Numbers


Quantum mechanics is full of complex numbers, numbers involving

i = −1.

Note that −1 is not an ordinary, “real”, number, since there is no real number
whose square is −1; the square of a real number is always positive. This section
summarizes the most important properties of complex numbers.
First, any complex number, call it c, can by definition always be written in
the form
c = cr + ici (1.1)

where both cr and ci are ordinary real numbers, not involving −1. The number
cr is called the real part of c and ci the imaginary part.
You can think of the real and imaginary parts of a complex number as the
components of a two-dimensional vector:

ci *
©
©©
c
©©
©©
cr

The length of that vector is called the “magnitude,” or “absolute value” |c| of
the complex number. It equals
q
|c| = c2r + c2i .

3
4 CHAPTER 1. MATHEMATICAL PREREQUISITES

Complex numbers can be manipulated pretty much in the same way as


ordinary numbers can. A relation to remember is:
1
= −i (1.2)
i
which can be verified by multiplying top and bottom of the fraction by i and
noting that by definition i2 = −1 in the bottom.
The complex conjugate of a complex number c, denoted by c∗ , is found by
replacing i everywhere by −i. In particular, if c = cr + ici , where cr and ci are
real numbers, the complex conjugate is

c∗ = cr − ici (1.3)

The following picture shows that graphically, you get the complex conjugate of
a complex number by flipping it over around the horizontal axis:

ci
©©*
© ©c
©
©©
HH cr
HH ∗
Hc
H
−ci j
H

You can get the magnitude of a complex number c by multiplying c with its
complex conjugate c∗ and taking a square root:

|c| = c∗ c (1.4)

If c = cr + icr , where cr and ci are real numbers, multiplying out c∗ c shows the
magnitude of c to be q
|c| = c2r + c2i
which is indeed the same as before.
From the above graph of the vector representing a complex number c, the
real part is cr = |c| cos α where α is the angle that the vector makes with the
horizontal axis, and the imaginary part is ci = |c| sin α. So you can write any
complex number in the form

c = |c| (cos α + i sin α)

The critically important Euler formula says that:

cos α + i sin α = eiα (1.5)


1.1. COMPLEX NUMBERS 5

So, any complex number can be written in “polar form” as

c = |c|eiα (1.6)

where both the magnitude |c| and the phase angle (or argument) α are real
numbers.
Any complex number of magnitude one can therefor be written as eiα . Note
that the only two real numbers of magnitude one, 1 and −1, are included for
α = 0, respectively α = π. The number i is obtained for α = π/2 and −i for
α = −π/2.
(See note {A.6} if you want to know where the Euler formula comes from.)

Key Points
⋄ Complex numbers include the square root of minus one, i, as a valid
number.
⋄ All complex numbers can be written as a real part plus i times an
imaginary part, where both parts are normal real numbers.
⋄ The complex conjugate of a complex number is obtained by replacing
i everywhere by −i.
⋄ The magnitude of a complex number is obtained by multiplying the
number by its complex conjugate and then taking a square root.
⋄ The Euler formula relates exponentials to sines and cosines.

1.1 Review Questions


1 Multiply out (2 + 3i)2 and then find its real and imaginary part.
2 Show more directly that 1/i = −i.
3 Multiply out (2+3i)(2−3i) and then find its real and imaginary part.
4 Find the magnitude or absolute value of 2 + 3i.
5 Verify that (2 − 3i)2 is still the complex conjugate of (2 + 3i)2 if both
are multiplied out.
6 Verify that e−2i is still the complex conjugate of e2i after both are
rewritten using the Euler formula.
¡ ¢
7 Verify that eiα + e−iα /2 = cos α.
¡ ¢
8 Verify that eiα − e−iα /2i = sin α.
6 CHAPTER 1. MATHEMATICAL PREREQUISITES

1.2 Functions as Vectors


The second mathematical idea that is critical for quantum mechanics is that
functions can be treated in a way that is fundamentally not that much different
from vectors.
A vector f~ (which might be velocity ~v , linear momentum p~ = m~v , force F~ ,
or whatever) is usually shown in physics in the form of an arrow:

Figure 1.1: The classical picture of a vector.

However, the same vector may instead be represented as a spike diagram,


by plotting the value of the components versus the component index:

Figure 1.2: Spike diagram of a vector.


(The symbol i for the component index is not to be confused with i = −1.)
In the same way as in two dimensions, a vector in three dimensions, or, for
that matter, in thirty dimensions, can be represented by a spike diagram:

Figure 1.3: More dimensions.


1.2. FUNCTIONS AS VECTORS 7

For a large number of dimensions, and in particular in the limit of infinitely


many dimensions, the large values of i can be rescaled into a continuous coor-
dinate, call it x. For example, x might be defined as i divided by the number
of dimensions. In any case, the spike diagram becomes a function f (x):

Figure 1.4: Infinite dimensions.

The spikes are usually not shown:

Figure 1.5: The classical picture of a function.

In this way, a function is just a vector in infinitely many dimensions.

Key Points
⋄ Functions can be thought of as vectors with infinitely many compo-
nents.
⋄ This allows quantum mechanics do the same things with functions as
you can do with vectors.

1.2 Review Questions


1 Graphically compare the spike diagram of the 10-dimensional vector
~v with components (0.5,1,1.5,2,2.5,3,3.5,4,4.5,5) with the plot of the
function f (x) = 0.5x.
2 Graphically compare the spike diagram of the 10-dimensional unit
vector ı̂3 , with components (0,0,1,0,0,0,0,0,0,0), with the plot of the
function f (x) = 1. (No, they do not look alike.)
8 CHAPTER 1. MATHEMATICAL PREREQUISITES

1.3 The Dot, oops, INNER Product


The dot product of vectors is an important tool. It makes it possible to find
the length of a vector, by multiplying the vector by itself and taking the square
root. It is also used to check if two vectors are orthogonal: if their dot product
is zero, they are. In this subsection, the dot product is defined for complex
vectors and functions.
The usual dot product of two vectors f~ and ~g can be found by multiplying
components with the same index i together and summing that:
f~ · ~g ≡ f1 g1 + f2 g2 + f3 g3
(The emphatic equal, ≡, is commonly used to indicate “is by definition equal” or
“is always equal.”) Figure 1.6 shows multiplied components using equal colors.

Figure 1.6: Forming the dot product of two vectors.

Note the use of numeric subscripts, f1 , f2 , and f3 rather than fx , fy , and fz ;


it means the same thing. Numeric subscripts allow the three term sum above
to be written more compactly as:
X
f~ · ~g ≡ fi gi
all i

The Σ is called the “summation symbol.”


The length of a vector f~, indicated by |f~| or simply by f , is normally com-
puted as q sX
|f~| = f~ · f~ = fi2
all i

However, this does not work correctly for complex vectors. The difficulty is that
terms of the form fi2 are no longer necessarily positive numbers. For example,
i2 = −1.
Therefore, it is necessary to use a generalized “inner product” for complex
vectors, which puts a complex conjugate on the first vector:
X
hf~|~g i ≡ fi∗ gi (1.7)
all i
1.3. THE DOT, OOPS, INNER PRODUCT 9

If vector f~ is real, the complex conjugate does nothing, and the inner product
hf~|~g i is the same as the dot product f~ · ~g . Otherwise, in the inner product f~
and ~g are no longer interchangeable; the conjugates are only on the first factor,
f~. Interchanging f~ and ~g changes the inner product value into its complex
conjugate.
The length of a nonzero vector is now always a positive number:
q sX
|f~| = hf~|f~i = |fi |2 (1.8)
all i

Physicists take the inner product “bracket” verbally apart as

hf~| |~g i
bra /c ket

and refer to vectors as bras and kets.


The inner product of functions is defined in exactly the same way as for
vectors, by multiplying values at the same x position together and summing.
But since there are infinitely many x-values, the sum becomes an integral:
Z
hf |gi = f ∗ (x)g(x) dx (1.9)
all x

as illustrated in figure 1.7.

Figure 1.7: Forming the inner product of two functions.

The equivalent of the length of a vector is in case of a function called its


“norm:” s
q Z
||f || ≡ hf |f i = |f (x)|2 dx (1.10)
all x

The double bars are used to avoid confusion with the absolute value of the
function.
A vector or function is called “normalized” if its length or norm is one:

hf |f i = 1 iff f is normalized. (1.11)


10 CHAPTER 1. MATHEMATICAL PREREQUISITES

(“iff” should really be read as “if and only if.”)


Two vectors, or two functions, f and g are by definition orthogonal if their
inner product is zero:
hf |gi = 0 iff f and g are orthogonal. (1.12)

•Sets of vectors
mutually or functions
orthogonal, and that are all
• normalized
occur a lot in quantum mechanics. Such sets should be called “orthonormal”,
though the less precise term “orthogonal” is often used instead. This document
will refer to them correctly as being orthonormal.
So, a set of functions or vectors f1 , f2 , f3 , . . . is orthonormal if
0 = hf1 |f2 i = hf2 |f1 i = hf1 |f3 i = hf3 |f1 i = hf2 |f3 i = hf3 |f2 i = . . .
and
1 = hf1 |f1 i = hf2 |f2 i = hf3 |f3 i = . . .

Key Points
⋄ For complex vectors and functions, the normal dot product becomes
the inner product.
⋄ To take an inner product of vectors, (1) take complex conjugates of the
components of the first vector; (2) multiply corresponding components
of the two vectors together; and (3) sum these products.
⋄ To take an inner product of functions, (1) take the complex conjugate
of the first function; (2) multiply the two functions; and (3) integrate
the product function. The real difference from vectors is integration
instead of summation.
⋄ To find the length of a vector, take the inner product of the vector with
itself, and then a square root.
⋄ To find the norm of a function, take the inner product of the function
with itself, and then a square root.
⋄ A pair of functions, or a pair of vectors, are orthogonal if their inner
product is zero.
⋄ A set of functions, or a set of vectors, form an orthonormal set if every
one is orthogonal to all the rest, and every one is of unit norm or length.

1.3 Review Questions


1 Find the following inner product of the two vectors:
*à !¯Ã !+
1+i ¯ 2i
¯
¯
2−i ¯ 3
1.4. OPERATORS 11

2 Find the length of the vector


à !
1+i
3

3 Find the inner product of the functions sin(x) and cos(x) on the
interval 0 ≤ x ≤ 1.
4 Show that the functions sin(x) and cos(x) are orthogonal on the in-
terval 0 ≤ x ≤ 2π.
5 Verify that sin(x) is not a normalized function on the interval 0 ≤
x ≤ 2π, and normalize it by dividing by its norm.
6 Verify that the most general multiple of sin(x) that is normalized on

the interval 0 ≤ x ≤ 2π is eiα sin(x)/ π where α is any arbitrary
real number. So, using the Euler formula, the following multiples of
√ √
sin(x) are all normalized: sin(x)/ π, (for α = 0), − sin(x)/ π, (for

α = π), and i sin(x)/ π, (for α = π/2).
7 Show that the functions e4iπx and e6iπx are an orthonormal set on the
interval 0 ≤ x ≤ 1.

1.4 Operators
This section defines operators, which are a generalization of matrices. Operators
are the principal components of quantum mechanics.
In a finite number of dimensions, a matrix A can transform any arbitrary
vector v into a different vector A~v :
matrix A
~v - w ~ = A~v

Similarly, an operator transforms a function into another function:


operator A
f (x) - g(x) = Af (x)

Some simple examples of operators:


xb -
f (x) g(x) = xf (x)

d
f (x) dx - g(x) = f ′ (x)
Note that a hat is often used to indicate operators; for example, xb is the sym-
bol for the operator that corresponds to multiplying by x. If it is clear that
something is an operator, such as d/dx, no hat will be used.
12 CHAPTER 1. MATHEMATICAL PREREQUISITES

It should really be noted that the operators that you are interested in in
quantum mechanics are “linear” operators: if you increase f by a number, Af
increases by that same number; also, if you sum f and g, A(f + g) will be Af
plus Ag.

Key Points
⋄ Matrices turn vectors into other vectors.
⋄ Operators turn functions into other functions.

1.4 Review Questions


1 So what is the result if the operator d/dx is applied to the function
sin(x)?
c2 sin(x) is simply the function x2 sin(x), then what is the
2 If, say, x
difference between x c2 and x2 ?

3 A less self-evident operator than the above examples is a shift oper-


ator like ∆π/2 that shifts the graph of a function towards the left by
³ ´
an amount π/2: ∆π/2 f (x) = f x + 21 π . (Curiously enough, shift
operators turn out to be responsible for the law of conservation of
momentum.) Show that ∆π/2 turns sin(x) into cos(x).
4 The inversion operator Inv turns f (x) into f (−x). It plays a part in
the question to what extent physics looks the same when seen in the
mirror. Show that Inv leaves cos(x) unchanged, but turns sin(x) into
− sin(x).

1.5 Eigenvalue Problems


To analyze quantum mechanical systems, it is normally necessary to find so-
called eigenvalues and eigenvectors or eigenfunctions. This section defines what
they are.
A nonzero vector ~v is called an eigenvector of a matrix A if A~v is a multiple
of the same vector:
A~v = a~v iff ~v is an eigenvector of A (1.13)
The multiple a is called the eigenvalue. It is just a number.
A nonzero function f is called an eigenfunction of an operator A if Af is a
multiple of the same function:
Af = af iff f is an eigenfunction of A. (1.14)
1.6. HERMITIAN OPERATORS 13

For example, ex is an eigenfunction of the operator d/dx with eigenvalue 1, since


dex /dx = 1ex .
However, eigenfunctions like ex are not very common in quantum mechanics
since they become very large at large x, and that typically does not describe
physical situations. The
√ eigenfunctions of d/dx that do appear a lot are of the
ikx
form e , where i = −1 and k is an arbitrary real number. The eigenvalue is
ik:
d ikx
e = ikeikx
dx
ikx
Function e does not blow up at large x; in particular, the Euler formula (1.5)
says:
eikx = cos(kx) + i sin(kx)
The constant k is called the “indexwave number!simplewave number.”

Key Points
⋄ If a matrix turns a nonzero vector into a multiple of that vector, that
vector is an eigenvector of the matrix, and the multiple is the eigen-
value.
⋄ If an operator turns a nonzero function into a multiple of that function,
that function is an eigenfunction of the operator, and the multiple is
the eigenvalue.

1.5 Review Questions


1 Show that eikx ,above, is also an eigenfunction of d2 /dx2 , but with
eigenvalue −k 2 . In fact, it is easy to see that the square of any oper-
ator has the same eigenfunctions, but with the square eigenvalues.
2 Show that any function of the form sin(kx) and any function of the
form cos(kx), where k is a constant called the wave number, is an
eigenfunction of the operator d2 /dx2 , though they are not eigenfunc-
tions of d/dx
3 Show that sin(kx) and cos(kx), with k a constant, are eigenfunctions
of the inversion operator Inv, which turns any function f (x) into
f (−x), and find the eigenvalues.

1.6 Hermitian Operators


Most operators in quantum mechanics are of a special kind called “Hermitian”.
This section lists their most important properties.
14 CHAPTER 1. MATHEMATICAL PREREQUISITES

An operator is called Hermitian when it can always be flipped over to the


other side if it appears in a inner product:

hf |Agi = hAf |gi always iff A is Hermitian. (1.15)

That is the definition, but Hermitian operators have the following additional
special properties: √
• They always have real eigenvalues, not involving i = −1. (But the eigen-
functions, or eigenvectors if the operator is a matrix, might be complex.)
Physical values such as position, momentum, and energy are ordinary real
numbers since they are eigenvalues of Hermitian operators {A.7}.
• Their eigenfunctions can always be chosen so that they are normalized
and mutually orthogonal, in other words, an orthonormal set. This tends
to simplify the various mathematics a lot.
• Their eigenfunctions form a “complete” set. This means that any function
can be written as some linear combination of the eigenfunctions. (There is
a proof in note {A.5} for an important example. But see also {A.8}.) In
practical terms, it means that you only need to look at the eigenfunctions
to completely understand what the operator does.
In the linear algebra of real matrices, Hermitian operators are simply sym-
metric matrices. A basic example is the inertia matrix of a solid body in New-
tonian dynamics. The orthonormal eigenvectors of the inertia matrix give the
directions of the principal axes of inertia of the body.
An orthonormal complete set of eigenvectors or eigenfunctions is an example
of a so-called “basis.” In general, a basis is a minimal set of vectors or functions
that you can write all other vectors or functions in terms of. For example, the
unit vectors ı̂, ̂, and k̂ are a basis for normal three-dimensional space. Every
three-dimensional vector can be written as a linear combination of the three.
The following properties of inner products involving Hermitian operators are
often needed, so they are listed here:

If A is Hermitian: hg|Af i = hf |Agi∗ , hf |Af i is real. (1.16)

The first says that you can swap f and g if you take complex conjugate. (It is
simply a reflection of the fact that if you change the sides in an inner product,
you turn it into its complex conjugate. Normally, that puts the operator at the
other side, but for a Hermitian operator, it does not make a difference.) The
second is important because ordinary real numbers typically occupy a special
place in the grand scheme of things. (The fact that the inner product is real
merely reflects the fact that if a number is equal to its complex conjugate, it
must be real; if there was an i in it, the number would change by a complex
conjugate.)
1.6. HERMITIAN OPERATORS 15

Key Points
⋄ Hermitian operators can be flipped over to the other side in inner
products.
⋄ Hermitian operators have only real eigenvalues.
⋄ Hermitian operators have a complete set of orthonormal eigenfunctions
(or eigenvectors).

1.6 Review Questions


2 is a Hermitian operator, but bi is not.
1 Show that the operator b
2 Generalize the previous question, by showing that any complex con-
stant c comes out of the right hand side of an inner product un-
changed, but out of the left hand side as its complex conjugate;
hf |cgi = chf |gi hcf |gi = c∗ hf |gi.
As a result, a number c is only a Hermitian operator if it is real: if c
is complex, the two expressions above are not the same.
3 Show that an operator such as x b2 , corresponding to multiplying by a
real function, is an Hermitian operator.
4 Show that the operator d/dx is not a Hermitian operator, but id/dx
is, assuming that the functions on which they act vanish at the ends of
the interval a ≤ x ≤ b on which they are defined. (Less restrictively,
it is only required that the functions are “periodic”; they must return
to the same value at x = b that they had at x = a.)
5 Show that if A is a Hermitian operator, then so is A2 . As a result,
under the conditions of the previous question, −d2 /dx2 is a Hermitian
operator too. (And so is just d2 /dx2 , of course, but −d2 /dx2 is the
one with the positive eigenvalues, the squares of the eigenvalues of
id/dx.)
6 A complete set of orthonormal eigenfunctions of −d2 /dx2 on the in-
terval 0 ≤ x ≤ π that are zero at the end points are the infinite set
of functions
sin(x) sin(2x) sin(3x) sin(4x)
p , p , p , p ,...
π/2 π/2 π/2 π/2
Check that these functions are indeed zero at x = 0 and x = π,
that they are indeed orthonormal, and that they are eigenfunctions
of −d2 /dx2 with the positive real eigenvalues
1, 4, 9, 16, . . .
Completeness is a much more difficult thing to prove, but they are.
The completeness proof in the notes covers this case.
16 CHAPTER 1. MATHEMATICAL PREREQUISITES

7 A complete set of orthonormal eigenfunctions of the operator id/dx


that are periodic on the interval 0 ≤ x ≤ 2π are the infinite set of
functions

e−3ix e−2ix e−ix 1 eix e2ix e3ix


..., √ , √ , √ , √ , √ , √ , √ ,...
2π 2π 2π 2π 2π 2π 2π

Check that these functions are indeed periodic, orthonormal, and that
they are eigenfunctions of id/dx with the real eigenvalues

. . . , 3, 2, 1, 0, −1, −2, −3, . . .

Completeness is a much more difficult thing to prove, but they are.


The completeness proof in the notes covers this case.

1.7 Additional Points


This subsection describes a few further issues of importance for this document.

1.7.1 Dirac notation


Physicists like to write inner products such as hf |Agi in “Dirac notation”:

hf |A|gi

since this conforms more closely to how you would think of it in linear algebra:

hf~| A |~g i
bra operator ket

The various advanced ideas of linear algebra can be extended to operators in


this way, but they will not be needed in this book.
In any case, hf |Agi and hf |A|gi mean the same thing:
Z
f ∗ (x) (Ag(x)) dx
all x

If A is a Hermitian operator, this book will on occasion use the additional bar
to indicate that the operator has been brought to the other side to act on f
instead of g.
1.7. ADDITIONAL POINTS 17

1.7.2 Additional independent variables


In many cases, the functions involved in an inner product may depend on more
than a single variable x. For example, they might depend on the position (x, y, z)
in three dimensional space.
The rule to deal with that is to ensure that the inner product integrations
are over all independent variables. For example, in three spatial dimensions:
Z Z Z
hf |gi = f ∗ (x, y, z)g(x, y, z) dxdydz
all x all y all z

Note that the time t is a somewhat different variable from the rest, and time
is not included in the inner product integrations.
Chapter 2

Basic Ideas of Quantum


Mechanics

In this chapter the basic ideas of quantum mechanics are described and then
two examples are worked out.
Before embarking on this chapter, do take note of the very sage advice
given by Richard Feynman, Nobel-prize winning pioneer of relativistic quantum
mechanics:

Do not keep saying to yourself, if you can possibly avoid it, “But
how can it be like that?” because you will get “down the drain,” into
a blind alley from which nobody has yet escaped. Nobody knows how
it can be like that. [Richard P. Feynman (1965) The Character of
Physical Law]

And it may be uncertain whether Neils Bohr, Nobel-prize winning pioneer of


early quantum mechanics actually said it to Albert Einstein, and if so, exactly
what he said, but it may be the best statement about quantum mechanics of
all:

Stop telling God what to do.

2.1 The Revised Picture of Nature


This section describes the view quantum mechanics has of nature, which is in
terms of a mysterious function called the “wave function”.
According to quantum mechanics, the way that the old Newtonian physics
describes nature is wrong if examined closely enough. Not just a bit wrong.
Totally wrong. For example, the Newtonian picture for a particle of mass m

19
20 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

looks like:
aa !
a aa ! !!
aa !
aa ! !!
a aa !!
aa ! !!
aa ! !!
!dx
u!
velocity: ! =a a etcetera
! ! dtaaa
!!
linear!momentum: px = muaa etcetera
aa
! ! aa
! du dp x ∂V
! !
Newton’s second law: m = Fx = a =− etcetera
a
! dt dt ∂x aa
! !! aa

The problems? A numerical position for the particle simply does not exist. A
numerical velocity or linear momentum for the particle does not exist.
What does exist according to quantum mechanics is the so-called wave func-
tion Ψ(x, y, z; t). Its square magnitude, |Ψ|2 , can be shown as grey tones (darker
where the magnitude is larger):

Figure 2.1: A visualization of an arbitrary wave function.

The physical meaning of the wave function is known as “Born’s statistical


interpretation”: darker regions are regions where the particle is more likely to
be found if the location is narrowed down. More precisely, if ~r = (x, y, z) is a
given location, then
|Ψ(~r; t)|2 d3~r (2.1)
is the probability of finding the particle within a small volume, of size d3~r =
dx dy dz, around that given location, if such a measurement is attempted.
(And if such a position measurement is actually done, it affects the wave
function: after the measurement, the new wave function will be restricted to
the volume to which the position was narrowed down. But it will spread out
again in time if allowed to do so afterwards.)
The particle must be found somewhere if you look everywhere. In quantum
mechanics, that is expressed by the fact that the total probability to find the
particle, integrated over all possible locations, must be 100% (certainty):
Z
|Ψ(~r; t)|2 d3~r = 1 (2.2)
all ~r
2.1. THE REVISED PICTURE OF NATURE 21

In other words, proper wave functions are normalized, hΨ|Ψi = 1.


The position of macroscopic particles is typically very much narrowed down
by incident light, surrounding objects, earlier history, etcetera. For such par-
ticles, the “blob size” of the wave function is extremely small. As a result,
claiming that a macroscopic particle, is, say, at the center point of the wave
function blob may be just fine in practical applications. But when you are in-
terested in what happens on very small scales, the nonzero blob size can make
a big difference.
In addition, even on macroscopic scales, position can be ill defined. Consider
what happens if you take the wave function blob apart and send half to Mars
and half to Venus. Quantum mechanics allows it; this is what happens in a
“scattering” experiment. You would presumably need to be extremely careful
to do it on such a large scale, but there is no fundamental theoretical objection
in quantum mechanics. So, where is the particle now? Hiding on Mars? Hiding
on Venus?
Orthodox quantum mechanics says: neither. It will pop up on one of the
two if measurements force it to reveal its presence. But until that moment, it
is just as ready to pop up on Mars as on Venus, at an instant’s notice. If it was
hiding on Mars, it could not possibly pop up on Venus on an instant’s notice;
the fastest it would be allowed to move is at the speed of light. Worse, when the
electron does pop up on Mars, it must communicate that fact instantaneously to
Venus to prevent itself from also popping up there. That requires that quantum
mechanics internally communicates at speeds faster than the speed of light, the
so-called Einstein-Podolski-Rosen paradox. A famous theorem by John Bell in
1964 implies that nature really does communicate instantaneously; it is not just
some unknown deficiency in the theory of quantum mechanics, chapter 11.2.
Of course, quantum mechanics is largely a matter of inference. The wave
function cannot be directly observed. But that is not as strong an argument
against quantum mechanics as it may seem. After almost a century, quantum
mechanics is still standing, with no real “more reasonable” competitors, ones
that stay closer to the Newtonian picture. And the best minds in physics have
tried.

Key Points
⋄ According to quantum mechanics, particles do not have definite values
of position or velocity when examined closely enough.
⋄ What they do have is a “wave function“ that depends on position.
⋄ Larger values of the magnitude of the wave function, (indicated in this
book by darker regions,) correspond to regions where the particle is
more likely to be found if a location measurement is done.
22 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

⋄ Such a measurement changes the wave function; the measurement itself


creates the reduced uncertainty in position that exists immediately
after the measurement.

⋄ In other words, the wave function is all there is; you cannot identify
a hidden position in a given wave function, just create a new wave
function that more precisely locates the particle.

⋄ The creation of such a more localized wave function during a position


measurement is governed by laws of chance: the more localized wave
function is more likely to end up in regions where the initial wave
function had a larger magnitude.

⋄ Proper wave functions are normalized.

2.2 The Heisenberg Uncertainty Principle

The Heisenberg uncertainty principle is a way of expressing the qualitative prop-


erties of quantum mechanics in an easy to visualize way.
Figure 2.2 is a combination plot of the position x of a particle and the
corresponding linear momentum px = mu, (with m the mass and u the velocity
in the x-direction):

Figure 2.2: Combined plot of position and momentum components.

Figure 2.3 shows what happens if you squeeze down on the particle to try
to restrict it to one position x: it stretches out in the momentum direction:
2.3. THE OPERATORS OF QUANTUM MECHANICS 23

Figure 2.3: The uncertainty principle illustrated.

Heisenberg showed that according to quantum mechanics, the area of the


blue “blob” cannot be contracted to a point. When you try to narrow down the
position of a particle, you get into trouble with momentum. Conversely, if you
try to pin down a precise momentum, you lose all hold on the position.

Key Points
⋄ The Heisenberg uncertainty principle says that there is always a mini-
mum combined uncertainty in position and linear momentum.
⋄ It implies that a particle cannot have a mathematically precise position,
because that would require an infinite uncertainty in linear momentum.
⋄ It also implies that a particle cannot have a mathematically precise
linear momentum (velocity), since that would imply an infinite uncer-
tainty in position.

2.3 The Operators of Quantum Mechanics


The numerical quantities that the old Newtonian physics uses, (position, mo-
mentum, energy, ...), are just “shadows” of what really describes nature: opera-
tors. The operators described in this section are the key to quantum mechanics.
As the first example, while a mathematically precise value of the position x
of a particle never exists, instead there is an x-position operator xb. It turns the
wave function Ψ into xΨ:

Ψ(x, y, z, t) xb - xΨ(x, y, z, t) (2.3)

The operators yb and zb are defined similarly as xb.


Instead of a linear momentum px = mu, there is an x-momentum operator

h̄ ∂
pbx = (2.4)
i ∂x
24 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

that turns Ψ into its x-derivative:


h̄ ∂
pbx = h̄
Ψ(x, y, z, t) i ∂x - Ψx (x, y, z, t) (2.5)
i
The constant h̄ is called Planck’s constant. (Or rather, it is Planck’s origi-
nal constant h divided by 2π.) If it would have been zero, all these troubles
with quantum mechanics would not occur. The blobs would become points.
Unfortunately, h̄ is very small, but nonzero. It is about 10−34 kg m2 /s.
The factor i in pbx makes it a Hermitian operator (a proof of that is in note
{A.9}). All operators reflecting macroscopic physical quantities are Hermitian.
The operators pby and pbz are defined similarly as pbx .
The kinetic energy operator Tb is:
pb2x + pb2y + pb2z
Tb = (2.6)
2m
Its shadow is the Newtonian notion that the kinetic energy equals:
1 ³ ´ (mu)2 + (mv)2 + (mw)2
T = m u2 + v 2 + w 2 =
2 2m
This is an example of the “Newtonian analogy”: the relationships between
the different operators in quantum mechanics are in general the same as those
between the corresponding numerical values in Newtonian physics. But since
the momentum operators are gradients, the actual kinetic energy operator is:
2
à !
h̄ ∂2 ∂2 ∂2
Tb = − + + . (2.7)
2m ∂x2 ∂y 2 ∂z 2
Mathematicians call the set of second order derivative operators in the ki-
netic energy operator the “Laplacian”, and indicate it by ∇2 :

∂2 ∂2 ∂2
∇2 ≡ + + (2.8)
∂x2 ∂y 2 ∂z 2

In those terms, the kinetic energy operator can be written more concisely as:
2

Tb = − ∇2 (2.9)
2m
Following the Newtonian analogy once more, the total energy operator, in-
dicated by H, is the the sum of the kinetic energy operator above and the
potential energy operator V (x, y, z, t):

h̄2 2
H=− ∇ +V (2.10)
2m
2.4. THE ORTHODOX STATISTICAL INTERPRETATION 25

This total energy operator H is called the Hamiltonian and it is very impor-
tant. Its eigenvalues are indicated by E (for energy), for example E1 , E2 , E3 , . . .
with:
Hψn = En ψn for n = 1, 2, 3, ... (2.11)
where ψn is eigenfunction number n of the Hamiltonian.
It is seen later that in many cases a more elaborate numbering of the eigen-
values and eigenvectors of the Hamiltonian is desirable instead of using a single
counter n. For example, for the electron of the hydrogen atom, there is more
than one eigenfunction for each different eigenvalue En , and additional counters
l and m are used to distinguish them. It is usually best to solve the eigenvalue
problem first and decide on how to number the solutions afterwards.
(It is also important to remember that in the literature, the Hamiltonian
eigenvalue problem is commonly referred to as the “time-independent Schrö-
dinger equation.” However, this book prefers to reserve the term Schrödinger
equation for the unsteady evolution of the wave function.)

Key Points
⋄ Physical quantities correspond to operators in quantum mechanics.
⋄ Expressions for various important operators were given.
⋄ Kinetic energy is in terms of the so-called Laplacian operator.
⋄ The important total energy operator, (kinetic plus potential energy,)
is called the Hamiltonian.

2.4 The Orthodox Statistical Interpretation


In addition to the operators defined in the previous section, quantum mechanics
requires rules on how to use them. This section gives those rules, along with a
critical discussion what they really mean.

2.4.1 Only eigenvalues


According to quantum mechanics, the only “measurable values” of position,
momentum, energy, etcetera, are the eigenvalues of the corresponding operator.
For example, if the total energy of a particle is “measured”, the only numbers
that can come out are the eigenvalues of the total energy Hamiltonian.
There is really no controversy that only the eigenvalues come out; this has
been verified overwhelmingly in experiments, often to astonishingly many digits
accuracy. It is the reason for the line spectra that allow the elements to be
26 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

recognized, either on earth or halfway across the observable universe, for lasers,
for the blackbody radiation spectrum, for the value of the speed of sound, for
the accuracy of atomic clocks, for the properties of chemical bonds, for the fact
that a Stern-Gerlach apparatus does not fan out a beam of atoms but splits it
into discrete rays, and countless other basic properties of nature.
But the question why and how only the eigenvalues come out is much more
tricky. In general the wave function that describes physics is a combination
of eigenfunctions, not a single eigenfunction. (Even if the wave function was
an eigenfunction of one operator, it would not be one of another operator.) If
the wave function is a combination of eigenfunctions, then why is the measured
value not a combination, (maybe some average), of eigenvalues, but a single
eigenvalue? And what happens to the eigenvalues in the combination that do
not come out? It is a question that has plagued quantum mechanics since the
beginning.
The most generally given answer in the physics community is the “orthodox
interpretation.” It is commonly referred to as the “Copenhagen Interpretation”,
though that interpretation, as promoted by Niels Bohr, was actually much more
circumspect than what is usually presented.

According to the orthodox interpretation, “measurement” causes the


wave function Ψ to “collapse” into one of the eigenfunctions of the
quantity being measured.

Staying with energy measurements as the example, any total energy “mea-
surement” will cause the wave function to collapse into one of the eigenfunctions
ψn of the total energy Hamiltonian. The energy that is measured is the corre-
sponding eigenvalue:
) (
Ψ = c1 ψ1 + c2 ψ2 + . . . energy measurement Ψ = cn ψn
- for some n
Energy is uncertain Energy = En

This story, of course, is nonsense. It makes a distinction between “nature”


(the particle, say) and the “measurement device” supposedly producing an exact
value. But the measurement device is a part of nature too, and therefore also
uncertain. What measures the measurement device?
Worse, there is no definition at all of what “measurement” is or is not,
so anything physicists, and philosophers, want to put there goes. Needless to
say, theories have proliferated, many totally devoid of common sense. The more
reasonable “interpretations of the interpretation” tend to identify measurements
as interactions with macroscopic systems. Still, there is no indication how and
when a system would be sufficiently macroscopic, and how that would produce
a collapse or at least something approximating it.
2.4. THE ORTHODOX STATISTICAL INTERPRETATION 27

If that is not bad enough, quantum mechanics already has a law, called the
Schrödinger equation (chapter 5.1), that says how the wave function evolves.
This equation contradicts the collapse, (chapter 11.5.)
The collapse in the orthodox interpretation is what the classical theater
world would have called “Deus ex Machina”. It is a god that appears out of thin
air to make things right. A god that has the power to distort the normal laws
of nature at will. Mere humans may not question the god. In fact, physicists
tend to actually get upset if you do.
However, it is a fact that after a real-life measurement has been made, further
follow-up measurements have statistics that are consistent with a collapsed wave
function, (which can be computed.) The orthodox interpretation does describe
what happens practically in actual laboratory settings well. It just offers no
practical help in circumstances that are not so clear cut, being phrased in terms
that are essentially meaningless.

Key Points
⋄ Even if a system is initially in a combination of the eigenfunctions of a
physical quantity, a measurement of that quantity pushes the measured
system into a single eigenfunction.
⋄ The measured value is the corresponding eigenvalue.

2.4.2 Statistical selection


There is another hot potato besides the collapse itself; it is the selection of the
eigenfunction to collapse to. If the wave function before a “measurement” is a
combination of many different eigenfunctions, then what eigenfunction will the
measurement produce? Will it be ψ1 ? ψ2 ? ψ10 ?
The answer of the orthodox interpretation is that nature contains a mysteri-
ous random number generator. If the wave function Ψ before the “measurement”
equals, in terms of the eigenfunctions,

Ψ = c1 ψ1 + c2 ψ2 + c3 ψ3 + . . .

then this random number generator will, in Einstein’s words, “throw the dice”
and select one of the eigenfunctions based on the result. It will collapse the
wave function to eigenfunction ψ1 in on average a fraction |c1 |2 of the cases, it
will collapse the wave function to ψ2 in a fraction |c2 |2 of the cases, etc.

The orthodox interpretation says that the square magnitudes of the


coefficients of the eigenfunctions give the probabilities of the corre-
sponding eigenvalues.
28 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

This too describes very well what happens practically in laboratory exper-
iments, but offers again no insight into why and when. And the notion that
nature would somehow come with, maybe not a physical random number gen-
erator, but certainly an endless sequence of truly random numbers seemed very
hard to believe even for an early pioneer of quantum mechanics like Einstein.
Many have proposed that the eigenfunction selections are not truly random, but
reflect unobserved “hidden variables” that merely seem random to us humans.
Yet, after almost a century, none of these theories have found convincing evi-
dence or general acceptance. Physicists still tend to insist quite forcefully on
a literal random number generator. Somehow, when belief is based on faith,
rather than solid facts, tolerance of alternative views is much less, even among
scientists.
While the usual philosophy about the orthodox interpretation can be taken
with a big grain of salt, the bottom line to remember is:

Random collapse of the wave function, with chances governed by the


square magnitudes of the coefficients, is indeed the correct way for
us humans to describe what happens in our observations.

As explained in chapter 11.6, this is despite the fact that the wave function does
not collapse: the collapse is an artifact produced by limitations in our capability
to see the entire picture. We humans have no choice but to work within our
limitations, and within these, the rules of the orthodox interpretation do apply.
Schrödinger gave a famous, rather cruel, example of a cat in a box to show
how weird the predictions of quantum mechanics really are. It is discussed in
chapter 11.1.

Key Points
⋄ If a system is initially in a combination of the eigenfunctions of a
physical quantity, a measurement of that quantity picks one of the
eigenvalues at random.
⋄ The chances of a given eigenvalue being picked are proportional to the
square magnitude of the coefficient of the corresponding eigenfunction
in the combination.

2.5 A Particle Confined Inside a Pipe


This section demonstrates the general procedure for analyzing quantum systems
using a very elementary example. The system to be studied is that of a particle,
say an electron, confined to the inside of a narrow pipe with sealed end. This
2.5. A PARTICLE CONFINED INSIDE A PIPE 29

example will be studied in some detail, since if you understand it thoroughly, it


becomes much easier not to get lost in the more advanced examples of quantum
mechanics discussed later. And as the final subsection 2.5.9 shows, the particle
in a pipe is really quite interesting despite its simplicity.

2.5.1 The physical system


The system to be analyzed is shown in figure 2.4 as it would appear in classical
non-quantum physics. A particle is bouncing around between the two ends of a

Figure 2.4: Classical picture of a particle in a closed pipe.

pipe. It is assumed that there is no friction, so the particle will keep bouncing
back and forward forever. (Friction is a macroscopic effect that has no place in
the sort of quantum-scale systems analyzed here.) Typically, classical physics
draws the particles that it describes as little spheres, so that is what figure 2.4
shows.
The actual quantum system to be analyzed is shown in figure 2.5. A particle

Figure 2.5: Quantum mechanics picture of a particle in a closed pipe.

like an electron has no (known) specific shape or size, but it does have a wave
function “blob.” So in quantum mechanics the equivalent of a particle bouncing
around is a wave function blob bouncing around between the ends of the pipe.
Please do not ask what this impenetrable pipe is made off. It is obviously a
crude idealization. You could imagine that the electron is a valence electron in
a very tiny bar of copper. In that case the pipe walls would correspond to the
surface of the copper bar, and it is assumed that the electron cannot get off the
bar.
But of course, a copper bar would have nuclei, and other electrons, and the
analysis here does not consider those. So maybe it is better to think of the
particle as being a lone helium atom stuck inside a carbon nanotube.

Key Points
30 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

⋄ An idealized problem of a particle bouncing about in a pipe will be


considered.

2.5.2 Mathematical notations


The first step in the solution process is to describe the problem mathematically.
To do so, an x-coordinate that measures longitudinal position inside the pipe
will be used, as shown in figure 2.6. Also,the length of the pipe will be called
ℓx .

¾ ℓx -

-x
x=0 x = ℓx

Figure 2.6: Definitions.

To make the problem as easy to solve as possible, it will be assumed that the
only position coordinate that exists is the longitudinal position x along the pipe.
For now, the existence of any coordinates y and z that measure the location in
cross section will be completely ignored.

Key Points
⋄ The only position coordinate to be considered for now is x.
⋄ The notations have been defined.

2.5.3 The Hamiltonian


To analyze a quantum system you must find the Hamiltonian. The Hamiltonian
is the total energy operator, equal to the sum of kinetic plus potential energy.
The potential energy V is the easiest to find: since it is assumed that the
particle does not experience forces inside the pipe, (until it hits the ends of the
pipe, that is), the potential energy must be constant inside the pipe:

V = constant

(The force is the derivative of the potential energy, so a constant potential


energy produces zero force.) Further, since the value of the constant does not
2.5. A PARTICLE CONFINED INSIDE A PIPE 31

make any difference physically, you may as well assume that it is zero and save
some writing:
V =0
Next, the kinetic energy operator Tb is needed. You can just look up its
precise form in section 2.3 and find it is:

h̄2 ∂ 2
Tb = −
2m ∂x2
Note that only the x-term is used here; the existence of the other two coordinates
y and z is completely ignored. The constant m is the mass of the particle, and
h̄ is Planck’s constant.
Since the potential energy is zero, the Hamiltonian H is just this kinetic
energy:
h̄2 ∂ 2
H=− (2.12)
2m ∂x2

Key Points
⋄ The one-dimensional Hamiltonian (2.12) has been written down.

2.5.4 The Hamiltonian eigenvalue problem


With the Hamiltonian H found, the next step is to formulate the Hamilto-
nian eigenvalue problem, (or “time-independent Schrödinger equation.”). This
problem is always of the form
Hψ = Eψ
Any nonzero solution ψ of this equation is called an energy eigenfunction and
the corresponding constant E is called the energy eigenvalue.
Substituting the Hamiltonian for the pipe as found in the previous subsec-
tion, the eigenvalue problem is:

h̄2 ∂ 2 ψ
− = Eψ (2.13)
2m ∂x2
The problem is not complete yet. These problems also need so called “bound-
ary conditions”, conditions that say what happens at the ends of the x range.
In this case, the ends of the x range are the ends of the pipe. Now recall that
the square magnitude of the wave function gives the probability of finding the
particle. So the wave function must be zero wherever there is no possibility of
finding the particle. That is outside the pipe: it is assumed that the particle is
confined to the pipe. So the wave function is zero outside the pipe. And since
32 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

the outside of the pipe starts at the ends of the pipe, that means that the wave
function must be zero at the ends {A.10}:

ψ = 0 at x = 0 and ψ = 0 at x = ℓx (2.14)

Key Points
⋄ The Hamiltonian eigenvalue problem (2.13)has been found.
⋄ It also includes the boundary conditions (2.14).

2.5.5 All solutions of the eigenvalue problem


The previous section found the Hamiltonian eigenvalue problem to be:

h̄2 ∂ 2 ψ
− = Eψ
2m ∂x2
Now you need to solve this equation. Mathematicians call an equation of this
type an ordinary differential equation; “differential” because it has a derivative
in it, and “ordinary” since there are no derivatives with respect to variables
other than x.
If you do not know how to solve ordinary differential equations, it is no big
deal. The best way is usually to look them up anyway. The equation above can
be found in most mathematical table books, e.g. [15, item 19.7]. According to
what it says there, (with changes in notation), if you assume that the energy E
is negative, the solution is

κx −κx −2mE
ψ = C1 e + C2 e κ=

This solution may easily by checked by simply substituting it into the ordinary
differential equation.
As far as the ordinary differential equation is concerned, the constants C1
and C2 could be any two numbers. But you also need to satisfy the two boundary
conditions given in the previous subsection. The boundary condition that ψ = 0
when x = 0 produces, if ψ is as above,

C1 e0 + C2 e0 = 0

and since e0 = 1, this can be used to find an expression for C2 :

C2 = −C1
2.5. A PARTICLE CONFINED INSIDE A PIPE 33

The second boundary condition, that ψ = 0 at x = ℓx , produces

C1 eκℓx + C2 e−κℓx = 0

or, since you just found out that C2 = −C1 ,


³ ´
C1 eκℓx − e−κℓx = 0

This equation spells trouble because the term between parentheses cannot be
zero; the exponentials are not equal. Instead C1 will have to be zero; that is
bad news since it implies that C2 = −C1 is zero too, and then so is the wave
function ψ:
ψ = C1 eκx + C2 e−κx = 0
A zero wave function is not acceptable, since there would be no possibility to
find the particle anywhere!
Everything was done right. So the problem must be the initial assumption
that the energy is negative. Apparently, the energy cannot be negative. This
can be understood from the fact that for this particle, the energy is all kinetic
energy. Classical physics would say that the kinetic energy cannot be negative
because it is proportional to the square of the velocity. You now see that
quantum mechanics agrees that the kinetic energy cannot be negative, but says
it is because of the boundary conditions on the wave function.
Try again, but now assume that the energy E is zero instead of negative. In
that case the solution of the ordinary differential equation is according to [15,
item 19.7]
ψ = C1 + C2 x
The boundary condition that ψ = 0 at x = 0 now produces:

C1 + C2 0 = C1 = 0

so C1 must be zero. The boundary condition that ψ = 0 at x = ℓx gives:

0 + C2 ℓx = 0

so C2 must be zero too. Once again there is no nonzero solution, so the assump-
tion that the energy E can be zero must be wrong too.
Note that classically, it is perfectly OK for the energy to be zero: it would
simply mean that the particle is sitting in the pipe at rest. But in quantum
mechanics, zero kinetic energy is not acceptable, and it all has to do with
Heisenberg’s uncertainty principle. Since the particle is restricted to the inside
of the pipe, its position is constrained, and so the uncertainty principle requires
that the linear momentum must be uncertain. Uncertain momentum cannot be
34 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

zero momentum; measurements will show a range of values for the momentum
of the particle, implying that it is in motion and therefore has kinetic energy.
Try, try again. The only possibility left is that the energy E is positive. In
that case, the solution of the ordinary differential equation is according to [15,
item 19.7]: √
2mE
ψ = C1 cos(kx) + C2 sin(kx) k=

The boundary condition that ψ = 0 at x = 0 is:

C1 1 + C2 0 = C1 = 0

so C1 must be zero. The boundary condition ψ = 0 at x = ℓx is then:

0 + C2 sin(kℓx ) = 0

There finally is possibility to get a nonzero coefficient C2 : this equation can be


satisfied if sin(kℓx ) = 0 instead of C2 . In fact, there is not just one possibility for
this to happen: a sine is zero when its argument equals π, 2π, 3π, . . .. So there
is a nonzero solution for each of the following values of the positive constant k:
π 2π 3π
k= , k= , k= , ...
ℓx ℓx ℓx
Each of these possibilities gives one solution ψ. Different solutions ψ will be
distinguished by giving them a numeric subscript:
µ ¶ µ ¶ µ ¶
π 2π 3π
ψ1 = C2 sin x , ψ2 = C2 sin x , ψ3 = C2 sin x , ...
ℓx ℓx ℓx
The generic solution can be written more concisely using a counter n as:
µ ¶

ψn = C2 sin x for n = 1, 2, 3, . . .
ℓx
Let’s check the solutions. Clearly each is zero when x = 0 and when x = ℓx .
Also, substitution of each of the solutions into the ordinary differential equation
h̄2 ∂ 2 ψ
− = Eψ
2m ∂x2
shows that they all satisfy it, provided that their energy values are, respectively:
h̄2 π 2 22 h̄2 π 2 32 h̄2 π 2
E1 = , E2 = , E3 = , ...
2mℓ2x 2mℓ2x 2mℓ2x
or generically:
n2 h̄2 π 2
En = for n = 1, 2, 3, . . .
2mℓ2x
2.5. A PARTICLE CONFINED INSIDE A PIPE 35

There is one more condition that must be satisfied: each solution must be
normalized so that the total probability of finding the particle integrated over
all possible positions is 1 (certainty). That requires:
Z µ ¶
ℓx nπ
1 = hψn |ψn i = |C2 |2 sin2 x dx
x=0 ℓx

which after integration fixes C2 (assuming you choose it to be a positive real


number):
s
2
C2 =
ℓx
Summarizing the results of this subsection, there is not just one energy
eigenfunction and corresponding eigenvalue, but an infinite set of them:

s µ ¶
2 π h̄2 π 2
ψ1 = sin x E1 =
ℓx ℓx 2mℓ2x
s µ ¶
2 2π 22 h̄2 π 2
ψ2 = sin x E2 =
ℓx ℓx 2mℓ2x
(2.15)
s µ ¶ 2 2 2
2 3π 3 h̄ π
ψ3 = sin x E3 =
ℓx ℓx 2mℓ2x

.. ..
. .

or in generic form:
s µ ¶
2 nπ n2 h̄2 π 2
ψn = sin x En = for n = 1, 2, 3, 4, 5, . . . (2.16)
ℓx ℓx 2mℓ2x

The next thing will be to take a better look at these results.

Key Points
⋄ After a lot of grinding mathematics armed with table books, the energy
eigenfunctions and eigenvalues have finally been found
⋄ There are infinitely many of them.
⋄ They are as listed in (2.16) above. The first few are also written out
explicitly in (2.15).
36 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

2.5.5 Review Questions


1 Write down eigenfunction number 6.
2 Write down eigenvalue number 6.

2.5.6 Discussion of the energy values


This subsection discusses the energy that the particle in the pipe can have. It
was already discovered in the previous subsection that the particle cannot have
negative energy, nor zero energy. In fact, according to the orthodox interpreta-
tion, the only values that the total energy of the particle can take are the energy
eigenvalues
h̄2 π 2 22 h̄2 π 2 32 h̄2 π 2
E1 = , E2 = , E3 = , ...
2mℓ2x 2mℓ2x 2mℓ2x
derived in the previous subsection.
Energy values are typically shown graphically in the form of an “energy
spectrum”, as in figure 2.7. Energy is plotted upwards, so the vertical height

25h̄2 π 2 /2mℓ2x n=5


6
E

16h̄2 π 2 /2mℓ2x n=4

9h̄2 π 2 /2mℓ2x n=3

4h̄2 π 2 /2mℓ2x n=2


h̄2 π 2 /2mℓ2x n=1

Figure 2.7: One-dimensional energy spectrum for a particle in a pipe.

of each energy level indicates the amount of energy it has. To the right of each
energy level, the solution counter, or “quantum number”, n is listed.
Classically, the total energy of the particle can have any nonnegative value.
But according to quantum mechanics, that is not true: the total energy must
be one of the levels shown in the energy spectrum figure 2.7. It should be noted
that for a macroscopic particle, you would not know the difference; the spacing
between the energy levels is macroscopically very fine, since Planck’s constant h̄
2.5. A PARTICLE CONFINED INSIDE A PIPE 37

is so small. However, for a quantum-scale system, the discreteness of the energy


values can make a major difference.
Another point: at absolute zero temperature, the particle will be stuck in
the lowest possible energy level, E1 = h̄2 π 2 /2mℓ2x , in the spectrum figure 2.7.
This lowest possible energy level is called the “ground state.” Classically you
would expect that at absolute zero the particle has no kinetic energy, so zero
total energy. But quantum mechanics does not allow it. Heisenberg’s principle
requires some momentum, hence kinetic energy to remain for a confined particle
even at zero temperature.

Key Points
⋄ Energy values can be shown as an energy spectrum.
⋄ The possible energy levels are discrete.
⋄ But for a macroscopic particle, they are extremely close together.
⋄ The ground state of lowest energy has nonzero kinetic energy.

2.5.6 Review Questions


1 Plug the mass of an electron, m = 9.10938 10−31 kg, and the rough
size of an hydrogen atom, call it ℓx = 2 10−10 m, into the expression
for the ground state kinetic energy and see how big it is. Note that
h̄ = 1.05457 10−34 J s. Express in units of eV, where one eV equals
1.60218 10−19 J.
2 Just for fun, plug macroscopic values, m = 1 kg and ℓx = 1 m, into
the expression for the ground state energy and see how big it is. Note
that h̄ = 1.05457 10−34 J s.
3 What is the eigenfunction number, or quantum number, n that pro-
duces a macroscopic amount of energy, 1 J, for macroscopic values
m = 1 kg and ℓx = 1 m? With that many energy levels involved,
would you see the difference between successive ones?

2.5.7 Discussion of the eigenfunctions


This subsection discusses the one-dimensional energy eigenfunctions of the par-
ticle in the pipe. The solution of subsection 2.5.5 found them to be:
s µ ¶ s µ ¶ s µ ¶
2 π 2 2π 2 3π
ψ1 = sin x , ψ2 = sin x , ψ3 = sin x , ...
ℓx ℓx ℓx ℓx ℓx ℓx
38 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

The first one to look at is the ground state eigenfunction


s µ ¶
2 π
ψ1 = sin x .
ℓx ℓx

It is plotted at the top of figure 2.8. As noted in section 2.1, it is the square
magnitude of a wave function that gives the probability of finding the particle.
So, the second graph in figure 2.8 shows the square of the ground state wave
function, and the higher values of this function then give the locations where the
particle is more likely to be found. This book shows regions where the particle
is more likely to be found as darker regions, and in those terms the probability
of finding the particle is as shown in the bottom graphic of figure 2.8. It is seen

ψ1
x
dark
|ψ1 |2
light x light

Figure 2.8: One-dimensional ground state of a particle in a pipe.

that in the ground state, the particle is much more likely to be found somewhere
in the middle of the pipe than close to the ends.
Figure 2.9 shows the two next lowest energy states
s µ ¶ s µ ¶
2 2π 2 3π
ψ2 = sin x and ψ3 = sin x
ℓx ℓx ℓx ℓx

as grey tones. Regions where the particle is relatively likely to be found alternate
with ones where it is less likely to be found. And the higher the energy, the
more such regions there are. Also note that in sharp contrast to the ground
state, for eigenfunction ψ2 there is almost no likelihood of finding the particle
close to the center.
Needless to say, none of those energy states looks at all like the wave func-
tion blob bouncing around in figure 2.5. Moreover, it turns out that energy
eigenstates are stationary states: the probabilities shown in figures 2.8 and 2.9
do not change with time.
In order to describe a localized wave function blob bouncing around, states
of different energy must be combined. It will take until chapter 5.5.4 before
2.5. A PARTICLE CONFINED INSIDE A PIPE 39

ψ2

x
dark dark
|ψ2 |2
light light x light

ψ3
x
dark dark dark
|ψ3 |2
light light light x light

Figure 2.9: Second and third lowest one-dimensional energy states.

the analytical tools to do so have been described. For now, the discussion
must remain restricted to just finding the energy levels. And these are impor-
tant enough by themselves anyway, sufficient for many practical applications of
quantum mechanics.

Key Points
⋄ In the energy eigenfunctions, the particle is not localized to within any
particular small region of the pipe.
⋄ In general there are regions where the particle may be found separated
by regions in which there is little chance to find the particle.
⋄ The higher the energy level, the more such regions there are.

2.5.7 Review Questions


1 So how does, say, the one-dimensional eigenstate ψ6 look?
2 Generalizing the results above, for eigenfunction ψn , any n, how many
distinct regions are there where the particle may be found?
40 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

3 If you are up to a trick question, consider the following. There are


no forces inside the pipe, so the particle has to keep moving until it
hits an end of the pipe, then reflect backward until it hits the other
side and so on. So, it has to cross the center of the pipe regularly.
But in the energy eigenstate ψ2 , the particle has zero chance of ever
being found at the center of the pipe. What gives?

2.5.8 Three-dimensional solution

The solution for the particle stuck in a pipe that was obtained in the previous
subsections cheated. It pretended that there was only one spatial coordinate x.
Real life is three-dimensional. And yes, as a result, the solution as obtained is
simply wrong.
Fortunately, it turns out that you can fix up the problem pretty easily if you
assume that the pipe has a square cross section. There is a way of combining
one-dimensional solutions for all three coordinates into full three-dimensional
solutions. This is called the “separation of variables” idea: Solve each of the
three variables x, y, and z separately, then combine the results.
The full coordinate system for the problem is shown in figure 2.10: in addi-
tion to the x coordinate along the length of the pipe, there is also a y-coordinate
giving the vertical position in cross section, and similarly a z-coordinate giving
the position in cross section towards you.

¾ ℓx -
y = ℓy
ℓy 6 y
z, ℓz e
q ? 6y = 0
-x
x=0 x = ℓx

Figure 2.10: Definition of all variables.

Now recall the one-dimensional solutions that were obtained assuming there
is just an x-coordinate, but add subscripts “x” to keep them apart from any
2.5. A PARTICLE CONFINED INSIDE A PIPE 41

solutions for y and z:

s µ ¶
2 π h̄2 π 2
ψx1 = sin x Ex1 =
ℓx ℓx 2mℓ2x
s µ ¶
2 2π 22 h̄2 π 2
ψx2 = sin x Ex2 =
ℓx ℓx 2mℓ2x
(2.17)
s µ ¶
2 3π 32 h̄2 π 2
ψx3 = sin x Ex3 =
ℓx ℓx 2mℓ2x

.. ..
. .

or in generic form:
s µ ¶
2 nx π n2x h̄2 π 2
ψxnx = sin x Exnx = for nx = 1, 2, 3, . . . (2.18)
ℓx ℓx 2mℓ2x

Since it is assumed that the cross section of the pipe is square or rectangular
of dimensions ℓy × ℓz , the y and z directions have one-dimensional solutions
completely equivalent to the x direction:
s à !
2 ny π n2y h̄2 π 2
ψyny = sin y Eyny = for ny = 1, 2, 3, . . . (2.19)
ℓy ℓy 2mℓ2y

and
s µ ¶
2 nz π n2z h̄2 π 2
ψznz = sin z Eznz = for nz = 1, 2, 3, . . . (2.20)
ℓz ℓz 2mℓ2z

After all, there is no fundamental difference between the three coordinate direc-
tions; each is along an edge of a rectangular box.
Now it turns out, {A.11}, that the full three-dimensional problem has eigen-
functions ψnx ny nz that are simply products of the one dimensional ones:
s µ ¶ Ã ! µ ¶
8 nx π ny π nz π
ψnx ny nz = sin x sin y sin z (2.21)
ℓx ℓ y ℓ z ℓx ℓy ℓz

There is one such three-dimensional eigenfunction for each set of three numbers
(nx , ny , nz ). These numbers are the three “quantum numbers” of the eigenfunc-
tion.
42 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

Further, the energy eigenvalues Enx ny nz of the three-dimensional problem


are the sum of those of the one-dimensional problems:

n2x h̄2 π 2 n2y h̄2 π 2 n2z h̄2 π 2


Enx ny nz = + + (2.22)
2mℓ2x 2mℓ2y 2mℓ2z
For example, the ground state of lowest energy occurs when all three quan-
tum numbers are lowest, nx = ny = nz = 1. The three-dimensional ground
state wave function is therefore:
s µ ¶ Ã ! µ ¶
8 π π π
ψ111 = sin x sin y sin z (2.23)
ℓ x ℓy ℓ z ℓx ℓy ℓz
This ground state is shown in figure 2.11. The y- and z-factors ensure that the
wave function is now zero at all the surfaces of the pipe.

ψx1
x
dark
|ψx1 |2
light x light
y light
dark
2 light
ψy1 |ψy1 |

Figure 2.11: True ground state of a particle in a pipe.

The ground state energy is:


h̄2 π 2 h̄2 π 2 h̄2 π 2
E111 = + + (2.24)
2mℓ2x 2mℓ2y 2mℓ2z
Since the cross section dimensions ℓy and ℓz are small compared to the length
of the pipe, the last two terms are large compared to the first one. They make
the true ground state energy much larger than the one-dimensional value, which
was just the first term.
The next two lowest energy levels occur for nx = 2, ny = nz = 1 respectively
nx = 3, ny = nz = 1. (The latter assumes that the cross section dimensions
are small enough that the alternative possibilities ny = 2, nx = nz = 1 and
nz = 2, nx = ny = 1 have more energy.) The energy eigenfunctions
s µ ¶ Ã ! µ ¶
8 2π π π
ψ211 = sin x sin y sin z (2.25)
ℓx ℓy ℓz ℓx ℓy ℓz
2.5. A PARTICLE CONFINED INSIDE A PIPE 43

Figure 2.12: True second and third lowest energy states.

s µ ¶ Ã ! µ ¶
8 3π π π
ψ311 = sin x sin y sin z (2.26)
ℓx ℓ y ℓ z ℓx ℓy ℓz
are shown in figure 2.12. They have energy levels:

4h̄2 π 2 h̄2 π 2 h̄2 π 2 9h̄2 π 2 h̄2 π 2 h̄2 π 2


E211 = + + E311 = + + (2.27)
2mℓ2x 2mℓ2y 2mℓ2z 2mℓ2x 2mℓ2y 2mℓ2z

Key Points
⋄ Three-dimensional energy eigenfunctions can be found as products of
one-dimensional ones.
⋄ Three-dimensional energies can be found as sums of one-dimensional
ones.
⋄ Example three-dimensional eigenstates have been shown.

2.5.8 Review Questions


1 If the cross section dimensions ℓy and ℓz are one tenth the size of the
pipe length, how much bigger are the energies Ey1 and Ez1 compared
to Ex1 ? So, by what percentage is the one-dimensional ground state
energy Ex1 as an approximation to the three-dimensional one, E111 ,
then in error?
2 At what ratio of ℓy /ℓx does the energy E121 become higher than the
energy E311 ?
3 Shade the regions where the particle is likely to be found in the ψ322
energy eigenstate.

2.5.9 Quantum confinement


Normally, motion in physics occurs in three dimensions. Even in a narrow pipe,
in classical physics a point particle of zero size would be able to move in all
44 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

three directions. But in quantum mechanics, if the pipe gets very narrow, the
motion becomes truly one-dimensional.
To understand why, the first problem that must be addressed is what “mo-
tion” means in the first place, because normally motion is defined as change in
position, and in quantum mechanics particles do not have a well-defined posi-
tion.
Consider the particle in the ground state of lowest energy, shown in figure
2.11. This is one boring state; the picture never changes. You might be surprised
by that; after all, it was found that the ground state has energy, and it is all
kinetic energy. If the particle has kinetic energy, should not the positions where
the particle is likely to be found change with time?
The answer is no; kinetic energy is not directly related to changes in likely
positions of a particle; that is only an approximation valid for macroscopic
systems. It is not necessarily true for quantum-scale systems, certainly not if
they are in the ground state. Like it or not, in quantum mechanics kinetic
energy is second-order derivatives of the wave function, and nothing else.
Next, as already pointed out, all the other energy eigenstates, like those in
figure 2.12, have the same boring property of not changing with time.
Things only become somewhat interesting when you combine states of dif-
ferent energy. As the simplest possible example, consider the possibility that
the particle has the wave function:
q q
4 1
Ψ= ψ
5 111
+ ψ
5 211

at some starting time, which will be taken as t = 0. According to the orthodox


interpretation, in an energy measurement this particle would have a 54 = 80%
chance of being found at the ground state energy E111 and a 20% chance of
being found at the elevated energy level E211 . So there is now uncertainty in
energy; that is critical.
In chapter 5.1 it will be found that for nonzero times, the wave function of
this particle is given by
q q
4 −iE111 t/h̄ 1 −iE211 t/h̄
Ψ= 5
e ψ111 + 5
e ψ211 .

Using this expression, the probability of finding the particle, |Ψ|2 , can be plotted
for various times. That is done in figure 2.13 for four typical times. It shows
that with uncertainty in energy, the wave function blob does move. It performs
a periodic oscillation: after figure 2.13(d), the wave function returns to state
2.13(a), and the cycle repeats.
You would not yet want to call the particle localized, but at least the
locations where the particle can be found are now bouncing back and for-
wards between the ends of the pipe. And if you add additional wave functions
2.5. A PARTICLE CONFINED INSIDE A PIPE 45

(a)

(b)

(c)

(d)

Figure 2.13: A combination of ψ111 and ψ211 seen at some typical times.

ψ311 , ψ411 , . . ., you can get closer and closer to a localized wave function blob
bouncing around.
But if you look closer at figure 2.13, you will note that the wave function
blob does not move at all in the y-direction; it remains at all times centered
around the horizontal pipe centerline. It may seem that this is no big deal; just
add one or more wave functions with an ny value greater than one, like ψ121 ,
and bingo, there will be interesting motion in the y-direction too.
But there is a catch, and it has to do with the required energy. According
to the previous section, the kinetic energy in the y-direction takes the values

h̄2 π 2 4h̄2 π 2 9h̄2 π 2


Ey1 = , E y2 = , E y3 = , ...
2mℓ2y 2mℓ2y 2mℓ2y

That will be very large energies for a narrow pipe in which ℓy is small. The
particle will certainly have the large energy Ey1 in the y-direction; if it is in the
pipe at all it has at least that amount of energy. But if the pipe is really narrow,
it will simply not have enough additional, say thermal, energy to get anywhere
close to the next level Ey2 . The kinetic energy in the y-direction will therefor
be stuck at the lowest possible level Ey1 .
The result is that absolutely nothing interesting goes on in the y-direction.
As far as a particle in a narrow pipe is concerned, the y direction might just as
well not exist. It is ironic that while the kinetic energy in the y-direction, Ey1 ,
is very large, nothing actually happens in that direction.
If the pipe is also narrow in the z-direction, the only interesting motion is in
the x-direction, making the nontrivial physics truly one-dimensional. It becomes
a “quantum wire”. However, if the pipe size in the z-direction is relatively wide,
the particle will have lots of different energy states in the z-direction available
too and the motion will be two-dimensional, a “quantum well”. Conversely, if
46 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

the pipe is narrow in all three directions, you get a zero-dimensional “quantum
dot” in which the particle does nothing unless it gets a sizable chunk of energy.
An isolated atom can be regarded as an example of a quantum dot; the
electrons are confined to a small region around the nucleus and will be at a single
energy level unless they are given a considerable amount of energy. But note
that when people talk about quantum confinement, they are normally talking
about semi-conductors, for which similar effects occur at significantly larger
scales, maybe tens of times as large, making them much easier to manufacture.
An actual quantum dot is often referred to as an “artificial atom”, and has
similar properties as a real atom.
It may give you a rough idea of all the interesting things you can do in
nanotechnology when you restrict the motion of particles, in particular of elec-
trons, in various directions. You truly change the dimensionality of the normal
three-dimensional world into a lower dimensional one. Only quantum mechanics
can explain why, by making the energy levels discrete instead of continuously
varying. And the lower dimensional worlds can have your choice of topology (a
ring, a letter 8, a sphere, a cylinder, a Möbius strip?, . . . ,) to make things really
exciting.

Key Points
⋄ Quantum mechanics allows you to create lower-dimensional worlds for
particles.

2.6 The Harmonic Oscillator


This section provides an in-depth discussion of a basic quantum system. The
case to be analyzed is a particle constrained by forces to remain at approximately
the same position. This can describe systems such as an atom in a solid or in
a molecule. If the forces pushing the particle back to its nominal position are
proportional to the distance that the particle moves away from it, you have what
is called an harmonic oscillator. This is usually also a good approximation for
other constrained systems as long as the distances from the nominal position
remain small.
The particle’s displacement from the nominal position will be indicated by
(x, y, z). The forces keeping the particle constrained can be modeled as springs,
as sketched in figure 2.14. The stiffness of the springs is characterized by the
so called “spring constant” c, giving the ratio between force and displacement.
Note that it will be assumed that the three spring stiffnesses are equal.
According to classical Newtonian physics, the particle vibrates back and
2.6. THE HARMONIC OSCILLATOR 47

Figure 2.14: The harmonic oscillator.

forth around its nominal position with a frequency


r
c
ω= (2.28)
m
in radians per second. This frequency remains a convenient computational
quantity in the quantum solution.

Key Points
⋄ The system to be described is that of a particle held in place by forces
that increase proportional to the distance that the particle moves away
from its equilibrium position.
⋄ The relation between distance and force is assumed to be the same in
all three coordinate directions.
⋄ Number c is a measure of the strength of the forces and ω is the fre-
quency of vibration according to classical physics.

2.6.1 The Hamiltonian


In order to find the energy levels that the oscillating particle can have, you must
first write down the total energy Hamiltonian.
As far as the potential energy is concerned, the spring in the x-direction
holds an amount of potential energy equal to 21 cx2 , and similarly the ones in
the y- and z-directions.
To this total potential energy, you need to add the kinetic energy operator
b
T from section 2.3 to get the Hamiltonian:
à !
h̄2 ∂2 ∂2 ∂2 ³ ´
H=− + + + 21 c x2 + y 2 + z 2 (2.29)
2m ∂x2 ∂y 2 ∂z 2

Key Points
⋄ The Hamiltonian (2.29) has been found.
48 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

2.6.2 Solution using separation of variables


This section finds the energy eigenfunctions and eigenvalues of the harmonic os-
cillator using the Hamiltonian as found in the previous subsection. Every energy
eigenfunction ψ and its eigenvalue E must satisfy the Hamiltonian eigenvalue
problem, (or “time-independent Schrödinger equation”):
" Ã ! #
h̄2 ∂2 ∂2 ∂2 1
³
2 2 2
´
− + + + 2
c x +y +z ψ = Eψ (2.30)
2m ∂x2 ∂y 2 ∂z 2

The boundary condition is that ψ becomes zero at large distance from the
nominal position. After all, the magnitude of ψ tells you the relative probability
of finding the particle at that position, and because of the rapidly increasing
potential energy, the chances of finding the particle very far from the nominal
position should be vanishingly small.
Like for the particle in the pipe of the previous section, it will be assumed
that each eigenfunction is a product of one-dimensional eigenfunctions, one in
each direction:
ψ = ψx (x)ψy (y)ψz (z) (2.31)
Finding the eigenfunctions and eigenvalues by making such an assumption is
known in mathematics as the “method of separation of variables”.
Substituting the assumption in the eigenvalue problem above, and dividing
everything by ψx (x)ψy (y)ψz (z) reveals that E consists of three parts that will
be called Ex , Ey , and Ez :

E = E x + Ey + Ez
h̄2 ψx′′ (x) 1 2
Ex = − + cx
2m ψx (x) 2
h̄2 ψy′′ (y) 1 2 (2.32)
Ey = − + cy
2m ψy (y) 2
h̄2 ψz′′ (z) 1 2
Ez = − + cz
2m ψz (z) 2

where the primes indicate derivatives. The three parts represent the x, y, and
z-dependent terms.
By the definition above, the quantity Ex can only depend on x; variables
y and z do not appear in its definition. But actually, Ex cannot depend on x
either, since Ex = E − Ey − Ez , and none of those quantities depends on x. The
inescapable conclusion is that Ex must be a constant, independent of all three
variables (x, y, z). The same way Ey and Ez must be constants.
2.6. THE HARMONIC OSCILLATOR 49

If now in the definition of Ex above, both sides are multiplied by ψx (x), a


one-dimensional eigenvalue problem results:
" #
h̄2 ∂ 2
− + 1 cx2 ψx = Ex ψx (2.33)
2m ∂x2 2

The operator within the square brackets here, call it Hx , involves only the x-
related terms in the full Hamiltonian. Similar problems can be written down
for Ey and Ez . Separate problems in each of the three variables x, y, and z have
been obtained, explaining why this mathematical method is called separation
of variables.
Solving the one dimensional problem for ψx can be done by fairly elementary
but elaborate means. If you are interested, you can find how it is done in note
{A.12}, but that is mathematics and it will not teach you much about quantum
mechanics. It turns out that, like for the particle in the pipe of the previous
section, there is again an infinite number of different solutions for Ex and ψx :

Ex0 = 21 h̄ω ψx0 (x) = h0 (x)


Ex1 = 32 h̄ω ψx1 (x) = h1 (x)
(2.34)
5
Ex2 = 2
h̄ω ψx2 (x) = h2 (x)
.. ..
. .

Unlike for the particle in the pipe, here by convention the solutions are numbered
starting from 0, rather than from 1. So the first eigenvalue is Ex0 and the first
eigenfunction ψx0 . That is just how people choose to do it.
Also, the eigenfunctions are not sines like for the particle in the pipe; instead,
as table 2.1 shows, they take the form of some polynomial times an exponential.
But you will probably really not care much about what kind of functions they
are anyway unless you end up writing a textbook on quantum mechanics and
have to plot them. In that case, you can find a general expression, (A.27), in
note {A.12}.
But it are the eigenvalues that you may want to remember from this solution.
According to the orthodox interpretation, these are the measurable values of
the total energy in the x-direction (potential energy in the x-spring plus kinetic
energy of the motion in the x-direction.) Instead of writing them all out as was
done above, they can be described using the generic expression:
2nx + 1
Exnx = h̄ω for nx = 0, 1, 2, 3, . . . (2.35)
2
The eigenvalue problem has now been solved, because the equations for Y
and Z are mathematically the same and must therefore have corresponding
50 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

1 2 /2
h0 (x) = e−ξ
(πℓ2 )1/4

2ξ r
h1 (x) = e−ξ
2 /2 c
ω=
(4πℓ2 )1/4 m
s
2ξ 2 − 1 −ξ 2 /2 h̄
h2 (x) = e ℓ=
(4πℓ2 )1/4 mω
2ξ 3 − 3ξ 2 /2 x
h3 (x) = e−ξ ξ=
(9πℓ2 )1/4 ℓ

4ξ 4 − 12ξ 2 + 3 2 /2
h4 (x) = e−ξ
(576πℓ2 )1/4

Table 2.1: First few one-dimensional eigenfunctions of the harmonic oscillator.

solutions:
2ny + 1
Eyny = h̄ω for ny = 0, 1, 2, 3, . . . (2.36)
2
2nz + 1
Eznz = h̄ω for nz = 0, 1, 2, 3, . . . (2.37)
2
The total energy E of the complete system is the sum of Ex , Ey , and Ez .
Any nonnegative choice for number nx , combined with any nonnegative choice
for number ny , and for nz , produces one combined total energy value Exnx +
Eyny + Eznz , which will be indicated by Enx ny nz . Putting in the expressions for
the three partial energies above, these total energy eigenvalues become:
2nx + 2ny + 2nz + 3
Enx ny nz = h̄ω (2.38)
2
where the “quantum numbers” nx , ny , and nz may each have any value in the
range 0, 1, 2, 3, . . .
The corresponding eigenfunction of the complete system is:

ψnx ny nz = hnx (x)hny (y)hnz (z) (2.39)

where the functions h0 , h1 , ... are in table 2.1 or in (A.27) if you need them.
Note that the nx , ny , nz numbering system for the solutions arose naturally
from the solution process; it was not imposed a priori.
2.6. THE HARMONIC OSCILLATOR 51

Key Points
⋄ The eigenvalues and eigenfunctions have been found, skipping a lot of
tedious math that you can check when the weather is bad during spring
break.
⋄ Generic expressions for the eigenvalues are above in (2.38) and for the
eigenfunctions in (2.39).

2.6.2 Review Questions


1 Write out the ground state energy.
2 Write out the ground state wave function fully.
3 Write out the energy E100 .
4 Write out the eigenstate ψ100 fully.

2.6.3 Discussion of the eigenvalues


As the previous subsection showed, for every set of three nonnegative whole
numbers nx , ny , nz , there is one unique energy eigenfunction, or eigenstate,
(2.39) and a corresponding energy eigenvalue (2.38). The “quantum numbers”
nx , ny , and nz correspond to the numbering system of the one-dimensional
solutions that make up the full solution.
This section will examine the energy eigenvalues. These are of great physical
importance, because according to the orthodox interpretation, they are the only
measurable values of the total energy, the only energy levels that the oscillator
can ever be found at.
The energy levels can be plotted in the form of a so-called “energy spectrum”,
as in figure 2.15. The energy values are listed along the vertical axis, and the
sets of quantum numbers nx , ny , nz for which they occur are shown to the right
of the plot.
The first point of interest illustrated by the energy spectrum is that the
energy of the oscillating particle cannot take on any arbitrary value, but only
certain discrete values. Of course, that is just like for the particle in the pipe
of the previous section, but for the harmonic oscillator, the energy levels are
evenly spaced. In particular the energy value is always an odd multiple of 21 h̄ω.
It contradicts the Newtonian notion that a harmonic oscillator can have any
energy level. But since h̄ is so small, about 10−34 kg m2 /s, macroscopically the
different energy levels are extremely close together. Though the old Newtonian
theory is strictly speaking incorrect, it remains an excellent approximation for
macroscopic oscillators.
52 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

nx = 3 0 0 2 0 1 0 2 1 1
9
2
h̄ω ny = 0 3 0 1 2 0 1 0 2 1
nz = 0 0 3 0 1 2 2 1 0 1
nx = 2 0 0 1 1 0
7
2
h̄ω ny = 0 2 0 1 0 1
nz = 0 0 2 0 1 1
nx = 1 0 0
5
2
h̄ω ny = 0 1 0
nz = 0 0 1
nx = 0
3
2
h̄ω ny = 0
nz = 0

Figure 2.15: The energy spectrum of the harmonic oscillator.

Also note that the energy levels have no largest value; however high the
energy of the particle in a true harmonic oscillator may be, it will never escape.
The further it tries to go, the larger the forces that pull it back. It can’t win.
Another striking feature of the energy spectrum is that the lowest possible
energy is again nonzero. The lowest energy occurs for nx = ny = nz = 0 and
has a value:
E000 = 32 h̄ω (2.40)
So, even at absolute zero temperature, the particle is not completely at rest at
its nominal position; it still has 32 h̄ω worth of kinetic and potential energy left
that it can never get rid of. This lowest energy state is the ground state.
The reason that the energy cannot be zero can be understood from the
uncertainty principle. To get the potential energy to be zero, the particle would
have to be at its nominal position for certain. But the uncertainty principle
does not allow a certain position. Also, to get the kinetic energy to be zero,
the linear momentum would have to be zero for certain, and the uncertainty
relationship does not allow that either.
The actual ground state is a compromise between uncertainties in momen-
tum and position that make the total energy as small as Heisenberg’s relation-
ship allows. There is enough uncertainty in momentum to keep the particle near
the nominal position, minimizing potential energy, but there is still enough un-
certainty in position to keep the momentum low, minimizing kinetic energy. In
fact, the compromise results in potential and kinetic energies that are exactly
equal, {A.13}.
2.6. THE HARMONIC OSCILLATOR 53

For energy levels above the ground state, figure 2.15 shows that there is a
rapidly increasing number of different sets of quantum numbers nx , ny , and nz
that all produce that energy. Since each set represents one eigenstate, it means
that multiple states produce the same energy.

Key Points
⋄ Energy values can be graphically represented as an energy spectrum.
⋄ The energy values of the harmonic oscillator are equally spaced, with
a constant energy difference of h̄ω between successive levels.
⋄ The ground state of lowest energy has nonzero kinetic and potential
energy.
⋄ For any energy level above the ground state, there is more than one
eigenstate that produces that energy.

2.6.3 Review Questions


1 Verify that the sets of quantum numbers shown in the spectrum figure
2.15 do indeed produce the indicated energy levels.
2 Verify that there are no sets of quantum numbers missing in the
spectrum figure 2.15; the listed ones are the only ones that produce
those energy levels.

2.6.4 Discussion of the eigenfunctions


This section takes a look at the energy eigenfunctions of the harmonic oscillator
to see what can be said about the position of the particle at various energy
levels.
At absolute zero temperature, the particle will be in the ground state of
lowest energy. The eigenfunction describing this state has the lowest possible
numbering nx = ny = nz = 0, and is according to (2.39) of subsection 2.6.2
equal to
ψ000 = h0 (x)h0 (y)h0 (z) (2.41)
where function h0 is in table 2.1. The wave function in the ground state must
be equal to the eigenfunction to within a constant:

Ψgs = c000 h0 (x)h0 (y)h0 (z) (2.42)

where the magnitude of the constant c000 must be one. Using the expression for
function h0 from table 2.1, the properties of the ground state can be explored.
54 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

As noted earlier in section 2.1, it is useful to plot the square magnitude of


Ψ as grey tones, because the darker regions will be the ones where the particle
is more likely to be found. Such a plot for the ground state is shown in figure
2.16. It shows that in the ground state, the particle is most likely to be found
near the nominal position, and that the probability of finding the particle falls
off quickly to zero beyond a certain distance from the nominal position.

Figure 2.16: Ground state ψ000 of the harmonic oscillator

The region in which the q particle is likely to be found extends, roughly speak-
ing, about a distance ℓ = h̄/mω from the nominal position. For a macroscopic
oscillator, this will be a very small distance because of the smallness of h̄. That
is somewhat comforting, because macroscopically, you would expect an oscilla-
tor to be able to be at rest at the nominal position. While quantum mechanics
does not allow it, at least the distance ℓ from the nominal position, and the
energy 32 h̄ω are extremely small.
But obviously, the bad news is that the ground state probability density of
figure 2.16 does not at all resemble the classical Newtonian picture of a localized
particle oscillating back and forwards. In fact, the probability density does not
even depend on time: the chances of finding the particle in any given location
are the same for all times. The probability density is also spherically symmetric;
it only depends on the distance from the nominal position, and is the same at all
angular orientations. To get something that can start to resemble a Newtonian
spring-mass oscillator, one requirement is that the energy is well above the
ground level.
Turning now to the second lowest energy level, this energy level is achieved
by three different energy eigenfunctions, ψ100 , ψ010 , and ψ001 . The probability
distribution of each of the three takes the form of two separate “blobs”; figure
2.6. THE HARMONIC OSCILLATOR 55

2.17 shows ψ100 and ψ010 when seen along the z-direction. In case of ψ001 , one
blob hides the other, so this eigenfunction was not shown.

Figure 2.17: Wave functions ψ100 and ψ010 .

Obviously, these states too do not resemble a Newtonian oscillator at all.


The probability distributions once again stay the same at all times. (This is a
consequence of energy conservation, as discussed later in chapter 5.1.2.) Also,
while in each case there are two blobs occupied by a single particle, the particle
will never be be caught on the symmetry plane in between the blobs, which
naively could be taken as a sign of the particle moving from one blob to the
other.
The eigenfunctions for still higher energy levels show similar lack of resem-
blance to the classical motion. As an arbitrary example, figure 2.18 shows
eigenfunction ψ213 when looking along the z-axis. To resemble a classical oscil-
lator, the particle would need to be restricted to, maybe not an exact moving
point, but at most a very small moving region. Instead, all energy eigenfunc-
tions have steady probability distributions and the locations where the particle
may be found extend over large regions. It turns out that there is an uncertainty
principle involved here: in order to get some localization of the position of the
particle, you need to allow some uncertainty in its energy. This will have to
wait until much later, in chapter 5.5.4.
The basic reason that quantum mechanics is so slow is simple. To analyze,
say the x-motion, classical physics says: “the value of the total energy Ex is

Ex = 12 mẋ2 + 21 cx2 ,
56 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

Figure 2.18: Energy eigenfunction ψ213 .

now go analyze the motion!”. Quantum mechanics says: “the total energy
operator Hx is
à !2
h̄ ∂
Hx = 21 m + 21 cxb2 ,
im ∂x
now first figure out the possible energy values Ex0 , Ex1 , . . . before you can even
start thinking about analyzing the motion.”

Key Points
⋄ The ground state wave function is spherically symmetric: it looks the
same seen from any angle.
⋄ In energy eigenstates the particle position is uncertain.

2.6.4 Review Questions


1 Write out the ground state wave function and show that it is indeed
spherically symmetric.
2 Show that the ground state wave function is maximal at the origin
and, like all the other energy eigenfunctions, becomes zero at large
distances from the origin.
2.6. THE HARMONIC OSCILLATOR 57

3 Write down the explicit expression for the eigenstate ψ213 using table
2.1, then verify that it looks like figure 2.18 when looking along the
z-axis, with the x-axis horizontal and the y-axis vertical.

2.6.5 Degeneracy
As the energy spectrum figure 2.15 illustrated, the only energy level for which
there is only a single energy eigenfunction is the ground state. All higher energy
levels are what is called “degenerate”; there is more than one eigenfunction that
produces that energy. (In other words, more than one set of three quantum
numbers nx , ny , and nz .)
It turns out that degeneracy always results in nonuniqueness of the eigen-
functions. That is important for a variety of reasons. For example, in the
quantum mechanics of molecules, chemical bonds often select among nonunique
theoretical solutions those that best fit the given conditions. Also, to find spe-
cific mathematical or numerical solutions for the eigenfunctions of a quantum
system, the nonuniquenesses will somehow have to be resolved.
Nonuniqueness also poses problems for advanced analysis. For example, sup-
pose you try to analyze the effect of various small perturbations that a harmonic
oscillator might experience in real life. Analyzing the effect of small perturba-
tions is typically a relatively easy mathematical problem: the perturbation will
slightly change an eigenfunction, but it can still be approximated by the unper-
turbed one. So, if you know the unperturbed eigenfunction you are in business;
unfortunately, if the unperturbed eigenfunction is not unique, you may not know
which is the right one to use in the analysis.
The nonuniqueness arises from the fact that:

Linear combinations of eigenfunctions at the same energy level pro-


duce alternative eigenfunctions that still have that same energy level.

For example, the eigenfunctions ψ100 , and ψ010 of the harmonic oscillator
have the same energy E100 = E010 = 52 h̄ω (as does ψ001 , but this example will
be restricted to two eigenfunctions.) Any linear combination of the two has that
energy too, so you could replace eigenfunctions ψ100 and ψ010 by two alternative
ones such as:
ψ100 + ψ010 ψ010 − ψ100
√ and √
2 2
It is readily verified these linear combinations are indeed still eigenfunctions with
eigenvalue E100 = E010 : applying the Hamiltonian H to either one will multiply
each term by E100 = E010 , hence the entire combination by that amount. How
do these alternative eigenfunctions look? Exactly like ψ100 and ψ010 in figure
58 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

2.17, except that they are rotated over 45 degrees. Clearly then, they are just
as good as the originals, just seen under a different angle.
Which raises the question, how come the analysis ended up with the ones
that it did in the first place? The answer is in the method of separation of
variables that was used in subsection 2.6.2. It produced eigenfunctions of the
form hnx (x)hny (y)hnz (z) that were not just eigenfunctions of the full Hamilto-
nian H, but also of the partial Hamiltonians Hx , Hy , and Hz , being the x-, y-,
and z-parts of it.
For example, ψ100 = h1 (x)h0 (y)h0 (z) is an eigenfunction of Hx with eigen-
value Ex1 = 32 h̄ω, of Hy with eigenvalue Ey0 = 12 h̄ω, and of Hz with eigenvalue
Ez0 = 12 h̄ω, as well as of H with eigenvalue E100 = 52 h̄ω.
The alternative eigenfunctions are still eigenfunctions of H, but no longer of
the partial Hamiltonians. For example,

ψ100 + ψ010 h1 (x)h0 (y)h0 (z) + h0 (x)h1 (y)h0 (z)


√ = √
2 2

is not an eigenfunction of Hx : taking Hx times this eigenfunction would multiply


the first term by Ex1 but the second term by Ex0 .
So, the obtained eigenfunctions were really made determinate by ensuring
that they are simultaneously eigenfunctions of H, Hx , Hy , and Hz . The nice
thing about them is that they can answer questions not just about the total
energy of the oscillator, but also about how much of that energy is in each of
the three directions.

Key Points
⋄ Degeneracy occurs when different eigenfunctions produce the same en-
ergy.
⋄ It causes nonuniqueness: alternative eigenfunctions will exist.
⋄ That can make various analysis a lot more complex.

2.6.5 Review Questions


1 Just to check that this book is not lying, (you cannot be too careful),
write down the analytical √ expression for ψ100 and
√ ψ010 using table
2.1, then (ψ100 + ψ010 ) / 2 and (ψ010 − ψ100 ) / 2. Verify that the
latter two are the functions ψ100 and ψ010 in a coordinate system
(x̄, ȳ, z) that is rotated 45 degrees counter-clockwise around the z-
axis compared to the original (x, y, z) coordinate system.
2.6. THE HARMONIC OSCILLATOR 59

2.6.6 Non-eigenstates
It should not be thought that the harmonic oscillator only exists in energy
eigenstates. The opposite is more like it. Anything that somewhat localizes
the particle will produce an uncertainty in energy. This section explores the
procedures to deal with states that are not energy eigenstates.
First, even if the wave function is not an energy eigenfunction, it can still
always be written as a combination of the eigenfunctions:

X ∞
∞ X
∞ X
Ψ(x, y, z, t) = cnx ny nz ψnx ny nz (2.43)
nx =0 ny =0 nz =0

That this is always possible is a consequence of the completeness of the eigen-


functions of Hermitian operators such as the Hamiltonian. An arbitrary example
of such a combination state is shown in figure 2.19.

Figure 2.19: Arbitrary wave function (not an energy eigenfunction).

The coefficients cnx ny nz in the combination are important: according to the


orthodox statistical interpretation, their square magnitude gives the probability
to find the energy to be the corresponding eigenvalue Enx ny nz . For example,
|c000 |2 gives the probability of finding that the oscillator is in the ground state
of lowest energy.
If the wave function Ψ is in a known state, (maybe because the position of
the particle was fairly accurately measured), then each coefficient cnx ny nz can
60 CHAPTER 2. BASIC IDEAS OF QUANTUM MECHANICS

be found by computing an inner product:

cnx ny nz = hψnx ny nz |Ψi (2.44)

The reason this works is orthonormality of the eigenfunctions. As an exam-


ple, consider the case of coefficient c100 :

c100 = hψ100 |Ψi = hψ100 |c000 ψ000 + c100 ψ100 + c010 ψ010 + c001 ψ001 + c200 ψ200 + . . .i

Now proper eigenfunctions of Hermitian operators are orthonormal; the inner


product between different eigenfunctions is zero, and between identical eigen-
functions is one:

hψ100 |ψ000 i = 0 hψ100 |ψ100 i = 1 hψ100 |ψ010 i = 0 hψ100 |ψ001 i = 0 ...

So, the inner product above must indeed produce c100 .


Chapter 5.1 will discuss another reason why the coefficients are important:
they determine the time evolution of the wave function. It may be recalled that
the Hamiltonian, and hence the eigenfunctions derived from it, did not involve
time. However, the coefficients do.
Even if the wave function is initially in a state involving many eigenfunctions,
such as the one in figure 2.19, the orthodox interpretation says that energy
“measurement” will collapse it into a single eigenfunction. For example, assume
that the energies in all three coordinate directions are measured and that they
return the values:

Ex2 = 52 h̄ω Ey1 = 32 h̄ω Ez3 = 27 h̄ω

for a total energy E = 152


h̄ω. Quantum mechanics could not exactly predict that
this was going to happen, but it did predict that the energies had to be odd
multiples of 12 h̄ω. Also, quantum mechanics gave the probability of measuring
the given values to be whatever |c213 |2 was. Or in other words, what |hψ213 |Ψi|2
was.
After the example measurement, the predictions become much more specific,
because the wave function is now collapsed into the measured one:

Ψnew = cnew
213 ψ213

This eigenfunction was shown earlier in figure 2.18.


If another measurement of the energies is now done, the only values that
can come out are Ex2 , Ey1 , and Ez3 , the same as in the first measurement.
There is now certainty of getting those values; the probability |cnew 2
213 | = 1. This
will continue to be true for energy measurements until the system is disturbed,
maybe by a position measurement.
2.6. THE HARMONIC OSCILLATOR 61

Key Points
⋄ The basic ideas of quantum mechanics were illustrated using an exam-
ple.
⋄ The energy eigenfunctions are not the only game in town. Their seem-
ingly lowly coefficients are important too.
⋄ When the wave function is known, the coefficient of any eigenfunction
can be found by taking an inner product of the wave function with
that eigenfunction.
Chapter 3

Single-Particle Systems

In this chapter, the machinery to deal with single particles is worked out, cul-
minating in the vital solutions of the hydrogen atom and hydrogen molecular
ion.

3.1 Angular Momentum


Before a solution can be found for the important electronic structure of the hy-
drogen atom, the basis for the description of all the other elements and chemical
bonds, first angular momentum must be discussed. Like in the classical New-
tonian case, angular momentum is essential for the analysis, and in quantum
mechanics, angular momentum is also essential for describing the final solution.
Moreover, the quantum properties of angular momentum turn out to be quite
unexpected and important for practical applications.

3.1.1 Definition of angular momentum


The old Newtonian physics defines angular momentum L ~ as the vectorial prod-
uct ~r × p~, where ~r is the position of the particle in question and p~ is its linear
momentum.
Following the Newtonian analogy, quantum mechanics substitutes the gradi-
ent operator h̄∇/i for the linear momentum, so the angular momentum operator
becomes: Ã !
b
~ h̄ b b ∂ ∂ ∂
L = ~r × ∇ ~r ≡ (xb, yb, zb) ∇ ≡ , , (3.1)
i ∂x ∂y ∂z
Unlike the Hamiltonian, the angular momentum operator is not specific to a
given system. All observations about angular momentum will apply regardless
of the physical system being studied.

63
64 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

Key Points
⋄ The angular momentum operator (3.1) has been identified.

3.1.2 Angular momentum in an arbitrary direction


The intent in this subsection is to find the operator for the angular momentum
in an arbitrary direction and its eigenfunctions and eigenvalues.
For convenience, the direction in which the angular momentum is desired
will be taken as the z-axis of the coordinate system. In fact, much of the
mathematics that you do in quantum mechanics requires you to select some
arbitrary direction as your z-axis, even if the physics itself does not have any
preferred direction. It is further conventional in the quantum mechanics of
atoms and molecules to draw the chosen z-axis horizontal, (though not in [10]
or [20]), and that is what will be done here.

P
φ
r
θ

Figure 3.1: Spherical coordinates of an arbitrary point P.

Things further simplify greatly if you switch from Cartesian coordinates x,


y, and z to “spherical coordinates” r, θ, and φ, as shown in figure 3.1. The
coordinate r is the distance from the chosen origin, θ is the angular position
away from the chosen z-axis, and φ is the angular position around the z-axis,
measured from the chosen x-axis.
In terms of these spherical coordinates, the z-component of angular momen-
tum simplifies to:
b ≡ h̄ ∂
L (3.2)
z
i ∂φ
3.1. ANGULAR MOMENTUM 65

This can be verified by looking up the gradient operator ∇ in spherical coor-


dinates in [15, pp. 124-126] and then taking the component of ~r × ∇ in the
z-direction.
In any case, with a bit of thought, it clearly makes sense: the z-component
of linear momentum classically describes the motion in the direction of the z-
axis, while the z-component of angular momentum describes the motion around
the z-axis. So if in quantum mechanics the z-linear momentum is h̄/i times the
derivative with respect the coordinate z along the z-axis, then surely the logical
equivalent for z-angular momentum is h̄/i times the derivative with respect to
the angle φ around the z-axis?
Anyway, the eigenfunctions of the operator L b above turn out to be expo-
z
nentials in φ. More precisely, the eigenfunctions are of the form

C(r, θ)eimφ (3.3)

where m is a constant and C(r, θ) can be any arbitrary function of r and θ. The
number m is called the “magnetic quantum number”. It must be an integer,
one of . . . , −2, −1, 0, 1, 2, 3, . . . The reason is that if you increase the angle φ by
2π, you make a complete circle around the z-axis and return to the same point.
Then the eigenfunction (3.3) must again be the same, but that is only the case
if m is an integer, as can be verified from the Euler formula (1.5).
The above solution is easily verified directly, and the eigenvalue Lz identi-
fied, by substitution into the eigenvalue problem L b Ceimφ = L Ceimφ using the
z z
expression for L b above:
z

h̄ ∂Ceimφ h̄
= Lz Ceimφ =⇒ imCeimφ = Lz Ceimφ
i ∂φ i
It follows that every eigenvalue is of the form:

Lz = mh̄ for m an integer (3.4)

So the angular momentum in a given direction cannot just take on any value:
it must be a whole multiple m, (possibly negative), of Planck’s constant h̄.
Compare that with the linear momentum component pz which can take on
any value, within the accuracy that the uncertainty principle allows. Lz can
only take discrete values, but they will be precise. And since the z-axis was
arbitrary, this is true in any direction you choose.

Key Points
⋄ Even if the physics that you want to describe has no preferred direction,
you usually need to select some arbitrary z-axis to do the mathematics
of quantum mechanics.
66 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

⋄ Spherical coordinates based on the chosen z-axis are needed in this and
subsequent analysis. They are defined in figure 3.1.
⋄ The operator for the z-component of angular momentum is (3.2), where
φ is the angle around the z-axis.
⋄ The eigenvalues, or measurable values, of angular momentum in any
arbitrary direction are whole multiples m, possibly negative, of h̄.
⋄ The whole multiple m is called the magnetic quantum number.

3.1.2 Review Questions


1 If the angular momentum in a given direction is a multiple of h̄ =
1.05457 10−34 J s, then h̄ should have units of angular momentum.
Verify that.
2 What is the magnetic quantum number of a macroscopic, 1 kg, par-
ticle that is encircling the z-axis at a distance of 1 m at a speed of 1
m/s? Write out as an integer, and show digits you are not sure about
as a question mark.
3 Actually, based on the derived eigenfunction, C(r, θ)eimφ , would any
macroscopic particle ever be at a single magnetic quantum number
in the first place? In particular, what can you say about where the
particle can be found in an eigenstate?

3.1.3 Square angular momentum


Besides the angular momentum in an arbitrary direction, the other quantity of
primary importance is the magnitude of q the angular momentum. This is the
length of the angular momentum vector, L ~ · L.
~ The square root is awkward,
though; it is easier to work with the square angular momentum:
~ ·L
L2 ≡ L ~
b 2 operator and its eigenvalues.
This subsection discusses the L
b operator of the previous subsection, L
Like the L b 2 can be written in terms
z
of spherical coordinates. To do so, note first that, {A.14},

~b = h̄ (~r × ∇) · h̄ (~r × ∇) = −h̄2~r · (∇ × (~r × ∇))


~b · L
L
i i
and then look up the gradient and the curl in [15, pp. 124-126]. The result is:
2
à !
b 2 ≡ − h̄ ∂ ∂ h̄2 ∂ 2
L sin θ − (3.5)
sin θ ∂θ ∂θ sin2 θ ∂φ2
3.1. ANGULAR MOMENTUM 67

b -operator of the previous


Obviously, this result is not as intuitive as the L z
subsection, but once again, it only involves the spherical coordinate angles.
The measurable values of square angular momentum will be the eigenvalues of
this operator. However, that eigenvalue problem is not easy to solve. In fact
the solution is not even unique.

s s s
1 3 5
Y00 = Y10 = cos(θ) Y20 = (3 cos2 θ − 1)
4π 4π 16π
s s
3 15
Y11 =− sin θ eiφ Y21 = − sin θ cos θ eiφ
8π 8π
s s
3 15
Y1−1 = sin θ e−iφ Y2−1 = sin θ cos θ e−iφ
8π 8π
s
15
Y22 = sin2 θ e2iφ
32π
s
15
Y2−2 = sin2 θ e−2iφ
32π

Table 3.1: The first few spherical harmonics.

The solution to the problem may be summarized as follows. First, the non
uniqueness is removed by demanding that the eigenfunctions are also eigenfunc-
b , the operator of angular momentum in the z-direction. This makes
tions of L z
the problem solvable, {A.15}, and the resulting eigenfunctions are called the
“spherical harmonics” Ylm (θ, φ). The first few are given explicitly in table 3.1.
In case you need more of them for some reason, there is a generic expression
(A.28) in note {A.15}.
These eigenfunctions can additionally be multiplied by any arbitrary func-
tion of the distance from the origin r. They are normalized to be orthonormal
integrated over the surface of the unit sphere:
Z Z (
π 2π 1 if l = l and m = m
m
Ylm (θ, φ)∗ Yl (θ, φ) sin θ dθdφ = (3.6)
θ=0 φ=0 0 otherwise
The spherical harmonics Ylm are sometimes symbolically written in “ket nota-
tion” as |l mi.
What to say about them, except that they are in general a mess? Well, at
b should be. More
least every one is proportional to eimφ , as an eigenfunction of L z
68 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

importantly, the very first one, Y00 is independent of angular position compared
to the origin (it is the same for all θ and φ angular positions.) This eigenfunction
corresponds to the state in which there is no angular momentum around the
origin at all. If a particle has no angular momentum around the origin, it can
be found at all angular locations relative to it with equal probability.
Far more important than the details of the eigenfunctions themselves are
the eigenvalues that come rolling out of the analysis. A spherical harmonic Ylm
has an angular momentum in the z-direction

Lz = mh̄ (3.7)

where the integer m is called the magnetic quantum number, as noted in the
previous subsection. That is no surprise, because the analysis demanded that
they take that form. The new result is that a spherical harmonic has a square
angular momentum
L2 = l(l + 1)h̄2 (3.8)
where l is also an integer, and is called the “azimuthal quantum number”. It is
maybe a weird result, (why not simply l2 h̄2 ?) but that is what square angular
momentum turns out to be.
The azimuthal quantum number is at least as large as the magnitude of the
magnetic quantum number m:
l ≥ |m| (3.9)

The reason is that Lb2 = L


b2 + L
b2 + L
b 2 must be at least as large as Lb 2 ; in terms of
x y z z
2 2
eigenvalues, l(l + 1)h̄ must be at least as large as m2 h̄ . As it is, with l ≥ |m|,
either the angular momentum is completely zero, for l = m = 0, or L2 is always
greater than L2z .

Key Points
⋄ The operator for square angular momentum is (3.5).
⋄ The eigenfunctions of both square angular momentum and angular
momentum in the chosen z-direction are called the spherical harmonics
Ylm .
⋄ If a particle has no angular momentum around the origin, it can be
found at all angular locations relative to it with equal probability.
⋄ The eigenvalues for square angular momentum take the counter-in-
tuitive form L2 = l(l + 1)h̄2 where l is a nonnegative integer, one of
0, 1, 2, 3, . . ., and is called the azimuthal quantum number.
⋄ The azimuthal quantum number l is always at least as big as the ab-
solute value of the magnetic quantum number m.
3.1. ANGULAR MOMENTUM 69

3.1.3 Review Questions


1 The general wave function of a state with azimuthal quantum number
l and magnetic quantum number m is Ψ = R(r)Ylm (θ, φ), where R(r)
is some further arbitrary function of r. Show that the condition for
this wave function to be normalized, so that the total probability of
finding the particle integrated over all possible positions is one, is
that Z ∞
R(r)∗ R(r)r2 dr = 1.
r=0

2 Can you invert the statement about zero angular momentum and say:
if a particle can be found at all angular positions compared to the
origin with equal probability, it will have zero angular momentum?
3 What is the minimum amount that the total square angular mo-
mentum is larger than just the square angular momentum in the
z-direction for a given value of l?

3.1.4 Angular momentum uncertainty


Rephrasing the final results of the previous subsection, if there is nonzero an-
gular momentum, the angular momentum in the z-direction is always less than
the total angular momentum. There is something funny going on here. The
z-direction can be chosen arbitrarily, and if you choose it in the same direction
as the angular momentum vector, then the z-component should be the entire
vector. So, how can it always be less?
The answer of quantum mechanics is that the looked-for angular momentum
vector does not exist. No axis, however arbitrarily chosen, can align with a
nonexisting vector.
There is an uncertainty principle here, similar to the one of Heisenberg for
position and linear momentum. For angular momentum, it turns out that if
the component of angular momentum in a given direction, here taken to be z,
has a definite value, then the components in both the x and y directions will be
uncertain. (Details will be given in chapter 9.1.1). The wave function will be
in a state where Lx and Ly have a range of possible values m1 h̄, m2 h̄, . . ., each
with some probability. Without definite x and y components, there simply is
no angular momentum vector.
It is tempting to think of quantities that have not been measured, such as the
angular momentum vector in this example, as being merely “hidden.” However,
the impossibility for the z-axis to ever align with any angular momentum vector
shows that there is a fundamental difference between “being hidden” and “not
existing”.
70 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

Key Points
⋄ According to quantum mechanics, an exact nonzero angular momen-
tum vector will never exist. If one component of angular momentum
has a value, then the other two components will be uncertain.

3.2 The Hydrogen Atom


This section examines the critically important case of the hydrogen atom. The
hydrogen atom consists of a nucleus which is just a single proton, and an electron
encircling that nucleus. The nucleus, being much heavier than the electron, can
be assumed to be at rest, and only the motion of the electron is of concern.
The energy levels of the electron determine the photons that the atom will
absorb or emit, allowing the powerful scientific tool of spectral analysis. The
electronic structure is also essential for understanding the properties of the other
elements and of chemical bonds.

3.2.1 The Hamiltonian


The first step is to find the Hamiltonian of the electron. The electron experiences
an electrostatic Coulomb attraction to the oppositely charged nucleus. The
corresponding potential energy is

e2
V =− (3.10)
4πǫ0 r
with r the distance from the nucleus. The constant

e = 1.6 10−19 C (3.11)

is the magnitude of the electric charges of the electron and proton, and the
constant
ǫ0 = 8.85 10−12 C2 /J m (3.12)
is called the “permittivity of space.”
Unlike for the harmonic oscillator discussed earlier, this potential energy
cannot be split into separate parts for Cartesian coordinates x, y, and z. To
do the analysis for the hydrogen atom, you must put the nucleus at the origin
of the coordinate system and use spherical coordinates r (the distance from the
nucleus), θ (the angle from an arbitrarily chosen z-axis), and φ (the angle around
the z-axis); see figure 3.1. In terms of spherical coordinates, the potential energy
above depends on just the single coordinate r.
3.2. THE HYDROGEN ATOM 71

To get the Hamiltonian, you need to add to this potential energy the kinetic
energy operator Tb of chapter 2.3, which involves the Laplacian. The Laplacian
in spherical coordinates is readily available in table books, [15, p. 126], and the
Hamiltonian is thus found to be:
( Ã ! Ã ! )
h̄2 ∂ ∂ 1 ∂ ∂ 1 ∂2 e2 1
H=− r2 + sin θ + − (3.13)
2me r2 ∂r ∂r sin θ ∂θ ∂θ sin2 θ ∂φ2 4πǫ0 r

where
me = 9.109 10−31 kg (3.14)
is the mass of the electron.
It may be noted that the small proton motion can be corrected for by slightly
adjusting the mass of the electron to be an effective 9.1044 10−31 kg, {A.16}.
This makes the solution exact, except for extremely small errors due to rela-
tivistic effects. (These are discussed in chapter 10.1.6.)

Key Points
⋄ To analyze the hydrogen atom, you must use spherical coordinates.
⋄ The Hamiltonian in spherical coordinates has been written down. It is
(3.13).

3.2.2 Solution using separation of variables


This subsection describes in general lines how the eigenvalue problem for the
electron of the hydrogen atom is solved. The basic ideas are like those used to
solve the particle in a pipe and the harmonic oscillator, but in this case, they
are used in spherical coordinates rather than Cartesian ones. Without getting
too much caught up in the mathematical details, do not miss the opportunity
of learning where the hydrogen energy eigenfunctions and eigenvalues come
from. This is the crown jewel of quantum mechanics; brilliant, almost flawless,
critically important; one of the greatest works of physical analysis ever.
The eigenvalue problem for the Hamiltonian, as formulated in the previous
subsection, can be solved by searching for solutions ψ that take the form of
a product of functions of each of the three coordinates: ψ = R(r)Θ(θ)Φ(φ).
More concisely, ψ = RΘΦ. The problem now is to find separate equations for
the individual functions R, Θ, and Φ from which they can then be identified.
The arguments are similar as for the harmonic oscillator, but messier, since
the coordinates are more entangled. First, substituting ψ = RΘΦ into the
Hamiltonian eigenvalue problem Hψ = Eψ, with the Hamiltonian H as given
72 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

in the previous subsection and E the energy eigenvalue, produces:


" ( Ã ! Ã ! ) #
h̄2 ∂ ∂ 1 ∂ ∂ 1 ∂2 e2 1
− r2 + sin θ + − RΘΦ
2me r2 ∂r ∂r sin θ ∂θ ∂θ sin2 θ ∂φ2 4πǫ0 r
= ERΘΦ

To reduce this problem, pre-multiply by 2me r2 /RΘΦ and then separate the
various terms:
à ! ( à ! )
h̄2 ∂ 2 ∂R 1 h̄2 ∂ ∂ h̄2 ∂ 2
− r + − sin θ − ΘΦ
R ∂r ∂r ΘΦ sin θ ∂θ ∂θ sin2 θ ∂φ2
2me r2 e2 1
− = 2me r2 E (3.15)
4πǫ0 r

Next identify the terms involving the angular derivatives and name them Eθφ .
They are: " Ã ! #
1 h̄2 ∂ ∂ h̄2 ∂ 2
− sin θ − ΘΦ = Eθφ
ΘΦ sin θ ∂θ ∂θ sin2 θ ∂φ2
By this definition, Eθφ only depends on θ and φ, not r. But it cannot depend on θ
or φ either, since none of the other terms in the original equation (3.15) depends
on them. So Eθφ must be a constant, independent of all three coordinates.
Then multiplying the angular terms above by ΘΦ produces a reduced eigenvalue
problem involving ΘΦ only, with eigenvalue Eθφ :
" Ã ! #
h̄2 ∂ ∂ h̄2 ∂ 2
− sin θ − ΘΦ = Eθφ ΘΦ (3.16)
sin θ ∂θ ∂θ sin2 θ ∂φ2

Repeat the game with this reduced eigenvalue problem. Multiply by sin2 θ/ΘΦ,
and name the only φ-dependent term Eφ . It is:
à !
1 ∂2
− h̄2 Φ = Eφ
Φ ∂φ2

By definition Eφ only depends on φ, but since the other two terms in the equa-
tion it came from did not depend on φ, Eφ cannot either, so it must be another
constant. What is left is a simple eigenvalue problem just involving Φ:
à !
∂2
−h̄2 Φ = Eφ Φ
∂φ2

And that is readily solvable.


3.2. THE HYDROGEN ATOM 73

In fact, the solution to this final problem has already been given, since the
operator involved is just the square of the angular momentum operator L b of
z
section 3.1.2: Ã ! Ã !2
2 ∂2 h̄ ∂ b 2Φ
−h̄ Φ= Φ=L z
∂φ2 i ∂φ
b ,
So this equation must have the same eigenfunctions as the operator L z

Φm = eimφ

and must have the square eigenvalues

Eφ = (mh̄)2
b multiplies the eigenfunction by mh̄). It may be recalled
(each application of L z
that the magnetic quantum number m must be an integer.
The eigenvalue problem (3.16) for ΘΦ is even easier; it is exactly the one for
the square angular momentum L2 of section 3.1.3. (So, no, there was not really
a need to solve for Φ separately.) Its eigenfunctions are therefor the spherical
harmonics,
ΘΦ = Ylm (θ, φ)
and its eigenvalues are
Eθφ = l(l + 1)h̄2
It may be recalled that the azimuthal quantum number l must be an integer
greater than or equal to |m|.
Returning now to the solution of the original eigenvalue problem (3.15), re-
placement of the angular terms by Eθφ = l(l + 1)h̄2 turns it into an ordinary
differential equation problem for the radial factor R(r) in the energy eigenfunc-
tion. As usual, this problem is a pain to solve, so that is again shoved away in
a note, {A.17}.
It turns out that the solutions of the radial problem can be numbered using
a third quantum number, n, called the “principal quantum number”. It is larger
than the azimuthal quantum number l, which in turn must be at least as large
as the absolute value of the magnetic quantum number:

n > l ≥ |m| (3.17)

so the principal quantum number must be at least 1. And if n = 1, then


l = m = 0.
In terms of these three quantum numbers, the final energy eigenfunctions of
the hydrogen atom are of the general form:

ψnlm = Rnl (r)Ylm (θ, φ) (3.18)


74 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

where the spherical harmonics Ylm were described in section 3.1.3. The brand
new radial wave functions Rnl can be found written out in table 3.2 for small
values of n and l, or in note {A.17}, (A.30), for any n and l. They are usually
written in terms of a scaled radial distance from the nucleus ρ = r/a0 , where
the length a0 is called the “Bohr radius” and has the value

4πǫ0 h̄2
a0 = ≈ 0.529177 10−10 m (3.19)
me e2

or about half an Ångstrom. The Bohr radius is a really good length scale to
describe atoms in terms of. The Ångstrom itself is a good choice too, it is 10−10
m, or one tenth of a nanometer.

2 2 − ρ −ρ/2 54 − 36ρ + 4ρ2 −ρ/3


R10 = q e−ρ R20 = q e R30 = q e
a30 2 2a30 81 3a30

ρ −ρ/2 24ρ − 4ρ2


R21 = q e R31 = q e−ρ/3
2 6a30 81 6a30

4ρ2
R32 = q e−ρ/3
81 30a30

4πǫ0 h̄2 r
a0 = ρ=
me e2 a0

Table 3.2: The first few radial wave functions for hydrogen.

The energy eigenvalues are much simpler and more interesting than the
eigenfunctions; they are

h̄2 1 E1 h̄2
En = − 2 2
= 2 n = 1, 2, 3, . . . E1 = − = −13.6057 eV
2me a0 n n 2me a20
(3.20)
−19
where eV stands for electron volt, a unit of energy equal to 1.60218 10 J. It is
the energy that an electron picks up during a 1 volt change in electric potential.
You may wonder why the energy only depends on the principal quantum
number n, and not also on the azimuthal quantum number l and the magnetic
quantum number m. Well, the choice of z-axis was arbitrary, so it should not
3.2. THE HYDROGEN ATOM 75

seem that strange that the physics would not depend on the angular momentum
in that direction. But that the energy does not depend on l is nontrivial: if you
solve the simpler problem of a particle stuck inside an impenetrable spherical
container, using procedures from {A.46}, the energy values depend on both n
and l. So, that is just the way it is. (It stops being true anyway if you include
relativistic effects in the Hamiltonian.)
Since the lowest possible value of the principal quantum number n is one,
the ground state of lowest energy E1 is eigenfunction ψ100 .

Key Points
⋄ Skipping a lot of math, energy eigenfunctions ψnlm and their energy
eigenvalues En have been found.
⋄ There is one eigenfunction for each set of three integer quantum num-
bers n, l, and m satisfying n > l ≥ |m|. The number n is called the
principal quantum number.
⋄ The typical length scale in the solution is called the Bohr radius a0 ,
which is about half an Ångstrom.
⋄ The derived eigenfunctions ψnlm are eigenfunctions of
• z-angular momentum, with eigenvalue Lz = mh̄;
• square angular momentum, with eigenvalue L2 = l(l + 1)h̄2 ;
• energy, with eigenvalue En = −h̄2 /2me a20 n2 .
⋄ The energy values only depend on the principal quantum number n.
⋄ The ground state is ψ100 .

3.2.2 Review Questions


1 Use the tables for the radial wave functions and the spherical har-
monics to write down the wave function

ψnlm = Rnl (r)Ylm (θ, φ)

for the case of the ground state ψ100 .


R
Check that the state is normalized. Note: 0∞ e−2u u2 du = 14 .
2 Use the generic expression
s µ ¶l µ ¶
2 (n − l − 1)! 2ρ 2ρ −ρ/n m
ψnlm =− 2 L2l+1
n+l e Yl (θ, φ)
n [(n + l)!a0 ]3 n n

with ρ = r/a0 and Ylm from the spherical harmonics table to find
the ground state wave function ψ100 . Note: the Laguerre polynomial
L1 (x) = 1 − x and for any p, Lp1 is just its p-th derivative.
76 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

3 Plug numbers into the generic expression for the energy eigenvalues,

h̄2 1
En = − ,
2me a20 n2

where a0 = 4πǫ0 h̄2 /me e2 , to find the ground state energy. Express
in eV, where 1 eV equals 1.6022 10−19 J. Values for the physical
constants can be found at the start of this section and in the notations
section.

3.2.3 Discussion of the eigenvalues


The only energy values that the electron in the hydrogen atom can have are the
“Bohr energies” derived in the previous subsection:

h̄2 1
En = − n = 1, 2, 3, . . .
2me a20 n2

This subsection discusses the physical consequences of this result.

Figure 3.2: Spectrum of the hydrogen atom.

To aid the discussion, the allowed energies are plotted in the form of an
energy spectrum in figure 3.2. To the right of the lowest three energy levels the
values of the quantum numbers that give rise to those energy levels are listed.
The first thing that the energy spectrum illustrates is that the energy lev-
els are all negative, unlike the ones of the harmonic oscillator, which were all
3.2. THE HYDROGEN ATOM 77

positive. However, that does not mean much; it results from defining the poten-
tial energy of the harmonic oscillator to be zero at the nominal position of the
particle, while the hydrogen potential is instead defined to be zero at large dis-
tance from the nucleus. (It will be shown later, chapter 5.1.4, that the average
potential energy is twice the value of the total energy, and the average kinetic
energy is minus the total energy, making the average kinetic energy positive as
it should be.)
A more profound difference is that the energy levels of the hydrogen atom
have a maximum value, namely zero, while those of the harmonic oscillator
went all the way to infinity. It means physically that while the particle can
never escape in a harmonic oscillator, in a hydrogen atom, the electron escapes
if its total energy is greater than zero. Such a loss of the electron is called
“ionization” of the atom.
There is again a ground state of lowest energy; it has total energy

E1 = −13.6 eV (3.21)

(an eV or “electron volt” is 1.6 10−19 J). The ground state is the state in which
the hydrogen atom will be at absolute zero temperature. In fact, it will still be
in the ground state at room temperature, since even then the energy of heat
motion is unlikely to raise the energy level of the electron to the next higher
one, E2 .
The ionization energy of the hydrogen atom is 13.6 eV; this is the minimum
amount of energy that must be added to raise the electron from the ground
state to the state of a free electron.
If the electron is excited from the ground state to a higher but still bound
energy level, (maybe by passing a spark through hydrogen gas), it will in time
again transition back to a lower energy level. Discussion of the reasons and the
time evolution of this process will have to wait until chapter 5.2. For now, it
can be pointed out that different transitions are possible, as indicated by the
arrows in figure 3.2. They are named by their final energy level to be Lyman,
Balmer, or Paschen series transitions.
The energy lost by the electron during a transition is emitted as a quantum
of electromagnetic radiation called a photon. The most energetic photons, in
the ultraviolet range, are emitted by Lyman transitions. Balmer transitions
emit visible light and Paschen ones infrared.
The photons emitted by isolated atoms at rest must have an energy very
precisely equal to the difference in energy eigenvalues; anything else would vi-
olate the requirement of the orthodox interpretation that only the eigenvalues
are observable. And according to the “Planck-Einstein relation,” the photon’s
energy equals the angular frequency ω of its electromagnetic vibration times h̄:

En1 − En2 = h̄ω.


78 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

Thus the spectrum of the light emitted by hydrogen atoms is very distinctive and
can be identified to great accuracy. Different elements have different spectra,
and so do molecules. It all allows atoms and molecules to be correctly recognized
in a lab or out in space.
Atoms and molecules may also absorb electromagnetic energy of the same
frequencies to enter an excited state and eventually emit it again in a different
direction, chapter 5.2. In this way, they can remove these frequencies from light
that passes them on its way to earth, resulting in an absorption spectrum. Since
hydrogen is so prevalent in the universe, its energy levels as derived here are
particularly important in astronomy.

Key Points
⋄ The energy levels of the electron in a hydrogen atom have a highest
value. This energy is by convention taken to be the zero level.
⋄ The ground state has a energy 13.6 eV below this zero level.
⋄ If the electron in the ground state is given an additional amount of
energy that exceeds the 13.6 eV, it has enough energy to escape from
the nucleus. This is called ionization of the atom.
⋄ If the electron transitions from a bound energy state with a higher
principal quantum number n1 to a lower one n2 , it emits radiation
with an angular frequency ω given by
h̄ω = En1 − En2

⋄ Similarly, atoms with energy En2 may absorb electromagnetic energy


of such a frequency.

3.2.3 Review Questions


1 If there are infinitely many energy levels E1 , E2 , E3 , E4 , E5 , E6 , . . .,
where did they all go in the energy spectrum?
2 What is the value of energy level E2 ? And E3 ?
3 Based on the results of the previous question, what is the color of
the light emitted in a Balmer transition from energy E3 to E2 ? The
Planck-Einstein relation says that the angular frequency ω of the
emitted photon is its energy divided by h̄, and the wave length of
light is 2πc/ω where c is the speed of light. Typical wave lengths of
visible light are: violet 400 nm, indigo 445 nm, blue 475 nm, green
510 nm, yellow 570 nm, orange 590 nm, red 650 nm.
4 What is the color of the light emitted in a Balmer transition from an
energy level En with a high value of n to E2 ?
3.2. THE HYDROGEN ATOM 79

3.2.4 Discussion of the eigenfunctions


The appearance of the energy eigenstates will be of great interest in under-
standing the heavier elements and chemical bonds. This subsection describes
the most important of them.
It may be recalled from subsection 3.2.2 that there is one eigenfunction ψnlm
for each set of three integer quantum numbers. They are the principal quan-
tum number n (determining the energy of the state), the azimuthal quantum
number l (determining the square angular momentum), and the magnetic quan-
tum number m (determining the angular momentum in the chosen z-direction.)
They must satisfy the requirements that
n > l ≥ |m|
For the ground state, with the lowest energy E1 , n = 1 and hence accord-
ing to the conditions above both l and m must be zero. So the ground state
eigenfunction is ψ100 ; it is unique.
The expression for the wave function of the ground state is (from the results
of subsection 3.2.2):
1 −r/a0
ψ100 (r) = q e (3.22)
πa30
where a0 is called the “Bohr radius”,
4πǫ0 h̄2
a0 = 2
= 0.53 × 10−10 m (3.23)
me e
The square magnitude of the energy states will again be displayed as grey
tones, darker regions corresponding to regions where the electron is more likely
to be found. The ground state is shown this way in figure 3.3; the electron may
be found within a blob size that is about thrice the Bohr radius, or roughly an
Ångstrom, (10−10 m), in diameter.

Figure 3.3: Ground state wave function ψ100 of the hydrogen atom.

It is the quantum mechanical refusal of electrons to restrict themselves to a


single location that gives atoms their size. If Planck’s constant h̄ would have
80 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

been zero, so would have been the Bohr radius, and the electron would have
been in the nucleus. It would have been a very different world.
The ground state probability distribution is spherically symmetric: the prob-
ability of finding the electron at a point depends on the distance from the nu-
cleus, but not on the angular orientation relative to it.
The excited energy levels E2 , E3 , . . . are all degenerate; as the spectrum
figure 3.2 indicated, there is more than one eigenstate producing each level.
Let’s have a look at the states at energy level E2 now.
Figure 3.4 shows energy eigenfunction ψ200 . Like ψ100 , it is spherically sym-
metric. In fact, all eigenfunctions ψn00 are spherically symmetric. However,
the wave function has blown up a lot, and now separates into a small, more or
less spherical region in the center, surrounded by a second region that forms a
spherical shell. Separating the two is a radius at which there is zero probability
of finding the electron.

Figure 3.4: Eigenfunction ψ200 .

The state ψ200 is commonly referred to as the “2s” state. The 2 indicates
that it is a state with energy E2 . The “s” indicates that the azimuthal quantum
number is zero; just think “spherically symmetric.” Similarly, the ground state
ψ100 is commonly indicated as “1s”, having the lowest energy E1 .
States which have azimuthal quantum number l = 1 are called “p” states, for
some historical reason. In particular, the ψ21m states are called “2p” states. As
first example of such a state, figure 3.5 shows ψ210 . This wave function squeezes
itself close to the z-axis, which is plotted horizontally by convention. There is
zero probability of finding the electron at the vertical x, y-symmetry plane, and
maximum probability at two symmetric points on the z-axis. Since the wave
function squeezes close to the z axis, this state is often more specifically referred
to as the “2pz ” state. Think “points along the z-axis.”
3.2. THE HYDROGEN ATOM 81

Figure 3.5: Eigenfunction ψ210 , or 2pz .

Figure 3.6 shows the other two “2p” states, ψ211 and ψ21−1 . These two
states look exactly the same as far as the probability density is concerned. It
is somewhat hard to see in the figure, but they really take the shape of a torus
around the left-to-right z-axis.

Figure 3.6: Eigenfunction ψ211 (and ψ21−1 ).

Eigenfunctions ψ200 , ψ210 , ψ211 , and ψ21−1 are degenerate: they all four have
the same energy E2 = −3.4 eV. The consequence is that they are not unique.
Combinations of them can be formed that have the same energy. These combi-
nation states may be more important physically than the original eigenfunctions.
In particular, the torus-shaped eigenfunctions ψ211 and ψ21−1 are often not
very useful for descriptions of heavier elements and chemical bonds. Two states
that are more likely to be relevant here are called 2px and 2py ; they are the
combination states:
1 i
2px : √ (−ψ211 + ψ21−1 ) 2py : √ (ψ211 + ψ21−1 ) (3.24)
2 2
82 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

These two states are shown in figure 3.7; they look exactly like the “pointer”
state 2pz of figure 3.5, except that they squeeze along the x-axis, respectively
the y-axis, instead of along the z-axis. (Since the y-axis is pointing towards
you, 2py looks rotationally symmetric. Seen from the side, it would look like pz
in figure 3.5.)

Figure 3.7: Eigenfunctions 2px , left, and 2py , right.

Note that unlike the two original states ψ211 and ψ21−1 , the states 2px and
2py do not have a definite value of the z-component of angular momentum; the
z-component has a 50/50 uncertainty of being either +h̄ or −h̄. But that is
not important in most circumstances. What is important is that when multiple
electrons occupy the p states, mutual repulsion effects tend to push them into
the px , py , and pz states.
So, the four independent eigenfunctions at energy level E2 are best thought
of as consisting of one spherically symmetrical 2s state, and three directional
states, 2px , 2py , and 2pz , pointing along the three coordinate axes.
But even that is not always ideal; as discussed in chapter 4.11.4, for many
chemical bonds, especially those involving the important element carbon, still
different combination states called “hybrids” show up. They involve combina-
tions of the 2s and the 2p states and therefor have uncertain square angular
momentum as well.

Key Points
⋄ The typical size of eigenstates is given by the Bohr radius, making the
size of the atom of the order of an Å.
⋄ The ground state ψ100 , or 1s state, is nondegenerate: no other set of
quantum numbers n, l, m produces energy E1 .
3.2. THE HYDROGEN ATOM 83

⋄ All higher energy levels are degenerate, there is more than one eigen-
state producing that energy.
⋄ All states of the form ψn00 , including the ground state, are spherically
symmetric, and are called s states. The ground state ψ100 is the 1s
state, ψ200 is the 2s state, etcetera.
⋄ States of the form ψn1m are called p states. The basic 2p states are
ψ21−1 , ψ210 , and ψ211 .
⋄ The state ψ210 is more specifically called the 2pz state, since it squeezes
itself around the z-axis.
⋄ There are similar 2px and 2py states that squeeze around the x and y
axes. Each is a combination of ψ21−1 and ψ211 .
⋄ The four spatial states at the E2 energy level can therefor be thought
of as one spherically symmetric 2s state and three 2p pointer states
along the axes.
⋄ However, since the E2 energy level is degenerate, eigenstates of still
different shapes are likely to show up in applications.

3.2.4 Review Questions


1 At what distance from the nucleus r, expressed as a multiple of the
Bohr radius a0 , becomes the square of the ground state wave func-
tion less than one percent of its value at the nucleus? What is that
expressed in Å?
2 Check from the conditions

n > l ≥ |m|

that ψ200 , ψ211 , ψ210 , and ψ21−1 are the only states of the form ψnlm
that have energy E2 . (Of course, all their combinations, like 2px
and 2py , have energy E2 too, but they are not simply of the form
ψnlm , but combinations of the “basic” solutions ψ200 , ψ211 , ψ210 , and
ψ21−1 .)
3 Check that the states

1 i
2px = √ (−ψ211 + ψ21−1 ) 2py = √ (ψ211 + ψ21−1 )
2 2

are properly normalized.


84 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

3.3 Expectation Value and Standard Deviation


It is a striking consequence of quantum mechanics that physical quantities may
not have a value. This occurs whenever the wave function is not an eigenfunction
of the quantity of interest. For example, the ground state of the hydrogen atom
is not an eigenfunction of the position operator xb, so the x-position of the
electron does not have a value. According to the orthodox interpretation, it
cannot be predicted with certainty what a measurement of such a quantity will
produce.
However, it is possible to say something if the same measurement is done
on a large number of systems that are all the same before the measurement.
An example would be x-position measurements on a large number of hydrogen
atoms that are all in the ground state before the measurement. In that case, it
is relatively straightforward to predict what the average, or “expectation value,”
of all the measurements will be.
The expectation value is certainly not a replacement for the classical value
of physical quantities. For example, for the hydrogen atom in the ground state,
the expectation position of the electron is in the nucleus by symmetry. Yet
because the nucleus is so small, measurements will never find it there! (The
typical measurement will find it a distance comparable to the Bohr radius away.)
Actually, that is good news, because if the electron would be in the nucleus as
a classical particle, its potential energy would be almost minus infinity instead
of the correct value of about -27 eV. It would be a very different universe. Still,
having an expectation value is of course better than having no information at
all.
The average discrepancy between the expectation value and the actual mea-
surements is called the “standard deviation.”. In the hydrogen atom example,
where typically the electron is found a distance comparable to the Bohr radius
away from the nucleus, the standard deviation in the x-position turns out to be
exactly one Bohr radius. (The same of course for the standard deviations in the
y- and z-positions away from the nucleus.)
In general, the standard deviation is the quantitative measure for how much
uncertainty there is in a physical value. If the standard deviation is very small
compared to what you are interested in, it is probably OK to use the expectation
value as a classical value. It is perfectly fine to say that the electron of the
hydrogen atom that you are measuring is in your lab but it is not OK to say
that it has countless electron volts of negative potential energy because it is in
the nucleus.
This section discusses how to find expectation values and standard deviations
after a brief introduction to the underlying ideas of statistics.

Key Points
3.3. EXPECTATION VALUE AND STANDARD DEVIATION 85

⋄ The expectation value is the average value obtained when doing mea-
surements on a large number of initially identical systems. It is as
close as quantum mechanics can come to having classical values for
uncertain physical quantities.
⋄ The standard deviation is how far the individual measurements on av-
erage deviate from the expectation value. It is the quantitative measure
of uncertainty in quantum mechanics.

3.3.1 Statistics of a die


Since it seems to us humans as if, in Einstein’s words, God is playing dice with
the universe, it may be a worthwhile idea to examine the statistics of a die first.
For a fair die, each of the six numbers will, on average, show up a fraction
1/6 of the number of throws. In other words, each face has a probability of 1/6.
The average value of a large number of throws is called the expectation value.
For a fair die, the expectation value is 3.5. After all, number 1 will show up in
about 1/6 of the throws, as will numbers 2 through 6, so the average is
(number of throws) × ( 16 1 + 16 2 + 61 3 + 16 4 + 61 5 + 16 6)
= 3.5
number of throws
The general rule to get the expectation value is to sum the probability for each
value times the value. In this example:
1
6
1 + 16 2 + 61 3 + 16 4 + 61 5 + 16 6 = 3.5

Note that the name “expectation value” is very poorly chosen. Even though
the average value of a lot of throws will be 3.5, you would surely not expect to
throw 3.5. But it is probably too late to change the name now.
The maximum possible deviation from the expectation value does of course
occur when you throw a 1 or a 6; the absolute deviation is then |1 − 3.5| =
|6 − 3.5| = 2.5. It means that the possible values produced by a throw can
deviate as much as 2.5 from the expectation value.
However, the maximum possible deviation from the average is not a useful
concept for quantities like position, or for the energy levels of the harmonic
oscillator, where the possible values extend all the way to infinity. So, instead
of the maximum deviation from the expectation value, some average deviation
is better. The most useful of those is called the “standard deviation”, denoted
by σ. It is found in two steps: first the average square deviation from the
expectation value is computed, and then a square root is taken of that. For the
die that works out to be:

σ = [ 16 (1 − 3.5)2 + 16 (2 − 3.5)2 + 61 (3 − 3.5)2 +


86 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

1
6
(4 − 3.5)2 + 61 (5 − 3.5)2 + 16 (6 − 3.5)2 ]1/2
= 1.71
On average then, the throws are 1.71 points off from 3.5.

Key Points
⋄ The expectation value is obtained by summing the possible values times
their probabilities.
⋄ To get the standard deviation, first find the average square deviation
from the expectation value, then take a square root of that.

3.3.1 Review Questions


1 Suppose you toss a coin a large number of times, and count heads as
one, tails as two. What will be the expectation value?
2 Continuing this example, what will be the maximum deviation?
3 Continuing this example, what will be the standard deviation?
4 Have I got a die for you! By means of a small piece of lead integrated
into its light-weight structure, it does away with that old-fashioned
uncertainty. It comes up six every time! What will be the expectation
value of your throws? What will be the standard deviation?

3.3.2 Statistics of quantum operators


The expectation values of the operators of quantum mechanics are defined in
the same way as those for the die.
Consider an arbitrary physical quantity, call it a, and assume it has an
associated operator A. For example, if the physical quantity a is the total
energy E, A will be the Hamiltonian H.
The equivalent of the face values of the die are the values that the quantity a
can take, and according to the orthodox interpretation, that are the eigenvalues
a1 , a2 , a 3 , . . .
of the operator A.
Next, the probabilities of getting those values are according to quantum
mechanics the square magnitudes of the coefficients when the wave function is
written in terms of the eigenfunctions of A. In other words, if α1 , α2 , α3 , . . . are
the eigenfunctions of operator A, and the wave function is
Ψ = c1 α1 + c2 α2 + c3 α3 + . . .
3.3. EXPECTATION VALUE AND STANDARD DEVIATION 87

then |c1 |2 is the probability of value a1 , |c2 |2 the probability of value a2 , etcetera.
The expectation value is written as hai, or as hAi, whatever is more appeal-
ing. Like for the die, it is found as the sum of the probability of each value
times the value:
hai = |c1 |2 a1 + |c2 |2 a2 + |c3 |2 a3 + . . .
Of course, the eigenfunctions might be numbered using multiple indices;
that does not really make a difference. For example, the eigenfunctions ψnlm of
the hydrogen atom are numbered with three indices. In that case, if the wave
function of the hydrogen atom is
Ψ = c100 ψ100 + c200 ψ200 + c210 ψ210 + c211 ψ211 + c21−1 ψ21−1 + c300 ψ300 + . . .
then the expectation value for energy will be, noting that E1 = −13.6 eV,
E2 = −3.4 eV, ...:
hEi = −|c100 |2 13.6 eV − |c200 |2 3.4 eV − |c210 |2 3.4 eV − |c211 |2 3.4 eV − . . .
Also, the expectation value of the square angular momentum will be, recalling
that its eigenvalues are l(l + 1)h̄2 ,
hL2 i = |c100 |2 0 + |c200 |2 0 + |c210 |2 2h̄2 + |c211 |2 2h̄2 + |c21−1 |2 2h̄2 + |c300 |2 0 + . . .
Also, the expectation value of the z-component of angular momentum will be,
recalling that its eigenvalues are mh̄,
hLz i = |c100 |2 0 + |c200 |2 0 + |c210 |2 0 + |c211 |2 h̄ − |c21−1 |2 h̄ + |c300 |2 0 + . . .

Key Points
⋄ The expectation value of a physical quantity is found by summing its
eigenvalues times the probability of measuring that eigenvalue.
⋄ To find the probabilities of the eigenvalues, the wave function Ψ can
be written in terms of the eigenfunctions of the physical quantity. The
probabilities will be the square magnitudes of the coefficients of the
eigenfunctions.

3.3.2 Review Questions


1 The 2px pointer state of the hydrogen atom was defined as
1
√ (−ψ211 + ψ21−1 ) .
2
What are the expectation values of energy, square angular momen-
tum, and z-angular momentum for this state?
2 Continuing the previous question, what are the standard deviations
in energy, square angular momentum, and z-angular momentum?
88 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

3.3.3 Simplified expressions


The procedure described in the previous section to find the expectation value
of a quantity is unwieldy: it requires that first the eigenfunctions of the quan-
tity are found, and next that the wave function is written in terms of those
eigenfunctions. There is a quicker way.
Assume that you want to find the expectation value, hai or hAi, of some
quantity a with associated operator A. The simpler way to do it is as an inner
product:
hAi = hΨ|A|Ψi. (3.25)
(Recall that hΨ|A|Ψi is just the inner product hΨ|AΨi; the additional separating
bar is often visually convenient, though.) This formula for the expectation value
is easily remembered as “leaving out Ψ” from the inner product bracket. The
reason that hΨ|A|Ψi works for getting the expectation value is given in note
{A.18}.
The simplified expression for the expectation value can also be used to find
the standard deviation, σA or σa :
q
σA = h(A − hAi)2 i (3.26)

where h(A − hAi)2 i is the inner product hΨ|(A − hAi)2 |Ψi.

Key Points
⋄ The expectation value of a quantity a with operator A can be found
as hAi = hΨ|A|Ψi.
p
⋄ Similarly, the standard deviation can be found as σA = h(A − hAi)2 i.

3.3.3 Review Questions


1 The 2px pointer state of the hydrogen atom was defined as
1
√ (−ψ211 + ψ21−1 ) .
2
where both ψ211 and ψ21−1 are eigenfunctions of the total energy
Hamiltonian H with eigenvalue E2 and of square angular momen-
tum L b 2 with eigenvalue 2h̄2 ; however, ψ211 is an eigenfunction of
z-angular momentum L b z with eigenvalue h̄, while ψ21−1 is one with
eigenvalue −h̄. Evaluate the expectation values of energy, square an-
gular momentum, and z-angular momentum in the 2px state using
inner products. (Of course, since 2px is already written out in terms
of the eigenfunctions, there is no simplification in this case.)
3.3. EXPECTATION VALUE AND STANDARD DEVIATION 89

2 Continuing the previous question, evaluate the standard deviations


in energy, square angular momentum, and z-angular momentum in
the 2px state using inner products.

3.3.4 Some examples


This section gives some examples of expectation values and standard deviations
for known wave functions.
First consider the expectation value of the energy of the hydrogen atom in
its ground state ψ100 . The ground state is an energy eigenfunction with the
lowest possible energy level E1 = −13.6 eV as eigenvalue. So, according to
the orthodox interpretation, energy measurements of the ground state can only
return the value E1 , with 100% certainty.
Clearly, if all measurements return the value E1 , then the average value must
be that value too. So the expectation value hEi should be E1 . In addition, the
measurements will never deviate from the value E1 , so the standard deviation
σE should be zero.
It is instructive to check those conclusions using the simplified expressions
for expectation values and standard deviations from the previous subsection.
The expectation value can be found as:

hEi = hHi = hΨ|H|Ψi

In the ground state


Ψ = c100 ψ100
where c100 is a constant of magnitude one, and ψ100 is the ground state eigen-
function of the Hamiltonian H with the lowest eigenvalue E1 . Substituting this
Ψ, the expectation value of the energy becomes

hEi = hc100 ψ100 |Hc100 ψ100 i = c∗100 c100 hψ100 |E1 ψ100 i = c∗100 c100 E1 hψ100 |ψ100 i

since Hψ100 = E1 ψ100 by the definition of eigenfunction. Note that constants


come out of the inner product bra as their complex conjugate, but unchanged
out of the ket. The final expression shows that hEi = E1 as it should, since
c100 has magnitude one, while hψ100 |ψ100 i = 1 because proper eigenfunctions are
normalized to one. So the expectation value checks out OK.
The standard deviation
q
σE = h(H − hEi)2 i

checks out OK too: q


σE = hψ100 |(H − E1 )2 ψ100 i
90 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

and since Hψ100 = E1 ψ100 , you have that (H − E1 )ψ100 is zero, so σE is zero as
it should be.
In general,
If the wave function is an eigenfunction of the measured variable, the
expectation value will be the eigenvalue, and the standard deviation
will be zero.
To get uncertainty, in other words, a nonzero standard deviation, the wave
function should not be an eigenfunction of the quantity being measured.
For example, the ground state of the hydrogen atom is an energy eigenfunc-
tion, but not an eigenfunction of the position operators. The expectation value
for the position coordinate x can still be found as an inner product:
Z Z Z
hxi = hψ100 |xbψ100 i = x|ψ100 |2 dxdydz.

This integral is zero. The reason is that |ψ100 |2 , shown as grey scale in figure
3.3, is symmetric around x = 0; it has the same value at a negative value of x as
at the corresponding positive value. Since the factor x in the integrand changes
sign, integration values at negative x cancel out against those at positive x. So
hxi = 0.
The position coordinates y and z go the same way, and it follows that the
expectation value of position is at (x, y, z) = (0, 0, 0); the expectation position
of the electron is in nucleus.
In fact, all basic energy eigenfunctions ψnlm of the hydrogen atom, like figures
3.3, 3.4, 3.5, 3.6, as well as the combination states 2px and 2py of figure 3.7,
have a symmetric probability distribution, and all have the expectation value of
position in the nucleus. (For the hybrid states discussed later, that is no longer
true.)
But don’t really expect to ever find the electron in the negligible small
nucleus! You will find it at locations that are on average one standard deviation
away from it. For example, in the ground state
sZ Z Z
q q
σx = h(x − hxi)2 i = hx2 i = x2 |ψ100 (x, y, z)|2 dxdydz

which is positive since the integrand is everywhere positive. So, the results of
x-position measurements are uncertain, even though they average out to the
nominal position x = 0. The negative experimental results for x average away
against the positive ones. The same is true in the y- and z-directions. Thus the
expectation position becomes the nucleus even though the electron will really
never be found there.
If you actually do the integral above, (it is not difficult in spherical coordi-
nates,) you find that the standard deviation in x equals the Bohr radius. So on
3.4. THE COMMUTATOR 91

average, the electron will be found at an x-distance equal to the Bohr radius
away from the nucleus. Similar deviations will occur in the y and z directions.
The expectation value of linear momentum in the ground state can be found
from the linear momentum operator pbx = h̄∂/i∂x:
Z Z Z
h̄ ∂ψ100 h̄ Z Z Z ∂ 12 ψ100
2
hpx i = hψ100 |pbx ψ100 i = ψ100 dxdydz = dxdydz
i ∂x i ∂x
This is again zero, since differentiation turns a symmetric function into an an-
tisymmetric one, one which changes sign between negative and corresponding
positive positions. Alternatively, just perform integration with respect to x,
noting that the wave function is zero at infinity.
More generally, the expectation value for linear momentum is zero for all
the energy eigenfunctions; that is a consequence of Ehrenfest’s theorem cov-
ered in chapter 5.1. The standard deviations are again nonzero, so that linear
momentum is uncertain like position is.
All these observations carry over in the same way to the eigenfunctions
ψnx ny nz of the harmonic oscillator. They too all have the expectation values of
position at the origin, in other words in the nucleus, and the expectation linear
momenta equal to zero.
If combinations of energy eigenfunctions are considered, it changes. Such
combinations may have nontrivial expectation positions and linear momenta.
A discussion will have to wait until chapter 5.

Key Points
⋄ Examples of certain and uncertain quantities were given for example
wave functions.
⋄ A quantity is certain when the wave function is an eigenfunction of
that quantity.

3.4 The Commutator


As the previous section discussed, the standard deviation σ is a measure of
the uncertainty of a property of a quantum system. The larger the standard
deviation, the farther typical measurements stray from the expected average
value. Quantum mechanics often requires a minimum amount of uncertainty
when more than one quantity is involved, like position and linear momentum
in Heisenberg’s uncertainty principle. In general, this amount of uncertainty is
related to an important mathematical object called the “commutator”, to be
discussed in this section.
92 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

3.4.1 Commuting operators


First, note that in many cases there is no fundamental prohibition against more
than one quantity having a definite value at the same time. For example, if the
electron of the hydrogen atom is in a ψnlm eigenstate, its total energy, square
angular momentum, and z-component of angular momentum all have precise
values at the same time.
More generally, two different quantities with operators A and B have precise
values if the wave function is an eigenfunction of both A and B. So, the question
whether two quantities can be certain at the same time is really whether their
operators A and B have common eigenfunctions. And it turns out that the
answer has to do with whether these operators “commute”, in other words, on
whether their order can be reversed as in AB = BA.
In particular, {A.19}:

Iff two Hermitian operators commute, there is a complete set of


eigenfunctions that is common to them both.

(For more than two operators, each operator has to commute with all others.)
For example, the operators Hx and Hy of the harmonic oscillator of chapter
2.6.2 commute:
" #" #
h̄2 ∂ 2 h̄2 ∂ 2
Hx Hy Ψ = − + 1 cx2 − + 1 cy 2 Ψ
2m ∂x2 2 2m ∂y 2 2
à !2
h̄2 ∂4Ψ h̄2 ∂ 2 21 cy 2 Ψ 1 2 h̄2 ∂ 2 Ψ 1 2 1 2
= − − 2 cx + 2 cx 2 cy Ψ
2m ∂x2 ∂y 2 2m ∂x2 2m ∂y 2
= Hy Hx Ψ

This is true since it makes no difference whether you differentiate Ψ first with
respect to x and then with respect to y or vice versa, and since the 21 cy 2 can be
pulled in front of the x-differentiations and the 21 cx2 can be pushed inside the
y-differentiations, and since multiplications can always be done in any order.
The same way, Hz commutes with Hx and Hy , and that means that H com-
mutes with them all, since H is just their sum. So, these four operators should
have a common set of eigenfunctions, and they do: it are the eigenfunctions
ψnx ny nz derived in chapter 2.6.2.
Similarly, for the hydrogen atom, the total energy Hamiltonian H, the square
angular momentum operator L b 2 and the z-component of angular momentum L b
z
all commute, and they have the common set of eigenfunctions ψnlm .
Note that such eigenfunctions are not necessarily the only game in town.
As a counter-example, for the hydrogen atom H, L b 2 , and the x-component of
angular momentum L b also all commute, and they too have a common set of
x
3.4. THE COMMUTATOR 93

b and L
eigenfunctions. But that will not be the ψnlm , since L b do not commute.
x z
(It will however be the ψnlm after you rotate them all 90 degrees around the
y-axis.) It would certainly be simpler mathematically if each operator had just
one unique set of eigenfunctions, but nature does not cooperate.

Key Points
⋄ Operators commute if you can change their order, as in AB = BA.
⋄ For commuting operators, a common set of eigenfunctions exists.
⋄ For those eigenfunctions, the physical quantities corresponding to the
commuting operators all have precise values at the same time.

3.4.1 Review Questions


1 The pointer state
1
2px = √ (−ψ211 + ψ21−1 ) .
2
b 2 , and L
is one of the eigenstates that H, L b x have in common. Check
b 2
that it is not an eigenstate that H, L , and L b z have in common.

3.4.2 Noncommuting operators and their commutator


Two quantities with operators that do not commute cannot in general have
definite values at the same time. If one has a value, the other is in general
uncertain.
The qualification “in general” is needed because there may be exceptions.
The angular momentum operators do not commute, but it is still possible for
the angular momentum to be zero in all three directions. But as soon as the
angular momentum in any direction is nonzero, only one component of angular
momentum can have a definite value.
A measure for the amount to which two operators A and B do not commute is
the difference between AB and BA; this difference is called their “commutator”
[A, B]:
[A, B] ≡ AB − BA (3.27)
A nonzero commutator [A, B] demands a minimum amount of uncertainty
in the corresponding quantities a and b. It can be shown, {A.20}, that the
uncertainties, or standard deviations, σa in a and σb in b are at least so large
that:
σa σb ≥ 12 |h[A, B]i| (3.28)
This equation is called the “generalized uncertainty relationship”.
94 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

Key Points
⋄ The commutator of two operators A and B equals AB − BA and is
written as [A, B].
⋄ The product of the uncertainties in two quantities is at least one half
the magnitude of the expectation value of their commutator.

3.4.3 The Heisenberg uncertainty relationship


This section will work out the uncertainty relationship of the previous subsection
for the position and linear momentum in an arbitrary direction. The result will
be a precise mathematical statement of the Heisenberg uncertainty principle.
To be specific, the arbitrary direction will be taken as the x-axis, so the
position operator will be xb, and the linear momentum operator pbx = h̄∂/i∂x.
These two operators do not commute, pbx xbΨ is simply not the same as xbpbx Ψ:
pbx xbΨ means multiply function Ψ by x to get the product function xΨ and then
apply pbx on that product, while xbpbx Ψ means apply pbx on Ψ and then multiply
the resulting function by x. The difference is found from writing it out:
h̄ ∂xΨ h̄ h̄ ∂Ψ
pbx xbΨ = = Ψ+ x = −ih̄Ψ + xbpbx Ψ
i ∂x i i ∂x
the second equality resulting from differentiating out the product.
Comparing start and end shows that the difference between xbpbx and pbx xb is
not zero, but ih̄. By definition, this difference is their commutator:

[xb, pbx ] = ih̄ (3.29)

This important result is called the “canonical commutation relation.” The com-
mutator of position and linear momentum in the same direction is the nonzero
constant ih̄.
Because the commutator is nonzero, there must be nonzero uncertainty in-
volved. Indeed, the generalized uncertainty relationship of the previous subsec-
tion becomes in this case:
σx σpx ≥ 12 h̄ (3.30)
This is the uncertainty relationship as first formulated by Heisenberg.
It implies that when the uncertainty in position σx is narrowed down to zero,
the uncertainty in momentum σpx must become infinite to keep their product
nonzero, and vice versa. More generally, you can narrow down the position of
a particle and you can narrow down its momentum. But you can never reduce
the product of the uncertainties σx and σpx below 21 h̄, whatever you do.
3.4. THE COMMUTATOR 95

It should be noted that the uncertainty relationship is often written as


∆px ∆x ≥ 12 h̄ or even as ∆px ∆x ≈ h̄ where ∆p and ∆x are taken to be vaguely
described “uncertainties” in momentum and position, rather than rigorously
defined standard deviations. And people write a corresponding uncertainty re-
lationship for time, ∆E∆t ≥ 12 h̄, because relativity suggests that time should
be treated just like space. But note that unlike the linear momentum operator,
the Hamiltonian is not at all universal. So, you might guess that the definition
of the “uncertainty” ∆t in time would not be universal either, and you would
be right. One common definition will be given later in chapter 5.1.4.

Key Points
⋄ The canonical commutator [x
b, pbx ] equals ih̄.
⋄ If either the uncertainty in position in a given direction or the uncer-
tainty in linear momentum in that direction is narrowed down to zero,
the other uncertainty blows up.
⋄ The product of the two uncertainties is at least the constant 21 h̄.

3.4.3 Review Questions


1 This sounds serious! If I am driving my car, the police requires me
to know my speed (linear momentum). Also, I would like to know
where I am. But neither is possible according to quantum mechanics.

3.4.4 Commutator reference [Reference]


It is a fact of life in quantum mechanics that commutators pop up all over the
place. Not just in uncertainty relations, but also in the time evolution of expec-
tation values, in angular momentum, and in quantum field theory, the advanced
theory of quantum mechanics used in solids and relativistic applications. This
section can make your life easier dealing with them. Browse through it to see
what is there. Then come back when you need it.
Recall the definition of the commutator [A, B] of any two operators A and
B:
[A, B] = AB − BA (3.31)
By this very definition , the commutator is zero for any two operators A1 and
A2 that commute, (whose order can be interchanged):

[A1 , A2 ] = 0 if A1 and A2 commute; A1 A2 = A2 A1 . (3.32)


96 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

If operators all commute, all their products commute too:

[A1 A2 . . . Ak , Ak+1 . . . An ] = 0 if A1 , A2 , . . . , Ak , Ak+1 , . . . , An all commute.


(3.33)
Everything commutes with itself, of course:

[A, A] = 0, (3.34)

and everything commutes with a numerical constant; if A is an operator and a


is some number, then:
[A, a] = [a, A] = 0. (3.35)
The commutator is “antisymmetric”; or in simpler words, if you interchange
the sides; it will change the sign, {A.21}:

[B, A] = −[A, B]. (3.36)

For the rest however, linear combinations multiply out just like you would ex-
pect:

[aA + bB, cC + dD] = ac[A, C] + ad[A, D] + bc[B, C] + bd[B, D], (3.37)

(in which it is assumed that A, B, C, and D are operators, and a, b, c, and d


numerical constants.)
To deal with commutators that involve products of operators, the rule to
remember is: “the first factor comes out at the front of the commutator, the
second at the back”. More precisely:

[AB, . . .] = A[B, . . .] + [A, . . .]B, [. . . , AB] = A[. . . , B] + [. . . , A]B.


¾ - ¾ -
(3.38)
So, if A or B commutes with the other side of the operator, it can simply be
taken out at at its side; (the second commutator will be zero.) For example,

[A1 B, A2 ] = A1 [B, A2 ], [BA1 , A2 ] = [B, A2 ]A1

if A1 and A2 commute.
Now from the general to the specific. Because changing sides in a commu-
tator merely changes its sign, from here on only one of the two possibilities will
be shown. First the position operators all mutually commute:

[xb, yb] = [yb, zb] = [zb, xb] = 0 (3.39)

as do position-dependent operators such as a potential energy V (x, y, z):

[xb, V (x, y, z)] = [yb, V (x, y, z)] = [zb, V (x, y, z)] = 0 (3.40)
3.4. THE COMMUTATOR 97

This illustrates that if a set of operators all commute, then all combinations of
those operators commute too.
The linear momentum operators all mutually commute:

[pbx , pby ] = [pby , pbz ] = [pbz , pbx ] = 0 (3.41)

However, position operators and linear momentum operators in the same direc-
tion do not commute; instead:

[xb, pbx ] = [yb, pby ] = [zb, pbz ] = ih̄ (3.42)

As seen in the previous subsection, this lack of commutation causes the Heisen-
berg uncertainty principle. Position and linear momentum operators in different
directions do commute:

[xb, pby ] = [xb, pbz ] = [yb, pbz ] = [yb, pbx ] = [zb, pbx ] = [zb, pby ] = 0 (3.43)

A generalization that is frequently very helpful is:


∂f ∂f ∂f
[f, pbx ] = ih̄ [f, pby ] = ih̄ [f, pbz ] = ih̄ (3.44)
∂x ∂y ∂z
where f is any function of x, y, and z.
Unlike linear momentum operators, angular momentum operators do not
mutually commute. The commutators are given by the so-called “ fundamental
commutation relations:”
b ,L
[L b ] = ih̄L
b b ,L
[L b ] = ih̄L
b b ,L
[L b ] = ih̄L
b (3.45)
x y z y z x z x y

Note the . . . xyzxyz . . . order of the indices that produces positive signs; a re-
versed . . . zyxzy . . . order adds a minus sign. For example [L b ,Lb ] = −ih̄Lb
z y x
because y following z is in reversed order.
The angular momentum components do all commute with the square angular
momentum operator:
b ,L
[L b 2 ] = [L
b ,Lb 2 ] = [L
b ,Lb 2 ] = 0 where L
b2 = L b2 + L
b2 + L b2 (3.46)
x y z x y z

Just the opposite of the situation for linear momentum, position and angular
momentum operators in the same direction commute,
b ] = [yb, L
[xb, L b ] = [zb, L
b ]=0 (3.47)
x y z

but those in different directions do not:


b ] = [L
[xb, L b , yb] = ih̄zb [yb, L
b ] = [L
b , zb] = ih̄x b ] = [L
b [zb, L b ,x
y x z y x z b] = ih̄y
b (3.48)
98 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

Square position commutes with all components of angular momentum,


b ] = [rb2 , L
[rb2 , L b ] = [rb2 , L
b ] = [rb2 , L
b 2] = 0 (3.49)
x y z

The commutator between position and square angular momentum is, using vec-
tor notation for conciseness,
b L
[~r, ~b = −2h̄2~rb + 2ih̄(~rb · ~r)
b 2 ] = −2h̄2~rb − 2ih̄~rb × L b ~pb − 2ih̄~r(
b ~rb · ~pb) (3.50)
The commutators between linear and angular momentum are very similar
to the ones between position and angular momentum:
b ] = [pb , L
[pbx , L b ] = [pb , L
b ]=0 (3.51)
x y y z z
b ] = [L
[pbx , L b , pb ] = ih̄pb b ] = [L
[pby , L b , pb ] = ih̄pb b ] = [L
[pbz , L b , pb ] = ih̄pb
y x y z z y z x x z x y
(3.52)
b ] = [pb2 , L
[pb2 , L b ] = [pb2 , L
b ] = [pb2 , L
b 2] = 0 (3.53)
x y z

b 2 ] = −2h̄2~pb − 2ih̄~pb × L
[~pb, L ~b = 2h̄2~pb + 2ih̄(~rb · ~pb)~pb − 2ih̄~r(
b ~pb · ~pb) (3.54)
The following commutators are also useful:
b
~ Lb 2 ] = 2ih̄~r L
b2 b 2 ], L
b 2 ] = 2h̄2 (~r L
b2 + L
b 2~r )
[~r × L, [[~r, L (3.55)
Commutators involving spin are discussed in a later chapter, 4.5.3.

Key Points
⋄ Rules for evaluating commutators were given.
⋄ Return to this subsection if you need to figure out some commutator
or the other.

3.5 The Hydrogen Molecular Ion


The hydrogen atom studied earlier is where full theoretical analysis stops.
Larger systems are just too difficult to solve analytically. Yet, it is often quite
possible to understand the solution of such systems using approximate argu-
ments. As an example, this section considers the H+ 2 -ion. This ion consists of
two protons and a single electron circling them. It will be shown that a chemical
bond forms that holds the ion together. The bond is a “covalent” one, in which
the protons share the electron.
The general approach will be to compute the energy of the ion, and to show
that the energy is less when the protons are sharing the electron as a molecule
than when they are far apart. This must mean that the molecule is stable:
energy must be expended to take the protons apart.
The approximate technique to be used to find the state of lowest energy is
a basic example of what is called a “variational method.”
3.5. THE HYDROGEN MOLECULAR ION 99

3.5.1 The Hamiltonian


First the Hamiltonian is needed. Since the protons are so much heavier than
the electron, to good approximation they can be considered fixed points in the
energy computation. That is called the “Born-Oppenheimer approximation”.
In this approximation, only the Hamiltonian of the electron is needed. It makes
things a lot simpler, which is why the Born-Oppenheimer approximation is a
common assumption in applications of quantum mechanics.
Compared to the Hamiltonian of the hydrogen atom of section 3.2.1, there
are now two terms to the potential energy, the electron experiencing attraction
to both protons:
h̄2 2 e2 1 e2 1
H=− ∇ − − (3.56)
2me 4πǫ0 rL 4πǫ0 rR
where rL and rR are the distances from the electron to the left and right protons,

rL ≡ |~r − ~rLp | rR ≡ |~r − ~rRp | (3.57)

with ~rLp the position of the left proton and ~rRp that of the right one.
The hydrogen ion in the Born-Oppenheimer approximation can be solved
analytically using “prolate spheroidal coordinates.” However, approximations
will be used here. For one thing, you learn more about the physics that way.

Key Points
⋄ In the Born-Oppenheimer approximation, the electronic structure is
computed assuming that the nuclei are at fixed positions.
⋄ The Hamiltonian in the Born-Oppenheimer approximation has been
found. It is above.

3.5.2 Energy when fully dissociated


The fully dissociated state is when the protons are very far apart and there
is no coherent molecule, as in figure 3.8. The best the electron can do under
those circumstances is to combine with either proton, say the left one, and form
a hydrogen atom in the ground state of lowest energy. In that case the right
proton will be alone. According to the solution for the hydrogen atom, the
electron loses 13.6 eV of energy by going in the ground state around the left
proton. Of course, it would lose the same energy going into the ground state
around the right proton, but for now, assume that it is around the left proton.
The wave function describing this state is just the ground state ψ100 derived
for the hydrogen atom, equation (3.22), but the distance should be measured
100 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

Figure 3.8: Hydrogen atom plus free proton far apart.

from the position ~rLp of the left proton instead of from the origin:

ψ = ψ100 (|~r − ~rLp |)

To shorten the notations, this wave function will be denoted by ψL :

ψL (~r) ≡ ψ100 (|~r − ~rLp |) (3.58)

Similarly the wave function that would describe the electron as being in the
ground state around the right proton will be denoted as ψR , with

ψR (~r) ≡ ψ100 (|~r − ~rRp |) (3.59)

Key Points
⋄ When the protons are far apart, there are two lowest energy states, ψL
and ψR , in which the electron is in the ground state around the left,
respectively right, proton. In either case there is an hydrogen atom
plus a free proton.

3.5.3 Energy when closer together

Figure 3.9: Hydrogen atom plus free proton closer together.


3.5. THE HYDROGEN MOLECULAR ION 101

When the protons get a bit closer to each other, but still well apart, the
distance rR between the electron orbiting the left proton and the right proton
decreases, as sketched in figure 3.9. The potential that the electron sees is now
not just that of the left proton; the distance rR is no longer so large that the
−e2 /4πǫ0 rR potential can be completely neglected.
However, assuming that the right proton stays sufficiently clear of the elec-
tron wave function, the distance rR between electron and right proton can still
be averaged out as being the same as the distance d between the two protons.
Within that approximation, it simply adds the constant −e2 /4πǫ0 d to the Hamil-
tonian of the electron. And adding a constant to a Hamiltonian does not change
the eigenfunction; it only changes the eigenvalue, the energy, by that constant.
So the ground state ψL of the left proton remains a good approximation to the
lowest energy wave function.
Moreover, the decrease in energy due to the electron/right proton attraction
is balanced by an increase in energy of the protons by their mutual repulsion, so
the total energy of the ion remains the same. In other words, the right proton
is to first approximation neither attracted nor repelled by the neutral hydrogen
atom on the left. To second approximation the right proton does change the
wave function of the electron a bit, resulting in some attraction, but this effect
will be ignored.
So far, it has been assumed that the electron is circling the left proton. But
the case that the electron is circling the right proton is of course physically
equivalent. In particular the energy must be exactly the same by symmetry.

Key Points
⋄ To first approximation, there is no attraction between the free proton
and the neutral hydrogen atom, even somewhat closer together.

3.5.4 States that share the electron


The approximate energy eigenfunction ψL that describes the electron as be-
ing around the left proton has the same energy as the eigenfunction ψR that
describes the electron as being around the right one. Therefor any linear com-
bination of the two,
ψ = aψL + bψR (3.60)
is also an eigenfunction with the same energy. In such combinations, the electron
is shared by the protons, in ways that depend on the chosen values of a and b.
Note that the constants a and b are not independent: the wave function
should be normalized, hψ|ψi = 1. Since ψL and ψR are already normalized, and
102 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

assuming that a and b are real, this works out to

haψL + bψR |aψL + bψR i = a2 + b2 + 2abhψL |ψR i = 1 (3.61)

As a consequence, only the ratio the coefficients a/b can be chosen freely.
A particularly interesting case is the “antisymmetric” one, b = −a. As figure
3.10 shows, in this state there is zero probability of finding the electron at the
symmetry plane midway in between the protons. The reason is that ψL and ψR

Figure 3.10: The electron being anti-symmetrically shared.

are equal at the symmetry plane, making their difference zero.


This is actually a quite weird result. You combine two states, in both of
which the electron has some probability of being at the symmetry plane, and in
the combination the electron has zero probability of being there. The probability
of finding the electron at any position, including the symmetry plane, in the first
state is given by |ψL |2 . Similarly, the probability of finding the electron in the
second state is given by |ψR |2 . But for the combined state nature does not
do the logical thing of adding the two probabilities together to come up with
1
2
|ψL |2 + 12 |ψR |2 .
Instead of adding physically observable probabilities, nature squares the un-
observable wave function aψL − aψR to find the new probability distribution.
The squaring adds a cross term, −2a2 ψL ψR , that simply adding probabilities
does not have. This term has the physical effect of preventing the electron to
be at the symmetry plane, but it does not have a normal physical explanation.
There is no force repelling the electrons from the symmetry plane or anything
like that. Yet it looks as if there is one in this state.
The most important combination of ψL and ψR is the “symmetric” one,
b = a. In this case, there is increased probability for the electron to be at the
symmetry plane, as shown in figure 3.11.
A state in which the electron is shared is truly a case of the electron being
in two different places at the same time. For if instead of sharing the electron,
each proton would be given its own half electron, the expression for the Bohr
radius, a0 = 4πǫ0 h̄2 /me e2 , shows that the eigenfunctions ψL and ψR would have
to blow up in radius by a factor four. The energy would then reduce by the
3.5. THE HYDROGEN MOLECULAR ION 103

Figure 3.11: The electron being symmetrically shared.

same factor four. That is simply not what happens. You get the physics of a
complete electron being present around each proton with 50% probability, not
the physics of half an electron being present for sure.

Key Points
⋄ This subsection brought home the physical weirdness arising from the
mathematics of the unobservable wave function.
⋄ In particular, within the approximations made, there exist states that
all have the same ground state energy, but whose physical properties
are dramatically different.
⋄ The protons may “share the electron.” In such states there is a prob-
ability of finding the electron around either proton.
⋄ Even if the protons share the electron equally as far as the probability
distribution is concerned, different physical states are still possible. In
the symmetric case that the wave functions around the protons have
the same sign, there is increased probability of the electron being found
in between the protons. In the antisymmetric case of opposite sign,
there is decreased probability of the electron being found in between
the protons.

3.5.5 Comparative energies of the states


The previous two subsections described states of the hydrogen molecular ion in
which the electron is around a single proton, as well as states in which it is shared
between protons. To the approximations made, all these states have the same
energy. Yet, if the expectation energy of the states is more accurately examined,
it turns out that increasingly large differences show up when the protons get
closer together. The symmetric state has the least energy, the antisymmetric
state the highest, and the states where the electron is around a single proton
have something in between.
104 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

It is not that easy to see physically why the symmetric state has the lowest
energy. An argument is often made that in the symmetric case, the electron
has increased probability of being in between the protons, where it is most
effective in pulling them together. However, actually the potential energy of the
symmetric state is higher than for the other states: putting the electron midway
in between the two protons means having to pull it away from one of them.
The Feynman lectures on physics, [9], argue instead that in the symmetric
case, the electron is somewhat less constrained in position. According to the
Heisenberg uncertainty relationship, that allows it to have less variation in mo-
mentum, hence less kinetic energy. Indeed the symmetric state does have less
kinetic energy, but this is almost totally achieved at the cost of a corresponding
increase in potential energy, rather than due to a larger area to move in at the
same potential energy. And the kinetic energy is not really directly related to
available area in any case. The argument is not incorrect, but in what sense it
explains, rather than just summarizes, the answer is debatable.

Key Points
⋄ The energies of the discussed states are not the same when examined
more closely.
⋄ The symmetric state has the lowest energy, the antisymmetric one the
highest.

3.5.6 Variational approximation of the ground state


The objective of this subsection is to use the rough approximations of the pre-
vious subsections to get some very concrete data on the hydrogen molecular
ion.
The idea is simple but powerful: since the true ground state is the state
of lowest energy among all wave functions, the best among approximate wave
functions is the one with the lowest energy. In the previous subsections, ap-
proximations to the ground state were discussed that took the form aψL + bψR ,
where ψL described the state where the electron was in the ground state around
the left proton, and ψR where it was around the right proton. The wave function
of this type with the lowest energy will produce the best possible data on the
true ground state, {A.22}.
Note that all that can be changed in the approximation aψL + bψR to the
wave function is the ratio of the coefficients a/b, and the distance between the
protons d. If the ratio a/b is fixed, a and b can be computed from it using the
normalization condition (3.61), so there is no freedom to chose them individually.
3.5. THE HYDROGEN MOLECULAR ION 105

The basic idea is now to search through all possible values of a/b and d until
you find the values that give the lowest energy.
This sort of method is called a “variational method” because at the minimum
of energy, the derivatives of the energy must be zero. That in turn means that
the energy does not vary with infinitesimally small changes in the parameters
a/b and d.
To find the minimum energy is nothing that an engineering graduate student
could not do, but it does take some effort. You cannot find the best values of
a/b and d analytically; you have to have a computer find the energy at a lot
of values of d and a/b and search through them to find the lowest energy. Or
actually, simply having a computer print out a table of values of energy versus
d for a few typical values of a/b, including a/b = 1 and a/b = −1, and looking
at the print-out to see where the energy is most negative works fine too. That
is what the numbers below came from.
You do want to evaluate the energy of the approximate states accurately as
the expectation value. If you do not find the energy as the expectation value,
the results may be less dependable. Fortunately, finding the expectation energy
for the given approximate wave functions can be done exactly; the details are
in note {A.23}.
If you actually go through the steps, your print-out should show that the
minimum energy occurs when a = b, the symmetric state, and at a separation
distance between the protons equal to about 1.3 Å. This separation distance is
called the “bond length”. The minimum energy is found to be about 1.8 eV
below the energy of -13.6 eV when the protons are far apart. So it will take
at least 1.8 eV to take the ground state with the protons at a distance of 1.3
Å completely apart into well separated protons. For that reason, the 1.8 eV is
called the “binding energy”.

Key Points
⋄ The best approximation to the ground state using approximate wave
functions is the one with the lowest energy.
⋄ Making such an approximation is called a variational method.
⋄ The energy should be evaluated as the expectation value of the Hamil-
tonian.
⋄ Using combinations of ψL and ψR as approximate wave functions, the
approximate ground state turns out to be the one in which the electron
is symmetrically shared between the protons.
⋄ The binding energy is the energy required to take the molecule apart.
⋄ The bond length is the distance between the nuclei.
106 CHAPTER 3. SINGLE-PARTICLE SYSTEMS

3.5.6 Review Questions


1 The solution for the hydrogen molecular ion requires elaborate eval-
uations of inner product integrals and a computer evaluation of the
state of lowest energy. As a much simpler example, you can try out
the variational method on the one-dimensional case of a particle stuck
inside a pipe, as discussed in chapter 2.5. Take the approximate wave
function to be:
ψ = ax(ℓ − x)
Find a from the normalization requirement that the total probability
of finding the particle integrated over all possible x-positions is one.
Then evaluate the energy hEi as hψ|H|ψi, where according to chapter
2.5.3, the Hamiltonian is

h̄2 ∂ 2
H=−
2m ∂x2
Compare the ground state energy with the exact value,

E1 = h̄2 π 2 /2mℓ2
Rℓ Rℓ
(Hints: 0 x(ℓ − x) dx = ℓ3 /6 and 0 x2 (ℓ − x)2 dx = ℓ5 /30)

3.5.7 Comparison with the exact ground state


The variational solution derived in the previous subsection is only a crude ap-
proximation of the true ground state of the hydrogen molecular ion. In par-
ticular, the assumption that the molecular wave function can be approximated
using the individual atom ground states is only valid when the protons are far
apart, and is inaccurate if they are 1.3 Å apart, as the solution says they are.
Yet, for such a poor wave function, the main results are surprisingly good.
For one thing, it leaves no doubt that a bound state really exists. The reason is
that the true ground state must always have a lower energy than any approxi-
mate one. So, the binding energy must be at least the 1.8 eV predicted by the
approximation.
In fact, the experimental binding energy is 2.8 eV. The found approximate
value is only a third less, pretty good for such a simplistic assumption for the
wave function. It is really even better than that, since a fair comparison requires
the absolute energies to be compared, rather than just the binding energy; the
approximate solution has −15.4 eV, rather than −16.4. This high accuracy
for the energy using only marginal wave functions is one of the advantages of
variational methods {A.24}.
3.5. THE HYDROGEN MOLECULAR ION 107

The estimated bond length is not too bad either; experimentally the protons
are 1.06 Å apart instead of 1.3 Å. (The analytical solution using spheroidal co-
ordinates mentioned earlier gives 2.79 eV and 1.06 Å, in good agreement with
the experimental values. But even that solution is not really exact: the electron
does not bind the nuclei together rigidly, but more like a spring force. As a re-
sult, the nuclei behave like a harmonic oscillator around their common center of
gravity. Even in the ground state, they will retain some uncertainty around the
1.06 Å position of minimal energy, and a corresponding small amount of addi-
tional molecular kinetic and potential energy. The improved Born-Oppenheimer
approximation of chapter 6.2.3 can be used to compute such effects.)
The qualitative properties of the approximate wave function are correct. For
example, it can be seen that the exact ground state wave function must be real
and positive {A.25}; the approximate wave function is real and positive too.
It can also be seen that the exact ground state must be symmetric around
the symmetry plane midway between the protons, and rotationally symmetric
around the line connecting the protons, {A.26}. The approximate wave function
has both those properties too.
Incidentally, the fact that the ground state wave function must be real and
positive is a much more solid reason that the protons must share the elec-
tron symmetrically than the physical arguments given in subsection 3.5.5, even
though it is more mathematical.

Key Points
⋄ The obtained approximate ground state is pretty good.
⋄ The protons really share the electron symmetrically in the ground state.
Chapter 4

Multiple-Particle Systems

So far, only wave functions for single particles have been discussed. This chapter
explains how the ideas generalize to more particles.

4.1 Wave Function for Multiple Particles


While a single particle is described by a wave function Ψ(~r; t), a system of two
particles, call them 1 and 2, is described by a wave function
Ψ(~r1 ,~r2 ; t) (4.1)
depending on both particle positions. The value of |Ψ(~r1 ,~r2 ; t)|2 d3~r1 d3~r2 gives
the probability of simultaneously finding particle 1 within a vicinity d3~r1 of ~r1
and particle 2 within a vicinity d3~r2 of ~r2 .
The wave function must be normalized to express that the electrons must
be somewhere: Z Z
hΨ|Ψi6 = |Ψ(~r1 ,~r2 ; t)|2 d3~r1 d3~r2 = 1 (4.2)
where the subscript 6 of the inner product is just a reminder that the integration
is over all six scalar position coordinates of Ψ.
The underlying idea of increasing system size is “every possible combina-
tion:” allow for every possible combination of state for particle 1 and state for
particle 2. For example, in one dimension, all possible x-positions of particle
1 geometrically form an x1 -axis. Similarly all possible x-positions of particle
2 form an x2 -axis. If every possible position x1 is separately combined with
every possible position x2 , the result is an x1 , x2 -plane of possible positions of
the combined system.
Similarly, in three dimensions the three-dimensional space of positions ~r1
combines with the three-dimensional space of positions ~r2 into a six-dimensional
space having all possible combinations of values for ~r1 with all possible values
for ~r2 .

109
110 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

The increase in the number of dimensions when the system size increases
is a major practical problem for quantum mechanics. For example, a single
arsenic atom has 33 electrons, and each electron has 3 position coordinates. It
follows that the wave function is a function of 99 scalar variables. (Not even
counting the nucleus, spin, etcetera.) In a brute-force numerical solution of
the wave function, maybe you could restrict each position coordinate to only
ten computational values, if no very high accuracy is desired. Even then, Ψ
values at 1099 different combined positions must be stored, requiring maybe
1091 Gigabytes of storage. To do a single multiplication on each of those those
numbers within a few years would require a computer with a speed of 1082
Gigaflops. No need to take any of that arsenic to be long dead before an answer
is obtained. (Imagine what it would take to compute a microgram of arsenic
instead of an atom.) Obviously, more clever numerical procedures are needed.
Sometimes the problem size can be reduced. In particular, the problem for
a two-particle system like the proton-electron hydrogen atom can be reduced to
that of a single particle using the concept of effective mass. That is shown in
note {A.16}.

Key Points
⋄ To describe multiple-particle systems, just keep adding more indepen-
dent variables to the wave function.
⋄ Unfortunately, this makes many-particle problems impossible to solve
by brute force.

4.1 Review Questions


1 A simple form that a six-dimensional wave function can take is a
product of two three-dimensional ones, as in ψ(~r1 ,~r2 ) = ψ1 (~r1 )ψ2 (~r2 ).
Show that if ψ1 and ψ2 are normalized, then so is ψ.
2 Show that for a simple product wave function as in the previous
question, the relative probabilities of finding particle 1 near a position
~ra versus finding it near another position ~rb is the same regardless
where particle 2 is. (Or rather, where particle 2 is likely to be found.)
Note: This is the reason that a simple product wave function is called
“uncorrelated.” For particles that interact with each other, an un-
correlated wave function is often not a good approximation. For
example, two electrons repel each other. All else being the same,
the electrons would rather be at positions where the other electron is
nowhere close. As a result, it really makes a difference for electron 1
where electron 2 is likely to be and vice-versa. To handle such situa-
tions, usually sums of product wave functions are used. However, for
4.2. THE HYDROGEN MOLECULE 111

some cases, like for the helium atom, a single product wave function
is a perfectly acceptable first approximation. Real-life electrons are
crowded together around attracting nuclei and learn to live with each
other.

4.2 The Hydrogen Molecule


This section uses similar approximations as for the hydrogen molecular ion of
chapter 3.5 to examine the neutral H2 hydrogen molecule. This molecule has
two electrons circling two protons. It will turn out that in the ground state, the
protons share the two electrons, rather than each being assigned one. This is
typical of covalent bonds.
Of course, “share” is a vague term, but the discussion will show what it
really means in terms of the six-dimensional electron wave function.

4.2.1 The Hamiltonian


Just like for the hydrogen molecular ion of chapter 3.5, for the neutral molecule
the Born-Oppenheimer approximation will be made that the protons are at given
fixed points. So the problem simplifies to just finding the wave function of the
two electrons, Ψ(~r1 ,~r2 ), where ~r1 and ~r2 are the positions of the two electrons
1 and 2. In terms of scalar arguments, the wave function can be written out
further as Ψ(x1 , y1 , z1 , x2 , y2 , z2 ).
In the Hamiltonian, following the Newtonian analogy the kinetic and poten-
tial energy operators simply add:
à !
h̄2 ³ 2 2
´ e2 1 1 1 1 1
H=− ∇1 + ∇2 − + + + − (4.3)
2me 4πǫ0 r1L r1R r2L r2R |~r1 − ~r2 |

In this expression, the Laplacians of the first two, kinetic energy, terms are
with respect to the position coordinates of the two electrons:

∂2 ∂2 ∂2 ∂2 ∂2 ∂2
∇21 = + + ∇22 = + + .
∂x21 ∂y12 ∂z12 ∂x22 ∂y22 ∂zz2

The next four terms in the Hamiltonian (4.3) are the attractive potentials be-
tween the electrons and the protons, with r1L , r2L , r1R , and r2R being the
distances between electrons 1 and 2 and the left, respectively right proton. The
final term represents the repulsive potential between the two electrons.

Key Points
112 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

⋄ The Hamiltonian for the 6-dimensional electron wave function has been
written down.

4.2.1 Review Questions


1 Verify that the repulsive potential between the electrons is infinitely
large when the electrons are at the same position.
Note: You might therefor think that the wave function needs to be
zero at the locations in six-dimensional space where ~r1 = ~r2 . Some
authors refer to that as a “Coulomb hole.” But the truth is that in
quantum mechanics, electrons are smeared out due to uncertainty.
That causes electron 1 to “see electron 2 at all sides”, and vice-versa,
and they do therefor not encounter any unusually large potential
when the wave function is nonzero at ~r1 = ~r2 . In general, it is just
not worth the trouble for the electrons to stay away from the same
position: that would reduce their uncertainty in position, increasing
their uncertainty-demanded kinetic energy.
2 Note that the total kinetic energy term is simply a multiple of the
six-dimensional Laplacian operator. It treats all Cartesian position
coordinates exactly the same, regardless of which direction or which
electron it is. Is this still the case if other particles are involved?

4.2.2 Initial approximation to the lowest energy state


The next step is to identify an approximate ground state for the hydrogen
molecule. Following the same approach as in chapter 3.5, it will first be assumed
that the protons are relatively far apart. One obvious approximate solution is
then that of two neutral atoms, say the one in which electron 1 is around the
left proton in its ground state and electron 2 is around the right one.
To formulate the wave function for that, the shorthand notation ψL will
again be used for the wave function of a single electron that in the ground state
around the left proton and ψR for one that is in the ground state around the
right hand one:
ψL (~r) ≡ ψ100 (|~r − ~rLp |) ψR (~r) ≡ ψ100 (|~r − ~rRp |)
where ψ100 is the hydrogen atom ground state (3.22), and ~rLp and ~rRp are the
positions of the left and right protons.
The wave function that describes that electron 1 is in the ground state around
the left proton and electron 2 around the right one will be approximated to be
the product of the single electron states:
ψ(~r1 ,~r2 ) = ψL (~r1 )ψR (~r2 )
4.2. THE HYDROGEN MOLECULE 113

Taking the combined wave function as a product of single electron states


is really equivalent to an assumption that the two electrons are independent.
Indeed, for the product state, the probability of finding electron 1 at position
~r1 and electron 2 at ~r2 is:

|ψL (~r1 )|2 d3~r1 × |ψR (~r2 )| d3~r2

or in words:
[probability of finding 1 at ~r1 unaffected by where 2 is]
× [probability of finding 2 at ~r2 unaffected by where 1 is]

Such product probabilities are characteristic of statistically independent quan-


tities. As a simple example, the chances of getting a three in the first throw of
a die and a five in the second throw are 61 × 61 or 1 in 36. Throwing the three
does not affect the chances of getting a five in the second throw.

Key Points
⋄ When the protons are well apart, an approximate ground state is that
of two neutral atoms.
⋄ Single electron wave functions for that case are ψL and ψR .
⋄ The complete wave function for that case is ψL (~r1 )ψR (~r2 ), assuming
that electron 1 is around the left proton and electron 2 around the
right one.

4.2.2 Review Questions


1 If electron 2 does not affect where electron 1 is likely to be, how would
a grey-scale picture of the probability of finding electron 1 look?
2 When the protons are close to each other, the electrons do affect each
other, and the wave function above is no longer valid. But suppose
you were given the true wave function, and you were once again asked
to draw the blob showing the probability of finding electron 1 (using
a plotting package, say). What would the big problem be?

4.2.3 The probability density


For multiple-particle systems like the electrons of the hydrogen molecule, show-
ing the magnitude of the wave function as grey tones no longer works since it is
a function in six-dimensional space. You cannot visualize six-dimensional space.
114 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

However, at every spatial position ~r in normal space, you can instead show the
“probability density” n(~r), which is the probability per unit volume of finding
either electron in a vicinity d3~r of the point. This probability is found as
Z Z
3
n(~r) = 2
|Ψ(~r,~r2 )| d ~r2 + |Ψ(~r1 ,~r)|2 d3~r1 (4.4)

since the first integral gives the probability of finding electron 1 at ~r regardless
of where electron 2 is, (i.e. integrated over all possible positions for electron 2),
and the second gives the probability of finding 2 at ~r regardless of where 1 is.
Since d3~r is vanishingly small, the chances of finding both particles in it at the
same time are zero.
The probability density n(~r) for state ψL (~r1 )ψR (~r2 ) with electron 1 around
the left proton and electron 2 around the right one is shown in figure 4.1. Of
course the probability density for the state ψR (~r1 )ψL (~r2 ) with the electrons
exchanged would look exactly the same.

Figure 4.1: State with two neutral atoms.

Key Points
⋄ The probability density is the probability per unit volume of finding
an electron, whichever one, near a given point.

4.2.3 Review Questions


1 Suppose, given the wave function ψL (~r1 )ψR (~r2 ), you found an elec-
tron near the left proton. What electron would it probably be? Sup-
pose you found an electron at the point halfway in between the pro-
tons. What electron would that likely be?

4.2.4 States that share the electrons


This section will examine the states where the protons share the two electrons.
4.2. THE HYDROGEN MOLECULE 115

The first thing is to shorten the notations a bit. So, the state ψL (~r1 )ψR (~r2 )
which describes that electron 1 is around the left proton and electron 2 around
the right one will be indicated by ψL ψR , using the convention that the first
factor refers to electron 1 and the second to electron 2. In this convention, the
state where electron 1 is around the right proton and electron 2 around the left
one is ψR ψL , shorthand for ψR (~r1 )ψL (~r2 ). It is of course physically the same
thing as ψL ψR ; the two electrons are identical.
The “every possible combination” idea of combining every possible state for
electron 1 with every possible state for electron 2 would suggest that the states
ψL ψL and ψR ψR should also be included. But these states have the electrons
around the same proton, and that is not going to be energetically favorable due
to the mutual repulsion of the electrons. So they are not useful for finding a
simple approximate ground state of lowest energy.
States where the electrons are no longer assigned to a particular proton can
be found as linear combinations of ψL ψR and ψR ψL :

ψ = aψL ψR + bψR ψL (4.5)

In such a combination each electron has a probability of being found about


either proton, but wherever it is found, the other electron will be around the
other proton.
The eigenfunction must be normalized, which noting that ψL and ψR are
real and normalized produces

hψ|ψi6 = haψL ψR + bψR ψL |aψL ψR + bψR ψL i = a2 + b2 + 2abhψL |ψR i2 = 1 (4.6)

assuming that a and b are real. As a result, only the ratio a/b can be chosen
freely. The probability density of the combination can be found to be:
n o
n = ψL2 + ψR2 + 2abhψL |ψR i 2ψL ψR − hψL |ψR i(ψL2 + ψR2 ) (4.7)

The most important combination state is the one with b = a:

ψ(~r1 ,~r2 ) = a [ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 )] (4.8)

This state is called “symmetric with respect to exchanging electron 1 with elec-
tron 2,” or more precisely, with respect to replacing ~r1 by ~r2 and vice-versa.
Such an exchange does not change this wave function at all. If you change ~r1
into ~r2 and vice-versa, you still end up with the same wave function.
The probability density of this wave function looks like figure 4.2. It has
increased likelihood for electrons to be found in between the protons, compared
to figure 4.1 in which each proton had its own electron.
The state with b = −a,

ψ(~r1 ,~r2 ) = a [ψL (~r1 )ψR (~r2 ) − ψR (~r1 )ψL (~r2 )] (4.9)
116 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

Figure 4.2: Symmetric sharing of the electrons.

is called “antisymmetric” with respect to exchanging electron 1 with electron 2:


swapping ~r1 and ~r2 changes the sign of wave function, but leaves it further un-
changed. As seen in figure 4.3, the antisymmetric state has decreased likelihood
for electrons to be found in between the protons.

Figure 4.3: Antisymmetric sharing of the electrons.

Key Points
⋄ In state ψL ψR , the electron numbered 1 is around the left proton and
2 around the right one.
⋄ In state ψR ψL , the electron numbered 1 is around the right proton and
2 around the left one.
⋄ In the symmetric state a(ψL ψR +ψR ψL ) the protons share the electrons
equally; each electron has an equal chance of being found around either
proton. In this state there is increased probability of finding an electron
somewhere in between the protons.
⋄ In the antisymmetric state a(ψL ψR − ψR ψL ) the protons also share
the electrons equally; each electron has again an equal chance of be-
ing found around either proton. But in this state there is decreased
probability of finding an electron somewhere in between the protons.
⋄ So, like for the molecular ion, at large proton separations the weird trick
of shuffling unobservable wave functions around does again produce
different physical states with pretty much the same energy.
4.2. THE HYDROGEN MOLECULE 117

4.2.4 Review Questions


1 Obviously, the visual difference between the various states is minor.
It may even seem counter-intuitive that there is any difference at all:
the states ψL ψR and ψR ψL are exactly the same physically, with one
electron around each proton. So why would their combinations be
any different?
The quantum difference would be much more clear if you could see
the full 6-dimensional wave function, but visualizing 6-dimensional
space just does not work. However, if you restrict yourself to only
looking on the z-axis through the nuclei, you get a drawable z1 , z2 -
plane describing near what axial combinations of positions you are
most likely to find the two electrons. In other words: what would
be the chances of finding electron 1 near some axial position z1 and
electron 2 at the same time near some other axial position z2 ?
Try to guess these probabilities in the z1 , z2 -plane as grey tones,
(darker if more likely), and then compare with the answer.
2 Based on the previous question, how would you think the probability
density n(z) would look on the axis through the nuclei, again ignoring
the existence of positions beyond the axis?

4.2.5 Variational approximation of the ground state


The purpose of this section is to find an approximation to the ground state
of the hydrogen molecule using the rough approximation of the wave function
described in the previous subsections.
Like for the hydrogen molecular ion of chapter 3.5.6, the idea is that since
the true ground state is the state of lowest energy among all wave functions, the
best among approximate wave functions is the one with the lowest energy. The
approximate wave functions are here of the form aψL ψR + bψR ψL ; in these the
protons share the electrons, but in such a way that when one electron is around
the left proton, the other is around the right one, and vice-versa.
A computer program is again needed to print out the expectation value of
the energy for various values of the ratio of coefficients a/b and proton-proton
distance d. And worse, the expectation value of energy for given a/b and d is a
six-dimensional integral, and parts of it cannot be done analytically; numerical
integration must be used. That makes it a much more messy problem, {A.27}.
You might just want to take it on faith that the binding energy, at the state
of lowest energy found, turns out to be 3.2 eV, at a proton to proton spacing of
0.87 Å, and that it occurs for the symmetric state a = b.
118 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

Key Points
⋄ An approximate ground state can be found for the hydrogen molecule
using a variational method much like that for the molecular ion.

4.2.6 Comparison with the exact ground state


The solution for the ground state of the hydrogen molecule obtained in the
previous subsection is, like the one for the molecular ion, pretty good. The
approximate binding energy, 3.2 eV, is not too much different from the experi-
mental value of 4.52 eV. Similarly, the bond length of 0.87 Å is not too far from
the experimental value of 0.74 Å.
Qualitatively, the exact ground state wave function is real, positive and
symmetric with respect to reflection around the symmetry plane and to rotations
around the line connecting the protons, and so is the approximate one. The
reasons for these properties are similar as for the molecular ion; {A.25,A.26}.
One very important new symmetry for the neutral molecule is the effect of
exchanging the electrons, replacing ~r1 by ~r2 and vice-versa. The approximate
wave function is symmetric (unchanged) under such an exchange, and so is the
exact wave function. To understand why, note that the operation of exchanging
the electrons commutes with the Hamiltonian, (exchanging identical electrons
physically does not do anything). So energy eigenfunctions can be taken to
be also eigenfunctions of the “exchange operator.” Furthermore, the exchange
operator is a Hermitian one, (taking it to the other side in inner products is
equivalent to a simple name change of integration variables,) so it has real eigen-
values. And more specifically, the eigenvalues can only be plus or minus one,
since swapping electrons does not change the magnitude of the wave function.
So the energy eigenfunctions, including the ground state, must be symmetric
under electron exchange (eigenvalue one), or antisymmetric (eigenvalue minus
one.) Since the ground state must be everywhere positive, (or more precisely, of
a single sign), a sign change due to swapping electrons is not possible. So only
the symmetric possibility exists for the ground state.
One issue that does not occur for the molecular ion, but only for the neutral
molecule is the mutual repulsion between the two electrons. This repulsion is
reduced when the electron clouds start to merge, compared to what it would be
if the clouds were more compact. (A similar effect is that the gravity force of the
earth decreases when you go down below the surface. To be sure, the potential
energy keeps going down, or up for electron clouds, but not as much as it would
otherwise. Compare figure 9.13.) Since the nuclei are compact, it gives an
advantage to nucleus-electron attraction over electron-electron repulsion. This
4.3. TWO-STATE SYSTEMS 119

increases the binding energy significantly; in the approximate model from about
1.8 eV to 3.2 eV. It also allows the protons to approach more closely; {A.27}.
The question has been asked whether there should not be an “activation
energy” involved in creating the hydrogen molecule from the hydrogen atoms.
The answer is no, hydrogen atoms are radicals, not stable molecules that need
to be taken apart before recombining. In fact, the hydrogen atoms attract each
other even at large distances due to Van der Waals attraction, chapter 7.1, an
effect lost in the approximate wave functions used in this section. But hydrogen
atoms that fly into each other also have enough energy to fly apart again; some
of the excess energy must be absorbed elsewhere to form a stable molecule. Ac-
cording to web sources, hydrogen molecule formation in the universe is believed
to typically occur on dust specks.

Key Points
⋄ The approximate ground state is pretty good, considering its simplicity.

4.3 Two-State Systems


The protons in the H+ 2 hydrogen molecular ion of chapter 3.5 are held together
by a single shared electron. However, in the H2 neutral hydrogen molecule of
the previous section, they are held together by a shared pair of electrons. The
main purpose of this section is to shed some light on the question why chemical
bonds involving a single electron are relatively rare, while bonds involving pairs
of shared electrons are common.
The common concept relating the two bonds is that of “two state systems.”
Such systems involve two basic states ψ1 and ψ2 . For the hydrogen molecular
ion, one state, ψ1 = ψL , described that the electron was around the left proton,
the other, ψ2 = ψR , that it was around the right one. For the hydrogen molecule,
ψ1 = ψL ψR had electron 1 around the left proton and electron 2 around the right
one; ψ2 = ψR ψL was the same, except with the electrons reversed.
There are many other physical situations that may be described as two state
systems. Covalent chemical bonds involving atoms other than hydrogen would
be an obvious example. Just substitute a positive ion for one or both protons.
A further example is provided by nuclear forces. Nuclear forces can be
thought of as effects of nucleons sharing various particles, in particular π-mesons,
much like the protons share the electron in the hydrogen molecular ion. (In fact,
all four fundamental forces of physics can be described in terms of the sharing
of particles.)
The C6 H6 “benzene molecular ring” consists of a hexagon of 6 carbon atoms
that are held together by 9 covalent bonds. The way that the 9 bonds between
120 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

the 6 atoms can be arranged is to make every second bond a double one. How-
ever, that still leaves two possibilities, by swapping the locations of the single
and double bonds, hence two different states ψ1 and ψ2 .
The NH3 “ammonia molecule” consists of an nitrogen atom bonded to three
hydrogen atoms. By symmetry, the logical place for the nitrogen atom to sit
would surely be in the center of the triangle formed by the three hydrogen atoms.
But it does not sit there. If it was in the center of the triangle, the angles between
the hydrogen atoms, measured from the nitrogen nucleus, should be 120◦ each.
However, as discussed later in chapter 4.11.3, valence bond theory requires that
the angles should be about 90◦ , not 120◦ . (The actual angles are about 108◦
because of reasons similar to those for water as discussed in chapter 4.11.3.)
The key point here is that the nitrogen must sit to the side of the triangle, and
there are two sides, producing once again two different states ψ1 and ψ2 .
In each case described above, there are two logical physical states ψ1 and ψ2 .
The peculiarities of two state systems arise from states that are combinations
of these two states, as in
Ψ = aψ1 + bψ2
Note that according to the ideas of quantum mechanics, the square magni-
tude of the first coefficient of the combined state, |a|2 , represents the probability
of being in state ψ1 and |b|2 the probability of being in state ψ2 . Of course, the
total probability of being in one of the states should be one:

|a|2 + |b|2 = 1

(This is only true if the ψ1 and ψ2 states are orthonormal. In the hydrogen
molecule cases, orthonormalizing the basic states would change them a bit, but
their physical nature would remain much the same, especially if the protons are
not too close.)
The key question is what combination of states has the lowest energy. The
expectation value of energy is

hEi = haψ1 + bψ2 |H|aψ1 + bψ2 i

This can be multiplied out as, (remember that numerical factors come out of
the left of an inner product as complex conjugates,)

hEi = a∗ aH11 + a∗ bH12 + b∗ aH21 + b∗ bH22

where the shorthand notation

H11 = hψ1 |Hψ1 i, H12 = hψ1 |Hψ2 i, H21 = hψ2 |Hψ1 i, H22 = hψ2 |Hψ2 i

was used. Note that H11 and H22 are real, (1.16), and the states will be ordered
so that H11 is less or equal to H22 . Normally, H12 and H21 are not real but
4.3. TWO-STATE SYSTEMS 121

complex conjugates, (1.16), but you can always change the definition of, say, ψ1
by a factor of magnitude one to make H12 equal to a real and negative number,
and then H21 will be that same negative number. Also note that a∗ a = |a|2 and
b∗ b = |b|2 .
The above expression for the expectation energy consists of two kinds of
terms, which will be called:

the averaged energy: |a|2 H11 + |b|2 H22 (4.10)


the twilight terms: (a∗ b + b∗ a)H12 (4.11)

Each of those contributions will be discussed in turn.


The averaged energy is the energy that you would intuitively expect the
combined wave function to have. It is a straightforward average of the energies
of the two component states ψ1 and ψ2 times the probabilities of being in those
states. In particular, in the important case that the two states have the same
energy, the averaged energy is that energy. What is more logical than that any
mixture of two states with the same energy would have that energy too?
But the twilight terms throw a monkey wrench in this simplistic thinking. It
can be seen that they will always make the ground state energy lower than the
energy H11 of the lowest component state. (To see that, just take a and b positive
real numbers and b small enough that b2 can be neglected.) This lowering of
the energy below the lowest component state comes out of the mathematics of
combining states; absolutely no new physical forces are added to produce it.
But if you try to describe it in terms of classical physics, it really looks like a
mysterious new “twilight force” is in operation here. It is no new force; it is the
weird mathematics of quantum mechanics.
So, what are these twilight terms physically? If you mean, what are they in
terms of classical physics, there is simply no answer. But if you mean, what are
they in terms of normal language, rather than formulae, it is easy. Just have
another look at the definition of the twilight terms; they are a measure of the
inner product hψ1 |Hψ2 i. That is the energy you would get if nature was in state
ψ1 if nature was in state ψ2 . On quantum scales, nature can get really, really
ethereal, where it moves beyond being describable by classical physics, and the
result is very concrete, but weird, interactions. For, at these scales twilight is
real, and classical physics is not.
For the twilight terms to be nonzero, there must be a region where the two
states overlap, i.e. there must be a region where both ψ1 and ψ2 are nonzero.
In the simplest case of the hydrogen molecular ion, if the atoms are far apart,
the left and right wave functions do not overlap and the twilight terms will be
zero. For the hydrogen molecule, it gets a bit less intuitive, since the overlap
should really be visualized in the six-dimensional space of those functions. But
still, the terms are zero when the atoms are far apart.
122 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

The twilight terms are customarily referred to as “exchange terms,” but


everybody seems to have a different idea of what that is supposed to mean.
The reason may be that these terms pop up all over the place, in all sorts of
very different settings. This book prefers to call them twilight terms, since that
most clearly expresses what they really are. Nature is in a twilight zone of
ambiguity.
The lowering of the energy by the twilight terms produces more stable chem-
ical bonds than you would expect. Typically, the effect of the terms is greatest
if the two basic states ψ1 and ψ2 are physically equivalent and have the same
energy. This is the case for the hydrogen examples and most of the others
mentioned. For q such states, the ground state will occur for an equal mixture of
states, a = b = 12 , because then the twilight terms are most negative. In that
case, the lowest energy, call it EL , is an amount H12 below the energy H11 = H22
of the component states.
On the other hand, if the lower energy state ψ1 has significantly less energy
than state ψ2 , then the minimum energy will occur for |a| ≈ 1 and |b| ≈ 0. (This
assumes that the twilight terms are not big enough to dominate the energy.) In
that case ab ≈ 0, which pretty much takes the twilight terms (4.11) out of the
picture completely.
This happens for the single-electron bond of the hydrogen molecular ion if
the second proton is replaced by another ion, say a lithium ion. The energy in
state ψ1 where the electron is around the proton will be less than that of state
ψ2 where it is around the lithium ion. For such asymmetrical single-electron
bonds, the twilight terms are not likely to help forge a strong bond. While it
turns out that the LiH+ ion is stable, the binding energy is only 0.14 eV or
so, compared to 2.8 eV for the H+ +
2 ion. Also, the LiH bond seems to be best
described as polarization of the hydrogen atom by the lithium ion, instead of as
a true chemical bond.
In contrast, for the two-electron bond of the neutral hydrogen molecule, if
the second proton is replaced by a lithium ion, states ψ1 and ψ2 will still be the
same: both have one electron around the proton and one around the lithium ion.
The two states do have the electrons reversed, but the electrons are identical.
Thus the twilight terms are still likely to be effective. Indeed neutral LiH lithium
hydride exists as a stable molecule with a binding energy of about 2.5 eV at
low pressures. It should be noted that the LiH bond is very ionic, with the
“shared” electrons mostly at the hydrogen side, so the actual ground state is
quite different from the model. But the model should be better when the nuclei
are farther apart, so the analysis can at least justify the existence of a significant
bond.

Key Points
4.4. SPIN 123

⋄ In quantum mechanics, the energy of different but physically equivalent


states can be lowered by mixing them together.
⋄ This lowering of energy does not come from new physical forces, but
from the weird mathematics of the wave function.
⋄ The effect tends to be much less when the original states are physically
very different.
⋄ One important place where states are indeed physically the same is in
chemical bonds involving pairs of electrons. The equivalent states here
merely have the identical electrons interchanged.

4.3 Review Questions


1 The effectiveness of mixing states was already shown by the hydro-
gen molecule and molecular ion examples. But the generalized story
above restricts the “basis” states to be orthogonal, and the states
used in the hydrogen examples were not.
Show that if ψ1 and ψ2 are not orthogonal states, but are normalized
and produce a real and positive value for hψ1 |ψ2 i, like in the hydrogen
examples, then orthogonal states can be found in the form

ψ̄1 = α (ψ1 − εψ2 ) ψ̄2 = α (ψ2 − εψ1 ) .

For normalized ψ1 and ψ2 the Cauchy-Schwartz inequality says that


hψ1 |ψ2 i will be less than one. If the states do not overlap much, it
will be much less than one and ε will be small.
(If ψ1 and ψ2 do not meet the stated requirements, you can always
redefine them by factors aeic and be−ic , with a, b, and c real, to get
states that do.)
2 Show that it does not have an effect on the solution whether or not the
basic states ψ1 and ψ2 are normalized, like in the previous question,
before the state of lowest energy is found.
This requires no detailed analysis; just check that the same solu-
tion can be described using the nonorthogonal and orthogonal basis
states. It is however an important observation for various numerical
solution procedures: your set of basis functions can be cleaned up
and simplified without affecting the solution you get.

4.4 Spin
At this stage, it becomes necessary to look somewhat closer at the various
particles involved in quantum mechanics themselves. The analysis so far already
124 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

used the fact that particles have a property called mass, a quantity that special
relativity has identified as being an internal amount of energy. It turns out that
in addition particles have a fixed amount of “build-in” angular momentum,
called “spin.” Spin reflects itself, for example, in how a charged particle such
as an electron interacts with a magnetic field.
To keep it apart from spin, from now the angular momentum of a particle
due to its motion will on be referred to as “orbital” angular momentum. As was
discussed in chapter 3.1, the square orbital angular momentum of a particle is
given by
L2 = l(l + 1)h̄2
where the azimuthal quantum number l is a nonnegative integer.
The square spin angular momentum of a particle is given by a similar ex-
pression:
S 2 = s(s + 1)h̄2 (4.12)
but the “spin s” is a fixed number for a given type of particle. And while l can
only be an integer, the spin s can be any multiple of one half.
Particles with half integer spin are called “fermions.” For example, electrons,
protons, and neutrons all three have spin s = 12 and are fermions.
Particles with integer spin are called “bosons.” For example, photons have
spin s = 1. The π-mesons have spin s = 0 and gravitons, unobserved at the
time of writing, should have spin s = 2.
The spin angular momentum in an arbitrarily chosen z-direction is

Sz = mh̄ (4.13)

the same formula as for orbital angular momentum, and the values of m range
again from −s to +s in integer steps. For example, photons can have spin in
a given direction that is h̄, 0, or −h̄. (The photon, a relativistic particle with
zero rest mass, has only two spin states along the direction of propagation; the
zero value does not occur in this case. But photons radiated by atoms can still
come off with zero angular momentum in a direction normal to the direction of
propagation. A derivation is in chapter 10.2.3.)
The common particles, (electrons, protons, neutrons), can only have spin
angular momentum 12 h̄ or − 12 h̄ in any given direction. The positive sign state
is called “spin up”, the negative one “spin down”.
It may be noted that the proton and neutron are not elementary particles,
but are baryons, consisting of three quarks. Similarly, mesons consist of a quark
and an anti-quark. Quarks have spin 21 , which allows baryons to have spin 32 or
1
2
. (It is not self-evident, but spin values can be additive or substractive within
the confines of their discrete allowable values; see chapter 9.1.) The same way,
mesons can have spin 1 or 0.
4.5. MULTIPLE-PARTICLE SYSTEMS INCLUDING SPIN 125

Spin states are commonly shown in “ket notation” as |s mi. For example,
the spin-up state for an electron is indicated by |1/2 1/2i and the spin-down state
as |1/2 1/2i. More informally, ↑ and ↓ are often used.

Key Points
⋄ Most particles have internal angular momentum called spin.
⋄ The square spin angular momentum and its quantum number s are
always the same for a given particle.
⋄ Electrons, protons and neutrons all have spin 21 . Their spin angular
momentum in a given direction is either 12 h̄ or − 12 h̄.
⋄ Photons have spin one. Possible values for their angular momentum in
a given direction are h̄, zero, or −h̄, though zero does not occur in the
direction of propagation.
⋄ Particles with integer spin, like photons, are called bosons. Particles
with half-integer spin, like electrons, protons, and neutrons, are called
fermions.
⋄ The spin-up state of a spin one-half particle like an electron is usually
indicated by |1/2 1/2i or ↑. Similarly, the spin-down state is indicated by
|1/2 1/2i or ↓.

4.4 Review Questions


1 Delta particles have spin s = 23 . What values can their spin angular
momentum in a given direction have?
2 Delta particles have spin 23 . What value does their total square an-
gular momentum have?

4.5 Multiple-Particle Systems Including Spin


Quantum mechanics as discussed so far must be generalized to account for
particles that have spin. Just like there is a probability that a particle is at
some position ~r, there is the additional probability that it has spin angular
momentum Sz in an arbitrarily chosen z-direction, and this must be included
in the wave function. This section discusses how.

4.5.1 Wave function for a single particle with spin


The first question is how spin should be included in the wave function of a
single particle. If spin is ignored, a single particle has a wave function Ψ(~r; t),
126 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

depending on position ~r and on time t. Now, the spin Sz is just some other
scalar variable that describes the particle, in that respect no different from say
the x-position of the particle. The “every possible combination” idea of allowing
every possible combination of states to have its own probability indicates that
Sz needs to be added to the list of variables. So the complete wave function Ψ
of the particle can be written out fully as:

Ψ ≡ Ψ(~r, Sz ; t) (4.14)

The value of |Ψ(~r, Sz ; t)|2 d3~r gives the probability of finding the particle within
a vicinity d3~r of ~r and with spin angular momentum in the z-direction Sz .
But note that there is a big difference between the spin “coordinate” and
the position coordinates: while the position variables can take on any value,
the values of Sz are highly limited. In particular, for the electron, proton, and
neutron, Sz can only be 21 h̄ or − 12 h̄, nothing else. You do not really have a full
Sz “axis”, just two points.
As a result, there are other meaningful ways of writing the wave function.
The full wave function Ψ(~r, Sz ; t) can be thought of as consisting of two parts
Ψ+ and Ψ− that only depend on position:

Ψ+ (~r; t) ≡ Ψ(~r, 12 h̄; t) and Ψ− (~r; t) ≡ Ψ(~r, − 12 h̄; t) (4.15)

These two parts can in turn be thought of as being the components of a two-
dimensional vector that only depends on position:
à !
~ r; t) ≡ Ψ+ (~r; t)
Ψ(~
Ψ− (~r; t)

Remarkably, Dirac found that the wave function for particles like electrons has
to be a vector, if it is assumed that the relativistic equations take a guessed
simple and beautiful form, like the Schrödinger and all other basic equations
of physics are simple and beautiful. Just like relativity reveals that particles
should have build-in energy, it also reveals that particles like electrons have
build-in angular momentum. A description of the Dirac equation is in chapter
9.2 if you are curious.
The two-dimensional vector is called a “spinor” to indicate that its compo-
nents do not change like those of ordinary physical vectors when the coordinate
system is rotated. (How they do change is of no importance here, but will even-
tually be described in note {A.84}.) The spinor can also be written in terms of
a magnitude times a unit vector:
à !
~ r; t) = Ψm (~r; t) χ1 (~r; t)
Ψ(~ .
χ2 (~r; t)
4.5. MULTIPLE-PARTICLE SYSTEMS INCLUDING SPIN 127

This book will just use the scalar wave function Ψ(~r, Sz ; t); not a vector one.
But it is often convenient to write the scalar wave function in a form equivalent
to the vector one:

Ψ(~r, Sz ; t) = Ψ+ (~r; t)↑(Sz ) + Ψ− (~r; t)↓(Sz ). (4.16)

The square magnitude of function Ψ+ gives the probability of finding the particle
near a position with spin-up. That of Ψ− gives the probability of finding it with
spin-down. The “spin-up” function ↑(Sz ) and the “spin-down” function ↓(Sz )
are in some sense the equivalent of the unit vectors ı̂ and ̂ in normal vector
analysis; they have by definition the following values:

↑( 12 h̄) = 1 ↑(− 12 h̄) = 0 ↓( 21 h̄) = 0 ↓(− 12 h̄) = 1.

The function arguments will usually be left away for conciseness, so that

Ψ = Ψ+ ↑ + Ψ− ↓

is the way the wave function of, say, an electron will normally be written out.

Key Points
⋄ Spin must be included as an independent variable in the wave function
of a particle with spin.
⋄ Usually, the wave function Ψ(~r, Sz ; t) of a single particle with spin
s = 12 will be written as

Ψ = Ψ+ ↑ + Ψ− ↓

where Ψ+ (~r; t) determines the probability of finding the particle near


a given location ~r with spin up, and Ψ− (~r; t) the one for finding it spin
down.
⋄ The functions ↑(Sz ) and ↓(Sz ) have the values

↑( 12 h̄) = 1 ↑(− 12 h̄) = 0 ↓( 12 h̄) = 0 ↓(− 12 h̄) = 1

and represent the pure spin-up, respectively spin-down states.

4.5.1 Review Questions


1 What is the normalization requirement of the wave function of a spin
1
2 particle in terms of Ψ+ and Ψ− ?
128 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

4.5.2 Inner products including spin


Inner products are important: they are needed for finding normalization fac-
tors, expectation values, uncertainty, approximate ground states, etcetera. The
additional spin coordinates add a new twist, since there is no way to integrate
over the few discrete points on the spin “axis”. Instead, you must sum over
these points.
As an example, the inner product of two arbitrary electron wave functions
Ψ1 (~r, Sz ; t) and Ψ2 (~r, Sz ; t) is
X Z
hΨ1 |Ψ2 i = Ψ∗1 (~r, Sz ; t)Ψ2 (~r, Sz ; t) d3~r
all ~r
Sz =± 21 h̄

or writing out the two-term sum,


Z Z
hΨ1 |Ψ2 i = Ψ∗1 (~r, 12 h̄; t)Ψ2 (~r, 12 h̄; t) d3~r + Ψ∗1 (~r, − 12 h̄; t)Ψ2 (~r, − 21 h̄; t) d3~r
all ~r all ~r

The individual factors in the integrals are by definition the spin-up components
Ψ1+ and Ψ2+ and the spin down components Ψ1− and Ψ2− of the wave functions,
so:
Z Z
hΨ1 |Ψ2 i = Ψ∗1+ (~r; t)Ψ2+ (~r; t) d3~r + Ψ∗1− (~r; t)Ψ2− (~r; t) d3~r
all ~r all ~r

In other words, the inner product with spin evaluates as

hΨ1+ ↑ + Ψ1− ↓|Ψ2+ ↑ + Ψ2− ↓i = hΨ1+ |Ψ2+ i + hΨ1− |Ψ2− i (4.17)

It is spin-up components together and spin-down components together.


Another way of looking at this, or maybe remembering it, is to note that
the spin states are an orthonormal pair,

h↑|↑i = 1 h↑|↓i = h↓|↑i = 0 h↓|↓i = 1 (4.18)

as can be verified directly from the definitions of those functions as given in


the previous subsection. Then you can think of an inner product with spin as
multiplying out as:

hΨ1+ ↑ + Ψ1− ↓|Ψ2+ ↑ + Ψ2− ↓i


= hΨ1+ |Ψ2+ ih↑|↑i + hΨ1+ |Ψ2− ih↑|↓i + hΨ1− |Ψ2+ ih↓|↑i + hΨ1− |Ψ2− ih↓|↓i
= hΨ1+ |Ψ2+ i + hΨ1− |Ψ2− i

Key Points
⋄ In inner products, you must sum over the spin states.
4.5. MULTIPLE-PARTICLE SYSTEMS INCLUDING SPIN 129

1
⋄ For spin 2 particles:

hΨ1+ ↑ + Ψ1− ↓|Ψ2+ ↑ + Ψ2− ↓i = hΨ1+ |Ψ2+ i + hΨ1− |Ψ2− i

which is spin-up components together plus spin-down components to-


gether.
⋄ The spin-up and spin-down states ↑ and ↓ are an orthonormal pair.

4.5.2 Review Questions


1 Show that the normalization requirement for the wave function
p of a
spin 21 particle in terms of Ψ+ and Ψ− requires its norm hΨ|Ψi to
be one.
2 Show that if ψL and ψR are normalized
√ spatial wave functions, then
a combination like (ψL ↑ + ψR ↓) / 2 is a normalized wave function
with spin.

4.5.3 Commutators including spin


There is no known “internal physical mechanism” that gives rise to spin like
there is for orbital angular momentum. Fortunately, this lack of detailed in-
formation about spin is to a considerable amount made less of an issue by
knowledge about its commutators.
In particular, physicists have concluded that spin components satisfy the
same commutation relations as the components of orbital angular momentum:

[Sbx , Sby ] = ih̄Sbz [Sby , Sbz ] = ih̄Sbx [Sbz , Sbx ] = ih̄Sby (4.19)

These equations are called the “fundamental commutation relations.” As will


be shown in chapter 9.1, a large amount of information about spin can be teased
from them.
Further, spin operators commute with all functions of the spatial coordi-
nates and with all spatial operators, including position, linear momentum, and
orbital angular momentum. The reason why can be understood from the given
description of the wave function with spin. First of all, the square spin opera-
tor Sb2 just multiplies the entire wave function by the constant h̄2 s(s + 1), and
everything commutes with a constant. And the operator Sbz of spin in an ar-
bitrary z-direction commutes with spatial functions and operators in much the
same way that an operator like ∂/∂x commutes with functions depending on y
and with ∂/∂y. The z-component of spin corresponds to an additional “axis”
130 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

separate from the x, y, and z ones, and Sbz only affects the variation in this ad-
ditional direction. For example, for a particle with spin one half, Sbz multiplies
the spin-up part of the wave function Ψ+ by the constant 12 h̄ and Ψ− by − 12 h̄.
Spatial functions and operators commute with these constants for both Ψ+ and
Ψ− hence commute with Sbz for the entire wave function. Since the z-direction
is arbitrary, this commutation applies for any spin component.

Key Points
⋄ While a detailed mechanism of spin is missing, commutators with spin
can be evaluated.
⋄ The components of spin satisfy the same mutual commutation relations
as the components of orbital angular momentum.
⋄ Spin commutes with spatial functions and operators.

4.5.3 Review Questions


1 Are not some commutators missing from the fundamental commuta-
tion relationship? For example, what is the commutator [Sby , Sbx ]?

4.5.4 Wave function for multiple particles with spin


The extension of the ideas of the previous subsections towards multiple particles
is straightforward. For two particles, such as the two electrons of the hydrogen
molecule, the full wave function follows from the “every possible combination”
idea as
Ψ = Ψ(~r1 , Sz1 ,~r2 , Sz2 ; t) (4.20)
The value of |Ψ(~r1 , Sz1 ,~r2 , Sz2 ; t)|2 d3~r1 d3~r2 gives the probability of simultane-
ously finding particle 1 within a vicinity d3~r1 of ~r1 with spin angular momentum
in the z-direction Sz1 , and particle 2 within a vicinity d3~r2 of ~r2 with spin an-
gular momentum in the z-direction Sz2 .
Restricting the attention again to spin 21 particles like electrons, protons
and neutrons, there are now four possible spin states at any given point, with
corresponding spatial wave functions

Ψ++ (~r1 ,~r2 ; t) ≡ Ψ(~r1 , + 21 h̄,~r2 , + 21 h̄; t)


Ψ+− (~r1 ,~r2 ; t) ≡ Ψ(~r1 , + 12 h̄,~r2 , − 21 h̄; t)
(4.21)
Ψ−+ (~r1 ,~r2 ; t) ≡ Ψ(~r1 , − 12 h̄,~r2 , + 21 h̄; t)
Ψ−− (~r1 ,~r2 ; t) ≡ Ψ(~r1 , − 12 h̄,~r2 , − 21 h̄; t)
4.5. MULTIPLE-PARTICLE SYSTEMS INCLUDING SPIN 131

For example, |Ψ+− (~r1 ,~r2 ; t)|2 d3~r1 d3~r2 gives the probability of finding particle
1 within a vicinity d3~r1 of ~r1 with spin up, and particle 2 within a vicinity d3~r2
of ~r2 with spin down.
The wave function can be written using purely spatial functions and purely
spin functions as
Ψ(~r1 , Sz1 ,~r2 , Sz2 ; t) = Ψ++ (~r1 ,~r2 ; t)↑(Sz1 )↑(Sz2 ) + Ψ+− (~r1 ,~r2 ; t)↑(Sz1 )↓(Sz2 )
+ Ψ−+ (~r1 ,~r2 ; t)↓(Sz1 )↑(Sz2 ) + Ψ−− (~r1 ,~r2 ; t)↓(Sz1 )↓(Sz2 )
As you might guess from this multi-line display, usually this will be written
more concisely as
Ψ = Ψ++ ↑↑ + Ψ+− ↑↓ + Ψ−+ ↓↑ + Ψ−− ↓↓
by leaving out the arguments of the spatial and spin functions. The understand-
ing is that the first of each pair of arrows refers to particle 1 and the second to
particle 2.
The inner product now evaluates as
hΨ1 |Ψ2 i =
X X Z Z
Ψ∗1 (~r1 , Sz1 ,~r2 , Sz2 ; t)Ψ2 (~r1 , Sz1 ,~r2 , Sz2 ; t) d3~r1 d3~r2
all ~r 1 all ~r 2
Sz1 =± 21 h̄ Sz2 =± 12 h̄

This can be written in terms of the purely spatial components as

hΨ1 |Ψ2 i = hΨ1++ |Ψ2++ i + hΨ1+− |Ψ2+− i + hΨ1−+ |Ψ2−+ i + hΨ1−− |Ψ2−− i
(4.22)
It reflects the fact that the four spin basis states ↑↑, ↑↓, ↓↑, and ↓↓ are an
orthonormal quartet.

Key Points
⋄ The wave function of a single particle with spin generalizes in a straight-
forward way to multiple particles with spin.
⋄ The wave function of two spin 12 particles can be written in terms of
spatial components multiplying pure spin states as
Ψ = Ψ++ ↑↑ + Ψ+− ↑↓ + Ψ−+ ↓↑ + Ψ−− ↓↓
where the first arrow of each pair refers to particle 1 and the second to
particle 2.
⋄ In terms of spatial components, the inner product hΨ1 |Ψ2 i evaluates
as inner products of matching spin components:
hΨ1++ |Ψ2++ i + hΨ1+− |Ψ2+− i + hΨ1−+ |Ψ2−+ i + hΨ1−− |Ψ2−− i
132 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

⋄ The four spin basis states ↑↑, ↑↓, ↓↑, and ↓↓ are an orthonormal quartet.

4.5.4 Review Questions


1 As an example of the orthonormality of the two-particle spin states,
verify that h↑↑|↓↑i is zero, so that ↑↑ and ↓↑ are indeed orthogonal.
Do so by explicitly writing out the sums over Sz1 and Sz2 .
2 A more concise way of understanding the orthonormality of the two-
particle spin states is to note that an inner product like h↑↑|↓↑i equals
h↑|↓ih↑|↑i, where the first inner product refers to the spin states of
particle 1 and the second to those of particle 2. The first inner product
is zero because of the orthogonality of ↑ and ↓, making h↑↑|↓↑i zero
too.
To check this argument, write out the sums over Sz1 and Sz2 for
h↑|↓ih↑|↑i and verify that it is indeed the same as the written out
sum for h↑↑|↓↑i given in the answer for the previous question.
The underlying mathematical principle is that sums of products can
be factored into separate sums as in:
  
X X X X
f (Sz1 )g(Sz2 ) =  f (Sz1 )  g(Sz2 )
all Sz1 all Sz2 all Sz1 all Sz2

This is similar to the observation in calculus that integrals of products


can be factored into separate integrals:
Z Z
f (~r1 )g(~r2 ) d3~r1 d3~r2 =
all ~r 1 all ~r2
·Z ¸ ·Z ¸
f (~r1 ) d3~r1 g(~r2 ) d3~r2
all ~r 1 all ~r 2

4.5.5 Example: the hydrogen molecule


As an example, this section considers the ground state of the hydrogen molecule.
It was found in section 4.2 that the ground state electron wave function must
be of the approximate form

ψgs,0 = a [ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 )]

where ψL was the electron ground state of the left hydrogen atom, and ψR the
one of the right one; a was just a normalization constant. This solution excluded
all consideration of spin.
4.5. MULTIPLE-PARTICLE SYSTEMS INCLUDING SPIN 133

Including spin, the ground state wave function must be of the general form

ψgs = ψ++ ↑↑ + ψ+− ↑↓ + ψ−+ ↓↑ + ψ−− ↓↓.

As you might guess, in the ground state, each of the four spatial functions ψ++ ,
ψ+− , ψ−+ , and ψ−− must be proportional to the no-spin solution ψgs,0 above.
Anything else would have more than the lowest possible energy, {A.28}.
So the approximate ground state including spin must take the form

ψgs = a [ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 )] [a++ ↑↑ + a+− ↑↓ + a−+ ↓↑ + a−− ↓↓]
(4.23)
where a++ , a+− , a−+ , and a−− are constants.

Key Points
⋄ The electron wave function ψgs,0 for the hydrogen molecule derived
previously ignored spin.
⋄ In the full electron wave function, each spatial component must sepa-
rately be proportional to a(ψL ψR + ψR ψL ).

4.5.5 Review Questions


1 Show that the normalization requirement for ψgs means that

|a++ |2 + |a+− |2 + |a−+ |2 + |a−− |2 = 1

4.5.6 Triplet and singlet states


In the case of two particles with spin 1/2, it is often more convenient to use
slightly different basis states to describe the spin states than the four arrow
combinations ↑↑, ↑↓, ↓↑, and ↓↓. The more convenient basis states can be
written in |s mi ket notation, and they are:

1 1
|1 1i = ↑↑ |1 0i = √ (↑↓ + ↓↑) |1 1i = ↓↓ |0 0i = √ (↑↓ − ↓↑)
|
2{z } |
2
{z }
the triplet states the singlet state
(4.24)
A state |s mi has net spin s, giving a net square angular momentum s(s + 1)h̄2 ,
and has net angular momentum in the z-direction mh̄. For example, if the two
particles are in the state |1 1i, the net square angular momentum is 2h̄2 , and
their net angular momentum in the z-direction is h̄.
134 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

The ↑↓ and ↓↑ states can be written as


1 1
↑↓ = √ (|1 0i + |0 0i) ↓↑ = √ (|1 0i − |0 0i)
2 2
This shows that while they have zero angular momentum in the z-direction;
they do not have a value for the net spin: they have a 50/50 probability of net
spin 1 and net spin 0. A consequence is that ↑↓ and ↓↑ cannot be written in
|s mi ket notation; there is no value for s.
Incidentally, note that z-components of angular momentum simply add up,
as the Newtonian analogy suggests. For example, for ↑↓, the 12 h̄ spin angular
momentum of the first electron adds to the − 21 h̄ of the second electron to pro-
duce zero. But Newtonian analysis does not allow square angular momenta to
be added together, and neither does quantum mechanics. In fact, it is quite a
messy exercise to actually prove that the triplet and singlet states have the net
spin values claimed above. (See chapter 9.1 if you want to see how it is done.)
The spin states ↑ = |1/2 1/2i and ↓ = |1/2 1/2i that apply for a single spin- 12
particle are often referred to as the “doublet” states, since there are two of them.

Key Points
⋄ The set of spin states ↑↑, ↑↓, ↓↑, and ↓↓ are often better replaced by
the triplet and singlet states |1 1i, |1 0i, |1 1i and |0 0i.
⋄ The triplet and singlet states have definite values for the net square
spin.

4.5.6 Review Questions


1 Like the states ↑↑, ↑↓, ↓↑, and ↓↓; the triplet and singlet states are
an orthonormal quartet. For example, check that the inner product
of |1 0i and |0 0i is zero.

4.6 Identical Particles


A number of the counter-intuitive features of quantum mechanics have already
been discussed: Electrons being neither on Mars or on Venus until they pop up
at either place. Superluminal interactions. The fundamental impossibility of
improving the accuracy of both position and momentum beyond a given limit.
Collapse of the wave function. A hidden random number generator. Quantized
energies and angular momenta. Nonexisting angular momentum vectors. In-
trinsic angular momentum. But nature has one more trick on its sleeve, and it
is a big one.
4.6. IDENTICAL PARTICLES 135

Nature entangles all identical particles with each other. Specifically, it re-
quires that the wave function remains unchanged if any two identical bosons
are exchanged. If particles i and j are identical bosons, then:

Ψ (~r1 , Sz1 , . . . ,~ri , Szi , . . . ,~rj , Szj , . . .) = Ψ (~r1 , Sz1 , . . . ,~rj , Szj , . . . ,~ri , Szi , . . .)
(4.25)
On the other hand, nature requires that the wave function changes sign if
any two identical fermions are exchanged. If particles i and j are identical
fermions, (say, both electrons), then:

Ψ (~r1 , Sz1 , . . . ,~ri , Szi , . . . ,~rj , Szj , . . .) = −Ψ (~r1 , Sz1 , . . . ,~rj , Szj , . . . ,~ri , Szi , . . .)
(4.26)
In other words, the wave function must be symmetric with respect to ex-
change of identical bosons, and antisymmetric with respect to exchange of iden-
tical fermions. This greatly restricts what wave functions can be.
For example, consider what this means for the electron structure of the
hydrogen molecule. The approximate ground state of lowest energy was in the
previous section found to be

ψgs = a [ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 )] [a++ ↑↑ + a+− ↑↓ + a−+ ↓↑ + a−− ↓↓]
(4.27)
were ψL was the ground state of the left hydrogen atom, ψR the one of the right
one, first arrows indicate the spin of electron 1 and second arrows the one of
electron 2, and a and the a±± are constants.
But since the two electrons are identical fermions, this wave function must
turn into its negative under exchange of the two electrons. Exchanging the two
electrons produces

−ψgs = a [ψL (~r2 )ψR (~r1 ) + ψR (~r2 )ψL (~r1 )] [a++ ↑↑ + a+− ↓↑ + a−+ ↑↓ + a−− ↓↓] ;

note in particular that since the first arrow of each pair is taken to refer to
electron 1, exchanging the electrons means that the order of each pair of arrows
must be inverted. To compare the above wave function with the nonexchanged
version (4.27), reorder the terms back to the same order:

−ψgs = a [ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 )] [a++ ↑↑ + a−+ ↑↓ + a+− ↓↑ + a−− ↓↓]

The spatial factor is seen to be the same as the nonexchanged version in (4.27);
the spatial part is symmetric under particle exchange. The sign change will
have to come from the spin part.
Since each of the four spin states is independent from the others, the co-
efficient of each of these states will have to be the negative of the one of the
nonexchanged version. For example, the coefficient a++ of ↑↑ must be the neg-
ative of the coefficient a++ of ↑↑ in the nonexchanged version, otherwise there
136 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

is a conflict at Sz1 = 12 h̄ and Sz2 = 21 h̄, where only the spin state ↑↑ is nonzero.
Something can only be the negative of itself if it is zero, so a++ must be zero
to satisfy the antisymmetry requirement. The same way, a−− = −a−− , re-
quiring a−− to be zero too. The remaining two spin states both require that
a+− = −a−+ , but this can be nonzero.
So, due to the antisymmetrization requirement, the full wave function of the
ground state must be,

ψgs = a [ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 )] a+− [↑↓ − ↓↑]

or after normalization, noting that a factor of magnitude one is always arbitrary,


↑↓ − ↓↑
ψgs = a [ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 )] √ .
2
It is seen that the antisymmetrization requirement restricts the spin state to be
the “singlet” one, as defined in the previous section. It is the singlet spin state
that achieves the sign change when the two electrons are exchanged; the spatial
part remains unchanged.
If the electrons would have been bosons, the spin state could have been any
combination of the three triplet states. The symmetrization requirement for
fermions is much more restrictive than the one for bosons.
Since there are a lot more electrons in the universe than just these two, you
might rightly ask where antisymmetrization stops. The answer given in chapter
11.3 is: nowhere. But don’t worry about it. The existence of electrons that are
too far away to affect the system being studied can be ignored.

Key Points
⋄ The wave function must be symmetric (must stay the same) under
exchange of identical bosons.
⋄ The wave function must be antisymmetric (must turn into its negative)
under exchange of identical fermions (e.g., electrons.)
⋄ Especially the antisymmetrization requirement greatly restricts what
wave functions can be.
⋄ The antisymmetrization requirement forces the electrons in the hydro-
gen molecule ground state to assume the singlet spin state.

4.6 Review Questions


1 Check that indeed any linear combination of the triplet states is un-
changed under particle exchange.
4.7. WAYS TO SYMMETRIZE THE WAVE FUNCTION 137

2 Suppose the electrons of the hydrogen molecule are in the excited


antisymmetric spatial state

a [ψL (~r1 )ψR (~r2 ) − ψR (~r1 )ψL (~r2 )] .

In that case what can you say about the spin state?
Yes, in this case the spin would be less restricted if the electrons
were bosons. But antisymmetric spatial states themselves are pretty
restrictive in general. The precise sense in which the antisymmetriza-
tion requirement is more restrictive than the symmetrization require-
ment will be explored in the next section.

4.7 Ways to Symmetrize the Wave Function


This section discusses ways in which the symmetrization requirements for wave
functions of systems of identical particles can be achieved in general. This is a
key issue in the numerical solution of any nontrivial quantum system, so this
section will examine it in some detail.
It will be assumed that the approximate description of the wave function is
done using a set of chosen single-particle functions, or “states”,

ψ1p (~r, Sz ), ψ2p (~r, Sz ), . . .

An example is provided by the approximate ground state of the hydrogen


molecule from the previous section,
↑↓ − ↓↑
a [ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 )] √ .
2
This can be multiplied out to be
a h
√ ψL (~r1 )↑(Sz1 )ψR (~r2 )↓(Sz2 ) + ψR (~r1 )↑(Sz1 )ψL (~r2 )↓(Sz2 )
2
i
−ψL (~r1 )↓(Sz1 )ψR (~r2 )↑(Sz2 ) − ψR (~r1 )↓(Sz1 )ψL (~r2 )↑(Sz2 )

and consists of four single-particle functions:

ψ1p (~r, Sz ) = ψL (~r)↑(Sz ) ψ2p (~r, Sz ) = ψL (~r)↓(Sz )


ψ3p (~r, Sz ) = ψR (~r)↑(Sz ) ψ4p (~r, Sz ) = ψR (~r)↓(Sz ).

The first of the four functions represents a single electron in the ground state
around the left proton with spin up, the second a single electron in the same
138 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

spatial state with spin down, etcetera. For better accuracy, more single-particle
functions could be included, say excited atomic states in addition to the ground
states. In terms of the above four functions, the expression for the hydrogen
molecule ground state is

a a
√ ψ1p (~r1 , Sz1 )ψ4p (~r2 , Sz2 ) + √ ψ3p (~r1 , Sz1 )ψ2p (~r2 , Sz2 )
2 2
a a
− √ ψ2p (~r1 , Sz1 )ψ3p (~r2 , Sz2 ) − √ ψ4p (~r1 , Sz1 )ψ1p (~r2 , Sz2 )
2 2

The issue in this section is that the above hydrogen ground state is just one
special case of the most general wave function for the two particles that can be
formed from four single-particle states:

Ψ(~r1 , Sz1 ,~r2 , Sz2 ; t) =


a11 ψ1p (~r1 , Sz1 )ψ1p (~r2 , Sz2 ) + a12 ψ1p (~r1 , Sz1 )ψ2p (~r2 , Sz2 ) +
a13 ψ1p (~r1 , Sz1 )ψ3p (~r2 , Sz2 ) + a14 ψ1p (~r1 , Sz1 )ψ4p (~r2 , Sz2 ) +
a21 ψ2p (~r1 , Sz1 )ψ1p (~r2 , Sz2 ) + a22 ψ2p (~r1 , Sz1 )ψ2p (~r2 , Sz2 ) +
a23 ψ2p (~r1 , Sz1 )ψ3p (~r2 , Sz2 ) + a24 ψ2p (~r1 , Sz1 )ψ4p (~r2 , Sz2 ) +
a31 ψ3p (~r1 , Sz1 )ψ1p (~r2 , Sz2 ) + a32 ψ3p (~r1 , Sz1 )ψ2p (~r2 , Sz2 ) +
a33 ψ3p (~r1 , Sz1 )ψ3p (~r2 , Sz2 ) + a34 ψ3p (~r1 , Sz1 )ψ4p (~r2 , Sz2 ) +
a41 ψ4p (~r1 , Sz1 )ψ1p (~r2 , Sz2 ) + a42 ψ4p (~r1 , Sz1 )ψ2p (~r2 , Sz2 ) +
a43 ψ4p (~r1 , Sz1 )ψ3p (~r2 , Sz2 ) + a44 ψ4p (~r1 , Sz1 )ψ4p (~r2 , Sz2 )

This can be written much more concisely using summation indices as

4
X 4
X
Ψ(~r1 , Sz1 ,~r2 , Sz2 ; t) = an1 n2 ψnp1 (~r1 , Sz1 )ψnp2 (~r2 , Sz2 )
n1 =1 n2 =1

However, the individual terms will be fully written out for now to reduce the
mathematical abstraction. The individual terms are sometimes called “Hartree
products.”
The antisymmetrization requirement says that the wave function must be
antisymmetric under exchange of the two electrons. More concretely, it must
turn into its negative when the arguments ~r1 , Sz1 and ~r2 , Sz2 are swapped. To
4.7. WAYS TO SYMMETRIZE THE WAVE FUNCTION 139

understand what that means, the various terms need to be arranged in groups:

I: a11 ψ1p (~r1 , Sz1 )ψ1p (~r2 , Sz2 )


II : a22 ψ2p (~r1 , Sz1 )ψ2p (~r2 , Sz2 )
III : a33 ψ3p (~r1 , Sz1 )ψ3p (~r2 , Sz2 )
IV : a44 ψ4p (~r1 , Sz1 )ψ4p (~r2 , Sz2 )
V: a12 ψ1p (~r1 , Sz1 )ψ2p (~r2 , Sz2 ) + a21 ψ2p (~r1 , Sz1 )ψ1p (~r2 , Sz2 )
VI : a13 ψ1p (~r1 , Sz1 )ψ3p (~r2 , Sz2 ) + a31 ψ3p (~r1 , Sz1 )ψ1p (~r2 , Sz2 )
VII : a14 ψ1p (~r1 , Sz1 )ψ4p (~r2 , Sz2 ) + a41 ψ4p (~r1 , Sz1 )ψ1p (~r2 , Sz2 )
VIII : a23 ψ2p (~r1 , Sz1 )ψ3p (~r2 , Sz2 ) + a32 ψ3p (~r1 , Sz1 )ψ2p (~r2 , Sz2 )
IX : a24 ψ2p (~r1 , Sz1 )ψ4p (~r2 , Sz2 ) + a42 ψ4p (~r1 , Sz1 )ψ2p (~r2 , Sz2 )
X: a34 ψ3p (~r1 , Sz1 )ψ4p (~r2 , Sz2 ) + a43 ψ4p (~r1 , Sz1 )ψ3p (~r2 , Sz2 )

Within each group, all terms involve the same combination of functions, but in
a different order. Different groups have a different combination of functions.
Now if the electrons are exchanged, it turns the terms in groups I through IV
back into themselves. Since the wave function must change sign in the exchange,
and something can only be its own negative if it is zero, the antisymmetrization
requirement requires that the coefficients a11 , a22 , a33 , and a44 must all be zero.
Four coefficients have been eliminated from the list of unknown quantities.
Further, in each of the groups V through X with two different states, ex-
change of the two electrons turn the terms into each other, except for their
coefficients. If that is to achieve a change of sign, the coefficients must be each
other’s negatives; a21 = −a12 , a31 = −a13 , . . . . So only six coefficients a12 , a13 ,
. . . still need to be found from other physical requirements, such as energy min-
imization for a ground state. Less than half of the original sixteen unknowns
survive the antisymmetrization requirement, significantly reducing the problem
size.
There is a very neat way of writing the antisymmetrized wave function of
systems of fermions, which is especially convenient for larger numbers of parti-
cles. It is done using determinants. The antisymmetric wave function for the
above example is:
¯ p ¯ ¯ p ¯
¯ ψ (~r , S ) ψ2p (~r1 , Sz1 ) ¯ ¯ ψ (~r , S ) ψ3p (~r1 , Sz1 ) ¯
Ψ = a12 ¯¯ 1p 1 z1 ¯ ¯ 1 z1 ¯
¯ + a13 ¯ 1p ¯+
¯ ψ1 (~r 2 , Sz2 ) ψ2p (~r2 , Sz2 ) ¯ ¯ ψ1 (~r 2 , Sz2 ) ψ3p (~r2 , Sz2 ) ¯
¯ p ¯ ¯ p ¯
¯ ψ (~r , S ) ψ4p (~r1 , Sz1 ) ¯¯ ¯ ψ (~r , S ) ψ3p (~r1 , Sz1 ) ¯
a14 ¯¯¯ 1p 1 z1
¯ 1 z1 ¯
p ¯ + a23 ¯ 2p ¯+
ψ1 (~r2 , Sz2 ) ψ4 (~r2 , Sz2 ) ¯ ¯ ψ2 (~r 2 , Sz2 ) ψ3p (~r2 , Sz2 ) ¯
¯ ¯ ¯ p ¯
¯ ψ2p (~r1 , Sz1 ) ψ4p (~r1 , Sz1 ) ¯¯ ¯ ψ (~r , S ) ψ4p (~r1 , Sz1 ) ¯
a24 ¯¯¯
¯ 1 z1 ¯
p p ¯ + a34 ¯ 3p ¯
ψ2 (~r2 , Sz2 ) ψ4 (~r2 , Sz2 ) ¯ ¯ ψ3 (~r 2 , Sz2 ) ψ4p (~r2 , Sz2 ) ¯

These determinants are called “Slater determinants”.


140 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

To find the actual hydrogen molecule ground state from the above expres-
sion, additional physical requirements have to be imposed. For example, the
coefficients a12 and a34 can reasonably be ignored for the ground state, because
according to the given definition of the states, their Slater determinants have the
electrons around the same nucleus, and that produces elevated energy due to the
mutual repulsion of the electrons. Also, following the arguments of section 4.2,
the coefficients a13 and a24 must be zero since their Slater determinants produce
the excited antisymmetric spatial state ψL ψR − ψR ψL times the ↑↑, respectively
↓↓ spin states. Finally, the coefficients a14 and a23 must be opposite in order
that their Slater determinants combine into the lowest-energy symmetric spa-
tial state ψL ψR + ψR ψL times the ↑↓ and ↓↑ spin states. That leaves the single
coefficient a14 that can be found from the normalization requirement, taking it
real and positive for convenience.
But the issue in this section is what the symmetrization requirements say
about wave functions in general, whether they are some ground state or not.
And for four single-particle states for two identical fermions, the conclusion is
that the wave function must be some combination of the six Slater determinants,
regardless of what other physics may be relevant.
The next question is how that conclusion changes if the two particles involved
are not fermions, but identical bosons. The symmetrization requirement is then
that exchanging the particles must leave the wave function unchanged. Since
the terms in groups I through IV do remain the same under particle exchange,
their coefficients a11 through a44 can have any nonzero value. This is the sense in
which the antisymmetrization requirement for fermions is much more restrictive
than the one for bosons: groups involving a duplicated state must be zero for
fermions, but not for bosons.
In groups V through X, where particle exchange turns each of the two terms
into the other one, the coefficients must now be equal instead of negatives;
a21 = a12 , a31 = a13 , . . . . That eliminates six coefficients from the original
sixteen unknowns, leaving ten coefficients that must be determined by other
physical requirements on the wave function.
(The equivalent of Slater determinants for bosons are “permanents,” basi-
cally determinants with all minus signs in their definition replaced by plus signs.
Unfortunately, many of the helpful properties of determinants do not apply to
permanents.)
All of the above arguments can be extended to the general case that N ,
instead of 4, single-particle functions ψ1p (~r, Sz ), ψ2p (~r, Sz ), . . . , ψN
p
(~r, Sz ) are used
to describe I, instead of 2, particles. Then the most general possible wave
function assumes the form:
N
X N
X N
X
Ψ= ... an1 n2 ...nI ψnp1 (~r1 , Sz1 )ψnp2 (~r2 , Sz2 ) . . . ψnpI (~rI , SzI ) (4.28)
n1 =1 n2 =1 nI =1
4.7. WAYS TO SYMMETRIZE THE WAVE FUNCTION 141

where the an1 n2 ...nI are numerical coefficients that are to be chosen to satisfy the
physical constraints on the wave function, including the (anti)symmetrization
requirements.
This summation is again the “every possible combination” idea of combin-
ing every possible state for particle 1 with every possible state for particle 2,
etcetera. So the total sum above contains N I terms: there are N possibilities
for the function number n1 of particle 1, times N possibilities for the function
number n2 of particle 2, ... In general then, a corresponding total of N I un-
known coefficients an1 n2 ...nI must be determined to find out the precise wave
function.
But for identical particles, the number that must be determined is much
less. That number can again be determined by dividing the terms into groups in
which the terms all involve the same combination of I single-particle functions,
just in a different order. The simplest groups are those that involve just a
single single-particle function, generalizing the groups I through IV in the earlier
example. Such groups consist of only a single term; for example, the group that
only involves ψ1p consists of the single term

a11...1 ψ1p (~r1 , Sz1 )ψ1p (~r2 , Sz2 ) . . . ψ1p (~rI , SzI ).

At the other extreme, groups in which every single-particle function is different


have as many as I! terms, since I! is the number of ways that I different items
can be ordered. In the earlier example, that were groups V through X, each
having 2! = 2 terms. If there are more than two particles, there will also be
groups in which some states are the same and some are different.
For identical bosons, the symmetrization requirement says that all the co-
efficients within a group must be equal. Any term in a group can be turned
into any other by particle exchanges; so, if they would not all have the same
coefficients, the wave function could be changed by particle exchanges. As a
result, for identical bosons the number of unknown coefficients reduces to the
number of groups.
For identical fermions, only groups in which all single-particle functions are
different can be nonzero. That follows because if a term has a duplicated single-
particle function, it turns into itself without the required sign change under an
exchange of the particles of the duplicated function.
So there is no way to describe a system of I identical fermions with anything
less than I different single-particle functions ψnp . This critically important ob-
servation is known as the “Pauli exclusion principle:” I − 1 fermions occupying
I − 1 single-particle functions exclude a I-th fermion from simply entering the
same I − 1 functions; a new function must be added to the mix for each ad-
ditional fermion. The more identical fermions there are in a system, the more
different single-particle functions are required to describe it.
142 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

Each group involving I different single-particle functions ψnp1 , ψnp2 , ... ψnpI re-
duces under the antisymmetrization requirement to a single Slater determinant
of the form
¯ ¯
¯ ψnp1 (~r1 , Sz1 ) ψnp2 (~r1 , Sz1 ) ψnp3 (~r1 , Sz1 )· · · ψnpI (~r1 , Sz1 ) ¯
¯ ¯
¯ ¯
¯ ψnp1 (~r2 , Sz2 ) ψnp2 (~r2 , Sz2 ) ψnp3 (~r2 , Sz2 )· · · ψnpI (~r2 , Sz2 ) ¯
1 ¯¯ ψnp1 (~r3 , Sz3 ) ψnp2 (~r3 , Sz3 ) ψnp3 (~r3 , Sz3 )· · · ψnpI (~r3 , Sz3 )
¯
¯
√ ¯ ¯ (4.29)
I! ¯¯ .. .. .. ... .. ¯
¯
¯ . . . . ¯
¯ p p p p ¯
¯ ψn1 (~rI , SzI ) ψn2 (~rI , SzI ) ψn3 (~rI , SzI ) · · · ψnI (~rI , SzI ) ¯

multiplied by a single unknown coefficient. The normalization factor 1/ I! has
been thrown in merely to ensure that if the functions ψnp are orthonormal, then
so are the Slater determinants. Using Slater determinants ensures the required
sign changes of fermion systems automatically, because determinants change
sign if two rows are exchanged.
In the case that the bare minimum of I functions is used to describe I
identical fermions, only one Slater determinant can be formed. Then the an-
tisymmetrization requirement reduces the I I unknown coefficients an1 n2 ...nI to
just one, a12...I ; obviously a tremendous reduction.
At the other extreme, when the number of functions N is very large, much
larger than I 2 to be precise, most terms have all indices different and the re-
duction is “only” from N I to about N I /I! terms. The latter would also be true
for identical bosons.
The functions better be chosen to produce a good approximation to the
wave function with a small number of terms. As an arbitrary example to focus
the thoughts, if N = 100 functions are used to describe an arsenic atom, with
I = 33 electrons, there would be a prohibitive 1066 terms in the sum (4.28).
Even after reduction to Slater determinants, there would still be a prohibitive
3 1026 or so unknown coefficients left. The precise expression for the number of
Slater determinants is called “N choose I;” it is given by
à !
N N! N (N − 1)(N − 2) . . . (N − I + 1)
= = ,
I (N − I)!I! I!
since the top gives the total number of terms that have all functions different, (N
possible functions for particle 1, times N − 1 possible functions left for particle
2, etcetera,) and the bottom reflects that it takes I! of them to form a single
Slater determinant.
The basic “Hartree-Fock” approach, discussed in chapter 6.3, goes to the
extreme in reducing the number of functions: it uses the very minimum of I
single-particle functions. However, rather than choosing these functions a priori,
they are adjusted to give the best approximation that is possible with a single
Slater determinant. Unfortunately, if a single determinant still turns out to
4.8. MATRIX FORMULATION 143

be not accurate enough, adding a few more functions quickly blows up in your
face. Adding just one more function gives I more determinants; adding another
function gives another I(I + 1)/2 more determinants, etcetera.

Key Points
⋄ Wave functions for multiple-particle systems can be formed using sums
of products of single-particle wave functions.
⋄ The coefficients of these products are constrained by the symmetriza-
tion requirements.
⋄ In particular, for identical fermions such as electrons, the single-particle
wave functions must combine into Slater determinants.
⋄ Systems of identical fermions require at least as many single-particle
states as there are particles. This is known as the Fermi exclusion
principle.
⋄ If more single-particle states are used to describe a system, the problem
size increases rapidly.

4.7 Review Questions


1 How many single-particle states would a basic Hartree-Fock approx-
imation use to compute the electron structure of an arsenic atom?
How many Slater determinants would that involve?
2 If two more single-particle states would be used to improve the accu-
racy for the arsenic atom, (one more normally does not help), how
many Slater determinants could be formed with those states?

4.8 Matrix Formulation


When the number of unknowns in a quantum mechanical problem has been
reduced to a finite number, the problem can be reduced to a linear algebra
one. This allows the problem to be solved using standard analytical or numer-
ical techniques. This section describes how the linear algebra problem can be
obtained.
Typically, quantum mechanical problems can be reduced to a finite number
of unknowns using some finite set of chosen wave functions, as in the previous
section. There are other ways to make the problems finite, it does not really
make a difference here. But in general some simplification will still be needed
afterwards. A multiple sum like equation (4.28) for distinguishable particles
144 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

is awkward to work with, and when various coefficients drop out for identical
particles, its gets even messier. So as a first step, it is best to order the terms
involved in some way; any ordering will in principle do. Ordering allows each
term to be indexed by a single counter q, being the place of the term in the
ordering.
Using an ordering, the wave function for a total of I particles can be written
more simply as

Ψ = a1 ψ1I (~r1 , Sz1 ,~r2 , Sz2 , . . . ,~rI , SzI ) + a2 ψ2I (~r1 , Sz1 ,~r2 , Sz2 , . . . ,~rI , SzI ) + . . .

or in index notation:
Q
X
Ψ= aq ψqI (~r1 , Sz1 ,~r2 , Sz2 , . . . ,~rI , SzI ). (4.30)
q=1

where Q is the total count of the chosen I-particle wave functions and the single
counter q in aq replaces a set of I indices in the description used in the previous
section. The I-particle functions ψqI are allowed to be anything; individual
(Hartree) products of single-particle wave functions for distinguishable particles
as in (4.28), Slater determinants for identical fermions, permanents for identical
bosons, or whatever. The only thing that will be assumed is that they are
mutually orthonormal. (Which means that any underlying set of single-particle
functions ψnp (~r, Sz ) as described in the previous section should be orthonormal.
If they are not, there are procedures like Gram-Schmidt to make them so. Or
you can just put in some correction terms.)
Under those conditions, the energy eigenvalue problem Hψ = Eψ takes the
form:
Q
X Q
X
Haq ψqI = Eaq ψqI
q=1 q=1

The trick is now to take the inner product of both sides of this equation with
each function ψqI in the set of wave functions in turn. In other words, take an
inner product with hψ1I | to get one equation, then take an inner product with
hψ2I | to get a second equation, and so on. This produces, using the fact that the
functions are orthonormal to clean up the right-hand side,

H11 a1 + H12 a2 + . . . + H1Q aQ = Ea1


H21 a1 + H22 a2 + . . . + H2Q aQ = Ea2
.. .. .. .. .. .. ..
. . . . . . .
Hq1 a1 + Hq2 a2 + . . . + HqQ aQ = Eaq
.. .. .. .. .. .. ..
. . . . . . .
HQ1 a1 + HQ2 a2 + . . . + HQQ aQ = EaQ
4.8. MATRIX FORMULATION 145

where
H11 = hψ1I |Hψ1I i, H12 = hψ1I |Hψ2I i, . . . , HQQ = hψQ
I I
|HψQ i.
are the matrix coefficients, or Hamiltonian coefficients.
This can again be written more compactly in index notation:
Q
X
Hqq aq = Eaq for q = 1, 2, . . . , Q with Hqq = hψqI |HψqI i (4.31)
q=1

which is just a finite-size matrix eigenvalue problem.


Since the functions ψqI are known, chosen, functions, and the Hamiltonian H
is also known, the matrix coefficients Hqq can be determined. The eigenvalues
E and corresponding eigenvectors (a1 , a2 , . . .) can then be found using linear
algebra procedures. Each eigenvector produces a corresponding approximate
eigenfunction a1 ψ1I + a2 ψ2I + . . . with an energy equal to the eigenvalue E.

Key Points
⋄ Operator eigenvalue problems can be approximated by the matrix
eigenvalue problems of linear algebra.
⋄ That allows standard analytical or numerical techniques to be used in
their solution.

4.8 Review Questions


1 As a relatively simple example, work out the above ideas for the Q = 2
hydrogen molecule spatial states ψ1I = ψL ψR and ψ2I = ψL ψR . Write
the matrix eigenvalue problem and identify the two eigenvalues and
eigenvectors. Compare with the results of section 4.3.
Assume that ψL and ψR have been slightly adjusted to be orthonor-
mal. Then so are ψ1I and ψ2I orthonormal, since the various six-
dimensional inner product integrals, like
hψ1I |ψ2I i ≡ hψL ψR |ψR ψL i ≡
Z Z
ψL (~r1 )ψR (~r2 ) ψR (~r1 )ψL (~r2 ) d3~r1 d3~r2
all ~r1 all ~r 2

can according to the rules of calculus be factored into three-dimensi-


onal integrals as
hψ1I |ψ2I i
·Z ¸ ·Z ¸
3 3
= ψL (~r1 ) ψR (~r1 ) d ~r1 ψR (~r2 ) ψL (~r2 ) d ~r2
all ~r 1 all ~r 2

= hψL |ψR ihψR |ψL i


146 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

which is zero if ψL and ψR are orthonormal.


Also, do not try to find actual values for H11 , H12 , H21 , and H22 .
As section 4.2 noted, that can only be done numerically. Instead just
refer to H11 as J and to H12 as −L:

H11 ≡ hψ1I |Hψ1I i ≡ hψL ψR |HψL ψR i ≡ J

H12 ≡ hψ1I |Hψ2I i ≡ hψL ψR |HψR ψL i ≡ −L.

Next note that you also have

H22 ≡ hψ2I |Hψ2I i ≡ hψR ψL |HψR ψL i = J

H21 ≡ hψ2I |Hψ1I i ≡ hψR ψL |HψL ψR i = −L

because they are the exact same inner product integrals; the difference
is just which electron you number 1 and which one you number 2 that
determines whether the wave functions are listed as ψL ψR or ψR ψL .
2 Find the eigenstates for the same problem, but now including spin.
As section 4.7 showed, the antisymmetric wave function with spin
consists of a sum of six Slater determinants. Ignoring the highly
excited first and sixth determinants that have the electrons around
the same nucleus, the remaining C = 4 Slater determinants can be
written out explicitly to give the two-particle states
ψL ψR ↑↑ − ψR ψL ↑↑ ψL ψR ↑↓ − ψR ψL ↓↑
ψ1I = √ ψ2I = √
2 2
ψL ψR ↓↑ − ψR ψL ↑↓ ψL ψR ↓↓ − ψR ψL ↓↓
ψ3I = √ ψ4I = √
2 2

Note that the Hamiltonian does not involve spin, to the approxima-
tion used in most of this book, so that, following the techniques of
section 4.5, an inner product like H23 = hψ2I |Hψ3I i can be written out
like
1
H23 = hψL ψR ↑↓ − ψR ψL ↓↑|H(ψL ψR ↓↑ − ψR ψL ↑↓)i
2
1
= hψL ψR ↑↓ − ψR ψL ↓↑|(HψL ψR )↓↑ − (HψR ψL )↑↓i
2
and then multiplied out into inner products of matching spin compo-
nents to give
1 1
H23 = − hψL ψR |HψR ψL i − hψR ψL |HψL ψR i = L.
2 2
The other 15 matrix coefficients can be found similarly, and most will
be zero.
4.9. HEAVIER ATOMS [DESCRIPTIVE] 147

If you do not have experience with linear algebra, you may want to
skip this question, or better, just read the solution. However, the
four eigenvectors are not that hard to guess; maybe easier to guess
than correctly derive.

4.9 Heavier Atoms [Descriptive]


This section solves the ground state electron configuration of the atoms of ele-
ments heavier than hydrogen. The atoms of the elements are distinguished by
their “atomic number” Z, which is the number of protons in the nucleus. For
the neutral atoms considered in this section, Z is also the number of electrons
circling the nucleus.
A crude approximation will be made to deal with the mutual interactions
between the electrons. Still, many properties of the elements can be understood
using this crude model, such as their geometry and chemical properties, and
how the Pauli exclusion principle raises the energy of the electrons.
This is a descriptive section, in which no new analytical procedures are
taught. However, it is a very important section to read, and reread, because
much of our qualitative understanding of nature is based on the ideas in this
section.

4.9.1 The Hamiltonian eigenvalue problem


The procedure to find the ground state of the heavier atoms is similar to the
one for the hydrogen atom of chapter 3.2. The total energy Hamiltonian for the
electrons of an element with atomic number Z with is:
" #
XZ
h̄2 2 e2 Z 1 X Z
e2 1
H= − ∇ − + (4.32)
i=1 2me i 4πǫ0 ri 2 i=1 4πǫ0 |~ri − ~ri |
i6=i

Within the brackets, the first term represents the kinetic energy of electron
number i out of Z, the second the attractive potential due to the nuclear charge
Ze, and the final term is the repulsion by all the other electrons. In the Hamil-
tonian as written, it is assumed that half of the energy of a repulsion is credited
to each of the two electrons involved, accounting for the factor 21 .
The Hamiltonian eigenvalue problem for the energy states takes the form:
Hψ(~r1 , Sz1 ,~r2 , Sz2 , . . . ,~rZ , SzZ ) = Eψ(~r1 , Sz1 ,~r2 , Sz2 , . . . ,~rZ , SzZ )

Key Points
⋄ The Hamiltonian for the electron structure has been written down.
148 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

4.9.2 Approximate solution using separation of variables


The Hamiltonian eigenvalue problem of the previous subsection cannot be solved
exactly. The repulsive interactions between the electrons, given by the last term
in the Hamiltonian are too complex.
More can be said under the, really poor, approximation that each electron
“sees” a repulsion by the other Z − 1 electrons that averages out as if the other
electrons are located in the nucleus. The other Z − 1 electrons then reduce the
net charge of the nucleus from Ze to e. An other way of saying this is that each
of the Z − 1 other electrons “shields” one proton in the nucleus, allowing only
a single remaining proton charge to filter through.
In this crude approximation, the electrons do not notice each other at all;
they see only a single charge hydrogen nucleus. Obviously then, the wave func-
tion solutions for each electron should be the ψnlm eigenfunctions of the hydrogen
atom, which were found in chapter 3.2.
To verify this explicitly, the approximate Hamiltonian is
( )
Z
X h̄2 2 e2 1
H= − ∇i −
i=1 2m 4πǫ0 ri

since this represents a system of noninteracting electrons in which each experi-


ences an hydrogen nucleus potential. This can be written more concisely as
Z
X
H= hi
i=1

where hi is the hydrogen-atom Hamiltonian for electron number i,

h̄2 2 e2 1
hi = − ∇i − .
2m 4πǫ0 ri
The approximate Hamiltonian eigenvalue problem can now be solved using
a method of separation of variables in which solutions are sought that take the
form of products of single-electron wave functions:

ψ Z = ψ1e (~r1 , Sz1 )ψ2e (~r2 , Sz2 ) . . . ψZe (~rZ , SzZ ).


P
Substitution of this assumption into the eigenvalue problem i hi ψ Z = Eψ Z
and dividing by ψ Z produces
1 1
e
h1 ψ1e (~r1 , Sz1 ) + e
h2 ψ2e (~r2 , Sz2 ) + ... = E
ψ1 (~r1 , Sz1 ) ψ2 (~r2 , Sz2 )

since h1 only does anything to the factor ψ1e (~r1 , Sz1 ), h2 only does anything to
the factor ψ2e (~r2 , Sz2 ), etcetera.
4.9. HEAVIER ATOMS [DESCRIPTIVE] 149

The first term in the equation above must be some constant ǫ1 ; it cannot
vary with ~r1 or Sz1 as ψ1e (~r1 , Sz1 ) itself does, since none of the other terms in
the equation varies with those variables. That means that
h1 ψ1e (~r1 , Sz1 ) = ǫ1 ψ1e (~r1 , Sz1 ),
which is an hydrogen atom eigenvalue problem for the single-electron wave func-
tion of electron 1. So, the single-electron wave function of electron 1 can be any
one of the hydrogen atom wave functions from chapter 3.2; allowing for spin,
the possible solutions are,
ψ100 (~r1 )↑(Sz1 ), ψ100 (~r1 )↓(Sz1 ), ψ200 (~r1 )↑(Sz1 ), ψ200 (~r1 )↓(Sz1 ), . . .
The energy ǫ1 is the corresponding hydrogen atom energy level, E1 for ψ100 ↑
or ψ100 ↓, E2 for any of the eight states ψ200 ↑, ψ200 ↓, ψ211 ↑, ψ211 ↓, ψ210 ↑, ψ210 ↓,
ψ21−1 ↑, ψ21−1 ↓, etcetera.
The same observations hold for the other electrons; their single-electron
eigenfunctions are ψnlm l hydrogen atom ones, (where l can be either ↑ or ↓.)
Their individual energies must be the corresponding hydrogen atom energy lev-
els.
The final wave functions for all Z electrons are then each a product of Z
hydrogen-atom wave functions,
ψn1 l1 m1 (~r1 )l(Sz1 )ψn2 l2 m2 (~r2 )l(Sz2 ) . . . ψnZ lZ mZ (~rZ )l(SzZ )
and the total energy is the sum of all the corresponding hydrogen atom energy
levels,
En1 + En2 + . . . + EnZ .
This solves the Hamiltonian eigenvalue problem under the shielding approx-
imation. The bottom line is: just multiply Z hydrogen energy eigenfunctions
together to get an energy eigenfunction for an heavier atom. The energy is
the sum of the Z hydrogen energy levels. However, the electrons are identical
fermions, so different eigenfunctions must still be combined together in Slater
determinants to satisfy the antisymmetrization requirements for electron ex-
change, as discussed in section 4.7. That will be done during the discussion of
the different atoms that is next.

Key Points
⋄ The Hamiltonian eigenvalue problem is too difficult to solve analyti-
cally.
⋄ To simplify the problem, the detailed interactions between electrons
are ignored. For each electron, it is assumed that the only effect of
the other electrons is to cancel, or “shield,” that many protons in the
nucleus, leaving only a hydrogen nucleus strength.
150 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

⋄ This is a very crude approximation.


⋄ It implies that the Z-electron wave functions are products of the single-
electron hydrogen atom wave functions. Their energy is the sum of the
corresponding single-electron hydrogen energy levels.
⋄ These wave functions must still be combined together to satisfy the
antisymmetrization requirement (Fermi exclusion principle).

4.9.3 Hydrogen and helium


This subsection starts off the discussion of the approximate ground states of the
elements. Atomic number Z = 1 corresponds to hydrogen, which was already
discussed in chapter 3.2. The lowest energy state, or ground state, is ψ100 ,
(3.22), also called the “1s” state, and the single electron can be in the spin-up
or spin-down versions of that state, or in any combination of the two. The most
general ground state wave function is therefore:
³ ´
Ψ(~r1 , Sz1 ) = a1 ψ100 (~r1 )↑(Sz1 )+a2 ψ100 (~r1 )↓(Sz1 ) = ψ100 (~r1 ) a1 ↑(Sz1 )+a2 ↓(Sz1 )
(4.33)
The “ionization energy” that would be needed to remove the electron from the
atom is the absolute value of the energy eigenvalue E1 , or 13.6 eV, as derived
in chapter 3.2.
For helium, with Z = 2, in the ground state both electrons are in the
lowest possible energy state ψ100 . But since electrons are identical fermions, the
antisymmetrization requirement now rears its head. It requires that the two
states ψ100 (~r)↑(Sz ) and ψ100 (~r)↓(Sz ) appear together in the form of a Slater
determinant (chapter 4.7):
¯ ¯
a ¯ ψ (~r )↑(Sz1 ) ψ100 (~r1 )↓(Sz1 ) ¯
Ψ(~r1 , Sz1 ,~r2 , Sz2 ; t) = √ ¯¯¯ 100 1 ¯
¯ (4.34)
2 ψ100 (~r2 )↑(Sz2 ) ψ100 (~r2 )↓(Sz2 ) ¯

or, writing out the Slater determinant:


↑(Sz1 )↓(Sz2 ) − ↓(Sz1 )↑(Sz2 )
aψ100 (~r1 )ψ100 (~r2 ) √ .
2
The spatial part is symmetric with respect to exchange of the two electrons.
The spin state is antisymmetric; it is the singlet configuration with zero net
spin of section 4.5.6.
Figure 4.4 shows the approximate probability density for the first two ele-
ments, indicating where electrons are most likely to be found. In actuality, the
shielding approximation underestimates the nuclear attraction and the shown
helium atom is much too big.
4.9. HEAVIER ATOMS [DESCRIPTIVE] 151

Figure 4.4: Approximate solutions for hydrogen (left) and helium (right).

It is good to remember that the ψ100 ↑ and ψ100 ↓ states are commonly indi-
cated as the “K shell” after the first initial of the airline of the Netherlands.
The analysis predicts that the ionization energy to remove one electron from
helium would be 13.6 eV, the same as for the hydrogen atom. This is a very
bad approximation indeed; the truth is almost double, 24.6 eV.
The problem is the made assumption that the repulsion by the other elec-
tron “shields” one of the two protons in the helium nucleus, so that only a
single-proton hydrogen nucleus is seen. When electron wave functions overlap
significantly as they do here, their mutual repulsion is a lot less than you would
naively expect, (compare figure 9.13). As a result, the second proton is only
partly shielded, and the electron is held much more tightly than the analysis
predicts. See chapter 10.1.2 for better estimates of the helium atom size and
ionization energy.
However, despite the inaccuracy of the approximation chosen, it is probably
best to stay consistent, and not fool around at random. It must just be accepted
that the theoretical energy levels will be too small in magnitude {A.29}.
The large ionization energy of helium is one reason that it is chemically inert.
Helium is called a “noble” gas, presumably because nobody expects nobility to
do anything.

Key Points
⋄ The ground states of the atoms of the elements are to be discussed.
⋄ Element one is hydrogen, solved before. Its ground state is ψ100 with
arbitrary spin. Its ionization energy is 13.6 eV.
⋄ Element two is helium. Its ground state has both electrons in the
lowest-energy spatial state ψ100 , and locked into the singlet spin state.
Its ionization energy is 24.6 eV.
⋄ The large ionization energy of helium means it holds onto its two elec-
trons tightly. Helium is an inert noble gas.
152 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

⋄ The two “1s” states ψ100 ↑ and ψ100 ↓ are called the “K shell.”

4.9.4 Lithium to neon


The next element is lithium, with three electrons. This is the first element for
which the antisymmetrization requirement forces the theoretical energy to go
above the hydrogen ground state level E1 . The reason is that there is no way
to create an antisymmetric wave function for three electrons using only the two
lowest energy states ψ100 ↑ and ψ100 ↓. A Slater determinant for three electrons
must have three different states. One of the eight ψ2lm l states with energy E2
will have to be thrown into the mix.
This effect of the antisymmetrization requirement, that a new state must
become “occupied” every time an electron is added is known as the Pauli ex-
clusion principle. It causes the energy values to become larger and larger as the
supply of low energy states runs out.
The transition to the higher energy level E2 is reflected in the fact that in
the “periodic table” of the elements, table 4.1, lithium starts a new row.

I II III IV V VI VII O

H 1 He 2
K
13.6 2.20 24.6 —
Li 3 Be 4 B 5 C 6 N 7 O 8 F 9 Ne 10
L
5.4 0.98 9.3 1.57 8.3 2.04 11.3 2.55 14.5 3.04 13.6 3.44 17.4 3.98 21.6 —
Na 11 Mg 12 Al 13 Si 14 P 15 S 16 Cl 17 Ar 18
M
5.1 0.93 7.6 1.31 6.0 1.61 8.1 1.90 10.5 2.19 10.4 2.58 13.0 3.16 15.8 —
K 19 Ca 20 Ga 31 Ge 32 As 33 Se 34 Br 35 Kr 36
N
4.3 0.82 6.1 1.00 6.0 1.81 7.9 2.01 9.8 2.18 9.7 2.55 11.8 2.96 14.0 —

transition metals:
Sc 21 Ti 22 V 23 Cr 24 Mn 25 Fe 26 Co 27 Ni 28 Cu 29 Zn 30
6.5 1.36 6.8 1.54 6.7 1.63 6.8 1.66 7.4 1.55 7.9 1.83 7.9 1.88 7.6 1.91 7.7 1.9 9.4 1.65

Table 4.1: Abbreviated periodic table of the elements, showing element symbol,
atomic number, ionization energy, and electronegativity.

For the third electron of the lithium atom, the available states with theoret-
ical energy E2 are the ψ200 l “2s” states and the ψ211 l, ψ210 l, and ψ21−1 l “2p”
4.9. HEAVIER ATOMS [DESCRIPTIVE] 153

states, a total of eight possible states. These states are, of course, commonly
called the “L shell.”
Within the crude nuclear shielding approximation made, all eight states have
the same energy. However, on closer examination, the spherically symmetric 2s
states really have less energy than the 2p ones. Very close to the nucleus,
shielding is not a factor and the full attractive nuclear force is felt. So a state
in which the electron is more likely to be close to the nucleus has less energy.
That are the 2s states; in the 2p states, which have nonzero orbital angular
momentum, the electron tends to stay away from the immediate vicinity of the
nucleus {A.30}.
Within the assumptions made, there is no preference with regard to the spin
direction of the 2s state, allowing two Slater determinants to be formed.
¯ ¯
¯ ψ (~r )↑(S ) ψ (~r )↓(S ) ψ (~r )↑(S ) ¯
a1 ¯¯ 100 1 z1 100 1 z1 200 1 z1 ¯
¯
√ ¯ ψ100 (~r2 )↑(Sz2 ) ψ100 (~r2 )↓(Sz2 ) ψ200 (~r2 )↑(Sz2 ) ¯
6 ¯¯ ψ (~r )↑(S ) ψ (~r )↓(S ) ψ (~r )↑(S ) ¯
¯
100 3 z3 100 3 z3 200 3 z3
¯ ¯
¯ ψ (~r )↑(S ) ψ (~r )↓(S ) ψ (~r )↓(S ) ¯
a2 ¯¯ 100 1 z1 100 1 z1 200 1 z1 ¯
¯
+ √ ¯ ψ100 (~r2 )↑(Sz2 ) ψ100 (~r2 )↓(Sz2 ) ψ200 (~r2 )↓(Sz2 ) ¯ (4.35)
6 ¯¯ ψ (~r )↑(S ) ψ (~r )↓(S ) ψ (~r )↓(S ) ¯
¯
100 3 z3 100 3 z3 200 3 z3

Figure 4.5: Approximate solutions for lithium (left) and beryllium (right).

It is common to say that the “third electron goes into a ψ200 ” state. Of
course that is not quite precise; the Slater determinants above have the first
two electrons in ψ200 states too. But the third electron adds the third state
to the mix, so in that sense it more or less “owns” the state. For the same
reason, the Pauli exclusion principle is commonly phrased as “no two electrons
may occupy the same state”, even though the Slater determinants imply that
all electrons share all states equally.
154 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

Since the third electron is bound with the much lower energy |E2 | instead
of |E1 |, it is rather easily given up. Despite the fact that the lithium ion has a
nucleus that is 50% stronger than the one of helium, it only takes a ionization
energy of 5.4 eV to remove an electron from lithium, versus 24.6 eV for helium.
The theory would predict a ionization energy |E2 | = 3.4 eV for lithium, which
is close, so it appears that the two 1s electrons shield their protons quite well
from the 2s one. This is in fact what one would expect, since the 1s electrons
are quite close to the nucleus compared to the large radial extent of the 2s state.
Lithium will readily give up its loosely bound third electron in chemical
reactions. Conversely, helium would have even less hold on a third electron
than lithium, because it has only two protons in its nucleus. Helium simply
does not have what it takes to seduce an electron away from another atom.
This is the second part of the reason that helium is chemically inert: it neither
will give up its electrons nor take on additional ones.
Thus the Pauli exclusion principle causes different elements to behave chem-
ically in very different ways. Even elements that are just one unit apart in
atomic number such as helium (inert) and lithium (very active).
For beryllium, with four electrons, the same four states as for lithium com-
bine in a single 4 × 4 Slater determinant;
¯ ¯
¯ ψ100 (~r1 )↑(Sz1 ) ψ100 (~r1 )↓(Sz1 ) ψ200 (~r1 )↑(Sz1 ) ψ200 (~r1 )↓(Sz1 ) ¯
¯ ¯
a ¯¯ ψ100 (~r2 )↑(Sz2 ) ψ100 (~r2 )↓(Sz2 ) ψ200 (~r2 )↑(Sz2 ) ψ200 (~r2 )↓(Sz2 ) ¯
¯
√ ¯ ¯ (4.36)
24 ¯¯ ψ100 (~r3 )↑(Sz3 ) ψ100 (~r3 )↓(Sz3 ) ψ200 (~r3 )↑(Sz3 ) ψ200 (~r3 )↓(Sz3 ) ¯
¯
¯ ψ100 (~r4 )↑(Sz4 ) ψ100 (~r4 )↓(Sz4 ) ψ200 (~r4 )↑(Sz4 ) ψ200 (~r4 )↓(Sz4 ) ¯

The ionization energy jumps up to 9.3 eV, due to the increased nuclear strength
and the fact that the fellow 2s electron does not shield its proton as well as the
two 1s electrons do theirs.
For boron, one of the ψ21m “2p” states will need to be occupied. Within
the approximations made, there is no preference for any particular state. As an
example, figure 4.6 shows the approximate solution in which the ψ210 , or “2pz ”
state is occupied. It may be recalled from figure 3.5 that this state remains close
to the z-axis (which is horizontal in the figure.) As a result, the wave function
becomes directional. The ionization energy decreases a bit to 8.3 eV, indicating
that indeed the 2p states have higher energy than the 2s ones.
For carbon, a second ψ21m state needs to be occupied. Within the made
approximations, the second 2p electron could also go into the 2pz state. How-
ever, in actuality, repulsion by the electron already in the 2pz state makes it
preferable for the new electron to stay away from the z-axis, which it can do by
going into say the 2px state. This state is around the vertical x-axis instead of
the horizontal z-axis. As noted in chapter 3.2, 2px is a ψ21m combination state.
For nitrogen, the third 2p electron can go into the 2py state, which is around
the y-axis. There are now three 2p electrons, each in a different spatial state.
4.9. HEAVIER ATOMS [DESCRIPTIVE] 155

Figure 4.6: Example approximate solution for boron.

However, for oxygen the game is up. There are no more free spatial states in
the L shell. The new electron will have to go, say, into the py state, pairing up
with the electron already there in an opposite-spin singlet state. The repulsion
by the fellow electron in the same state reflects in an decrease in ionization
energy compared to nitrogen.
For fluorine, the next electron goes into the 2px state, leaving only the 2pz
state unpaired.
For neon, all 2p electrons are paired, and the L shell is full. This makes neon
an inert noble gas like helium: it cannot accommodate any more electrons at
the E2 energy level, and, with the strongest nucleus among the L-shell elements,
it holds tightly onto the electrons it has.
On the other hand, the previous element, fluorine, has a nucleus that is al-
most as strong, and it can accommodate an additional electron in its unpaired
2pz state. So fluorine is very willing to steal an electron if it can get away with
it. The capability to draw electrons from other elements is called “electronega-
tivity,” and fluorine is the most electronegative of them all.
Neighboring elements oxygen and nitrogen are less electronegative, but oxy-
gen can accommodate two additional electrons rather than one, and nitrogen
will even accommodate three.

Key Points
⋄ The Pauli exclusion principle forces states of higher energy to become
occupied when the number of electrons increases. This raises the energy
levels greatly above what they would be otherwise.
⋄ With the third element, beryllium, one of the ψ200 l “2s” states be-
comes occupied. Because of the higher energy of those states, the
third electron is readily given up; the ionization energy is only 5.4 eV.
156 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

⋄ Conversely, helium will not take on a third electron.


⋄ The fourth element is lithium, with both 2s states occupied.
⋄ For boron, carbon, nitrogen, oxygen, fluorine, and neon, the successive
ψ21m “2p” states become occupied.
⋄ Neon is a noble gas like helium: it holds onto its electrons tightly, and
will not accommodate any additional electrons since they would have
to enter the E3 energy level states.
⋄ Fluorine, oxygen, and nitrogen, however, are very willing to accommo-
date additional electrons in their vacant 2p states.
⋄ The eight states ψ2lm l are called the “L shell.”

4.9.5 Sodium to argon


Starting with sodium (natrium), the E3 , or “M shell” begins to be filled. Sodium
has a single 3s electron in the outermost shell, which makes it much like lithium,
with a single 2s electron in its outermost shell. Since the outermost electrons are
the critical ones in chemical behavior, sodium is chemically much like lithium.
Both are metals with a “valence” of one; they are willing to sacrifice one electron.
Similarly, the elements following sodium in the third row of the periodic
table 4.1 mirror the corresponding elements in the previous row. Near the end
of the row, the elements are again eager to accept additional electrons in the
still vacant 3p states.
Finally argon, with no 3s and 3p vacancies left, is again inert. This is actually
somewhat of a surprise, because the E3 M-shell also includes 10 ψ32m l states.
These states of increased angular momentum are called the “3d” states. (What
else?) According to the approximations made, the 3s, 3p, and 3d states would
all have the same energy. So it might seem that argon could accept additional
electrons into the 3d states.
But it was already noted that the p states in reality have more energy than
the s states, and the d states have even more. The reason is the same: the d
states stay even further away from the nucleus than the p states. Because of
the higher energy of the d states, argon is really not willing to accept additional
electrons.

Key Points
⋄ The next eight elements mirror the properties of the previous eight,
from the metal sodium to the highly electronegative chlorine and the
noble gas argon.
⋄ The states ψ3lm l are called the “M shell.”
4.10. PAULI REPULSION [DESCRIPTIVE] 157

4.9.6 Potassium to krypton


The logical continuation of the story so far would be that the potassium (kalium)
atom would be the first one to put an electron into a 3d state. However, by now
the shielding approximation starts to fail not just quantitatively, but qualita-
tively. The 3d states actually have so much more energy than the 3s states that
they even exceed the energy of the 4s states. Potassium puts its last electron
into a 4s state, not a 3d one. This makes its outer shell much like the ones of
lithium and sodium, so it starts a new row in the periodic table.
The next element, calcium, fills the 4s shell, putting an end to that game.
Since the six 4p states have more energy, the next ten elements now start filling
the skipped 3d states with electrons, leaving the N-shell with 2 electrons in it.
(Actually, this is not quite precise; the 3d and 4s energies are closely together,
and for copper and chromium one of the two 4s electrons turns out to switch
to a 3d state.) In any case, it takes until gallium until the six 4p states start
filling, which is fully accomplished at krypton. Krypton is again a noble gas,
though it can form a weak bond with chlorine.
Continuing to still heavier elements, the energy levels get even more con-
fused. This discussion will stop while it is still ahead.

Key Points
⋄ Unlike what the approximate theory says, in real life the 4s states ψ400 l
have less energy than the ψ32m l 3d states, and are filled first.
⋄ After that, the transition metals fill the skipped states before the old
logic resumes.
⋄ The states ψ4lm l are called the “N shell.” It all spells KLM Nether-
lands.
⋄ The substates are of course called “s,” “p,” “d,” “f,” . . .

4.10 Pauli Repulsion [Descriptive]


Before proceeding to a description of chemical bonds, one important point must
first be made. While the earlier descriptions of the hydrogen molecular ion
and hydrogen molecule produced many important observations about chemical
bonds, they are highly misleading in one aspect.
In the hydrogen molecule cases, the repulsive force that eventually stops the
atoms from getting together any closer than they do is the electrostatic repulsion
between the nuclei. It is important to recognize that this is the exception,
rather than the norm. Normally, the main repulsion between atoms is not
158 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

due to repulsion between the nuclei, but due to the Pauli exclusion principle
for their electrons. Such repulsion is called “exclusion-principle repulsion” or
“Pauli repulsion.”
To understand why the repulsion arises, consider two helium ions, and as-
sume that you put them right on top of each other. Of course, with the nuclei
right on top of each other, the nuclear repulsion will be infinite, but ignore that
for now. There is another effect, and that is the interesting one here. There are
now 4 electrons in the 1s shell.
Without the Pauli exclusion principle, that would not be a big deal. The
repulsion between the electrons would go up, but so would the combined nuclear
strength double. However, Pauli says that only two electrons may go into the
1s shell. The other two 1s electrons will have to divert to the 2s shell, and that
requires a lot of energy.
Next consider what happens when two helium atoms are not on top of each
other, but are merely starting to intrude on each other’s 1s shell space. Re-
call that the Pauli principle is just the antisymmetrization requirement of the
electron wave function applied to a description in terms of given energy states.
When the atoms get closer together, the energy states get confused, but the
antisymmetrization requirement stays in full force. When the filled shells start
to intrude on each other’s space, the electrons start to divert to increasingly
higher energy to continue to satisfy the antisymmetrization requirement. This
process ramps up much more quickly than the nuclear repulsions and dominates
the net repulsion in almost all circumstances.
In everyday terms, the standard example of repulsion forces that ramp up
very quickly is billiard balls. If billiard balls are a millimeter away from touching,
there is no repulsion between them, but move them closer a millimeter, and
suddenly there is this big repulsive force. The repulsion between filled atom
shells does not ramp up that quickly in relative terms, of course, but it does
ramp up quickly. So describing atoms with closed shells as billiard balls is quite
reasonable if you are just looking for a general idea.

Key Points

⋄ If electron wave functions intrude on each others space, it can cause


repulsion due to the antisymmetrization requirement.

⋄ This is called Pauli repulsion or exclusion principle repulsion.

⋄ It is the dominant repulsion in almost all cases.


4.11. CHEMICAL BONDS [DESCRIPTIVE] 159

4.11 Chemical Bonds [Descriptive]


The electron states, or “atomic orbitals”, of the elements discussed in section
4.9 form the basis for the “valence bond” description of chemical bonds. This
section summarizes some of the basic ideas involved.

4.11.1 Covalent sigma bonds


As pointed out in section 4.9, helium is chemically inert: its outermost, and
only, shell can hold two electrons, and it is full. But hydrogen has only one
electron, leaving a vacant position for another 1s electron. As discussed earlier
in chapter 4.2, two hydrogen atoms are willing to share their electrons. This
gives each atom in some sense two electrons in its shell, filling it up. The shared
state has lower energy than the two separate atoms, so the H2 molecule stays
together. A sketch of the shared 1s electrons was given in figure 4.2.
Fluorine has one vacant spot for an electron in its outer shell just like hydro-
gen; its outer shell can contain 8 electrons and fluorine has only seven. One of
its 2p states, assume it is the horizontal axial state 2pz , has only one electron in
it instead of two. Two fluorine atoms can share their unpaired electrons much
like hydrogen atoms do and form an F2 molecule. This gives each of the two
atoms a filled shell. The fluorine molecular bond is sketched in figure 4.7 (all
other electrons have been omitted.) This bond between p electrons looks quite

Figure 4.7: Covalent sigma bond consisting of two 2pz states.

different from the H2 bond between s electrons in figure 4.2, but it is again a
covalent one, in which the electrons are shared. In addition, both bonds are
called “sigma” bonds: if you look at either bond from the side, it looks rota-
tionally symmetric, just like an s state. (Sigma is the Greek equivalent of the
letter s; it is written as σ.)

Key Points
160 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

⋄ Two fluorine or similar atoms can share their unpaired 2p electrons in


much the same way that two hydrogen atoms can share their unpaired
2s electrons.
⋄ Since such bonds look like s states when seen from the side, they are
called sigma or σ bonds.

4.11.2 Covalent pi bonds


The N2 nitrogen molecule is another case of covalent bonding. Nitrogen atoms
have a total of three unpaired electrons, which can be thought of as one each in
the 2px , 2py , and 2pz states. Two nitrogen atoms can share their unpaired 2pz
electrons in a sigma bond the same way that fluorine does, longitudinally.
However, the 2px and 2py states are normal to the line through the nuclei;
these states must be matched up sideways. Figure 4.8 illustrates this for the
bond between the two vertical 2px states. This covalent bond, and the corre-

Figure 4.8: Covalent pi bond consisting of two 2px states.

sponding one between the 2py states, looks like a p state when seen from the
side, and it is called a “pi” or π bond.
So, the N2 nitrogen molecule is held together by two pi bonds in addition to
a sigma bond, making a triple bond. It is a relatively inert molecule.

Key Points
⋄ Unpaired p states can match up sideways in what are called pi or π
bonds.
4.11. CHEMICAL BONDS [DESCRIPTIVE] 161

4.11.3 Polar covalent bonds and hydrogen bonds


Oxygen, located in between fluorine and nitrogen in the periodic table, has two
unpaired electrons. It can share these electrons with another oxygen atom to
form O2 , the molecular oxygen we breath. However, it can instead bind with
two hydrogen atoms to form H2 O, the water we drink.
In the water molecule, the lone 2pz electron of oxygen is paired with the 1s
electron of one hydrogen atom, as shown in figure 4.9. Similarly, the lone 2py

Figure 4.9: Covalent sigma bond consisting of a 2pz and a 1s state.

electron is paired with the 1s electron of the other hydrogen atom. Both bonds
are sigma bonds: they are located on the connecting line between the nuclei.
But in this case each bond consists of a 1s and a 2p state, rather than two states
of the same type.
Since the x and y axis are orthogonal, the two hydrogen atoms in water
should be at a 90 degree angle from each other, relative to the oxygen nucleus.
(Without valence bond theory, the most logical guess would surely have been
that they would be at opposite sides of the oxygen atom.) The predicted 90
degree angle is in fair approximation to the experimental value of 105 degrees.
The reason that the actual angle is a bit more may be understood from the
fact that the oxygen atom has a higher attraction for the shared electrons, or
electronegativity, than the hydrogen atoms. It will pull the electrons partly away
from the hydrogen atoms, giving itself some negative charge, and the hydrogen
atoms a corresponding positive one. The positively charged hydrogen atoms
repel each other, increasing their angle a bit. If you go down one place in the
periodic table below oxygen, to the larger sulfur atom, H2 S has its hydrogen
atoms under about 93 degrees, quite close to 90 degrees.
Bonds like the one in water, where the negative electron charge shifts towards
the more electronegative atom, are called “polar” covalent bonds.
162 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

It has significant consequences for water, since the positively charged hy-
drogen atoms can electrostatically attract the negatively charged oxygen atoms
on other molecules. This has the effect of creating bonds between different
molecules called “hydrogen bonds.” While much weaker than typical covalent
bonds, they are strong enough to affect the physical properties of water. For
example, they are the reason that water is normally a liquid instead of a gas,
quite a good idea if you are thirsty, and that ice floats on water instead of sink-
ing to the bottom of the oceans. Hydrogen is particularly efficient at creating
such bonds because it does not have any other electrons to shield its nucleus.

Key Points
⋄ The geometry of the quantum states reflects in the geometry of the
formed molecules.
⋄ When the sharing of electrons is unequal, a bond is called polar.
⋄ A special case is hydrogen, which is particularly effective in also cre-
ating bonds between different molecules, hydrogen bonds, when polar-
ized.
⋄ Hydrogen bonds give water unusual properties that are critical for life
on earth.

4.11.4 Promotion and hybridization


While valence bond theory managed to explain a number of chemical bonds so
far, two important additional ingredients need to be added. Otherwise it will
not at all be able to explain organic chemistry, the chemistry of carbon critical
to life.
Carbon has two unpaired 2p electrons just like oxygen does; the difference
between the atoms is that oxygen has in addition two paired 2p electrons. With
two unpaired electrons, it might seem that carbon should form two bonds like
oxygen.
But that is not what happens; normally carbon forms four bonds instead
of two. In chemical bonds, one of carbon’s paired 2s electrons moves to the
empty 2p state, leaving carbon with four unpaired electrons. It is said that the
2s electron is “promoted” to the 2p state. This requires energy, but the energy
gained by having four bonds more than makes up for it.
Promotion explains why a molecule such as CH4 forms. Including the 4
shared hydrogen electrons, the carbon atom has 8 electrons in its outer shell, so
its shell is full. It has made as many bonds as it can support.
However, promotion is still not enough to explain the molecule. If the CH4
molecule was merely a matter of promoting one of the 2s electrons into the
4.11. CHEMICAL BONDS [DESCRIPTIVE] 163

vacant 2py state, the molecule should have three hydrogen atoms under 90
degrees, sharing the 2px , 2py and 2pz electrons respectively, and one hydrogen
atom elsewhere, sharing the remaining 2s electron. In reality, the CH4 molecule
is shaped like a regular tetrahedron, with angles of 109.5 degrees between all
four hydrogens.
The explanation is that, rather than using the 2px , 2py , 2pz , and 2s states
directly, the carbon atom forms new combinations of the four called “hybrid”
states. (This is not unlike how the torus-shaped ψ211 and ψ21−1 states were
recombined in chapter 3.2 to produce the equivalent 2px and 2py pointer states.)
In case of CH4 , the carbon converts the 2s, 2px , 2py , and 2pz states into four
new states. These are called sp3 states, since they are formed from one s and
three p states. They are given by:

|sp3a i = 21 (|2si + |2px i + |2py i + |2pz i)

|sp3b i = 12 (|2si + |2px i − |2py i − |2pz i)

|sp3c i = 12 (|2si − |2px i + |2py i − |2pz i)

|sp3d i = 21 (|2si − |2px i − |2py i + |2pz i)

where the kets denote the wave functions of the indicated states.
All four sp3 hybrids have the same shape, shown in figure 4.10. The asym-

Figure 4.10: Shape of an sp3 hybrid state.

metrical shape can increase the overlap between the wave functions in the bond.
The four sp3 hybrids are under equal 109.5 degrees angles from each other, pro-
ducing the tetrahedral structure of the CH4 molecule. And of diamond, for that
matter. With the atoms bound together in all spatial directions, diamond is an
extremely hard material.
164 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

But carbon is a very versatile atom. In graphite, and carbon nanotubes,


carbon atoms arrange themselves in layers instead of three dimensional struc-
tures. Carbon achieves this trick by leaving the 2p-state in the direction normal
to the plane, call it px , out of the hybridization. The two 2p states in the plane
plus the 2s state can then be combined into three sp2 states:

1 2
|sp2a i = √ |2si + √ |2pz i
3 6
1 1 1
|sp2b i = √ |2si − √ |2pz i + √ |2py i
3 6 2
1 1 1
|sp2c i = √ |2si − √ |2pz i − √ |2py i
3 6 2
Each is shaped as shown in figure 4.11. These planar hybrids are under 120

Figure 4.11: Shapes of the sp2 (left) and sp (right) hybrids.

degree angles from each other, giving graphite its hexagonal structure. The
left-out p electrons normal to the plane can form pi bonds with each other. A
4.11. CHEMICAL BONDS [DESCRIPTIVE] 165

planar molecule formed using sp2 hybridization is ethylene (C2 H4 ); it has all six
nuclei in the same plane. The pi bond normal to the plane prevents out-of-plane
rotation of the nuclei around the line connecting the carbons, keeping the plane
rigid.
Finally, carbon can combine the 2s state with a single 2p state to form two
sp hybrids under 180 degrees from each other:

1
|spa i = √ (|2si + |2pz i)
2
1
|spb i = √ (|2si − |2pz i)
2
An example sp hybridization is acetylene (C2 H2 ), which has all its four nuclei
on a single line.

Key Points
⋄ The chemistry of carbon is critical for life as we know it.
⋄ It involves two additional ideas; one is promotion, where carbon kicks
one of its 2s electrons into a 2p state. This gives carbon one 2s and
three 2p electrons.
⋄ The second idea is hybridization, where carbon combines these four
states in creative new combinations called hybrids.
⋄ In sp3 hybridization, carbon creates four hybrids in a regular tetrahe-
dron combination.
⋄ In sp2 hybridization, carbon creates three hybrids in a plane, spaced
at 120 degree intervals. That leaves a conventional 2p state in the
direction normal to the plane.
⋄ In sp hybridization, carbon creates two hybrids along a line, pointing
in opposite directions. That leaves two conventional 2p states normal
to the line of the hybrids and each other.

4.11.5 Ionic bonds


Ionic bonds are the extreme polar bonds; they occur if there is a big difference
between the electronegativities of the atoms involved.
An example is kitchen salt, NaCl. The sodium atom has only one electron
in its outer shell, a loosely bound 3s one. The chlorine has seven electrons in its
outer shell and needs only one more to fill it. When the two react, the chlorine
does not just share the lone electron of the sodium atom, it simply takes it away.
166 CHAPTER 4. MULTIPLE-PARTICLE SYSTEMS

It makes the chlorine a negatively charged ion. Similarly, it leaves the sodium
as a positively charged ion.
The charged ions are bound together by electrostatic forces. Since these
forces act in all directions, each ion does not just attract the opposite ion it
exchanged the electron with, but all surrounding opposite ions. And since in
salt each sodium ion is surrounded by six chlorine ions and vice versa, the
number of bonds that exists is large.
Since so many bonds must be broken to take a ionic substance apart, their
properties are quite different from covalently bounded substances. For example,
salt is a solid with a high melting point, while the covalently bounded Cl2 chlo-
rine molecule is normally a gas, since the bonds between different molecules are
weak. Indeed, the covalently bound hydrogen molecule that has been discussed
much in this chapter remains a gas until especially low cryogenic temperatures.
Chapter 7.2 will give a more quantitative discussion of ionic molecules and
solids.

Key Points
⋄ When a bond is so polar that practically speaking one atom takes the
electron away from the other, the bond is called ionic.
⋄ Ionic substances like salt tend to form strong solids, unlike typical
purely covalently bound molecules like hydrogen that tend to form
gases.

4.11.6 Limitations of valence bond theory


Valence bond theory does a terrific job of describing chemical bonds, producing
a lot of essentially correct, and very nontrivial predictions, but it does have
limitations.
One place it fails is for the O2 oxygen molecule. In the molecule, the atoms
share their unpaired 2px and 2pz electrons. With all electrons symmetrically
paired in the spatial states, the electrons should all be in singlet spin states
having no net spin. However, it turns out that oxygen is strongly paramagnetic,
indicating that there is in fact net spin. The problem in valence bond theory
that causes this error is that it ignores the already paired-up electrons in the
2py states. In the molecule, the filled 2py states of the atoms are next to each
other and they do interact. In particular, one of the total of four 2py electrons
jumps over to the 2px states, where it only experiences repulsion by two other
electrons instead of by three. The spatial state of the electron that jumps over
is no longer equal to that of its twin, allowing them to have equal instead of
opposite spin.
4.11. CHEMICAL BONDS [DESCRIPTIVE] 167

Valence bond theory also has problems with single-electron bonds such as
the hydrogen molecular ion, or with benzene, in which the carbon atoms are
held together with what is essentially 1.5 bonds, or rather, bonds shared as in
a two state system. Excited states produce major difficulties. Various fixes and
improved theories exist.

Key Points
⋄ Valence bond theory is extremely useful. It is conceptually simple and
explains much of the most important chemical bonds.
⋄ However, it does have definite limitations: some types of bonds are not
correctly or not at all described by it.
⋄ Little in life is ideal, isn’t it?
Chapter 5

Time Evolution

The evolution of systems in time is less important in quantum mechanics than


in classical physics, since in quantum mechanics so much can be learned from
the energy eigenvalues and eigenfunctions. Still, time evolution is needed for
such important physical processes like the emission and absorption of radiation
and the decay of unstable atomic nuclei.

5.1 The Schrödinger Equation


In Newtonian mechanics, Newton’s second law states that the linear momentum
changes in time proportional to the applied force; dm~v /dt = m~a = F~ . The
equivalent in quantum mechanics is the Schrödinger equation, which describes
how the wave function evolves. This section discusses this equation, and a few
of its immediate consequences.
The Schrödinger equation says that the time derivative of the wave function
is obtained by applying the Hamiltonian on it. More precisely:
∂Ψ
ih̄ = HΨ (5.1)
∂t
The solution to the Schrödinger equation can immediately be given for most
cases of interest. The only condition that needs to be satisfied is that the
Hamiltonian depends only on the state the system is in, and not explicitly on
time. This condition is satisfied in all cases discussed so far, including the
particle in a box, the harmonic oscillator, the hydrogen and heavier atoms, and
the molecules, so the following solution applies to them all.
To satisfy the Schrödinger equation, write the wave function Ψ in
terms of the energy eigenfunctions ψ~n of the Hamiltonian,
X
Ψ = c~n1 (t)ψ~n1 + c~n2 (t)ψ~n2 + . . . = c~n (t)ψ~n (5.2)
~
n

169
170 CHAPTER 5. TIME EVOLUTION

Then the coefficients c~n must evolve in time as complex exponentials:

c~n (t) = c~n (0)e−iE~n t/h̄ (5.3)

for every combination of quantum numbers ~n.

The initial values c~n (0) of the coefficients are not determined from the Schrö-
dinger equation, but from whatever initial condition for the wave function is
given.
As always, the appropriate set of quantum numbers ~n depends on the prob-
lem. For example, the energy eigenfunctions ψnlm ↑ and ψnlm ↓ of the hydrogen
atom are characterized by the set of quantum numbers n, l, m, and ms , where
ms = ± 12 indicates spin up or down. So, for the hydrogen atom the above
solution for the wave function can be written out more explicitly as
∞ n−1
X X X l
Ψ= cnlm+ (0)e−iEn t/h̄ ψnlm (r, θ, φ)↑ + cnlm− (0)e−iEn t/h̄ ψnlm (r, θ, φ)↓
n=1 l=0 m=−l

(This ignores any external disturbances and small errors due to spin and rela-
tivity.)
The given solution in terms of eigenfunctions covers most cases of interest,
but as noted, it is not valid if the Hamiltonian depends explicitly on time. That
possibility arises when there are external influences on the system; in such cases
the energy does not just depend on what state the system itself is in, but also
on what the external influences are like at the time.

5.1.1 Energy conservation


Assuming that there are no external influences, the Schrödinger equation implies
that the energy of a system is conserved. To see why, remember that the coeffi-
cients c~n of the energy eigenfunctions give the probability for the corresponding
energy. While according to the Schrödinger equation these coefficients vary with
time, their square magnitudes do not:

|c~n (t)|2 ≡ c~n∗ (t)c~n (t) = c~n∗ (0)eiE~n t/h̄ c~n (0)e−iE~n t/h̄ = |c~n (0)|2

So according to the orthodox interpretation, the probability of measuring a given


energy level does not vary with time either. For example, the wave function for
a hydrogen atom at the excited energy level E2 might be of the form:

Ψ = e−iE2 t/h̄ ψ210 ↑

(This corresponds to an assumed initial condition in which all cnlm± are zero
except c210+ = 1.) The square magnitude of the exponential is one, so the
5.1. THE SCHRÖDINGER EQUATION 171

energy of this excited atom will stay E2 with 100% certainty for all time. The
energy is conserved.
This also illustrates that left to itself, an excited atom will maintain its
energy indefinitely. It will not emit a photon and drop back to the unexcited
energy E1 . Excited atoms emit radiation because they are perturbed by an
electromagnetic field. This is true even in vacuum at absolute zero. For one
thing, even in vacuum, the electromagnetic field has a nonzero ground state
energy. However, the radiation that the atom interacts with is its own, through
a weird twilight effect, chapter 10.2.4. In any case, eventually, at a time that is
observed to be random for reasons discussed in chapter 11.6, the perturbation
causes the excited atom to drop back to the lower energy state and emit the
photon.
Returning to the unperturbed atom, it should also be noted that even if the
energy is uncertain, still the probabilities of measuring the various energy levels
do not change with time. As an arbitrary example, the following wave function
describes a case of an undisturbed hydrogen atom where the energy has a 50/50
chance of being measured as E1 (-13.6 eV) or as E2 (-3.4 eV):
1 1
Ψ = √ e−iE1 t/h̄ ψ100 ↓ + √ e−iE2 t/h̄ ψ210 ↑
2 2
The 50/50 probability applies regardless how long the wait is before the mea-
surement is done.
How about the other conservation laws, such as conservation of linear or
angular momentum for a system that is left alone? Surprisingly, it turns out
that these conservation laws arise from the symmetries of physics. That is
discussed in chapter 11.4.

5.1.2 Stationary states


The previous subsection examined the time variation of energy, but the Schrö-
dinger equation also determines how the other physical properties, such as po-
sitions and momenta, of a given system vary with time.
The simplest case is that in which the energy is certain, in other words,
states in which the wave function is a single energy eigenfunction:
Ψ = c~n (0)e−iE~n t/h̄ ψ~n
It turns out, {A.31}, that none of the physical properties of such a state changes
with time. The physical properties may be uncertain, but the probabilities for
their possible values will remain the same. For that reason, states of definite
energy are called “stationary states.”
Hence it is not really surprising that none of the energy eigenfunctions de-
rived so far had any resemblance to the classical Newtonian picture of a particle
172 CHAPTER 5. TIME EVOLUTION

moving around. Each energy eigenfunction by itself is a stationary state. There


will be no change in the probability of finding the particle at any given location
regardless of the time you look, so how could it possibly resemble a classical
particle that is at different positions at different times?
Similarly, while classically the linear momentum of a particle that experi-
ences forces will change with time, in energy eigenstates the chances of measur-
ing a given momentum do not change with time.
To get time variations of physical quantities, states of different energy must
be combined. In other words, there must be uncertainty in energy.

5.1.3 Time variations of symmetric two-state systems


The simplest case of physical systems that can have a nontrivial dependence on
time are systems described by two different states. Some examples of such sys-
tems were given in chapter 4.3. One was the hydrogen molecular ion, consisting
of two protons and one electron. In that case, there was a state ψ1 in which the
electron was in the ground state around one proton, and a state ψ2 in which
it was around the other proton. Another example was the ammonia molecule,
where the nitrogen atom was at one side of its ring of hydrogens in state ψ1 ,
and at the other side in state ψ2 . This section examines the time variation of
such systems.
It will be assumed that the states ψ1 and ψ2 are physically equivalent, like
the mentioned examples. In that case, according to chapter 4.3 the ground state
of lowest energy, call it EL , is an equal combination of the two states ψ1 and
ψ2 . The state of highest energy EH is also an equal combination, but with the
opposite sign. The solution of the Schrödinger equation is in terms of these two
combinations of states, {A.32}:
ψ1 + ψ2 ψ1 − ψ2
Ψ = cL e−iEL t/h̄ √ + cH e−iEH t/h̄ √
2 2
Consider now the case of the hydrogen molecular ion, and assume that the
electron is around the first proton, so in state ψ1 , at time t = 0. The wave
function must then be:
" #
−iEL t/h̄ ψ1 + ψ2 ψ1 − ψ2
Ψ = cL e √ + e−i(EH −EL )t/h̄ √
2 2
At time zero, this produces indeed state ψ1 , but when the exponential in the last
term becomes -1, the system converts into state ψ2 . The electron has jumped
over to the other proton.
The time this takes is
πh̄
EH − EL
5.1. THE SCHRÖDINGER EQUATION 173

since e−iπ = −1, (1.5). After another time interval of the same length the
electron will be back in state ψ1 around the first proton, and so on.
Note that this time interval for the two protons to exchange the electron
is inversely proportional to the energy difference EH − EL . In chapter 4.3
this energy difference appeared in another context: it is twice the molecular
binding energy produced by the “twilight terms” when the electron is shared.
It is interesting now to see that this binding energy also determines the time it
takes for the electron to be exchanged if it is not shared. The more readily the
protons exchange the non-shared electron, the more the binding energy of the
shared state will be. It may be one reason that the twilight terms are commonly
referred to as “exchange terms.”
The mathematics for the time evolution of the nitrogen atom in ammonia is
similar. If measurements locate the nitrogen atom at one side of the hydrogen
ring, then after a certain time, it will pop over to the other side. However, the
more interesting thing about the ammonia molecule is the difference in energy
levels itself: transitions from EH to EL produce microwave radiation, allowing
a maser to be constructed.

5.1.4 Time variation of expectation values


The time evolution of more complex systems can be described in terms of the
energy eigenfunctions of the system, just like for the two state systems of the
previous subsection. However, finding the eigenfunctions may not be easy.
Fortunately, it is possible to find the evolution of the expectation value of
physical quantities without solving the energy eigenvalue problem. The expec-
tation value, defined in chapter 3.3, gives the average of the possible values of
the physical quantity.
The Schrödinger equation requires that the expectation value hai of any
physical quantity a with associated operator A evolves in time as:
¿ À * +
dhai i ∂A
= [H, A] + (5.4)
dt h̄ ∂t
The derivation is in note {A.33}. The commutator [H, A] of A with the Hamil-
tonian was defined in chapter 3.4 as HA−AH. The final term in (5.4) is usually
zero, since most (simple) operators do not explicitly depend on time.
The above evolution equation for expectation values does not require the
energy eigenfunctions, but it does require the commutator. Its main applica-
tion is to relate quantum mechanics to Newtonian mechanics, as in the next
section. (Some minor applications that will be left to the notes for the in-
terested are the “virial theorem” {A.34} relating kinetic and potential energy
and the Mandelshtam-Tamm version of the “energy-time uncertainty principle”
∆E∆t ≥ 21 h̄ {A.35}.)
174 CHAPTER 5. TIME EVOLUTION

Note that if A commutes with the Hamiltonian, i.e. [H, A] = 0, then the
expectation value of the corresponding quantity a will not vary with time. Such
a quantity has eigenfunctions that are also energy eigenfunctions, so it has the
same time-preserved statistics as energy. Equation (5.4) demonstrates this for
the expectation value, but the standard deviation, etcetera, would not change
with time either.

5.1.5 Newtonian motion


The purpose of this section is to show that even though Newton’s equations do
not apply to very small systems, they are correct for macroscopic systems.
The trick is to note that for a macroscopic particle, the position and mo-
mentum are very precisely defined. Many unavoidable physical effects, such as
incident light, colliding air atoms, earlier history, etcetera, will narrow down po-
sition and momentum of a macroscopic particle to great accuracy. Heisenberg’s
uncertainty relationship says that they must have uncertainties big enough that
σx σpx ≥ 21 h̄, but h̄ is far too small for that to be noticeable on a macroscopic
scale. Normal light changes the momentum of a rocket ship in space only im-
measurably little, but it is quite capable of locating it to excellent accuracy.
With little uncertainty in position and momentum, both can be approxi-
mated accurately by their expectation values. So the evolution of macroscopic
systems can be obtained from the evolution equation (5.4) for expectation values
given in the previous subsection. Just work out the commutator that appears
in it.
Consider one-dimensional motion of a particle in a potential V (x) (the three-
dimensional case goes exactly the same way). The Hamiltonian H is:

pb2x
H= + V (x)
2m
where pbx is the linear momentum operator and m the mass of the particle.
Now according to evolution equation (5.4), the expectation position hxi
changes at a rate:
¿ À * " #+
dhxi i i pb2x
= [H, xb] = + V (x), xb (5.5)
dt h̄ h̄ 2m

Recalling the properties of the commutator from chapter 3.4, [V (x), xb] = 0, since
multiplication commutes. Further, according to the rules for manipulation of
products and the canonical commutator

[pb2x , xb] = pbx [pbx , xb] + [pbx , xb]pbx = −pbx [xb, pbx ] − [xb, pbx ]pbx = −2ih̄pbx
5.1. THE SCHRÖDINGER EQUATION 175

So the rate of change of expectation position becomes:


¿ À
dhxi px
= (5.6)
dt m
This is exactly the Newtonian expression for the change in position with time,
because Newtonian mechanics defines px /m to be the velocity. However, it is in
terms of expectation values.
To figure out how the expectation value of momentum varies, the commu-
tator [H, pbx ] is needed. Now pbx commutes, of course, with itself, but just like it
does not commute with xb, it does not commute with the potential energy V (x);
the generalized canonical commutator (3.44) says that [V, pbx ] equals −h̄∂V /i∂x.
As a result, the rate of change of the expectation value of linear momentum
becomes: * +
dhpx i ∂V
= − (5.7)
dt ∂x
This is Newton’s second law in terms of expectation values: Newtonian me-
chanics defines the negative derivative of the potential energy to be the force,
so the right hand side is the expectation value of the force. The left hand side
is equivalent to mass times acceleration.
The fact that the expectation values satisfy the classical equations is known
as “Ehrenfest’s theorem.”
(For a quantum system, however, it should be cautioned that even the expec-
tation values do not truly satisfy Newtonian equations. Newtonian equations
use the force at the expectation value of position, instead of the expectation
value of the force. If the force varies nonlinearly over the range of possible
positions, it makes a difference.)

5.1.6 Heisenberg picture


This book follows the formulation of quantum mechanics as developed by Schrö-
dinger. However, there is another, earlier, formulation due to Heisenberg. This
subsection gives a brief description so that you are aware of it when you run
into it in literature.
In the Schrödinger picture, physical observables like position and momentum
are represented by time-independent operators. The time dependence is in the
wave function. This is somewhat counterintuitive because classically position
and momentum are time dependent quantities. The Heisenberg picture removes
the time dependence from the wave function and absorbs it into the operator.
To see how that works out, consider first the wave function. According to
the Schrödinger equation, it can be written as

Ψ(. . . ; t) = e−iHt/h̄ Ψ(. . . ; 0) (5.8)


176 CHAPTER 5. TIME EVOLUTION

where the exponential of an operator is defined through its Taylor series:


t t2
e−iHt/h̄ = 1 − i H − H2 + . . . (5.9)
h̄ 2!h̄2
This exponential form applies assuming that the Hamiltonian is independent of
time. If it is not, the transformation from the initial wave function Ψ(. . . ; 0) to
a later time one Ψ(. . . ; t) still remains a “unitary” one; one that keeps the wave
function normalized.
Now consider an arbitrary Schrödinger operator A. b The physical effects of
the operator can be characterized by inner products, as in
b (. . . ; 2)i
hΨ1 (. . . ; t)|AΨ (5.10)
2

Such a dot product tells you what amount of a wave function Ψ1 is produced
by applying the operator on a wave function Ψ2 . Knowing these inner products
for all wave functions is equivalent to knowing the operator.
If the time-dependent exponentials are now peeled off Ψ1 and Ψ2 and ab-
sorbed into the operator, that operator becomes

Ae ≡ eiHt/h̄ Ae
b −iHt/h̄ (5.11)

where the argument of the first exponential changed sign due to being taken to
the other side of the inner product.
The operator Ae depends on time. To see how it evolves, differentiate the
product with respect to time:

dAe i b
b −iHt/h̄ + eiHt/h̄ ∂ A e−iHt/h̄ − eiHt/h̄ Ae
b −iHt/h̄ i H
= HeiHt/h̄ Ae
dt h̄ ∂t h̄
The first and third terms can be recognized as the commutator of H and A, e
b
while the middle term is the Heisenberg version of the time derivative of A,
b
in case A does depend on time. So the evolution equation for the Heisenberg
operator becomes
g
dAe i h i ∂ Ab h i h i
= H, Ae + H, Ae = eiHt/h̄ H, Ab e−iHt/h̄ (5.12)
dt h̄ ∂t
(Note that there is no difference between the Hamiltonians H c and Hf because
H commutes with itself, hence with its exponentials.)
For example, consider the x position and linear momentum operators of a
particle. These do not depend on time, and using the commutators as figured
out in the previous subsection, the Heisenberg operators evolve as:

dxe 1 dpex g
∂V
= pex =−
dt m dt ∂x
5.1. THE SCHRÖDINGER EQUATION 177

Those have the same form as the equations for the classical position and mo-
mentum. It is the Ehrenfest theorem on steroids.
In fact, the equivalent of the general equation (5.12) is also found in classi-
cal physics: it is derived in advanced mechanics in this form, with the so-called
“Poisson bracket” taking the place of the commutator. As a simple example,
consider one-dimensional motion of a particle. Any variable a that is some func-
tion of the position and linear momentum of the particle has a time derivative
given by
da ∂a dx ∂a dpx
= −
dt ∂x dt ∂px dt
according to the total differential of calculus. And from the classical Hamilto-
nian
p2
H = x +V
2m
it is seen that the time derivatives of position and momentum obey the classical
“Hamiltonian dynamics”

dx ∂H dpx ∂H
= =−
dt ∂px dt ∂x

Substituting this into the time derivative of a gives

da ∂a ∂H ∂a ∂H
= −
dt ∂x ∂px ∂px ∂x

The negative of the right hand side is by definition the Poisson bracket (H, a).
Note that it, like the commutator, is antisymmetric under exchange of H and a.
And using the so-called Lagrangian formulation usually covered in an engineer-
ing education, and otherwise found in note {A.3}, this story can be generalized
to multiple, possibly non Cartesian coordinates. In that case, in the Pois-
son bracket you must sum over all generalized coordinates and their associated
“canonical” momenta. (For example, an angular coordinate has the correspond-
ing angular momentum as its canonical momentum. In general, canonical mo-
mentum is the derivative of the Lagrangian with respect to the time derivative
of the coordinate.)
The bottom line is that the Heisenberg equations are usually not easy to solve
unless you return to the Schrödinger picture by peeling off the time dependence.
In relativistic applications however, time joins space as an additional coordinate,
and the Heisenberg picture becomes more helpful. It can also make it easier to
identify the correspondence between classical equations and the corresponding
quantum operators.
178 CHAPTER 5. TIME EVOLUTION

5.1.7 The adiabatic approximation

The adiabatic approximation describes the evolution of systems for which the
Hamiltonian changes nontrivially in time, but slowly. (Note that this use of
the word “adiabatic” is not to be confused with adiabatic in thermodynam-
ics, which normally describes processes that occur sufficiently quickly that heat
transfer with the surroundings can be ignored. The term “quasi-steady” would
be understandable, so physicists could not use that one.)
As a simple example, assume that you have a particle in the ground state
in a box, and you change the volume of the box by a significant amount. The
question is, will the particle still be in the ground state after the volume change?
Normally there is no reason to assume so; after all, either way the energy of the
particle will change significantly. However, the “adiabatic theorem” says that
if the change is performed slowly enough, the particle will indeed remain in the
ground state, even when that state slowly changes into a completely different
one.
If the system is in an energy state other than the ground state, the particle
will stay in that state as it evolves during an adiabatic process. The theorem
does assume that the energy remains at all times non-degenerate, so that the
energy state is unambiguous. More sophisticated versions of the analysis exist
to deal with degeneracy and continuous spectra.
Note A.36 gives a derivation of the theorem and some additional implica-
tions. The most important practical application of the adiabatic theorem is
without doubt the Born-Oppenheimer approximation, which is discussed sepa-
rately in chapter 6.2.

5.2 Unsteady Perturbations of Systems


This section takes a general look at what happens to a system, say an hydrogen
atom, that can be in two different relevant energy eigenstates and you poke at
it with a perturbation, say an electromagnetic field. A typical application is the
emission and absorption of radiation by atoms.
The energy eigenstates of the unperturbed system will be used to describe the
system both with and without the perturbation. So, let ψL be the unperturbed
lowest energy state and ψH be the unperturbed highest energy, or “excited”,
state. In principle, ψL and ψH can be any two energy eigenstates of a system,
but to be concrete, think of ψL as the ψ100 ground state of an hydrogen atom,
and of ψH as an excited state like ψ210 , with energy E2 > E1 .
5.2. UNSTEADY PERTURBATIONS OF SYSTEMS 179

5.2.1 Schrödinger equation for a two-state system


By assumption, the wave function can be approximated as a combination of the
two unperturbed eigenstates:
Ψ = aψL + bψH |a|2 + |b|2 = 1 (5.13)
where |a|2 is the probability that the system can be found at the lower energy
EL , and |b|2 the probability that it can be found at the higher energy EH . The
sum of the two probabilities must be one; the two-state system must be found
in either state, {A.37}.
According to the Schrödinger equation, the time evolution is given by ih̄Ψ̇ =
HΨ, so here ³ ´
ih̄ ȧψL + ḃψH = H (aψL + bψH )
Separate equations for ȧ and ḃ can be obtained by taking inner products with
hψL |, respectively hψH | and using orthonormality in the left hand side:

ih̄ȧ = HLL a + HLH b ih̄ḃ = HHL a + HHH b (5.14)


which involves the following “Hamiltonian coefficients”

HLL = hψL |HψL i HLH = hψL |HψH i


(5.15)
HHL = hψH |HψL i HHH = hψH |HψH i

Note that HLL and HHH are real, (1.16) and that HLH and HHL are complex

conjugates, HHL = HLH .
A general analytical solution to the system (5.14) cannot be given, but you
can get rid of half the terms in the right hand sides using the following trick:
define new coefficients ā and b̄ by
R R
a = āe−i HLL dt/h̄
b = b̄e−i HHH dt/h̄
(5.16)
The new coefficients ā and b̄ are physically just as good as a and b: the probabil-
ities are given by the square magnitudes of the coefficients, and the exponentials
above have magnitude one, so the square magnitudes of ā and b̄ are exactly the
same as those of a and b. Also, the initial conditions, call them a0 and b0 , are
unchanged assuming you choose the integration constants appropriately.
The equations for ā and b̄ are a lot simpler; substituting the definitions into
(5.14) and simplifying:

ih̄ā˙ = H LH b̄ ih̄b̄˙ = H HL ā (5.17)


where R

H LH = H HL = HLH e−i (HHH −HLL ) dt/h̄
(5.18)
180 CHAPTER 5. TIME EVOLUTION

5.2.2 Stimulated and spontaneous emission


The simplified evolution equations (5.17) that were derived in the previous sec-
tion have a remarkable property: for every solution ā, b̄ there is a second solution
ā2 = b̄∗ , b̄2 = −ā∗ that has the probabilities of the low and high energy states
exactly reversed. It means that

A perturbation that lifts a system out of the ground state will equally
take that system out of the excited state.

It is a consequence of the Hermitian nature of the Hamiltonian; it would not



apply if H LH was not equal to H HL .

(a) Spontaneous emission:

(b) Absorption:

(c) Stimulated emission:

Figure 5.1: Emission and absorption of radiation by an atom.

Consider again the example of the atom. If the atom is in the excited
state, it can spontaneously emit a photon, a quantum of electromagnetic energy,
transitioning back to the ground state, as sketched in figure 5.1(a). That is
spontaneous emission; the emitted photon will have an energy Ephoton and an
electromagnetic frequency ω0 given by:

Ephoton ≡ h̄ω0 = EH − EL (5.19)

The inverse of this process is where you perturb the ground state atom with an
electromagnetic wave of frequency ω0 and the atom absorbs one photon of energy
from that wave, entering the excited state. That is absorption, as sketched in
figure 5.1(b). But according to the reversed solution above, there must then
also be a corresponding process where the same perturbing photon takes the
system out of the excited state back to the ground state, figure 5.1(c). Because
of energy conservation, this process, called “stimulated emission”, will produce
a second photon.
It is the operating principle of the laser: if you have a collection of atoms
all in the excited state, you can create a runaway process where a single photon
stimulates an atom to produce a second photon, and then those two photons go
5.2. UNSTEADY PERTURBATIONS OF SYSTEMS 181

on to produce two more, and so on. The result will be monochromatic, coherent
light, since all its photons originate from the same source.
Note that you must initially have a “population inversion,” you must have
more excited atoms than ground state ones, because absorption competes with
stimulated emission for photons. Indeed, if you have a 50/50 mixture of ground
state and excited atoms, then the processes of figures 5.1(b) and 5.1(c) exactly
cancel each other’s effects.
Going back to spontaneous emission; as has been mentioned in section 5.1.1,
there is really no such thing. The Schrödinger equation shows that an excited
atom will maintain its energy indefinitely if not perturbed. Spontaneous emis-
sion, figure 5.1(a), is really stimulated emission figure 5.1(c), in which, loosely
speaking, the triggering photon jumps into and out of existence due to the
quantum fluctuations of the electromagnetic field. Details of that process are
in chapter 10.2.4.

5.2.3 Effect of a single wave


This subsection will derive the basic equations (5.22) and (5.24) for the inter-
action between an atom and a single wave. It will be assumed that the atom
is hydrogen, or at least that interactions between the electrons can be ignored.
The perturbing electromagnetic field will be assumed to be a monochromatic
wave that is propagating along the y-axis and is polarized in the z-direction,
(9.28):
³ ´ ³ ´
~ = k̂E0 cos ω(t − y/c) − φ
E ~ = ı̂ 1 E0 cos ω(t − y/c) − φ .
B
c

where E~ is the electric field strength, B


~ the magnetic field strength, the constant
E0 is the amplitude of the electric field, ω > 0 the angular frequency of the wave,
c the speed of light, and φ is some phase angle.
But this can be greatly simplified. At non relativistic velocities, the charged
electron primarily reacts to the electric field, so the magnetic field can be ig-
nored. Moreover, the atom, supposed to be at the origin, is so small compared
to the typical wave length of an electromagnetic wave, (assuming it is light
and not an X-ray,) that y can be put to zero. Then the electromagnetic field
simplifies to a spatially uniform electric field:
~ = k̂E0 cos(ωt − φ)
E (5.20)

The Lyman-transition wave lengths are of the order of a thousand Å, and the
atom about one Å, so this seems reasonable enough.
Assuming also that the internal time scales of the electron are fast compared
to ω, the perturbation Hamiltonian can be found as the potential of a quasi-
182 CHAPTER 5. TIME EVOLUTION

steady uniform electric field:

H1 = eE0 cos(ωt − φ)z (5.21)

(It is like the mgh potential energy of gravity, with the charge e playing the
part of the mass m, the electric field strength E0 cos(ωt − φ) that of the gravity
strength g, and z that of the height h.)
To this the unperturbed Hamiltonian of the hydrogen atom must be added;
that one was written down in chapter 3.2.1, but its form is not important for
the effect of the perturbation, and it will just be referred to as H0 . So the total
Hamiltonian is
H = H0 + H1
with H1 as above.
Now the Hamiltonian matrix coefficients (5.15) are needed. The first one is
HLL :

HLL = hψL |H0 |ψL i + hψL |H1 |ψL i = EL + eE0 cos(ωt − φ)hψL |z|ψL i

It can be seen from the symmetry properties of the eigenfunctions of the hy-
drogen atom as given in chapter 3.2.4 that the final inner product is zero, and
so HLL is just the lower atom energy EL . Similarly, HHH is the higher atom
energy EH .
For HLH , the inner product with H0 is zero, since ψL and ψH are orthogonal
eigenfunctions of H0 , and the inner product with H1 gives:

HLH = E0 hψL |ez|ψH i cos(ωt − φ)

To get the Hamiltonian of the simplified system (5.17), according to (5.18)


the coefficient HLH needs to be multiplied with
R R
e−i (HHH −HLL ) dt/h̄
= e−i (EH −EL ) dt/h̄

By definition
EH − EL = h̄ω0
with ω0 the frequency of the photon released when the atom transitions from
the high energy state to the low one. So the factor to multiply HLH with is
simply e−iω0 t , and the Hamiltonian coefficient of the simplified system becomes

H LH = E0 hψL |ez|ψH i cos(ωt − φ)e−iω0 t

And if you write HHH − HLL in terms of a frequency, you may as well do
the same with the time-independent part of H LH too, and define a “frequency”
ω1 by
E0 hψL |ez|ψH i ≡ h̄ω1 (5.22)
5.2. UNSTEADY PERTURBATIONS OF SYSTEMS 183

Note however that unlike ω0 , ω1 has no immediate physical meaning as a fre-


quency; it is just a concise way of writing the effective strength level of the
perturbation. Also, ω1 could be a complex number.
The simplified system (5.17) for the coefficients ā and b̄ of the states ψL and
ψH now becomes:

ā˙ = −iω1 cos(ωt − φ)e−iω0 t b̄ b̄˙ = −iω1 cos(ωt − φ)eiω0 t ā (5.23)

This system governs how the probabilities of the low and high energy states vary
with time under the electromagnetic single wave of frequency ω. Unfortunately,
it does not have a simple solution.
But according to the Euler formula (1.5), the cosine comes apart into two
exponentials:
ei(ωt−φ) + e−i(ωt−φ)
cos(ωt − φ) =
2
If this is substituted into (5.23), one exponential gives rise to a factor e±i(ω+ω0 )t .
Now for light, ω0 is extremely large, in the order of 1015 /s, and adding ω makes
it even larger, so this exponential fluctuates extremely rapidly in value and
averages out to zero over any reasonable time interval. For that reason, this
exponential is usually ignored. (If the relevant time interval becomes so short
that ω0 t is no longer large, this is no longer justified.)
Keeping only the exponential with the frequency opposite to ω0 , by approx-
imation the equations (5.23) for the coefficients ā and b̄ of the lower and excited
energy states become:

ei(ω−ω0 )t −i(ω−ω0 )t
e
ā˙ = −iω1 e−iφ b̄ b̄˙ = −iω1 eiφ ā (5.24)
2 2

Recall that ω was the frequency of the perturbation, ω0 the frequency of the
emitted photon, (5.19), and ω1 a measure for the strength of the perturbation,
(5.22). This approximation requires that ω0 t is large, if t is the typical time
over which the perturbation is applied.

5.2.4 Forbidden transitions


According to the results (5.22) and (5.24) of the previous subsection, the effect
of an electromagnetic wave on an atom transition between two states ψL and
ψH is proportional to the inner product hψL |z|ψH i. If that inner product is zero,
the electromagnetic wave cannot cause such a transition, to the approximations
made.
However, the assumed wave had its electric field in the z-direction. A more
general electromagnetic field has electric field components in all three axial
184 CHAPTER 5. TIME EVOLUTION

directions. Such a field can cause transitions as long as at least one of the three
inner products of the form hψL |ri |ψH i, with ri equal to x, y, or z, is nonzero.
Because these inner products vaguely resemble integrals that produce the so-
called electric dipole strength of a charge distribution, chapter 9.5, transitions
caused by them are called “electric dipole transitions.” They could not be called
“uniform electric field transitions,” because that would have been meaningful
and understandable.
If all three inner products hψL |ri |ψH i are zero, then the transition cannot
occur to the approximations made. Such transitions are called “forbidden tran-
sitions.” An example is the hydrogen 2s to 1s transition, i.e. ψH = ψ200 and
ψL = ψ100 . Both of these states are spherically symmetric, making the inner
product hψL |ri |ψH i zero by symmetry: the negative values of ri integrate away
against the positive values. So, with no perturbation effect left, the prediction
must then unavoidably be that the excited 2s state does not decay!
The term is misleading, however: forbidden transitions often take place just
fine, even if the electric dipole approximation says they cannot. Reasons for
them to occur anyway can be the ignored spatial variations in the electric field,
leading to so-called multipole transitions, or the also ignored interaction with
the magnetic field, leading to magnetic dipole or multipole transitions. However,
since these are very small effects, “forbidden” transitions do take much longer
to occur than normal electric dipole ones by orders of magnitude.

5.2.5 Selection rules


Electric dipole transitions between hydrogen atom states cannot occur unless the
quantum numbers of the states ψL and ψH involved satisfy certain conditions.
These conditions are called “selection rules.” The quantum numbers appearing
in the rules are the orbital azimuthal quantum number l, giving the square
orbital angular momentum as l(l + 1)h̄2 , the orbital magnetic quantum number
m, giving the orbital angular momentum in the chosen z-direction as mh̄, and
the spin magnetic quantum number ms = ± 12 , called spin up respectively down,
giving the spin angular momentum in the z-direction as ms h̄. For electric dipole
transitions to occur, these quantum numbers must satisfy, {A.38},

lH = lL ± 1 mH = mL or mL ± 1 ms,H = ms,L (5.25)

It should be noted that in a more sophisticated analysis of the hydrogen


atom, there is a slight interaction between orbital and spin angular momentum
in the atom, chapter 10.1.6. As a result the correct energy eigenfunctions no
longer have a definite z-component of orbital angular momentum nor of spin,
and the above rules are no longer really right. The energy eigenfunctions do
have definite values for the square magnitude J 2 = j(j + 1)h̄2 of the combined
5.2. UNSTEADY PERTURBATIONS OF SYSTEMS 185

b ~b + S
~b and its z-component Jbz = mj h̄. In those terms
angular momentum J~ = L
the appropriate selection rules become

lH = lL ± 1 jH = jL or jL ± 1 mj,H = mj,L or mj,L ± 1 (5.26)

If these selection rules are not satisfied, the transition is called forbidden.
However, the transition may still occur through a different mechanism. One
possibility is a “magnetic dipole transition,” in which the electron interacts with
the magnetic part of the electromagnetic field. That interaction occurs because
an electron has spin and orbital angular momentum. A charged particle with
angular momentum behaves like a little electromagnet and wants to align itself
with an ambient magnetic field, chapter 9.6. The selection rules in this case are,
{A.38},

lH = lL mH = mL or mL ± 1 ms,H = ms,L or ms,L ± 1 (5.27)

In addition either the orbital or the spin magnetic quantum numbers must be
unequal. In the presence of spin-orbit coupling, that becomes

lH = lL jH = jL or jL ± 1 mj,H = mj,L or mj,L ± 1 (5.28)

Transitions may also occur through spatial variations in the electric or mag-
netic fields. The strongest of these are the so-called “electric quadrupole tran-
sitions.” These must satisfy the selection rules, {A.38},

lH = lL or lL ± 2 mH = mL or mL ± 1 or mL ± 2 ms,H = ms,L (5.29)

In addition lH = lL = 0 is not possible for such transitions.

5.2.6 Angular momentum conservation


According to the selection rules of the previous subsection, the 2s, or ψ200 ,
hydrogen atom state cannot decay to the 1s, or ψ100 , ground state through an
electric dipole transition, because lH = lL = 0. That transition is therefor
forbidden. That does not mean it does not take place; just forbid your kids
something. But this particular excited state cannot return to the ground state
through a magnetic dipole transition either, because without orbital angular
momentum, only the spin responds to the magnetic field. It cannot return to
the ground state through a electric quadrupole transition, because these do not
allow lH = lL = 0 transitions.
In fact, the 2s state cannot decay to the ground state through any elec-
tric or magnetic multipole transition. There is a simple argument to see why
not, much simpler than trying to work out the selection rules for all possible
186 CHAPTER 5. TIME EVOLUTION

multipole transitions. Both the 2s and 1s states have zero angular momentum.
Conservation of angular momentum then requires that the photon emitted in
the transition must have zero angular momentum too, and a photon cannot, [4].
A photon is a boson that has spin one. Now it is certainly possible for
a particle with spin sp = 1 and orbital angular momentum quantum number
lp = 1 to be in a state that has zero net angular momentum, jp = 0. In fact,
it can be deduced from figure 9.6 that that happens for a linear combination
of the three states that each have zero net angular momentum in the chosen
z-direction:
q q q
1 1 1
|jp =0i = 3
|mp =1, ms,p =−1i − 3
|mp =0, ms,p =0i + 3
|mp =−1, ms,p =1i
However, a photon is not a normal particle; it is a relativistic particle with rest
mass zero that can only move at the speed of light. It turns out that when
the z-axis is taken in the direction of propagation, the photon can only have
ms,p = 1 or ms,p = −1; the state ms,p = 0 does not exist, chapter 10.2.3. So
the combination above with zero net angular momentum is not possible for a
photon.
So, what does happen to the 2s state? It turns out that in an extremely
high vacuum, in which the state is not messed up by collisions with other atoms
or particles, the dominant decay is by the emission of two photons, rather than
a single one. This takes forever on quantum scales; the 2s state survives about
a tenth of a second rather than maybe a nanosecond for an electric dipole
transition. You see why ψ210 was cited as the typical excited state in this
section, rather than the more obvious ψ200 .
This example illustrates how powerful conservation law arguments can be.
Much of the selection rules of the previous subsection, as well as of many not
listed there, can be understood using a simple rule for the angular momentum
of the emitted photon, [4]: The net angular momentum of the emitted photon
can be taken to be jp = 1 in dipole transitions. It increases by one unit for each
next higher level: quadrupole transitions have jp = 2, octupole ones jp = 3,
etcetera.
Conservation of angular momentum says that the angular momentum of the
excited state ψH must be the sum of that of the lower state ψL plus that of the
photon:
b b b
J~H = J~L + J~p
Now if two vectors of lengths JL and Jp are added together in classical physics,
as in figure 5.2, then the length of the resultant vector satisfies the so-called
“triangle inequality”
|JL − Jp | ≤ JH ≤ JL + Jp
JH must be less than JL + Jp or the combined lengths of the sides AB and BC
in figure 5.2 would not be enough to reach point C from point A even if aligned
5.2. UNSTEADY PERTURBATIONS OF SYSTEMS 187

:C
»
»
»» ¢̧
H »» »
J~» ¢
»» » ¢
»»»

HH
¢
¢J~p
HH ¢
HH ¢
J~L HH ¢
Hj

B

Figure 5.2: Triangle inequality.

with AC. Similarly JH has to be long enough to cover the length difference
between JL and Jp . Classical physics also says that the z-component Jp,z of the
photon angular momentum equals JH,z − JL,z , and that the magnitude of that
b
component cannot be more than the length of the complete vector J~p , so

|JH,z − JL,z | ≤ Jp

In quantum mechanics however, angular momentum is quantized. The


length of an angular momentum
q vector is given in terms of an azimuthal quan-
tum number j as J = j(j + 1)h̄. In these terms, the triangle inequality
becomes, chapter 9.1,
|jL − jp | ≤ jH ≤ jL + jp (5.30)
Also a z-component of angular momentum is given in terms of a magnetic
quantum number m as Jz = mh̄. It is still true that Jp,z = JH,z − JL,z , but
now it is the magnetic quantum number mj,p whose magnitude is limited by the
azimuthal quantum jp , so

|mj,H − mj,L | ≤ jp (5.31)

Ignoring spin-orbit interaction and transitions that just flip over the electron
spin, these requirements become

|lL − jp | ≤ lH ≤ lL + jp |mH − mL | ≤ jp

The quantum numbers j and m must be integer or half integer. For orbital
angular momentum, they can only be integer.
Since the emitted photon has jp = 1 in dipole transmissions, these expres-
sions limit the changes in azimuthal and magnetic quantum numbers to no
more than one unit each. Similarly, in quadrupole transitions, with jp = 2,
the changes are limited to two units, etcetera. The quantum triangle inequality
also implies that transitions from jH = 0 to jL = 0 are not possible. For higher
188 CHAPTER 5. TIME EVOLUTION

order multipole transitions, the corresponding restriction become stronger. For


a quadrupole transition, jL = 0 requires jH = 2 and jL = 1 requires at least
jH = 1.
Some selection rules are not explained by angular momentum conservation.
For example, how come that l must change one unit in electric dipole transi-
tions, while it must remain the same in magnetic dipole transitions? Angular
momentum conservation allows any change that does not exceed one unit for
either. These gaps are filled in in the next subsection.

5.2.7 Parity
The previous subsection demonstrated the power of conservation laws in figur-
ing out what physical interactions can occur. Physicists such as Wigner have
therefor developed these arguments to a fine art. One reason is that they con-
tinue to work for systems that are not by far as well understood as an hydrogen
atom, in particular atomic nuclei. Mathematicians discovered that conservation
laws relate to fundamental symmetries of nature, chapter 11.4. For example,
conservation of angular momentum turns out to be a mathematical rephrasing
of the fact that the physics of a system is the same regardless how the system
is oriented compared to the surrounding empty space.
Physicists then identified another helpful symmetry of nature; nature looks
perfectly fine when viewed in a mirror. Your left hand changes into a right hand
when seen in a mirror, but there is no fundamental difference between the two;
lots of people write with their left hand. Physicists figured nature had to behave
exactly the same way when seen in a mirror. And it does, as long as the weak
nuclear force does not play a part. That force will be ignored here.
In three dimensions, a mathematically neat way of doing the mirroring is to
replace ~r by −~r. This is equivalent to a mirroring followed by a 180◦ rotation
around the line normal to the mirror. The operator that does that is called the
“inversion operator,” and its eigenvalues must have magnitude one, because the
integrated square magnitude of the wave function does not change when seen in
the mirror. The eigenvalues must also be real, because the inversion operator is
Hermitian; taking it to the other side in inner product amounts to just a change
in the names of the integration variables. Therefor, the eigenvalues can only
be plus or minus one. If the eigenvalue is one, the parity is called even, if it is
minus one, the parity is odd.
It turns out, {A.15}, that a particle in an orbit with azimuthal quantum
number l has a parity (−1)l : odd if l is odd, and even when l is even. Also,
while the angular momenta of different particles add up to the total angular
momentum, the parities of particles multiply to the total parity. For example,
5.2. UNSTEADY PERTURBATIONS OF SYSTEMS 189

consider a wave function

Ψ = cψn1 l1 m1 (~rA )ψn2 l2 m2 (~rB )

for two particles A and B. When space is inverted, the change of ~rA into −~rA
changes the wave function by a factor (−1)l1 , while the change of ~rB into −~rB
produces another factor (−1)l2 , for a total of (−1)l1 (−1)l2 , the product of the
parities. The same for a complete Slater determinant of such terms.
If the weak nuclear force is not a factor, parity must be conserved like angular
momentum is. Now consider hydrogen atom transitions from that point of view.
The parity of the emitted photon can be taken to be odd in electric dipole
transitions [4]. The combined parity of atom and photon after emission is then
the opposite of the parity of the atom alone. If that combined parity is still to
be equal to the parity of the atom before the emission of the photon, the parity
of the atom must have changed. That means the orbital angular momentum
quantum number l must have changed. Conversely, a magnetic dipole transition
emits a photon with even parity that has no effect on the combined parity. So
the atom must still have the same parity after the transition as before. That
requires l to remain unchanged; a unit change, the only other possibility allowed
by angular momentum conservation, would flip over the parity of the atom.
The parity of the photon flips over with every next higher multipole level. For
example, while the photon in electric dipole transitions has odd parity, in elec-
tric quadrupole transitions it has even parity. Therefor, in electric quadrupole
transitions, l must be the same or change by two units; a unit change would
now violate conservation of parity.
It should be pointed out that the given rules for photons only work fine for
their interaction with matter. In particular, parity is a somewhat dubious thing
for a photon, because physicists cannot just bring a photon to rest to examine
its properties without the context of motion. A photon has zero rest mass and
must move at the speed of light or it does not exist. And it is easy to see from
an simple example like Ψ = sin(x) that the parity operator does not commute
with linear momentum. In any case, officially the photon is classified as having
odd intrinsic parity.

5.2.8 Absorption of a single weak wave


This subsection discusses what happens to atoms in the ground state if you
perturb them with a single coherent wave of electromagnetic radiation.
To do so, the system of equations (5.24) derived in subsection 5.2.3 must
be solved to give the probabilities |ā|2 and |b̄|2 of the lower and higher energy
states. That can be done; these same equations are solved for nuclear magnetic
resonance in chapter 9.8. But the solution is messy. Therefor in this section an
190 CHAPTER 5. TIME EVOLUTION

approximate analysis called “time-dependent perturbation theory” will be used.


Time-dependent perturbation theory assumes that the level of perturbation,
here given by ω1 , is small.
It will also be assumed that the atom starts in the lower energy state, and
more simply, from a0 = 1, b0 = 0. If ā starts at one and its changes, as given by
(5.24), are small, then ā will stay about one. So in the equation for b̄ in (5.24),
the factor ā can just be ignored. That allows it to be easily solved:

e−i(ω−ω0 )t − 1
b̄L = ω1 eiφ (5.32)
2(ω − ω0 )

where the superscript L merely indicates that this solution starts from the lower
energy state.
This solution is valid as long as āL remains close to one; that means that the
effective strength ω1 of the applied wave must be weak enough and/or the time
interval t during which the wave is applied short enough. To be more precise,
the formula requires that ω1 t is small, in addition to the requirement from the
previous subsection that ω0 t is large. (The formula should definitely not be
allowed to fail in the critical case that ω = ω0 ; then b̄ is only small if ω1 t is.)
Of course, a real-life atom in a gas would suffer other perturbations than
just the applied electromagnetic wave. Periodically, collisions with neighboring
atoms will occur, and the pre-existing ambient electromagnetic field will disturb
the atoms too. However, if the electromagnetic wave is applied for a very short
time interval, there will not be enough time for any of these effects to act, and
the above expression correctly describes the evolution. The range of application
times in which interactions with the environment are unlikely to occur is called
the “collisionless regime.” To compensate for the very short application time,
the effective field strength ω1 can be jacked way up by using concentrated laser
light. (This remains allowable within the made assumptions as long as the
application time does not need to be so short as to become comparable to
1/ω0 .)
At the time t that the wave is turned off, the atom will have a “transition
probability” that it can be found in the excited state equal to the square mag-
1
nitude of b̄L . To find it, take a factor eiφ− 2 i(ω−ω0 )t out of (5.32), use Euler on
the remainder, and take the square absolute value. That gives the transition
probability from low to high as:
à !2
¯ ¯2 sin 12 (ω − ω0 )t hψL |ez|ψH i
PL→H ≡ ¯¯b̄L ¯¯ = 41 |ω1 |2 t2 1 ω1 ≡ E0 (5.33)
2
(ω − ω0 )t h̄

Now just sit back and allow its surroundings time to “measure” the atom. The
larger the transition probability, the more of these atoms will be found to be
5.2. UNSTEADY PERTURBATIONS OF SYSTEMS 191

in the elevated energy state and to transition back to the ground state while
emitting a photon of energy h̄ω0 . (The excited atoms have some average “life-
time” dependent on the ambient electromagnetic field. Compare the remaining
subsections.) So the higher the transition probability above, the more photons
you will get.
There is only a decent transition probability for perturbation frequencies ω
in a narrow range around the photon frequency ω0 . In particular, the fraction
within parentheses in (5.33) has a maximum of one at ω = ω0 , (using l’Hôpital),
and is negligibly small unless (ω−ω0 )t is finite. Even in that range, the transition
probability cannot be more than 14 |ω1 |2 t2 , so ω1 t must be finite too for a decent
probability. True, ω1 t must formally be small because of the small perturbation
assumption, but you want to stretch it as far as you can, even if you would have
to use the exact analysis instead. If both (ω − ω0 )t and ω1 t must be finite, then
the range of frequencies around ω0 for which there is a decent response must be
comparable to |ω1 |. The physical meaning of |ω1 | is therefor as a frequency range
rather than as a frequency by itself. It is the typical range of frequencies around
the photon frequency ω0 for which there is a decent response to perturbations
at this strength level.
Since a small range of frequencies can be absorbed, the observed line in the
absorption spectrum is not going to be a mathematically thin line, but will have
a small width. Such an effect is known as “spectral line broadening” {A.39}.

5.2.9 Absorption of incoherent radiation


Under normal conditions, an atom is not subjected to just a single electromag-
netic wave, but to “broadband” incoherent radiation of all frequencies moving
in all directions. In that case, you have to integrate the effects of all waves
together.
In addition, normally interactions with the environment, like neighboring
atoms, occur frequently instead of rarely. This is called the “collision-dominated
regime.” Collisions between atoms may be modeled as elastic, since in the
absence of interaction with the electromagnetic field, the Schrödinger equation
preserves the probabilities of the global energy states of the particles. However,
they will definitely randomize the coefficients a and b of the states ψL and ψH
of the individual atoms.
Normally the approximations made for a single wave in the previous subsec-
tions, that ω0 t is large but ω1 t is small, continue to hold. While a typical value
for ω0 is of the order of 1015 /s, typical times, like spontaneous decay times and
atom collision times, are considerably larger than 10−15 s, making the typical
value of ω0 t large. And if the radiation level is low, ω1 t will be small.
Since both the electromagnetic field and the collisions are random, a sta-
tistical rather than a determinate treatment is needed. In it, the probability
192 CHAPTER 5. TIME EVOLUTION

that a randomly chosen atom can be found in the lower energy state will be
denoted as PL and in the higher energy state by PH . For a single atom, these
probabilities are given by the square coefficients |a|2 and |b|2 . Therefor, PL and
PH will be defined as the averages of |a|2 respectively |b|2 over all atoms.
In these terms, it turns out that the rates of change in PL and PH are given
by

dPL
= − BL→H ρ(ω0 )PL + BH→L ρ(ω0 )PH + AH→L PH + . . . (5.34)
dt

dPH
= + BL→H ρ(ω0 )PL − BH→L ρ(ω0 )PH − AH→L PH + . . . (5.35)
dt

where ρ(ω) is the energy of the electromagnetic radiation per unit volume and
per unit frequency range, and ω0 is again the frequency of the photon released
in a transition from the higher to the lower energy state. The combination
BL→H ρ(ω0 ) is called the “transition rate” from L to H, and similarly BH→L ρ(ω0 )
is the transition rate from H to L. Further, AH→L is called the “spontaneous
emission rate.”
If there are transitions between more than two states involved, all their
effects should be summed together; that is indicated by the dots in (5.34) and
(5.35).
The constants are called the “Einstein A and B coefficients.” Imagine that
some big shot in engineering was too lazy to select appropriate symbols for the
quantities used in a paper and just called them A and B. Referees and standards
committees would be on his/her back, big shot or not. However, in physics they
still stick with the stupid symbols almost a century later. Also, Einstein in
those pre-Schrödinger equation days treated the atoms as being in the low and
high energy states for certain. So, so do the various textbooks. And these same
textbooks typically define “measurement” as the sort of thing a physicist does
in a lab. The implied notion that physicists would be carefully measuring the
energy of each of countless atoms in the lab and in space (to get the atoms in
energy eigenstates), on a continuing basis, is rather, ahem, interesting?
Anyway, as shown in note {A.40},

π|hψL |e~r|ψH i|2


BL→H = BH→L = (5.36)
3h̄2 ǫ0

where ǫ0 = 8.85419 10−12 C2 /J m is the permittivity of space. The spontaneous


emission rate AH→L is derived in the next subsection.
5.2. UNSTEADY PERTURBATIONS OF SYSTEMS 193

5.2.10 Spontaneous emission of radiation


Einstein derived the spontaneous emission rate of radiation based on a relatively
simple argument. In particular, he postulated the formula (5.35) for the proba-
bility PH that atoms are at the higher energy level. Then he demanded that in
an equilibrium situation, in which PH is independent of time, the formula must
agree with Planck’s formula for the blackbody electromagnetic radiation energy
ρ.
Solving the steady-state formula (5.35) for ρ(ω0 ) and then equating that to
Planck’s blackbody spectrum as derived in chapter 8.14.5 gives

AH→L /BH→L h̄ ω3
ρ(ω0 ) = = 2 3 h̄ω0 /k 0T
(BL→H PL /BH→L PH ) − 1 π c e B −1

The atoms can be modeled as distinguishable particles, so the ratio PH /PL is


given by the Maxwell-Boltzmann formula of chapter 7.9 as e−(EH −EL )/kB T ; that
is e−h̄ω0 /kB T in terms of the photon frequency. It then follows that for the two
expressions for ρ(ω0 ) to be equal,

AH→L h̄ω03
BL→H = BH→L = 2 3 (5.37)
BH→L π c

where BH→L was given in (5.36).


That BL→H must equal BH→L was already mentioned in section 5.2.2. But
it was not self-evident when Einstein wrote the paper; Einstein really invented
stimulated emission here. The valuable new result is the formula for the spon-
taneous emission rate AH→L .
If you like to think of spontaneous emission as being due to perturbations
by the ground state electromagnetic field, the ratio AH→L /BH→L above would
be the ground state energy density ρgs at frequency ω0 according to classical
quantum mechanics. In terms of the ideas of chapter 8.14.5, it corresponds to
one photon in each radiation mode. That is not quite right: in the ground state
of the electromagnetic field there is half a photon in each mode. It is just like
a harmonic oscillator, which has half an energy quantum h̄ω left in its ground
state. The photon that the excited atom interacts with is its own, through a
twilight effect, chapter 10.2.4.
Still, as (5.37) indicates, the “vacuum energy” of empty space becomes in-
finite like ω 3 at infinite frequencies. In the early days of relativistic quantum
mechanics, this was seen as an ununderstood error in the theory that had to
be fixed up some way. However, over time physicists came to accept that such
energy blow up is really what happens; it gives rise to confirmed theoretical
predictions. Presumably the blow-up describes extremely short scale processes
that cannot be observed. Their nature may remain unknown to us, because
194 CHAPTER 5. TIME EVOLUTION

what we observe is the same regardless of the precise details of these short-
scale processes. It is like a thunderstorm: if you are far enough away, it looks
like light and rumble satisfying the normal equations of light propagation and
acoustics through a steady atmosphere; however, inside the thunderstorm itself,
the generation of light and sound is highly nonlinear and complex.
Due to the spontaneous emission rate, atoms pumped up to an excited energy
state ψH will decay to lower energy states over time when left alone, say isolated
in a closed box at absolute zero temperature. The governing equation is
dPH
= − [AH→L1 + AH→L2 + AH→L3 + . . .] PH
dt
where the sum is over all the lower energy states that exist. The resulting
expression for the number of atoms that can be found in the elevated energy
state at a given time is

1
PH (t) = PH (0)e−t/τ τ= (5.38)
AH→L1 + AH→L2 + AH→L3 +...

The constant τ is called the “lifetime” of the excited state. No it is not really a
lifetime, except in some average sense, but 1/τ is the average fraction of excited
atoms that disappears per unit time. The “half-life”

τ1/2 = τ ln 2

is the time it takes for the number of atoms that can be found in the excited
state to decrease by about half.
Isolated atoms in a box that is at room temperature are bathed in thermal
blackbody radiation as well as vacuum energy. So stimulated emission will add
to spontaneous emission. However, at room temperature, blackbody radiation
has negligible energy in the visible light range, and transitions in this range will
not really be affected.

5.3 Position and Linear Momentum


The subsequent sections will be looking at the time evolution of various quantum
systems, as predicted by the Schrödinger equation. However, before that can be
done, first the eigenfunctions of position and linear momentum must be found.
That is something that the exposition so far has been studiously avoiding. The
problem is that the position and linear momentum eigenfunctions have awkward
issues with normalizing them.
These normalization problems have consequences for the coefficients of the
eigenfunctions. In the normal orthodox interpretation, the absolute squares of
5.3. POSITION AND LINEAR MOMENTUM 195

the coefficients should the probabilities of getting the corresponding values of


position, respectively linear momentum. But for position and linear momentum,
this statement must be modified a bit.
One good thing is that unlike the Hamiltonian, which is specific to a given
system, the position operator

~rb = (xb, yb, zb)

and the linear momentum operator


à !
h̄ ∂ ∂ ∂
~pb = (pbx , pby , pbz ) = , ,
i ∂x ∂y ∂z

are the same for all systems. So, you only need to find their eigenfunctions once.

5.3.1 The position eigenfunction


The eigenfunction that corresponds to the particle being at a precise x-position
x, y-position y, and z-position z will be denoted by Rxyz (x, y, z). The eigenvalue
problem is:

xbRxyz (x, y, z) = xRxyz (x, y, z)


ybRxyz (x, y, z) = yRxyz (x, y, z)
zbRxyz (x, y, z) = zRxyz (x, y, z)

(Note the need in this section to use (x, y, z) for the measurable particle position,
since (x, y, z) are already used for the eigenfunction arguments.)
To solve this eigenvalue problem, try again separation of variables, where it
is assumed that Rxyz (x, y, z) is of the form X(x)Y (y)Z(z). Substitution gives
the partial problem for X as

xX(x) = xX(x)

This equation implies that at all points x not equal to x, X(x) will have to be
zero, otherwise there is no way that the two sides can be equal. So, function
X(x) can only be nonzero at the single point x. At that one point, it can be
anything, though.
To resolve the ambiguity, the function X(x) is taken to be the “Dirac delta
function,”
X(x) = δ(x − x)
The delta function is, loosely speaking, sufficiently strongly infinite at the single
point x = x that its integral over that single point is one. More precisely, the
196 CHAPTER 5. TIME EVOLUTION

¾
- width: ε
6

δε (x − x) δ(x − x)

1
height:
ε

?
0 x x 0 x x

Figure 5.3: Approximate Dirac delta function δε (x − x) is shown left. The true
delta function δ(x − x) is the limit when ε becomes zero, and is an infinitely
high, infinitely thin spike, shown right. It is the eigenfunction corresponding to
a position x.

delta function is defined as the limiting case of the function shown in the left
hand side of figure 5.3.
The fact that the integral is one leads to a very useful mathematical property
of delta functions: they are able to pick out one specific value of any arbitrary
given function f (x). Just take an inner product of the delta function δ(x − x)
with f (x). It will produce the value of f (x) at the point x, in other words, f (x):
Z ∞ Z ∞
hδ(x − x)|f (x)i = δ(x − x)f (x) dx = δ(x − x)f (x) dx = f (x)
x=−∞ x=−∞
(5.39)
(Since the delta function is zero at all points except x, it does not make a
difference whether f (x) or f (x) is used in the integral.)
The problems for the position eigenfunctions Y and Z are the same as the one
for X, and have a similar solution. The complete eigenfunction corresponding
to a measured position (x, y, z) is therefore:

Rxyz (x, y, z) = δ(x − x)δ(y − y)δ(z − z) ≡ δ 3 (~r − ~r) (5.40)

Here δ 3 (~r − ~r) is the three-dimensional delta function, a spike at position ~r


whose volume integral equals one.
According to the orthodox interpretation, the probability of finding the par-
ticle at (x, y, z) for a given wave function Ψ should be the square magnitude
5.3. POSITION AND LINEAR MOMENTUM 197

of the coefficient cxyz of the eigenfunction. This coefficient can be found as an


inner product:
cxyz (t) = hδ(x − x)δ(y − y)δ(z − z)|Ψi
It can be simplified to
cxyz (t) = Ψ(x, y, z; t) (5.41)
because of the property of the delta functions to pick out the corresponding
function value.
However, the apparent conclusion that |Ψ(x, y, z; t)|2 gives the probability of
finding the particle at (x, y, z) is wrong. The reason it fails is that eigenfunctions
should be normalized; the integral of their square should be one. The integral
of the square of a delta function is infinite, not one. That is OK, however; ~r
is a continuously varying variable, and the chances of finding the particle at
(x, y, z) to infinite number of digits accuracy would be zero. So, the properly
normalized eigenfunctions would have been useless anyway.
In fact, according to Born’s statistical interpretation of chapter 2.1, the
expression |Ψ(x, y, z)|2 d3~r gives the probability of finding the particle in an
infinitesimal volume d3~r around (x, y, z). In other words, |Ψ|2 is the probability
of finding the particle near that location per unit volume.
Besides the normalization issue, another idea that needs to be somewhat
modified is a strict collapse of the wave function. Any position measurement
that can be done will leave some uncertainty about the precise location of the
particle: it will leave the coefficient cxyz , or in other words Ψ(x, y, z), nonzero
over a small range of positions, rather than just one position. Moreover, un-
like energy eigenstates, position eigenstates are not stationary: after a position
measurement, Ψ will again spread out as time increases.

5.3.2 The linear momentum eigenfunction


Turning now to linear momentum, the eigenfunction that corresponds to a pre-
cise linear momentum (px , py , pz ) will be indicated as Ppx py pz (x, y, z). If you
again assume that this eigenfunction is of the form X(x)Y (y)Z(z), the partial
problem for X is found to be:
h̄ ∂X(x)
= px X(x)
i ∂x
The solution is a complex exponential:

X(x) = Aeipx x/h̄

where A is a constant.
This linear momentum eigenfunction too has a normalization problem: since
it does not become small at large |x|, the integral of its square is infinite, not
198 CHAPTER 5. TIME EVOLUTION

one. Again, the solution is to ignore the problem and to just take a nonzero
value for A; the choice that works out best is to take:
1
A= √
2πh̄
(However, other books, in particular non-quantum ones, are likely to make a
different choice.)
The problems for the y and z-linear momentum have similar solutions, so
the full eigenfunction for linear momentum takes the form:
1 i(px x+py y+pz z)/h̄
Ppx py pz (x, y, z) = √ 3e (5.42)
2πh̄
Turning now to the coefficient cpx py pz (t) of the eigenfunction, this coefficient
is called the “momentum space wave function” and indicated by the special
symbol Φ(px , py , pz ; t). It is again found by taking an inner product of the
eigenfunction with the wave function,
1 i(px x+py y+pz z)/h̄
Φ(px , py , pz ; t) = √ 3 he |Ψi (5.43)
2πh̄
Just like what was the case for position, this coefficient of the linear momen-
tum eigenfunction does not quite give the probability for the momentum to be
(px , py , pz ). Instead it turns out that |Φ(px , py , pz ; t)|2 dpx dpy dpz gives the prob-
ability of finding the linear momentum within an range dpx dpy dpz of (px , py , pz ).
In short, the momentum space wave function Φ is in the “momentum space”
(px , py , pz ) what the normal wave function Ψ is in normal space (x, y, z).
There is even an inverse relationship to recover Ψ from Φ, and it is easy to
remember:
1 −i(px x+py y+pz z)/h̄
Ψ(x, y, z; t) = √ 3 he |Φip~ (5.44)
2πh̄
where the subscript on the inner product indicates that the integration is over
momentum space rather than physical space.
If this inner product is written out, it reads:
1
Ψ(x, y, z; t) = √ 3
2πh̄
Z ∞ Z ∞ Z ∞
Φ(px , py , pz ; t)ei(px x+py y+pz z)/h̄ dpx dpy dpz (5.45)
px =−∞ py =−∞ pz =−∞

Mathematicians prove this formula under the name “Fourier Inversion Theo-
rem”, {A.41}. But it really is just the same sort of idea as writing Ψ as a sum
P
of energy eigenfunctions ψn times their coefficients cn , as in Ψ = n cn ψn . In
this case, the coefficients are given by Φ and the eigenfunctions by the expo-
nential (5.42). The only real difference is that the sum has become an integral
since p~ has continuous values, not discrete ones.
5.4. WAVE PACKETS IN FREE SPACE 199

5.4 Wave Packets in Free Space


This section gives a full description of the motion of a particle according to
quantum mechanics. It will be assumed that the particle is in free space, so
that the potential energy is zero. In addition, to keep the analysis concise and
the results easy to graph, it will be assumed that the motion is only in the
x-direction. The results may easily be extended to three dimensions by using
separation of variables.
The analysis will also show how limiting the uncertainty in both momentum
and position produces the various features of classical Newtonian motion. It may
be recalled that in Newtonian motion through free space, the linear momentum
p is constant. (The subscript x will be dropped from px from now on, since only
the x-dimension will be considered.) Further, since p/m is the velocity v, the
classical particle will move at constant speed:
p
v= = constant x = vt + x0 for Newtonian motion in free space
m

5.4.1 Solution of the Schrödinger equation.


As discussed in section 5.1, the unsteady evolution of a quantum system may be
determined by finding the eigenfunctions of the Hamiltonian and giving them
coefficients that are proportional to e−iEt/h̄ . This will be worked out in this
subsection.
For a free particle, there is only kinetic energy, so in one dimension the
Hamiltonian eigenvalue problem is:

h̄2 ∂ 2 ψ
− = Eψ (5.46)
2m ∂x2
Solutions to this equation take the form of exponentials

ψE = Ae±i 2mEx/h̄

where A is a constant.
Note that E must be positive: if the square root would be imaginary, the
solution would blow up exponentially at large positive or negative x. Since the
square magnitude of ψ at a point gives the probability of finding the particle
near that position, blow up at infinity would imply that the particle must be at
infinity with certainty.
The energy eigenfunction above is really the same as the eigenfunction of
the x-momentum operator pb derived in the previous section:
1 √
ψE = √ eipx/h̄ with p = ± 2mE (5.47)
2πh̄
200 CHAPTER 5. TIME EVOLUTION

The reason that the momentum eigenfunctions are also energy eigenfunctions is
that the energy is all kinetic energy, and the kinetic operator equals Tb = pb2 /2m.
So eigenfunctions with precise momentum p have precise energy p2 /2m.
As was noted in the previous section, combinations of momentum eigenfunc-
tions take the form of an integral rather than a sum. In this one-dimensional
case that integral is:
1 Z∞
Ψ(x, t) = √ Φ(p, t)eipx/h̄ dp
2πh̄ −∞

where Φ(p, t) was called the momentum space wave function.


Whether a sum or an integral, the Schrödinger equation still requires that
the coefficient of each energy eigenfunction varies in in time proportional to
e−iEt/h̄ . The coefficient is here the momentum space wave function Φ, and the
energy is E = p2 /2m, so the solution of the Schrödinger equation must be:

1 Z∞ p
Ψ(x, t) = √ Φ0 (p)eip(x− 2m t)/h̄ dp (5.48)
2πh̄ −∞

where Φ0 (p) ≡ Φ(p, 0) is determined by whatever initial conditions are relevant


to the situation that is to be described. The above integral is the final solution
for a particle in free space.

5.4.2 Component wave solutions


Before trying to interpret the complete obtained solution (5.48) for the wave
function of a particle in free space, it is instructive first to have a look at the
component solutions, defined by
p
ψw ≡ eip(x− 2m t)/h̄ (5.49)

These solutions will be called component waves; both their real and imaginary
parts are sinusoidal, as can be seen from the Euler formula (1.5).
µ ³ ¶ µ ¶
p ´ ³ p ´
ψw = cos p x − t /h̄ + i sin p x − t /h̄
2m 2m
In figure 5.4, the real part of the wave (in other words, the cosine), is sketched
as the red curve; also the magnitude of the wave (which is unity) is shown as
the top black line, and minus the magnitude is drawn as the bottom black
line. The black lines enclose the real part of the wave, and will be called the
“envelope.” Since their vertical separation is twice the magnitude of the wave
function, the vertical separation between the black lines at a point is a measure
for the probability of finding the particle near that point.
5.4. WAVE PACKETS IN FREE SPACE 201

Figure 5.4: The real part (red) and envelope (black) of an example wave.

The constant separation between the black lines shows that there is abso-
lutely no localization of the particle to any particular region. The particle is
equally likely to be found at every point in the infinite range. This also graphi-
cally demonstrates the normalization problem of the momentum eigenfunctions
discussed in the previous section: the total probability of finding the particle
just keeps getting bigger and bigger, the larger the range you look in. So there
is no way that the total probability of finding the particle can be limited to one
as it should be.
The reason for the complete lack of localization is the fact that the com-
ponent wave solutions have an exact momentum p. With zero uncertainty in
momentum, Heisenberg’s uncertainty relationship says that there must be infi-
nite uncertainty in position. There is.
There is another funny thing about the component waves: when plotted for
different times, it is seen that the real part of the wave moves towards the right
with a speed p/2m = 21 v, as illustrated in figure 5.5. This is unexpected, because

The html version of this document has an animation of the motion.

Figure 5.5: The wave moves with the phase speed.

classically the particle moves with speed v, not 12 v. The problem is that the
speed with which the wave moves, called the “phase speed,” is not meaningful
physically. In fact, without anything like a location for the particle, there is no
way to define a physical velocity for a component wave.

5.4.3 Wave packets


As the previous section indicated, in order to get some localization of the po-
sition of a particle, some uncertainty must be allowed in momentum. That
means that you must take the momentum space wave function Φ0 in (5.48) to
be nonzero over at least some small interval of different momentum values p.
Such a combination of component waves is called a “wave packet”.
202 CHAPTER 5. TIME EVOLUTION

The wave function for a typical wave packet is sketched in figure 5.6. The
red line is again the real part of the wave function, and the black lines are the
envelope enclosing the wave; they equal plus and minus the magnitude of the
wave function.

Figure 5.6: The real part (red) and magnitude or envelope (black) of a wave
packet. (Schematic)

The vertical separation between the black lines is again a measure of the
probability of finding the particle near that location. It is seen that the possible
locations of the particle are now restricted to a finite region, the region in which
the vertical distance between the black lines is nonzero.
If the envelope changes location with time, and it does, then so does the
region where the particle can be found. This then finally is the correct picture
of motion in quantum mechanics: the region in which the particle can be found
propagates through space.
The limiting case of the motion of a macroscopic Newtonian point mass can
now be better understood. As noted in section 5.1.5, for such a particle the
uncertainty in position is negligible. The wave packet in which the particle can
be found, as sketched in figure 5.6, is so small that it can be considered to be
a point. To that approximation the particle then has a point position, which is
the normal classical description.
The classical description also requires that the particle moves with velocity
u = p/m, which is twice the speed p/2m of the wave. So the envelope should
move twice as fast as the wave. This is indicated in figure 5.7 by the length of
the bars, which show the motion of a point on the envelope and of a point on
the wave during a small time interval.
That the envelope does indeed move at speed p/m can be seen if you define
the representative position of the envelope to be the expectation value of posi-
tion. That position must be somewhere in the middle of the wave packet. The
expectation value of position moves according to Ehrenfest’s theorem of section
5.1.5 with a speed hpi/m, where hpi is the expectation value of momentum,
which must be constant since there is no force. Since the uncertainty in mo-
mentum is small for a macroscopic particle, the expectation value of momentum
hpi can be taken to be “the” momentum p.
5.4. WAVE PACKETS IN FREE SPACE 203

The html version of this document has an animation of the motion.

Figure 5.7: The velocities of wave and envelope are not equal.

5.4.4 Group velocity


As the previous subsection explained, particle motion in classical mechanics is
equivalent to the motion of wave packets in quantum mechanics. Motion of a
wave packet implies that the region in which the particle can be found changes
position. And that is not just important for understanding where particles in
free space end up. It is also critical for the quantum mechanics of, for example,
solids, in which electrons, photons, and phonons (quanta of crystal vibrations)
move around in an environment that is cluttered with other particles. And it is
also of great importance in classical applications, such as acoustics in solids and
fluids, water waves, stability theory of flows, electromagnetodynamics, etcetera.
This section explains how wave packets move in such more general systems.
Only the one-dimensional case will be considered, but the generalization to
three dimensions is straightforward.
The systems of interest have component wave solutions of the general form

component wave: ψw = ei(kx−ωt) (5.50)

The constant k is called the “wave number,” and ω the “angular frequency.”
For a particle in free space, the wave number k equals p/h̄, so it is then just a
rescaled linear momentum. Also the frequency ω equals p2 /2mh̄, so it is then
just a rescaled kinetic energy.
Regardless of what kind of system it is, the relationship between the fre-
quency and the wave number is called the

dispersion relation: ω = ω(k) (5.51)

It really defines the physics of the wave propagation. The wave number and
frequency must be real for the analysis in this section to apply. That means
that the amplitude of the component waves must not change with space nor
time. Such systems are called nondissipative: although a combination of waves
may get dispersed over space, its square magnitude integral will be conserved.
(This is true on account of Parseval’s relation, {A.41}.)
204 CHAPTER 5. TIME EVOLUTION

ω
Since the waves are of the form eik(x− k t) , the wave is constant at points that
move with the
ω
phase velocity: vp ≡ (5.52)
k
In free space, the phase velocity is half the classical velocity.
However, as noted in the previous subsection, wave packets do not normally
move with the phase velocity. The velocity that they do move with is called
the “group velocity.” For a particle in free space, you can infer that the group
velocity is the same as the classical velocity from Ehrenfest’s theorem, but that
does not work for more general systems. The approach will therefor be to define
the group velocity as

group velocity: vg ≡ (5.53)
dk
and then to explore how the so-defined group velocity relates to the motion of
wave packets.
Wave packets are combinations of component waves, and the most general
combination of waves takes the form

1 Z∞
Ψ(x, t) = √ Φ0 (k)ei(kx−ωt) dk (5.54)
2π −∞

Here Φ0 is the complex amplitude of the waves. √ The combination Φ0 e−iωt is


called the “Fourier transform” of Ψ. The factor 2π is just a normalization
factor that might be chosen differently in another book. Wave packets corre-
spond to combinations in which the complex amplitude Φ0 (k) is only nonzero
in a small range of wave numbers k. More general combinations of waves may
of course always be split up into such wave packets.
To describe the motion of wave packets is not quite as straightforward as
it may seem: the envelope of a wave packet extends over a finite region, and
different points on it actually move at somewhat different speeds. So what do
you take as the point that defines the motion if you want to be precise? There is
a trick here: consider very long times. For large times, the propagation distance
is so large that it dwarfs the ambiguity about what point to take as the position
of the envelope.
Finding the wave function Ψ for large time is a messy exercise banned to
note {A.42}. But the conclusions are fairly straightforward. Assume that the
range of waves in the packet is restricted to some small interval k1 < k < k2 .
In particular, assume that the variation in group velocity is relatively small and
monotonous. In that case, for large times the wave function will be negligibly
small except in the region
vg1 t < x < vg2 t
5.4. WAVE PACKETS IN FREE SPACE 205

(In case vg1 > vg2 , invert these inequalities.) Since the variation in group velocity
is small for the packet, it therefor definitely does move with “the” group velocity.
For a given position and time within the wave packet, there will be one
particular wave number k0 at which the group speed vg0 exactly equals x/t. The
wave function can be written in terms of the value of the complex amplitude
Φ0 (k0 ) at that wave number as
e∓iπ/4 x
Ψ(x, t) ∼ q Φ0 (k0 )ei(k0 x−ω0 t) vg0 =

|vg0 |t t

where vg0 is the derivative of the group speed with respect to k at k0 and ±

stands for the sign of vg0 .

Note in particular that Ψ decreases in magnitude proportional to 1/ t.
Since the wave packet spreads out proportional to time because of its variation
in group velocity, the square magnitude integral of Ψ is conserved.
It may also be noted that if you examine Ψ locally on the scale of a few
oscillations, it looks as if it was a single component wave of wave number k0 .
Only if you look on a bigger scale do you see that it really is a wave packet.
To see why it seems like a simple wave on a local scale, even though k0 is not
constant but a function of x, just look at the differential
d(k0 x − ω0 t) = k0 dx − ω0 dt + xdk0 − tdω0
and observe that the final two terms cancel because dω0 /dk0 is the group veloc-
ity, which equals x/t.
For the particle in free space, the result for the large time wave function can
be written out further to give
r µ ¶
m
−iπ/4 mx imx2 /2h̄t
Ψ(x, t) ∼ e Φ0 e
t t
Since the group speed p/m in this case is monotonously increasing, the wave
packets have negligible overlap, and this is in fact the large time solution for
any combination of waves, not just narrow wave packets.
In a typical true quantum mechanics case, Φ0 will extend over a range of
wave numbers that is not small, and may include both positive and negative
values of the momentum p. So, there is no longer a meaningful velocity for
the wave function: the wave function spreads out in all directions at velocities
ranging from negative to positive. For example, if the momentum space wave
function Φ0 consists of two narrow nonzero regions, one at a positive value of p,
and one at a negative value, then the wave function in normal space splits into
two separate wave packets. One packet moves with constant speed towards the
left, the other with constant speed towards the right. The same particle is now
going in two completely different directions at the same time. That would be
unheard of in classical Newtonian mechanics.
206 CHAPTER 5. TIME EVOLUTION

5.5 Motion near the Classical Limit


This section examines the motion of a particle in the presence of forces. Just like
in the previous section, it will be assumed that the initial position and momen-
tum are narrowed down sufficiently that the particle is restricted to a relatively
small, coherent, region. Solutions of this type are called “wave packets.”
In addition, for the examples in this section the forces vary slowly enough
that they are approximately constant over the spatial extent of the wave packet.
Hence, according to Ehrenfest’s theorem, section 5.1.5, the wave packet should
move according to the classical Newtonian equations.
The examples in this section were obtained on a computer, and should be
numerically exact. Details about how they were computed can be found in note
{A.43}, if you want to understand them better, or create some yourself.

5.5.1 Motion through free space


First consider the trivial case that there are no forces; a particle in free space.
This will provide the basis against which the motion with forces in the next
subsections can be compared to.
Classically, a particle in free space moves at a constant velocity. In quantum
mechanics, the wave packet does too; figure 5.8 shows it at two different times. If

The html version of this document has an animation of the motion to show
that it is indeed at constant speed.

Figure 5.8: A particle in free space.

you step back far enough that the wave packet in the figures begins to resemble
just a dot, you have classical motion. The blue point indicates the position of
maximum wave function magnitude, as a visual anchor. It provides a reasonable
approximation to the expectation value of position whenever the wave packet
contour is more or less symmetric. A closer examination shows that the wave
packet is actually changing a bit in size in addition to translating.
5.5. MOTION NEAR THE CLASSICAL LIMIT 207

5.5.2 Accelerated motion


Figure 5.9 shows the motion when the potential energy (shown in green) ramps
down starting from the middle of the plotted range. Physically this corresponds
to a constant accelerating force beyond that point. A classical point particle
would move at constant speed until it encounters the ramp, after which it would
start accelerating at a constant rate. The quantum mechanical solution shows a
corresponding acceleration of the wave packet, but in addition the wave packet
stretches a lot.

The html version of this document has an animation of the motion.

Figure 5.9: An accelerating particle.

5.5.3 Decelerated motion


Figure 5.10 shows the motion when the potential energy (shown in green) ramps
up starting from the center of the plotting range. Physically this corresponds
to a constant decelerating force beyond that point. A classical point particle
would move at constant speed until it encounters the ramp, after which it would
start decelerating until it runs out of kinetic energy; then it would be turned
back, returning to where it came from.
The quantum mechanical solution shows a corresponding reflection of the
wave packet back to where it came from. The black dot on the potential energy
line shows the “turning point” where the potential energy becomes equal to
the nominal energy of the wave packet. That is the point where classically the
particle runs out of kinetic energy and is turned back.

5.5.4 The harmonic oscillator


The harmonic oscillator describes a particle caught in a force field that prevents
it from escaping in either direction. In all three previous examples the particle
could at least escape towards the far left. The harmonic oscillator was the first
real quantum system that was solved, in chapter 2.6, but only now, near the
208 CHAPTER 5. TIME EVOLUTION

The html version of this document has an animation of the motion.

Figure 5.10: An decelerating particle.

end of part I, can the classical picture of a particle oscillating back and forward
actually be created.
There are some mathematical differences from the previous cases, because
the energy levels of the harmonic oscillator are discrete, unlike those of the
particles that are able to escape. But if the energy levels are far enough above
the ground state, localized wave packets similar to the ones in free space may
be formed, {A.43}. The animation in figure 5.11 gives the motion of a wave
packet whose nominal energy is hundred times the ground state energy.

The html version of this document has an animation of the motion.

Figure 5.11: Unsteady solution for the harmonic oscillator. The third picture
shows the maximum distance from the nominal position that the wave packet
reaches.

The wave packet performs a periodic oscillation back and forth just like a
classical point particle would. In addition, it oscillates at the correct classical
frequency ω. Finally, the point of maximum wave function, shown in blue, fairly
closely obeys the classical limits of motion, shown as black dots.
5.6. WKB THEORY OF NEARLY CLASSICAL MOTION 209

Curiously, the wave function does not return to the same values after one
period: it has changed sign after one period and it takes two periods for the
wave function to return to the same values. It is because the sign of the wave
function cannot be observed physically that classically the particle oscillates at
frequency ω, and not at 12 ω like the wave function does.

5.6 WKB Theory of Nearly Classical Motion


WKB theory provides simple approximate solutions for the energy eigenfunc-
tions when the conditions are almost classical, like for the wave packets of the
previous section. The approximation is named after Wentzel, Kramers, and
Brillouin, who refined the ideas of Liouville and Green. The bandit scientist
Jeffreys tried to rob WKB of their glory by doing the same thing two years
earlier, and is justly denied all credit.

V p2c /2m
1 ? 2
E50
6
x
h50

Figure 5.12: Harmonic oscillator potential energy V , eigenfunction h50 , and its
energy E50 .

The WKB approximation is based on the rapid spatial variation of energy


eigenfunctions with almost macroscopic energies. As an example, figure 5.12
shows the harmonic oscillator energy eigenfunction h50 . Its energy E50 is hun-
dred times the ground state energy. That makes the kinetic energy E − V quite
large over most of the range, and that in turn makes the linear momentum large.
In fact, the classical Newtonian linear momentum pc = mv is given by
q
pc ≡ 2m(E − V ) (5.55)

In quantum mechanics, the large momentum implies the rapid oscillation of the
wave function since quantum mechanics associates the linear momentum with
the operator h̄d/idx that denotes spatial variation.
The WKB approximation is most appealing in terms of the classical mo-
mentum pc as defined above. To find its form, in the Hamiltonian eigenvalue
problem
h̄2 d2 ψ
− + V ψ = Eψ
2m dx2
210 CHAPTER 5. TIME EVOLUTION

take the V ψ term to the other side and then rewrite E − V in terms of the
classical linear momentum. That produces
d2 ψ p2c
= − ψ (5.56)
dx2 h̄2
Now under almost classical conditions, a single period of oscillation of the
wave function is so short that normally pc is almost constant over it. Then by
approximation the solution of the eigenvalue problem over a single period is
simply an arbitrary combination of two exponentials,

ψ ∼ cf eipc x/h̄ + cb e−ipc x/h̄ ,

where the constants cf and cb are arbitrary. (The subscripts denote whether the
wave speed of the corresponding term is forward or backward.) It turns out,
{A.44}, that to make the above expression work over R
more than one period,
it is necessary to replace pc x by the anti-derivative pc dx; furthermore, the
“constants” cf and cb must be allowed to vary from period to period proportional

to 1/ pc . In short, the WKB approximation of the wave function is:

1 h i 1Z
classical WKB: ψ ≈ √ Cf eiθ + Cb e−iθ θ≡ pc dx (5.57)
pc h̄

where Cf and Cb are now true constants.


If you ever glanced at notes such as {A.12}, {A.15}, and {A.17}, in which
the eigenfunctions for the harmonic oscillator and hydrogen atom were found,
you recognize what a big simplification the WKB approximation is. Just do the
integral for θ and that is it. No elaborate transformations and power series to
grind down. And the WKB approximation can often be used where no exact
solutions exist at all.
In many applications, it is more convenient to write the WKB approximation
in terms of a sine and a cosine. That can be done by taking the exponentials
apart using the Euler formula (1.5). It produces

1 1Z
rephrased WKB: ψ ≈ √ [Cc cos θ + Cs sin θ] θ≡ pc dx (5.58)
pc h̄

The constants Cc and Cs are related to the original constants Cf and Cb as

Cc = Cf + Cb Cs = iCf − iCb Cf = 12 (Cc − iCs ) Cb = 21 (Cc + iCs )


(5.59)
which allows you to convert back and forward between the two formulations as
needed. Do note that either way, the constants depend on what you chose for
the integration constant in the θ integral.
5.6. WKB THEORY OF NEARLY CLASSICAL MOTION 211

As an application, consider a particle stuck between two impenetrable walls


at positions x1 and x2 . An example would be the particle in a pipe that was
studied way back in chapter 2.5. The wave function ψ must become zero at
both x1 and x2 , since there is zero possibility of finding the particle outside the
impenetrable walls. It is now smart to chose the integration constant in θ so
that θ1 = 0. In that case, Cc must be zero for ψ to be zero at x1 . The wave
function must be just the sine term. Next, for ψ also to be zero at x2 , θ2 must
be a whole multiple n of π, because that are the only places where sines are
zero. So θ2 − θ1 = nπ, which means that

1 Z x2
particle between impenetrable walls: pc (x) dx = nπ (5.60)
h̄ x=x1
q
Recall that pc was 2m(E − V ), so this is just an equation for the energy
eigenvalues. It is an equation involving just an integral; it does not even require
you to find the corresponding eigenfunctions!
It does get a bit more tricky for a case like the harmonic oscillator where
the particle is not caught between impenetrable walls, but merely prevented to
escape by a gradually increasing potential. Classically, such a particle would
still be rigorously constrained between the so called “turning points” where the
potential energy V becomes equal to the total energy E, like the points 1 and 2
in figure 5.12. But as the figure shows, in quantum mechanics the wave function
does not become zero at the turning points; there is some chance for the particle
to be found somewhat beyond the turning points.
A further complication arises since the WKB approximation becomes in-
accurate in the immediate vicinity of the turning points. The problem is the
requirement that the classical momentum can be approximated as a nonzero
constant on a small scale. At the turning points the momentum becomes zero
and that approximation fails. However, it is possible to solve the Hamiltonian
eigenvalue problem near the turning points assuming that the potential energy
is not constant, but varies approximately linearly with position, {A.45}. Doing
so and fixing up the WKB solution away from the turning points produces a
simple result. The classical WKB approximation remains a sine, but at the
turning points, sin θ stays an angular amount π/4 short of becoming zero. (Or
to be precise, it just seems to stay π/4 short, because the classical WKB ap-
proximation is no longer valid at the turning points.) Assuming that there are
turning points with gradually increasing potential at both ends of the range, like
for the harmonic oscillator, the total angular range will be short by an amount
π/2, turning (5.60) into

1 Z x2
particle trapped between turning points: pc (x) dx = (n − 12 )π (5.61)
h̄ x=x1
212 CHAPTER 5. TIME EVOLUTION

The WKB approximation works fine in regions where the totalqenergy E is


less than the potential energy V . The classical momentum pc = 2m(E − V )
is imaginary in such regions, reflecting the fact that classically the particle does
not have enough energy to enter them. But, as the nonzero wave function
beyond the turning points in figure 5.12 shows, quantum mechanics does allow
some possibility for the particle to be found in regions where E is less than V .
It is loosely said that the particle can “tunnel” through, after a popular way for
criminals to escape from jail. To use the WKB approximation
q in these regions,
just rewrite it in terms of the magnitude |pc | = 2m(V − E) of the classical
momentum:

1 h
γ −γ
i 1Z
tunneling WKB: ψ≈q Cp e + Cn e γ≡ |pc | dx (5.62)
|pc | h̄

Note that γ is the equivalent of the angle θ in the classical approximation.

Key Points
⋄ The WKB approximation applies to situations of almost macroscopic
energy.
⋄ The WKB
p solution is described in terms of the classical momentum
R
pc ≡ 2m(E − V ) and in particular its antiderivative θ = pc dx/h̄.
⋄ The wave function can be written as (5.57) or (5.58), whatever is more
convenient.
⋄ For a particle stuck between impenetrable walls, the energy eigenvalues
can be found from (5.60).
⋄ For a particle stuck between a gradually increasing potential at both
sides, the energy eigenvalues can be found from (5.61).
⋄ The “tunneling” wave function in regions that classically the particle
is forbidden to enter can
R
be approximated as (5.62). It is in terms of
the antiderivative γ = |pc | dx/h̄.

5.6 Review Questions


1 Use the equation Z
1 x2
pc (x) dx = nπ
h̄ x=x1

to find the WKB approximation for the energy levels of a particle


stuck in a pipe of chapter 2.5.5. The potential V is zero inside the
pipe, given by 0 ≤ x ≤ ℓx
5.7. SCATTERING 213

In this case, the WKB approximation produces the exact result, since
the classical momentum really is constant. If there was a force field
in the pipe, the solution would only be approximate.
2 Use the equation
Z x2
1
pc (x) dx = (n − 21 )π
h̄ x=x1

to find the WKB approximation for the energy levels of the harmonic
oscillator. The potential energy is 12 mωx2 where the constant ω is the
classical natural frequency. So the total energy, expressed in terms of
the turning points x2 = −x1 at which E = V , is E = 12 mωx2 2 .
In this case too, the WKB approximation produces the exact en-
ergy eigenvalues. That, however, is just a coincidence; the classical
WKB wave functions are certainly not exact; they become infinite at
the turning points. As the example h50 above shows, the true wave
functions most definitely do not.

5.7 Scattering
The motion of the wave packets in section 5.5 approximated that of classical
Newtonian particles. However, if the potential starts varying nontrivially over
distances short enough to be comparable to a quantum wave length, much more
interesting behavior results, for which there is no classical equivalent. This
section gives a couple of important examples.

5.7.1 Partial reflection


A classical particle entering a region of changing potential will keep going as
long as its total energy exceeds the potential energy. Consider the potential
shown in green in figure 5.13; it drops off to a lower level and then stays there.
A classical particle would accelerate to a higher speed in the region of drop off
and maintain that higher speed from there on.
However, the potential in this example varies so rapidly on quantum scales
that the classical Newtonian picture is completely wrong. What actually hap-
pens is that the wave packet splits into two, as shown in the bottom figure. One
part returns to where the packet came from, the other keeps on going.
One hypothetical example used in chapter 2.1 was that of sending a single
particle both to Venus and to Mars. As this example shows, a scattering setup
gives a very real way of sending a single particle in two different directions at
the same time.
214 CHAPTER 5. TIME EVOLUTION

The html version of this document has an animation of the motion.

Figure 5.13: A partial reflection.

Partial reflections are the norm for potentials that vary nontrivially on quan-
tum scales, but this example adds a second twist. Classically, a decelerating force
is needed to turn a particle back, but here the force is everywhere accelerating
only! As an actual physical example of this weird behavior, neutrons trying to
enter nuclei experience attractive forces that come on so quickly that they may
be repelled by them.

5.7.2 Tunneling
A classical particle will never be able to progress past a point at which the
potential energy exceeds its total energy. It will be turned back. However, the
quantum mechanical truth is, if the region in which the potential energy exceeds
the particle’s energy is narrow enough on a quantum scale, the particle can go
right through it. This effect is called “tunneling.”
As an example, figure 5.14 shows part of the wave packet of a particle passing
right through a region where the peak potential exceeds the particle’s expecta-
tion energy by a factor three.

The html version of this document has an animation of the motion.

Figure 5.14: An tunneling particle.

Of course, the energy values have some uncertainty, but it is small. The
reason the particle can pass through is not because it has a chance of having
three times its nominal energy. It absolutely does not; the simulation set the
5.7. SCATTERING 215

probability of having more than twice the nominal energy to zero exactly. The
particle has a chance of passing through because its motion is governed by the
Schrödinger equation, instead of the equations of classical physics.
And if that is not convincing enough, consider the case of a delta function
barrier in figure 5.15; the limit of an infinitely high, infinitely narrow barrier.
Being infinitely high, classically nothing can get past it. But since it is also
infinitely narrow, a quantum particle will hardly notice a weak-enough delta
function barrier. In figure 5.15, the strength of the delta function was chosen
just big enough to split the wave function into equal reflected and transmitted
parts. If you look for the particle afterwards, you have a 50/50 chance of finding
it at either side of this “impenetrable” barrier.

The html version of this document has an animation of the motion.

Figure 5.15: Penetration of an infinitely high potential energy barrier.

Curiously enough, a delta function well, (with the potential going down
instead of up), reflects the same amount as the barrier version.
Tunneling has consequences for the mathematics of bound energy states.
Classically, you can confine a particle by sticking it in between, say two delta
function potentials, or between two other potentials that have a maximum po-
tential energy V that exceeds the particle’s energy E. But such a particle trap
does not work in quantum mechanics, because given time, the particle would
tunnel through a local potential barrier. In quantum mechanics, a particle is
bound only if its energy is less than the potential energy at infinite distance.
Local potential barriers only work if they have infinite potential energy, and
that over a larger range than a delta function.
Note however that in many cases, the probability of a particle tunneling out
is so infinitesimally small that it can be ignored. For example, since the electron
in a hydrogen atom has a binding energy of 13.6 eV, a 110 or 220 V ordinary
household voltage should in principle be enough for the electron to tunnel out
of a hydrogen atom. But don’t wait for it; it is likely to take much more than
the total life time of the universe. You would have to achieve such a voltage
drop within an atom-scale distance to get some action.
One major practical application of tunneling is the scanning tunneling mi-
croscope. Tunneling can also explain alpha decay of nuclei, and it is a critical
216 CHAPTER 5. TIME EVOLUTION

part of much advanced electronics, including current leakage problems in VLSI


devices.

5.8 Reflection and Transmission Coefficients


Scattering and tunneling can be described in terms of so-called “reflection and
transmission coefficients.” This section explains the underlying ideas.

V
E 1 2

Cfl eipl x/h̄ + Cbl e−ipl x/h̄ Cfr eipr x/h̄

Figure 5.16: Schematic of a scattering potential and the asymptotic behavior of


an example energy eigenfunction for a wave packet coming in from the far left.

Consider an arbitrary scattering potential like the one in figure 5.16. To the
far left and right, it is assumed that the potential assumes a constant value. In
such regions the energy eigenfunctions take the form

Cf eipx/h̄ + Cb e−ipx/h̄
q
where p = 2m(E − V ) is the classical momentum and Cf and Cb are constants.
When eigenfunctions of slightly different energies are combined together, the
terms Cf eipx/h̄ produce wave packets that move forwards in x, graphically from
left to right, and the terms Cb e−ipx/h̄ produce packets that move backwards. So
the subscripts indicate the direction of motion.
This section will discuss what happens to a wave packet that comes in from
the far left and is scattered by the nontrivial potential in the center region. To
describe this, the coefficient Cb must be zero in the far right region, as indicated
in figure 5.16, because otherwise it would produce a second wave packet coming
in from the far right.
The term Cbl e−ipx/h̄ produces the part of the incoming wave packet that is
reflected back towards the far left. The relative amount of the wave packet that
is reflected back is called the “reflection coefficient” R. It gives the probability
that the particle can be found to the left of the scattering region after the inter-
action with the scattering potential. It can be computed from the coefficients
of the energy eigenfunction in the left region as, {A.47},

|Cbl |2
R= (5.63)
|Cfl |2
5.9. ALPHA DECAY OF NUCLEI 217

Similarly, the relative fraction of the wave packet that passes through the
scattering region is called the “transmission coefficient” T . It gives the proba-
bility that the particle can be found at the other side of the scattering region
afterwards. It is most simply computed as T = 1 − R: whatever is not reflected
must pass through. Alternatively, it can be computed as

pr |Cfr |2 q q
T = pl = 2m(E − Vl ) pr = 2m(E − Vr ) (5.64)
pl |Cfl |2

where pl respectively pr are the values of the classical momentum in the far left
and right regions.
Note that a coherent wave packet requires a small amount of uncertainty in
energy. Using the eigenfunction at the nominal value of energy in the above
expressions for the reflection and transmission coefficients will involve a small
error. It can be made to go to zero by reducing the uncertainty in energy, but
then the size of the wave packet will expand correspondingly.
In the case of tunneling through a high and wide barrier, the WKB ap-
proximation may be used to derive a simplified expression for the transmission
coefficient, {A.45}. It is

1 Z x2 q
T ≈ e−2γ12 γ12 = |pc | dx |pc | = 2m(V − E) (5.65)
h̄ x1

where x1 and x2 are the “turning points” in figure 5.16, in between which the
potential energy exceeds the total energy of the particle.
For similar considerations in three-dimensional scattering, see note {A.46}.

5.9 Alpha Decay of Nuclei


So far, the focus on this book has been mostly on electrons. That is normal
because electrons are important like nothing else for the physical properties of
matter. Nuclei appear in the story only as massive anchors for the electrons,
holding onto them with their positive charge Ze, in which the “atomic number”
Z is the number of protons in the nucleus.
But then there is nuclear energy. Here the nuclei call the shots. The number
of neutrons in the nucleus now also becomes a primary factor. The total number
of “nucleons,” protons and neutrons, is called the “mass number” A. It and the
atomic number Z characterize nuclei. For example, the normal helium nucleus
contains two protons and two neutrons, making Z = 2 and A = 4. Such a
nucleus is commonly written as 42 He. The left subscript indicates the atomic
number and is redundant, since the atom name already implies the number of
protons. Therefor it is often left away. A nucleus with the same atomic number
218 CHAPTER 5. TIME EVOLUTION

Z but a different mass number A, (so a different number of neutrons), is called


an “isotope.” For example, in the atmosphere one in a million helium atoms
has a 32 He nucleus, with just a single neutron, instead of a 42 He one. It behaves
chemically virtually the same as the normal helium atom since the nuclear
charge is the same. (However, helium-3 is a fermion while normal helium-4 is
a boson: each neutron and proton has spin 12 just like the electrons. It makes
their cryogenic behavior quite different.)
Many isotopes are unstable and fall apart spontaneously, liberating energy.
While 32 He is stable like 42 He, a 62 He nucleus will emit an electron in about a
second; that produces 63 Li lithium because the electron carries off one unit of
negative charge, turning a neutron into a proton. For historical reasons, a decay
process of this type is called “beta decay” instead of “electron emission;” initially
it was not recognized that the emitted radiation was simply electrons. More
precisely, the process is called beta-minus decay: some nuclei emit positively
charged positrons instead of electrons, and that is called beta-plus decay. (A
neutrino or antineutrino is also emitted, but it is almost impossible to detect:
solar neutrinos will readily travel all the way through the earth with only a
miniscule chance of being captured.)
Nuclear decay is governed by chance. It is impossible to tell exactly when
any specific nucleus will decay. However, if a large number J of unstable nuclei
are examined, then the change dJ in the number of nuclei in an infinitesimally
small time interval dt is found to be

dt
dJ = −J (5.66)
τ

where τ is a constant that is called the “lifetime” of the nucleus. No, it is not
really a lifetime, except in some average sense, but its inverse 1/τ is the relative
fraction that disappears per unit time. In any case, usually the “half-life” τ1/2
is reported for nuclei. That is the time after which only about half of the nuclei
are left. It is shorter than the lifetime by a factor ln 2:

τ1/2 = τ ln 2 (5.67)

Note that ln 2 is less than one. The 62 He nucleus mentioned before has an half-
life of 0.8 seconds. After 0.8 s, only half of them are left, after 1.6 s, only one
quarter, after 2.4 s only one eighth, etcetera. Don’t wait a couple of minutes to
see whether you have any left.
There are a variety of other ways in which nuclei may decay; for example,
protons or neutrons may be emitted or the nucleus may spontaneously fall
apart, “fission,” into two or more smaller nuclei and other particles. Or instead
emitting a positron, a nucleus might achieve beta-plus decay by capturing an
5.9. ALPHA DECAY OF NUCLEI 219

electron. This “electron capture” process is sometimes referred to as “inverse


beta decay.”. It leads to collapsed “neutron stars” in astrophysics.
Another process, “gamma decay,” is much like spontaneous decay of excited
electron levels; in gamma decay an excited nucleus transitions to a lower energy
state and emits the released energy as electromagnetic radiation. The nucleus
remains of the same type. Because nuclear energies are very large, the emitted
photons of the radiation are highly energetic, much more so than typical x-rays,
and are said to be in the gamma-ray range. However, the distinction between
the gamma-ray and x-ray ranges is not sharp; the use of the term gamma-
ray or x-ray tends to be based more on the source of the radiation (nuclei or
high-energy electrons) than its energy.
So, if gamma decay is electromagnetic radiation and beta decay is electrons,
then what is the third of the famous trio, “alpha decay?” In alpha decay a
nucleus emits an “alpha particle,” later identified to be simply a 42 He nucleus.
Since the escaping alpha particle consists of two protons plus two neutrons, the
atomic number Z of the nucleus decreases by two and the mass number A by
four. Like in other nuclear processes, the energy released is in the typical range
of MeV. That is a million times higher than chemical binding energies, which
are in the range of eV. It is the devastating difference between a nuclear bomb
and a stick of dynamite. Or between the almost limitless power than can be
obtained from peaceful nuclear reactors and the limited supply of fossil fuels.
Consider figure 5.17, where the existing data for the nuclei that decay exclu-
sively through alpha decay is plotted. Nuclei are much like cherries: they have
a variable size that depends mainly on their mass number A, and a charge Z
that can be shown as different shades of red. You can even define a “stem” for
them, as explained later. Nuclei with the same atomic number Z are joined by
branches.
Not shown in figure 5.17 is the unstable beryllium isotope 84 Be, which has a
half-life of only 67 as, (i.e. 67 10−18 s), and a decay energy of only 0.092 MeV.
As you can see from the graph, these numbers are wildly different from the
other, much heavier, alpha-decay nuclei, and inclusion would make the graph
very messy. However, do not think that 84 Be, essentially just two alpha particles
pushed together, is not important. Without it there would be no life on earth.
Because of the absence of stable intermediaries, the Big Bang produced no
elements heavier than beryllium, (and only trace amounts of that) including no
carbon. As Hoyle pointed out, the carbon of life is formed in the interior of
aging stars when 84 Be captures a third alpha particle, to produce 126 C, which is
stable. This is called the “triple alpha process.” Under the extreme conditions
in the interior of collapsing stars, given time this process produces significant
amounts of carbon despite the extremely short half-life of 84 Be. But it is far too
slow to have occurred in the Big Bang.
Note the tremendous range of half-lives in figure 5.17, from mere nanoseconds
220 CHAPTER 5. TIME EVOLUTION

1Py
1Ty
τ1/2
1My
1ky 1Gy
1y 1My
1h 1ky
1s 1y
1ms 1h
1µs 4 5 6
2 3 4 5 6 7 8 MeV
1Ey
1Py
τ1/2
1Gy
1My
1ky
1y
1h
1s
1ms
1µs
2 3 4 5 6 7 8 MeV

Figure 5.17: Half-life versus energy release for the atomic nuclei marked in
NUBASE 2003 as showing pure alpha decay with unqualified energies. Top:
only the even values of the mass and atomic numbers cherry-picked. Inset:
really cherry-picking, only a few even mass numbers for thorium and uranium!
Bottom: all the nuclei except one.
5.9. ALPHA DECAY OF NUCLEI 221

to quintillions of years. And that excludes beryllium’s attoseconds. In the early


history of alpha decay, it seemed very hard to explain how nuclei that do not
seem that different with respect to their values of Z and A could have such
dramatically different half-lives. The values of the energy that is released in the
decay process do not vary that much, as figure 5.17 also shows.
To add to the mystery in those early days of quantum mechanics, if an alpha
particle was shot back at the nucleus with the same energy that it came out,
it would not go back in! It was reflected by the electrostatic repulsion of the
positively charged nucleus. So, it had not enough energy to pass through the
region of high energy surrounding the nucleus, yet it did pass through it when
it came out.
Gamow, and independently Gurney & Condon, recognized that the expla-
nation was quantum tunneling. It explained how the alpha particle could get
out without having enough energy to do so. Also, because the approximate ex-
pression T = e−2γ12 for the tunneling probability derived in the previous section
involves an exponential, it can explain the tremendous range: exponentials can
vary greatly in magnitude for relatively modest changes in their argument.

VCoulomb

E
r1 r2
Vn

Figure 5.18: Schematic potential for an alpha particle that tunnels out of a
nucleus.

The next step was obviously to put in some ballpark numbers and check
whether the experimental data could be explained. Consider the schematic
for the potential energy of the alpha particle in figure 5.18. Near but not in
the nucleus the potential energy of the alpha particle is very large because of
the Coulomb repulsion by the positively charged nucleus. However, inside the
nucleus the nuclear binding forces are so strong that the potential is low despite
the very large Coulomb repulsion. Nuclear forces are strong but very short
range and can be ignored outside the nucleus.
Now imagine an alpha particle wave packet “rattling around” in the nucleus
trying to escape. If it has a typical velocity vα and the nucleus has a radius r1 ,
222 CHAPTER 5. TIME EVOLUTION

it will hit the perimeter of the nucleus about vα /2r1 times per second. That is
sure to be a very large number of times per second, the nucleus being so small,
but each time it hits the perimeter, it only has a miniscule e−2γ12 chance of
escaping. So it may well take trillions of years before it is successful anyway.
Even so, among a very large number of nuclei a few will get out every time.
Remember that a mol of atoms represents in the order of 1023 nuclei; among
that many nuclei, a few alpha particles are going to succeed whatever the odds
against. The relative fraction of successful escape attempts per unit time is by
definition the inverse of the lifetime τ ;
vα −2γ12 1
e = (5.68)
2r1 τ
The idea is now to put in some rough numbers and then compare the computed
half-life with the experimental value.
Starting with the velocity vα of the wave packet, it should q be possible to
ballpark it from its kinetic energy E − Vn in the nucleus as 2(E − Vn )/mα ,
ignoring relativistic corrections. Unfortunately, even to this day, the nuclear
forces are poorly understood quantitatively, and it is not clear what to make of
Vn . But have another look at figure 5.17. Forget about engineering ideas about
acceptable accuracy. A 50% error in half-life would be invisible seen on the
tremendous range of figure 5.17. Being wrong by a factor 10, or even a factor
100, two orders of magnitude, is ho-hum on the scale that the half-life varies.
So, in lack of someone coming up with a better idea, just put the potential
energy Vn inside the nucleus equal to zero; then compute the amount of times
the alpha particle hits the perimeter from that. Next, the nuclear radius r1
can be ballparked reasonably well from the number of nucleons; the nucleus
can modelled as a “liquid drop” whose volume is proportional to the number of
nucleons A. A frequently given estimate is that the radius of the nucleus is

3
r1 = 1.2 A fm (5.69)

where f, femto, is 10−15 . That is a lot smaller than the typical Bohr radius
over which electrons are spread out. Electrons are “far away” and not really
relevant.
That leaves the value of γ12 to be modelled. Now, assuming that the alpha-
particle wave packet is small enough that three-dimensional effects can be ig-
nored, WKB analysis has it that
1 Z r2 q
γ12 = 2mα (V − E) dr
h̄ r=r1
where the integration is in between the turning points in figure 5.18. Because
the nuclear forces are so short-range, they should be negligible over most of
5.9. ALPHA DECAY OF NUCLEI 223

the integration range, so it seems reasonable to simply substitute the Coulomb


potential everywhere for V in doing the integral. This potential is inversely
proportional to the radial position r, and it equals E at r2 , so V must equal
V = Er2 /r. Substituting this in, and doing the integral by making the change
of integration variable r = r2 sin2 u, produces
√ " s µ ¶ s #
2mα E π r1 r1 r1
γ12 = r2 − 1− − arcsin
h̄ 2 r2 r2 r2

The last two terms within the square brackets are typically relatively small
compared to the first one, because√ r1 is usually fairly small compared to r2 .
Then γ12 is about proportional to Er2 . But r2 itself is inversely proportional
to E, because the total energy of the alpha particle equals its potential energy
at r2 ;
(Z − 2)e 2e
E= .
4πǫ0 r2

That makes γ12 about proportional to 1/ E for a given atomic number Z. So
if you plot the half-life on a logarithmic scale, and the energy E on an inverse
square root scale, as done in figure 5.17, they should vary linearly with each
other for a given atomic number. The predicted slope of linear variation is
indicated by the “stems” on the cherries in figure 5.17. Ideally, all cherries
connected by branches should fall on a single line with this slope. However,
the other two terms within the square brackets in the expression for γ12 are not
really that small, and the factor vα /2r1 in the lifetime varies too. Then there
are modelling errors such as the unknown nuclear potential, the quasi-classical
description of the wave packet, and the ignored potential of the surrounding
electrons. Considering all that, the agreement does not seem that bad.
The bottom line question is whether the theory, rough as it may be, can
produce meaningful values for the experimental half-lives, within reason. To
compute actual half-lives, the energy E of the alpha particle may be found from
Einstein’s famous expression E = mc2 , {A.4}. Just find the difference between
the rest mass of the original nucleus and the sum of that of the final nucleus and
the alpha particle, and multiply by the square speed of light. The other quantity
that is needed is the effective radius of the nucleus r1 . The results below followed
web sources that estimate r1 as the sum of the radius of the nucleus having lost
the alpha particle plus the radius of the alpha particle sitting right next to it.
Both radii were estimated from their mass number A using (5.69). It should be
pointed out that the results are quite sensitive to the value of r1 , and there is
no really good way to give it a more precise value.
Figure 5.19 shows predicted half-lives versus the actual ones. Cherries on
the black line indicate that the correct value is predicted. It is clear that there
is no real accuracy to the predictions in any normal sense; they are easily off by
224 CHAPTER 5. TIME EVOLUTION

1Ey
1Py
1Ty
τ1/2 1Gy
1My
1ky
1y
1h
1s
1ms
1µs
1µs 1s 1y 1My 1Ty 1Ey true

Figure 5.19: Half-life predicted by the Gamow / Gurney & Condon theory
versus the true value.

several orders of magnitude. What can you expect without an accurate model
of the nucleus itself? However, the predictions do successfully reproduce the
tremendous range of half-lives and they do not deviate from the correct value
that much compared to that tremendous range. It is hard to imagine any other
theory besides tunneling that could do the same.
For 84 Be, the predicted half-life is 167 fs (167 10−15 s). While that is indeed an
extremely short half-life, it is more than 2000 times longer than the true value
of 67 as, the worst performance of the theory among all nuclei. (But not much
worse than that for the 209 83 Bi bismuth isotope whose half-life of 19 Ey, 19 10
18

years, is underestimated by a similar factor. Then again, since the universe has
only existed about 14 109 years, who is going to live long enough to complain
about it?)
Part II

Advanced Topics

225
Chapter 6

Numerical Procedures

Since analytical solutions in quantum mechanics are extremely limited, numeri-


cal solution is essential. This chapter outlines some of the most important ideas.
The most glaring omission at this time is the DFT (Density Functional Theory.)
A writer needs a sabbatical.

6.1 The Variational Method


Solving the equations of quantum mechanics is typically difficult, so approxi-
mations must usually be made. One very effective way to find an approximate
ground state is the variational principle. This section gives some of the basic
ideas, including ways to apply it best, and how to find eigenstates of higher
energy in similar ways.

6.1.1 Basic variational statement


The variational method is based on the observation that the ground state is the
state among all allowable wave functions that has the lowest expectation value
of energy:

hEi is minimal for the ground state wave function. (6.1)

The variational method has already been used to find the ground states for
the hydrogen molecular ion, chapter 3.5, and the hydrogen molecule, chapter
4.2. The general procedure is to guess an approximate form of the wave function,
invariably involving some parameters whose best values you are unsure about.
Then search for the parameters that give you the lowest expectation value of
the total energy; those parameters will give your best possible approximation
to the true ground state {A.22}. In particular, you can be confident that the
true ground state energy is no higher than what you compute, {A.24}.

227
228 CHAPTER 6. NUMERICAL PROCEDURES

To get the second lowest energy state, you could search for the lowest energy
among all wave functions orthogonal to the ground state. But since you would
not know the exact ground state, you would need to use your approximate one
instead. That would involve some error, and it is no longer sure that the true
second-lowest energy level is no higher than what you compute, but anyway.
If you want to get more accurate values, you will need to increase the num-
ber of parameters. The molecular example solutions were based on the atom
ground states, and you could consider adding some excited states to the mix. In
general, a procedure using appropriate guessed functions is called a Rayleigh-
Ritz method. Alternatively, you could just chop space up into little pieces, or
elements, and use a simple polynomial within each piece. That is called a finite
element method. In either case, you end up with a finite, but relatively large
number of unknowns; the parameters and/or coefficients of the functions, or the
coefficients of the polynomials.

6.1.2 Differential form of the statement


You might by now wonder about the wisdom of trying to find the minimum
energy by searching through the countless possible combinations of a lot of pa-
rameters. Brute-force search worked fine for the hydrogen molecule examples
since they really only depended nontrivially on the distance between the nuclei.
But if you add some more parameters for better accuracy, you quickly get into
trouble. Semi-analytical approaches like Hartree-Fock even leave whole func-
tions unspecified. In that case, simply put, every single function value is an
unknown parameter, and a function has infinitely many of them. You would
be searching in an infinite-dimensional space, and could search forever. Maybe
you could try some clever genetic algorithm.
Usually it is a much better idea to write some equations for the minimum
energy first. From calculus, you know that if you want to find the minimum of
a function, the sophisticated way to do it is to note that the partial derivatives
of the function must be zero at the minimum. Less rigorously, but a lot more
intuitive, at the minimum of a function the changes in the function due to small
changes in the variables that it depends on must be zero. In the simplest possible
example of a function f (x) of one variable x, a rigorous mathematician would
say that at a minimum, the derivative f ′ (x) must be zero. Instead a typical
physicist would say that the change δf , (or df ,) in f due to a small change δx
in x must be zero. It is the same thing, since δf = f ′ δx, so that if f ′ is zero,
then so is δf . But mathematicians do not like the word small, since it has no
rigorous meaning. On the other hand, in physics you may not like to talk about
derivatives, for if you say derivative, you must say with respect to what variable;
you must say what x is as well as what f is, and there is often more than one
possible choice for x, with none preferred under all circumstances. (And in
6.1. THE VARIATIONAL METHOD 229

practice, the word “small” does have an unambiguous meaning: it means that
you must ignore everything that is of square magnitude or more in terms of the
“small” quantities.)
In physics terms, the fact that the expectation energy must be minimal in
the ground state means that you must have:

δhEi = 0 for all acceptable small changes in wave function (6.2)

The changes must be acceptable; you cannot allow that the changed wave func-
tion is no longer normalized. Also, if there are boundary conditions, the changed
wave function should still satisfy them. (There may be exceptions permitted
to the latter under some conditions, but these will be ignored here.) So, in
general you have “constrained minimization;” you cannot make your changes
completely arbitrary.

6.1.3 Example application using Lagrangian multipliers


As an example of how you can apply the variational formulation of the previous
subsection analytically, and how it can also describe eigenstates of higher energy,
this subsection will work out a very basic example. The idea is to figure out
what you get if you truly zero the changes in the expectation value of energy
hEi = hψ|H|ψi over all acceptable wave functions ψ. (Instead of just over all
possible versions of a numerical approximation, say.) It will illustrate how you
can deal with the constraints.
The differential statement is:

δhψ|H|ψi = 0 for all acceptable changes δψ in ψ

But “acceptable” is not a mathematical concept. What does it mean? Well, if it


is assumed that there are no boundary conditions, (like the harmonic oscillator,
but unlike the particle in a pipe,) then acceptable just means that the wave
function must remain normalized under the change. So the change in hψ|ψi
must be zero, and you can write more specifically:

δhψ|H|ψi = 0 whenever δhψ|ψi = 0.

But how do you crunch a statement like that down mathematically? Well,
there is a very important mathematical trick to simplify this. Instead of rigor-
ously trying to enforce that the changed wave function is still normalized, just
allow any change in wave function. But add “penalty points” to the change in
expectation energy if the change in wave function goes out of allowed bounds:

δhψ|H|ψi − ǫδhψ|ψi = 0
230 CHAPTER 6. NUMERICAL PROCEDURES

Here ǫ is the penalty factor; such factors are called “Lagrangian multipliers”
after a famous mathematician who probably watched a lot of soccer. For a
change in wave function that does not go out of bounds, the second term is
zero, so nothing changes. And if the penalty factor is carefully tuned, the
second term can cancel any erroneous gain or decrease in expectation energy
due to going out of bounds, {A.48}.
You do not, however, have to explicitly tune the penalty factor yourself. All
you need to know is that a proper one exists. In actual application, all you
do in addition to ensuring that the penalized change in expectation energy is
zero is ensure that at least the unchanged wave function is normalized. It is
really a matter of counting equations versus unknowns. Compared to simply
setting the change in expectation energy to zero with no constraints on the wave
function, one additional unknown has been added, the penalty factor. And quite
generally, if you add one more unknown to a system of equations, you need one
more equation to still have a unique solution. As the one-more equation, use
the normalization condition. With enough equations to solve, you will get the
correct solution, which means that the implied value of the penalty factor should
be OK too.
So what does this variational statement now produce? Writing out the
differences explicitly, you must have
³ ´ ³ ´
hψ + δψ|H|ψ + δψi − hψ|H|ψi − ǫ hψ + δψ|ψ + δψi − hψ|ψi = 0

Multiplying out, canceling equal terms and ignoring terms that are quadratically
small in δψ, you get
³ ´
hδψ|H|ψi + hψ|H|δψi − ǫ hδψ|ψi + hψ|δψi = 0

That is not yet good enough to say something specific about. But remember
that you can exchange the sides of an inner product if you add a complex
conjugate, so
³ ´
hδψ|H|ψi + hδψ|H|ψi∗ − ǫ hδψ|ψi − hδψ|ψi∗ = 0

Also remember that you can allow any change δψ you want, including the δψ
you are now looking at times i. That means that you also have:
³ ´
hiδψ|H|ψi + hiδψ|H|ψi∗ − ǫ hiδψ|ψi + hiδψ|ψi∗ = 0

or using the fact that numbers come out of the left side of an inner product as
complex conjugates
³ ´
−ihδψ|H|ψi + ihδψ|H|ψi∗ − ǫ − ihδψ|ψi + ihψ|δψi∗ = 0
6.2. THE BORN-OPPENHEIMER APPROXIMATION 231

If you divide out a −i and then average with the original equation, you get rid
of the complex conjugates:

hδψ|H|ψi − ǫhδψ|ψi = 0

You can now combine them into one inner product with δψ on the left:

hδψ|Hψ − ǫψi = 0

If this is to be zero for any change δψ, then the right hand side of the inner
product must unavoidably be zero. For example, just take δψ equal to a small
number ε times the right hand side, you will get ε times the square norm of
the right hand side, and that can only be zero if the right hand side is. So
Hψ − ǫψ = 0, or
Hψ = ǫψ.

So you see that you have recovered the Hamiltonian eigenvalue problem from
the requirement that the variation of the expectation energy is zero. Unavoid-
ably then, ǫ will have to be an energy eigenvalue E. It often happens that
Lagrangian multipliers have a physical meaning beyond being merely penalty
factors. But note that there is no requirement for this to be the ground state.
Any energy eigenstate would satisfy the equation; the variational principle works
for them all.
Indeed, you may remember from calculus that the derivatives of a function
may be zero at more than one point. For example, a function might also have a
maximum, or local minima and maxima, or stationary points where the function
is neither a maximum nor a minimum, but the derivatives are zero anyway. This
sort of thing happens here too: the ground state is the state of lowest possible
energy, but there will be other states for which δhEi is zero, and these will
correspond to energy eigenstates of higher energy, {A.49}.

6.2 The Born-Oppenheimer Approximation


Exact solutions in quantum mechanics are hard to come by. In almost all cases,
approximation is needed. The Born-Oppenheimer approximation in particular
is a key part of real-life quantum analysis of atoms and molecules and the like.
The basic idea is that the uncertainty in the nuclear positions is too small to
worry about when you are trying to find the wave function for the electrons.
That was already assumed in the earlier approximate solutions for the hydrogen
molecule and molecular ion. This section discusses the approximation, and how
it can be used, in more depth.
232 CHAPTER 6. NUMERICAL PROCEDURES

6.2.1 The Hamiltonian


The general problem to be discussed in this section is that of a number of
electrons around a number of nuclei. You first need to know what is the true
problem to be solved, and for that you need the Hamiltonian.
This discussion will be restricted to the strictly nonrelativistic case. Cor-
rections for relativistic effects on energy, including those involving spin, can in
principle be added later, though that is well beyond the scope of this book. The
physical problem to be addressed is that there are a finite number I of electrons
around a finite number J of nuclei in otherwise empty space. That describes
basic systems of atoms and molecules, but modifications would have to be made
for ambient electric and magnetic fields and electromagnetic waves, or for the
infinite systems of electrons and nuclei used to describe solids.
The electrons will be numbered using an index i, and whenever there is a
second electron involved, its index will be called i. Similarly, the nuclei will be
numbered with an index j, or j where needed. The nuclear charge of nucleus
number j, i.e. the number of protons in that nucleus, will be indicated by Zj ,
and the mass of the nucleus by mjn . Roughly speaking, the mass mjn will be the
sum of the masses of the protons and neutrons in the nucleus; however, internal
nuclear energies are big enough that there are noticeable relativistic deviations
in total nuclear rest mass from what you would think. All the electrons have
the same mass me since relativistic mass changes due to motion are ignored.
Under the stated assumptions, the Hamiltonian of the system consists of a
number of contributions that will be looked at one by one. First there is the
kinetic energy of the electrons, the sum of the kinetic energy operators of the
individual electrons:
à !
bE
I
X h̄2 I
X h̄2 ∂2 ∂2 ∂2
T =− ∇2i =− + + . (6.3)
i=1 2me i=1 2me ∂r1i 2 ∂r2i 2 ∂r3i 2

where ~ri = (r1i , r2i , r3i ) is the position of electron number i. Note the use of
(r1 , r2 , r3 ) as the notation for the components of position, rather than (x, y, z).
For more elaborate mathematics, the index notation (r1 , r2 , r3 ) is often more
convenient, since you can indicate any generic component by the single expres-
sion rα , (with the understanding that α = 1, 2, or 3,) instead of writing them
out all three separately.
Similarly, there is the kinetic energy of the nuclei,
à !
J
X h̄2 J
X h̄2 ∂2 ∂2 ∂2
Tb N = − ∇ n2
n j
=− + + . (6.4)
j=1 2m j j=1 2mjn ∂r1jn 2 ∂r2jn 2 ∂r3jn 2

where ~rjn = (r1jn , r2jn , r3jn ) is the position of nucleus number j.


6.2. THE BORN-OPPENHEIMER APPROXIMATION 233

Next there is the potential energy due to the attraction of the I electrons
by the J nuclei. That potential energy is, summing over all electrons and over
all nuclei:
I X J
X Zj e2 1
V NE = − (6.5)
i=1 j=1 4πǫ0 rij

where rij ≡ |~ri − ~rjn | is the distance between electron number i and nucleus
number j, and ǫ0 = 8.85 10−12 C2 /J m is the permittivity of space.
Next there is the potential energy due to the electron-electron repulsions:
I
I X
EE 1
X e2 1
V = 2
(6.6)
i=1 i=1 4πǫ0 rii
i6=i

where rii ≡ |~ri − ~ri | is the distance between electron number i and electron
number i. Half of this repulsion energy will be attributed to electron i and half
to electron i, accounting for the factor 12 .
Finally, there is the potential energy due to the nucleus-nucleus repulsions,
J Z Z e2
J X
X
NN 1 j j 1
V = 2
, (6.7)
j=1 j=1 4πǫ0 rjj
j6=j

where rjj ≡ |~rjn − ~rjn | is the distance between nucleus number j and nucleus
number j.
Solving the full quantum problem for this system of electrons and nuclei
exactly would involve finding the eigenfunctions ψ to the Hamiltonian eigenvalue
problem h i
Tb E + Tb N + V NE + V EE + V NN ψ = Eψ (6.8)
Here ψ is a function of the position and spin coordinates of all the electrons and
all the nuclei, in other words:

ψ = ψ(~r1 , Sz1 ,~r2 , Sz2 , . . . ,~rI , SzI , ~r1n , Sz1


n
,~r2n , Sz2
n
, . . . ,~rJn , SzJ
n
) (6.9)

You might guess solving this problem is a tall order, and you would be
perfectly right. It can only be done analytically for the very simplest case
of one electron and one nucleus. That is the hydrogen atom solution, using
an effective electron mass to include the nuclear motion. For any decent size
system, an accurate numerical solution is a formidable task too.

6.2.2 The basic Born-Oppenheimer approximation


The general idea of the Born-Oppenheimer approximation is simple. First note
that the nuclei are thousands of times heavier than the electrons. A proton is
234 CHAPTER 6. NUMERICAL PROCEDURES

almost two thousand times heavier than an electron, and that does not even
count any neutrons in the nuclei.
So, if you take a look at the kinetic energy operators of the two,
à !
I
X h̄2 ∂2 ∂2 ∂2
Tb E = − + +
i=1 2me ∂r1i 2 ∂r2i 2 ∂r3i 2
à !
J
X h̄2 ∂2 ∂2 ∂2
Tb N = − + +
j=1 2mjn ∂r1jn 2 ∂r2jn 2 ∂r3jn 2

then what would seem more reasonable than to ignore the kinetic energy Tb N of
the nuclei? It has those heavy masses in the bottom.
An alternative, and better, way of phrasing the assumption that Tb N can be
ignored is to say that you ignore the uncertainty in the positions of the nuclei.
For example, visualize the hydrogen molecule, figure 4.2. The two protons, the
nuclei, have pretty well defined positions in the molecule, while the electron wave
function extends over the entire region like a big blob of possible measurable
positions. So how important could the uncertainty in position of the nuclei
really be?
Assuming that the nuclei do not suffer from quantum uncertainty in position
is really equivalent to putting h̄ to zero in their kinetic energy operator above,
making the operator disappear, because h̄ is nature’s measure of uncertainty.
And without a kinetic energy term for the nuclei, there is nothing left in the
mathematics to force them to have uncertain positions. Indeed, you can now just
guess numerical values for the positions of the nuclei, and solve the approximated
eigenvalue problem Hψ = Eψ for those assumed values.
That thought is the Born-Oppenheimer approximation in a nutshell. Just do
the electrons, assuming suitable positions for the nuclei a priori. The solutions
that you get doing so will be called ψ E to distinguish them from the true solu-
tions ψ that do not use the Born-Oppenheimer approximation. Mathematically
ψ E will still be a function of the electron and nuclear positions:
ψ E = ψ E (~r1 , Sz1 ,~r2 , Sz2 , . . . ,~rI , SzI ; ~r1n , Sz1
n
,~r2n , Sz2
n
, . . . ,~rJn , SzJ
n
). (6.10)
But physically it will be a quite different thing: it describes the probability
of finding the electrons, given the positions of the nuclei. That is why there
is a semi-colon between the electron positions and the nuclear positions. The
nuclear positions are here assumed positions, while the electron positions are
potential positions, for which the square magnitude of the wave function ψ E
gives the probability. This is an electron wave function only.
In application, it is usually most convenient to write the Hamiltonian eigen-
value problem for the electron wave function as
h i
Tb E + V NE + V EE + V NN ψ E = (E E + V NN )ψ E ,
6.2. THE BORN-OPPENHEIMER APPROXIMATION 235

which just means that the eigenvalue is called E E + V NN instead of simply E E .


The reason is that you can then get rid of V NN , and obtain the electron wave
function eigenvalue problem in the more concise form
h i
Tb E + V NE + V EE ψ E = E E ψ E (6.11)

After all, for given nuclear coordinates, V NN is just a bothersome constant in


the solution of the electron wave function that you may just as well get rid of.
Of course, after you compute your electron eigenfunctions, you want to get
something out of the results. Maybe you are looking for the ground state of a
molecule, like was done earlier for the hydrogen molecule and molecular ion. In
that case, the simplest approach is to try out various nuclear positions and for
each likely set of nuclear positions compute the electronic ground state energy
E
Egs , the lowest eigenvalue of the electronic problem (6.11) above.
For different assumed nuclear positions, you will get different values for the
electronic ground state energy, and the nuclear positions corresponding to the
actual ground state of the molecule will be the ones for which the total energy
is least:
E
nominal ground state condition: Egs + V NN is minimal (6.12)

This is what was used to solve the hydrogen molecule cases discussed in
E
earlier chapters; a computer program was written to print out the energy Egs +
NN
V for a lot of different spacings between the nuclei, allowing the spacing that
had the lowest total energy to be found by skimming down the print-out. That
identified the ground state. The biggest error in those cases was not in using
the Born-Oppenheimer approximation or the nominal ground state condition
above, but in the crude way in which the electron wave function for given
nuclear positions was approximated.
For more accurate work, the nominal ground state condition (6.12) above
does have big limitations, so the next subsection discusses a more advanced
approach.

6.2.3 Going one better


Solving the wave function for electrons only, given positions of the nuclei is
definitely a big simplification. But identifying the ground state as the position
of the nuclei for which the electron energy plus nuclear repulsion energy is
minimal is much less than ideal.
Such a procedure ignores the motion of the nuclei, so it is no use for figuring
out any molecular dynamics beyond the ground state. And even for the ground
state, it is really wrong to say that the nuclei are at the position of minimum
236 CHAPTER 6. NUMERICAL PROCEDURES

energy, because the uncertainty principle does not allow certain positions for
the nuclei.
Instead, the nuclei behave much like the particle in a harmonic oscillator.
They are stuck in an electron blob that wants to push them to their nominal
positions. But uncertainty does not allow that, and the wave function of the
nuclei spreads out a bit around the nominal positions, adding both kinetic and
potential energy to the molecule. One example effect of this “zero point energy”
is to lower the required dissociation energy a bit from what you would expect
otherwise.
It is not a big effect, maybe on the order of tenths of electron volts, compared
to typical electron energies described in terms of multiple electron volts (and
much more for the inner electrons in all but the lightest atoms.) But it is not
as small as might be guessed based on the fact that the nuclei are at least
thousands of times heavier than the electrons.
Moreover, though relatively small in energy, the motion of the nuclei may
actually be the one that is physically the important one. One reason is that
the electrons tend to get stuck in single energy states. That may be because
the differences between electron energy levels tend to be so large compared to
a typical unit 12 kT of thermal energy, about one hundredth of an electron volt,
or otherwise because they tend to get stuck in states for which the next higher
energy levels are already filled with other electrons. The interesting physical
effects then become due to the seemingly minor nuclear motion.
For example, the heat capacity of typical diatomic gases, like the hydrogen
molecule or air under normal conditions, is not in any direct sense due to the
electrons; it is kinetic energy of translation of the molecules plus a comparable
energy due to angular momentum of the molecule; read, angular motion of the
nuclei around their mutual center of gravity. The heat capacity of solids too is
largely due to nuclear motion, as is the heat conduction of non-metals.
For all those reasons, you would really, really, like to actually compute the
motion of the nuclei, rather than just claim they are at fixed points. Does
that mean that you need to go back and solve the combined wave function for
the complete system of electrons plus nuclei anyway? Throw away the Born-
Oppenheimer approximation results?
Fortunately, the answer is mostly no. It turns out that nature is quite coop-
erative here, for a change. After you have done the electronic structure compu-
tations for all relevant positions of the nuclei, you can proceed with computing
the motion of nuclei as a separate problem. For example, if you are interested in
the ground state nuclear motion, it is governed by the Hamiltonian eigenvalue
problem h i
Tb N + V NN + E1E ψ1N = Eψ1N

where ψ1N is a wave function involving the nuclear coordinates only, not any
6.2. THE BORN-OPPENHEIMER APPROXIMATION 237

electronic ones. The trick is in the potential energy to use in such a computation;
it is not just the potential energy of nucleus to nucleus repulsions, but you must
include an additional energy E1E .
So, what is this E1E ? Easy, it is the electronic ground state energy Egs
E
that
you computed for assumed positions of the nuclei. So it will depend on where
the nuclei are, but it does not depend on where the electrons are. You can just
computed E1E for a sufficient number of relevant nuclear positions, tabulate the
results somehow, and interpolate them as needed. E1E is then a known function
function of the nuclear positions and so is V NN . Proceed to solve for the wave
function for the nuclei ψ1N as a problem not directly involving any electrons.
And it does not necessarily have to be just to compute the ground state.
You might want to study thermal motion or whatever. As long as the electrons
are not kicked strongly enough to raise them to the next energy level, you can
assume that they are in their ground state, even if the nuclei are not. The usual
way to explain this is to say something like that the electrons “move so fast
compared to the slow nuclei that they have all the time in the world to adjust
themselves to whatever the electronic ground state is for the current nuclear
positions.“
You might even decide to use classical molecular dynamics based on the
potential V NN + E1E instead of quantum mechanics. It would be much faster
and easier, and the results are often good enough.
So what if you are interested in what your molecule is doing when the elec-
trons are at an elevated energy level, instead of in their ground state? Can
you still do it? Sure. If the electrons are in an elevated energy level EnE , (for
simplicity, it will be assumed that the electron energy levels are numbered with
a single index n,) just solve
h i
Tb N + V NN + EnE ψnN = EψnN (6.13)

or equivalent.
Note that for a different value of n, this is truly a different motion problem
for the nuclei, since the potential energy will be different. If you are a visual
sort of person, you might vaguely visualize the potential energy for a given value
of n plotted as a surface in some high-dimensional space, and the state of the
nuclei moving like a roller-coaster along that potential energy surface, speeding
up when the surface goes down, slowing down if it goes up. There is one such
surface for each value of n. Anyway. The bottom line is that people refer
to these different potential energies as “potential energy surfaces.” They are
also called “adiabatic surfaces” because “adiabatic” normally means processes
sufficiently fast that heat transfer can be ignored. So, some quantum physicists
figured that it would be a good idea to use the same term for quantum processes
238 CHAPTER 6. NUMERICAL PROCEDURES

that are so slow that quasi-equilibrium conditions persist throughout, and that
have nothing to do with heat transfer.
Of course, any approximation can fail. It is possible to get into trouble
solving your problem for the nuclei as explained above. The difficulties arise if
two electron energy levels, call them EnE and EnE , become almost equal, and in
particular when they cross. In simple terms, the difficulty is that if energy levels
are equal, the energy eigenfunctions are not unique, and the slightest thing can
throw you from one eigenfunction to the completely different one.
You might now get alarmed, because for example the hydrogen molecular ion
does have two different ground state solutions with the same energy. Its single
electron can be in either the spin-up state or the spin down state, and it does
not make any difference for the energy because the assumed Hamiltonian does
not involve spin. In fact, all systems with an odd number of electrons will have
a second solution with all spins reversed and the same energy {A.50}. There is
no need to worry, though; these reversed-spin solutions go their own way and
do not affect the validity of (6.13). It is spatial, rather than spin nonuniqueness
that is a concern.
There is a derivation of the nuclear eigenvalue problem (6.13) in note {A.51},
showing what the ignored terms are and why they can usually be ignored.

6.3 The Hartree-Fock Approximation


Many of the most important problems that you want to solve in quantum me-
chanics are all about atoms and/or molecules. These problems involve a number
of electrons around a number of atomic nuclei. Unfortunately, a full quantum
solution of such a system of any nontrivial size is very difficult. However, ap-
proximations can be made, and as section 6.2 explained, the real skill you need
to master is solving the wave function for the electrons given the positions of
the nuclei.
But even given the positions of the nuclei, a brute-force solution for any
nontrivial number of electrons turns out to be prohibitively laborious. The
Hartree-Fock approximation is one of the most important ways to tackle that
problem, and has been so since the early days of quantum mechanics. This
section explains some of the ideas.

6.3.1 Wave function approximation


The key to the basic Hartree-Fock method is the assumptions it makes about
the form of the electron wave function. It will be assumed that there are a total
of I electrons in orbit around a number of nuclei. The wave function describing
6.3. THE HARTREE-FOCK APPROXIMATION 239

the set of electrons then has the general form:

Ψ(~r1 , Sz1 ,~r2 , Sz2 , . . . ,~ri , Szi , . . .~rI , SzI )

where ~ri is the position of electron number i, and Szi its spin in a chosen z-
direction, with measurable values 21 h̄ and − 21 h̄. Of course, what answer you
get for the wave function will also depend on where the nuclei are, but in this
section, the nuclei are supposed to be at given positions, so to reduce the clutter,
the dependence of the electron wave function on the nuclear positions will not
be explicitly shown.
Hartree-Fock approximates the wave function in terms of a set of single-
electron functions, each a product of a spatial function and a spin state:

ψ1e (~r)l1 (Sz ), ψ2e (~r)l2 (Sz ), ψ3e (~r)l3 (Sz ), . . .

where l stands for either spin-up, ↑, or spin-down, ↓. (By definition, function


↑(Sz ) equals one if the spin Sz is 12 h̄, and zero if it is − 21 h̄, while function ↓(Sz )
equals zero if Sz is 12 h̄ and one if it is − 21 h̄.) These single-electron functions
are called “orbitals” or “spin orbitals.” The reason is that you tend to think
of them as describing a single electron being in orbit around the nuclei with a
particular spin. Wrong, of course: the electrons do not have reasonably defined
positions on these scales. But you do tend to think of them that way anyway.
The spin orbitals are taken to be an orthonormal set. Note that any two spin
orbitals are automatically orthogonal if they have opposite spins: spin states
are orthonormal so h↑|↓i = 0. If they have the same spin, their spatial orbitals
will need to be orthogonal.
Single-electron functions can be combined into multi-electron functions by
forming products of them of the form

an1 ,n2 ,...,nI ψne 1 (~r1 )ln1 (Sz1 ) ψne 2 (~r2 )ln2 (Sz2 ) . . . ψne I (~rI )lnI (SzI )

where n1 is the number of the single-electron function used for electron 1, n2


the number of the single-electron function used for electron 2, and so on, and
an1 ,n2 ,...,nI is a suitable numerical constant. Such a product wave function is
called a “Hartree product.”
Now if you use enough single-electron functions, with all their Hartree prod-
ucts, you can approximate any multi-electron wave function to arbitrarily high
accuracy. Unfortunately, using many of them produces a problem much too big
to be solved on even the most powerful computer. So you really want to use
as little of them as possible. But you cannot use too few either; as chapter 4.7
explained, nature imposes an “antisymmetrization” requirement: the complete
wave function that you write must change sign whenever any two electrons are
exchanged, in other words when you replace ~ri , Szi by ~ri , Szi and vice-versa for
240 CHAPTER 6. NUMERICAL PROCEDURES

any pair of electrons numbered i and i. That is only possible if you use at least
I different single-electron functions for your I electrons. This is known as the
Pauli exclusion principle: any group of I − 1 electrons occupying the minimum
of I − 1 single-electron functions “exclude” an additional I-th electron from
simply entering the same functions. The I-th electron will have to find its own
single-electron function to add to the mix.
The basic Hartree-Fock approximation uses the absolute minimum that is
possible, just I different single-electron functions for the I electrons. In that
case, the wave function Ψ can be written as a single “Slater determinant:”
¯ ¯
¯ ψ1e (~r1 )l1 (Sz1 ) ψ2e (~r1 )l2 (Sz1 ) . . . ψne (~r1 )ln (Sz1 ) . . . ψIe (~r1 )lI (Sz1 ) ¯
¯ ¯
¯ ¯
¯
¯
ψ1e (~r2 )l1 (Sz2 ) ψ2e (~r2 )l2 (Sz2 ) . . . ψne (~r2 )ln (Sz2 ) . . . ψIe (~r2 )lI (Sz2 ) ¯
¯
¯ .. .. ... .. ... .. ¯
a ¯¯ . . . . ¯
¯
√ ¯ ¯
I! ¯¯ ψ1e (~ri )l1 (Szi ) ψ2e (~ri )l2 (Szi ) . . . ψne (~ri )ln (Szi ) . . . ψIe (~ri )lI (Szi ) ¯
.. .. ... .. ... .. ¯
¯ ¯
¯ . . . . ¯
¯ ¯
¯ ψ1e (~rI )l1 (SzI ) ψ2e (~rI )l2 (SzI ) . . . ψne (~rI )ln (SzI ) . . . e
ψI (~rI )lI (SzI ) ¯
(6.14)
where a is a constant of magnitude one. As chapter 4.7 explained, a Slater
determinant is really equivalent to a sum of I! Hartree products, each with the
single-electron functions in a different order. It is the one wave function obtain-
able from the I single-electron functions that is antisymmetric with respect to
exchanging any two of the I electrons.
Displaying the Slater determinant fully as above may look impressive, but
it is a lot to read. Therefor, from now on it will be abbreviated as
a ¯ E
Ψ = √ ¯¯det(ψ1e l1 , ψ2e l2 , . . . , ψne ln , . . . , ψIe lI ) . (6.15)
I!

It is important to realize that using the minimum number of single-electron


functions will unavoidably produce an error that is mathematically speaking not
small {A.52}. To get a vanishingly small error, you would need a large number
of different Slater determinants, not just one. Still, the results you get with the
basic Hartree-Fock approach may be good enough to satisfy your needs. Or you
may be able to improve upon them enough with “post-Hartree-Fock methods.”
But none of that would be likely if you just selected the single-electron func-
tions ψ1e l1 , ψ2e l2 , . . . at random. The cleverness in the Hartree-Fock approach
will be in writing down equations for these single-electron functions that produce
the best approximation possible with a single Slater determinant.
This section will reserve the term “orbitals” specifically for the single-elec-
tron functions that provide the best single-determinant approximation. In those
terms, if the Hartree-Fock orbitals provide the best single-determinant approxi-
mation, their results will certainly be better than the solutions that were written
6.3. THE HARTREE-FOCK APPROXIMATION 241

down for the atoms in chapter 4.9, because those were really single Slater de-
terminants. In fact, you could find much more accurate ways to average out
the effects of the neighboring electrons than just putting them in the nucleus
like the section on atoms essentially did. You could smear them out over some
optimal area, say. But the solution you will get doing so will be no better than
you could get using Hartree-Fock.
That assumes of course that the spins are taken the same way. Consider that
problem for a second. Typically, a nonrelativistic approach is used, in which
spin effects on the energy are ignored. Spin then really only directly affects the
antisymmetrization requirements.
Things are straightforward if you try to solve, say, a helium atom. In the
exact ground state, the two electrons are in the spatial wave function that has
the absolutely lowest energy, regardless of any antisymmetrization concerns.
This spatial wave function is symmetric under electron exchange since the two
electrons are identical. The antisymmetrization requirement is met since the
electrons assume the singlet configuration,

↑(Sz1 )↓(Sz2 ) − ↓(Sz1 )↑(Sz2 )


√ ,
2

for their combined spins.


The approximate Hartree-Fock wave function for helium you would corre-
spondingly take to be
1 ¯ E
√ ¯¯det(ψ1e ↑, ψ2e ↓)
2
and then you would make things easier for yourself by postulating a priori that
the spatial orbitals are the same, ψ1e = ψ2e . Lo and behold, when you multiply
out the Slater determinant,
¯ ¯
1 ¯ ψ e (~r )↑(Sz1 ) ψ1e (~r1 )↓(Sz1 ) ¯ ↑(Sz1 )↓(Sz2 ) − ↓(Sz1 )↑(Sz2 )
√ ¯¯ 1e 1 e
¯
¯ = ψ1e (~r 1 )ψ1e (~r 2 ) √ ,
2 ¯ ψ1 (~r2 )↑(Sz2 ) ψ1 (~r2 )↓(Sz2 ) ¯ 2

it automagically reproduces the correct singlet spin state. And you only need
to find one spatial orbital instead of two.
As discussed in chapter 4.9, a beryllium atom has two electrons with op-
posite spins in the “1s” shell like helium, and two more in the “2s” shell.√An
appropriate Hartree-Fock wave function would be |det(ψ1e ↑, ψ1e ↓, ψ3e ↑, ψ3e ↓)i/ 4!,
in other words, two pairs of orbitals with the same spatial states and opposite
spins. Similarly, Neon has an additional 6 paired electrons in a closed “2p”
shell, and you could use 3 more pairs of orbitals with the same spatial states
and opposite spins. The number of spatial orbitals that must be found in such
solutions is only half the number of electrons. This is called the closed shell
242 CHAPTER 6. NUMERICAL PROCEDURES

“Restricted Hartree-Fock (RHF)” method. It restricts the form of the spatial


states to be pair-wise equal.
But now look at lithium. Lithium has two paired 1s electrons like helium,
and an unpaired 2s electron. For the third orbital in the Hartree-Fock determi-
nant, you will now have to make a choice whether to take it of the form ψ3e ↑ or
ψ3e ↓. Lets assume you take ψ3e ↑, so the wave function is
1 ¯ E
√ ¯¯det(ψ1e ↑, ψ2e ↓, ψ3e ↑)
3!
You have introduced a bias in the determinant: there is now a real difference
between ψ1e ↑ and ψ2e ↓: ψ1e ↑ has the same spin as the third spin orbital, and ψ2e ↓
opposite.
If you find the best approximation among all possible orbitals ψ1e ↑, ψ2e ↓,
and ψ3e ↑, you will end up with spatial orbitals ψ1e and ψ2e that are not the
same. Allowing for them to be different is called the “Unrestricted Hartree-
Fock (UHF)” method. In general, you no longer require that equivalent spatial
orbitals are the same in their spin-up and spin down versions. For a bigger
system, you will end up with one set of orthonormal spatial orbitals for the
spin-up orbitals and a different set of orthonormal spatial orbitals for the spin-
down ones. These two sets of orthonormal spatial orbitals are not mutually
orthogonal; the only reason the complete spin orbitals are still orthonormal is
because the two spins are orthogonal, h↑|↓i = 0.
If instead of using unrestricted Hartree-Fock, you insist on demanding that
the spatial orbitals for spin up and down do form a single set of orthonormal
functions, it is called “open shell” restricted Hartree-Fock. In the case of lithium,
you would then demand that ψ2e equals ψ1e . Since the best (in terms of energy)
solution has them different, your solution is then no longer the best possible.
You pay a price, but you now only need to find two spatial orbitals rather than
three. The spin orbital ψ3e ↑ without a matching opposite-spin orbital counts as
an open shell. For nitrogen, you might want to use three open shells to represent
the three different spatial states 2px , 2py , and 2pz with an unpaired electron in
it.
If you use unrestricted Hartree-Fock instead, you will need to compute more
spatial functions, and you pay another price, spin. Since all spin effects in the
Hamiltonian are ignored, it commutes with the spin operators. So, the exact
energy eigenfunctions are also, or can be taken to be also, spin eigenfunctions.
Restricted Hartree-Fock has the capability of producing approximate energy
eigenstates with well defined spin. Indeed, as you saw for helium, in restricted
Hartree-Fock all the paired spin-up and spin-down states combine into zero-spin
singlet states. If any additional unpaired states are all spin up, say, you get an
energy eigenstate with a net spin equal to the sum of the spins of the unpaired
states.
6.3. THE HARTREE-FOCK APPROXIMATION 243

But a true unrestricted Hartree-Fock solution does not have correct, defi-
nite, spin. For two electrons to produce states of definite combined spin, the
coefficients of spin up and spin down must come in specific ratios. As a sim-
ple example, an unrestricted Slater determinant of ψ1e ↑ and ψ2e ↓ with unequal
spatial orbitals multiplies out to

1 ¯ E ψ e (~r )ψ e (~r )↑(Sz1 )↓(Sz2 ) − ψ2e (~r1 )ψ1e (~r2 )↓(Sz1 )↑(Sz2 )
√ ¯¯det(ψ1e ↑, ψ2e ↓) = 1 1 2 2 √
2 2
or, writing the spin combinations in terms of singlets and triplets,

ψ1e (~r1 )ψ2e (~r2 ) + ψ2e (~r1 )ψ1e (~r2 ) ↑(Sz1 )↓(Sz2 ) − ↓(Sz1 )↑(Sz2 )
√ +
2 2
ψ1e (~r1 )ψ2e (~r2 ) − ψ2e (~r1 )ψ1e (~r2 ) ↑(Sz1 )↓(Sz2 ) + ↓(Sz1 )↑(Sz2 )

2 2

So, the spin will be some combination of zero spin (the singlet) and spin one
(the triplet), and the combination will be different at different locations of the
electrons to boot. However, it may be noted that unrestricted wave functions
are commonly used as first approximations of doublet and triplet states anyway
[18, p. 105].
To show that all this can make a real difference, take the example of the
hydrogen molecule, chapter 4.2, when the two nuclei are far apart. The correct
electronic ground state is

ψL (~r1 )ψR (~r2 ) + ψR (~r1 )ψL (~r2 ) ↑(Sz1 )↓(Sz2 ) − ↓(Sz1 )↑(Sz2 )
√ √
2 2

where ψL (~r1 )ψR (~r2 ) is the state in which electron 1 is around the left proton
and electron 2 around the right one, and ψR (~r1 )ψL (~r2 ) is the same state but
with the electrons reversed. Note that the spin state is a singlet one with zero
net spin.
Now try to approximate it with a√restricted closed shell Hartree-Fock wave
function of the form |det(ψ1e ↑, ψ1e ↓)i/ 2. The determinant multiplies out to

↑(Sz1 )↓(Sz2 ) − ↓(Sz1 )↑(Sz2 )


ψ1e (~r1 )ψ1e (~r2 ) √
2

Now ψ1e will be something like (ψL + ψR )/ 2; the energy of the electrons is
lowest when they are near the nuclei. But if ψ1e (~r1 ) is appreciable when electron
1 is near say the left nucleus, then ψ1e (~r2 ) is also appreciable when electron 2
is near the same nucleus, since it is the exact same function. So there is a
big chance of finding both electrons together near the same nucleus. That is
244 CHAPTER 6. NUMERICAL PROCEDURES

all wrong, since the electrons repel each other: if one electron is around the
left nucleus, the other should be around the right one. The computed energy,
which should be that of two neutral hydrogen atoms far apart, will be much too
high. Note however that you do get the correct spin state. Also, at the nuclear
separation distance corresponding to the ground state of the complete molecule,
the errors are much less, [18, p. 166]. Only when you are “breaking the bond”
(dissociating the molecule, i.e. taking the nuclei apart) do you get into major
trouble. √
If instead you would use unrestricted Hartree-Fock, |det(ψ1e ↑, ψ2e ↓)i/ 2, you
should find ψ1e = ψL and ψ2e = ψR (or vice versa), which would produce a wave
function
ψL (~r1 )ψR (~r2 )↑(Sz1 )↓(Sz2 ) − ψR (~r1 )ψL (~r2 )↓(Sz1 )↑(Sz2 )
√ .
2
This would produce the correct energy, though the spin would now be all wrong.
Little in life is ideal, is it?
All of the above may be much more than you ever wanted to hear about the
wave function. The purpose was mainly to indicate that things are not as simple
as you might initially suppose. As the examples showed, some understanding of
the system that you are trying to model definitely helps. Or experiment with
different approaches.
Let’s go on to the next step: how to get the equations for the spatial orbitals
ψ1 , ψ2e , . . . that give the most accurate approximation of a multi-electron prob-
e

lem. The expectation value of energy will be needed for that, and to get that,
first the Hamiltonian is needed. That will be the subject of the next subsection.

6.3.2 The Hamiltonian


The non-relativistic Hamiltonian of the system of I electrons consists of a num-
ber of contributions. First there is the kinetic energy of the electrons; the sum
of the kinetic energy operators of the individual electrons:
à !
I
X h̄2 I
X h̄2 ∂2 ∂2 ∂2
Tb E = − ∇2i =− + + . (6.16)
i=1 2me i=1 2me ∂x2i ∂yi2 ∂zi2
Next there is the potential energy due to the ambient electric field that the
electrons move in. It will be assumed that this field is caused by J nuclei,
numbered using an index j, and having charge Zj e (i.e. there are Zj protons
in nucleus number j). In that case, the total potential energy due to nucleus-
electron attractions is, summing over all electrons and over all nuclei:
J
I X
X Zj e2 1
V NE = − (6.17)
i=1 j=1 4πǫ0 rij
6.3. THE HARTREE-FOCK APPROXIMATION 245

where rij ≡ |~ri − ~rjn | is the distance between electron number i and nucleus
number j, and ǫ0 = 8.85 10−12 C2 /J m is the permittivity of space.
And now for the black plague of quantum mechanics, the electron to electron
repulsions. The potential energy for those repulsions is
I
I X
X e2 1
V EE = 1
2
(6.18)
i=1 i=1 4πǫ0 rii
i6=i

where rii ≡ |~ri − ~ri | is the distance between electron number i and electron
number i. Half of this repulsion energy will be blamed on electron i and half
on electron i, accounting for the factor 12 .
Without this interaction between different electrons, you could solve for each
electron separately, and all would be nice. But you do have it, and so you really
need to solve for all electrons at once, usually an impossible task. You may
recall that when chapter 4.9 examined the atoms heavier than hydrogen, those
with more than one electron, the discussion cleverly threw out the electron to
electron repulsion terms, by assuming that the effect of each neighboring electron
is approximately like canceling out one proton in the nucleus. And you may also
remember how this outrageous assumption led to all those wrong predictions
that had to be corrected by various excuses. The Hartree-Fock approximation
tries to do better than that.
It is helpful to split the Hamiltonian into the single electron terms and the
troublesome interactions, as follows,
I
X I X
X I
H= hei + 1
2
viiee (6.19)
i=1 i=1 i=1
i6=i

where hei is the single-electron Hamiltonian of electron i,

h̄2 2 X J
Zj e2 1
hei = − ∇i + (6.20)
2me j=1 4πǫ0 rij

and viiee is the electron i to electron i repulsion potential energy

e2 1
viiee = . (6.21)
4πǫ0 rii

Note that he1 , he2 , . . . , heI all take the same general form; the difference is just
in which electron you are talking about. That is not surprising because the
ee
electrons all have the same properties. Similarly, the difference between v12 ,
ee ee ee
v13 , . . . , vI(I−2) , vI(I−1) is just in which pair of electrons you talk about.
246 CHAPTER 6. NUMERICAL PROCEDURES

6.3.3 The expectation value of energy


As was discussed in more detail in section 6.1, to find the best possible Hartree-
Fock approximation, the expectation value of energy will be needed. For exam-
ple, the best approximation to the ground state is the one that has the smallest
expectation value of energy.
The expectation value of energy is defined as hEi = hΨ|HΨi. There is a
problem with using this expression as stands, though. Look once again at the
arsenic atom example. There are 33 electrons in it, so you could try to choose
33 promising single-electron functions to describe it. You could then try to
multiply out the Slater determinant for Ψ, integrate the inner products of the
individual terms on a computer, and add it all together. However, the inner
product of each pair of terms involves an integration over 99 scalar coordinates.
Taking 10 locations along each axis to perform the integration, you would need
to compute 1099 values for each pair of terms. And there are 33! terms in the
Slater determinant, or (33!)2 = 7.5 1073 pairs of terms. . . A computer that could
do that is unimaginable.
Fortunately, it turns out that almost all of those integrations are trivial since
the single-electron functions are orthonormal. If you sit down and identify what
is really left, you find that only a few three-dimensional and six-dimensional
inner products survive the weeding-out process.
In particular, the single-electron Hamiltonians produce only single-electron
energy expectation values of the general form
Ene ≡ hψne (~r)|he |ψne (~r)i (6.22)
You might think there should be an index i on ~ri and hei to indicate which
electron it is. But remember that an inner product is really an integral; this
one is Z
ψne (~ri )∗ hei ψne (~ri ) d3~ri ,
all ~r i
and that the name of the integration variable ~ri does not make any difference:
you get the exact same value for electron 1 as for electron I or any other. So
the value of i does not make a difference, and it will just be left away.
If there was just one electron and it was in single-electron state ψne ln , Ene
would be its expectation value of energy. Actually, of course, there are I elec-
trons, each partly present in state ψne ln because of the way the Slater determi-
nant writes out, and each electron turns out to contribute an equal share Ene /I
to the total energy Ene associated with single-electron state ψne ln .
The pair-repulsion Hamiltonians produce six-dimensional inner products
that come in two types. The inner products of the first type will be indicated
by Jnn , and they are
Jnn ≡ hψne (~r)ψne (~r)|v ee |ψne (~r)ψne (~r)i (6.23)
6.3. THE HARTREE-FOCK APPROXIMATION 247

Again, what electrons ~r and ~r refer to is of no consequence. But written out


as an integral for a specific set of electrons, using the expression for the pair
energy viiee of the previous section, you get
Z Z
e2 1 3
|ψne (~ri )|2 |ψne (~ri )|2 d ~ri d3~ri
all ~r i all ~r i 4πǫ0 rii
If all you had was one electron i in single-electron state ψne ln and a second
electron i in single-electron state ψne ln , this would be the expectation potential
energy of their interaction. It would be the probability of electron i being near
~ri and electron i being near ~ri times the Coulomb potential energy at those
positions. For that reason these integrals are called “Coulomb integrals.”
The second type of integrals will be indicated by Knn , and they are

Knn ≡ hψne (~r)ψne (~r)|v ee |ψne (~r)ψne (~r)i (6.24)

These integrals are called the “exchange integrals.” Now don’t start thinking
that they are there because the wave function must be antisymmetric under
electron exchange. They, and others, would show up in any reasonably general
wave function. You can think of them instead as Coulomb integrals with the
electrons in the right hand side of the inner product exchanged.
The exchange integrals are a reflection of nature doing business in terms of
an unobservable wave function, rather than the observable probabilities that
appear in the Coulomb integrals. They are the equivalent of the twilight terms
that have appeared before in two-state systems. Written out as integrals, you
get
Z Z
e2 1 e
ψne (~ri )∗ ψne (~ri )∗ ψn (~ri )ψne (~ri ) d3~ri d3~ri .
all ~r i all ~r i 4πǫ0 rii
Going back to the original question of the expectation energy of the complete
system of I electrons, it turns out that it can be written in terms of the various
inner products above as
I
X X I
I X I X
X I
hEi = Ene + 1
2
Jnn − 1
2
hln |ln i2 Knn (6.25)
n=1 n=1 n=1 n=1 n=1

The spin inner product hln |ln i is one if the orbitals have the same spin, and
zero if they have opposite spin, so the square is somewhat superfluous. Consider
it a reminder that if you want, you can shove the spins into Knn to make it a
spin, rather than spatial, orbital inner product.
If you want to see where all this comes from, the derivations are in note
{A.53}. There are also some a priori things you can say about the Coulomb
and exchange integrals, {A.54}; they are real, and additionally
Jnn = Knn Jnn ≥ Knn ≥ 0 Jnn = Jnn Knn = Knn (6.26)
248 CHAPTER 6. NUMERICAL PROCEDURES

So in terms of linear algebra, they are real symmetric matrices with nonnegative
coefficients and the same main diagonal.
The analysis can easily be extended to generalized orbitals that take the
form
ψne (~r, Sz ) = ψn+
e e
(~r)↑(Sz ) + ψn− (~r)↓(Sz ).
However, the normal unrestricted spin-up or spin-down orbitals, in which either
e e
ψn+ or ψn− is zero, already satisfy the variational requirement δhEi = 0 even if
generalized variations in the orbitals are allowed, {A.55}.
In any case, the expectation value of energy has been found.

6.3.4 The canonical Hartree-Fock equations


The previous section found the expectation value of energy for any electron wave
function described by a single Slater determinant. The final step is to find the
orbitals that produce the best approximation of the true wave function using
such a single determinant. For the ground state, the best single determinant
would be the one with the lowest expectation value of energy. But surely you
would not want to guess spatial orbitals at random until you find some with
really, really, low energy.
What you would like to have is specific equations for the best spatial orbitals
that you can then solve in a methodical way. And you can have them using the
methods of section 6.1, {A.56}. In unrestricted Hartree-Fock, for every spatial
orbital ψne (~r) there is an equation of the form:

I D
X ¯ ¯ E
he ψne (~r) + ψne (~r)¯¯v ee ¯¯ψne (~r) ψne (~r)
n=1
I
X D ¯ ¯ E
− hln |ln i2 ψne (~r)¯¯v ee ¯¯ψne (~r) ψne (~r) = ǫn ψne (~r) (6.27)
n=1

These are called the “canonical Hartree-Fock equations.” For equations valid
for the restricted closed-shell and single-determinant open-shell approximations,
see the derivation in {A.56}.
Recall that he is the single-electron Hamiltonian consisting of its kinetic
energy and its potential energy due to nuclear attractions, and that v ee is the
potential energy of repulsion between two electrons at given locations:

h̄2 2 X J
Zj e2 1 e2 1
he = − ∇ − rj ≡ |~r − ~rjn | v ee = r ≡ |~r − ~r|
2me j=1 4πǫ0 rj 4πǫ0 r

So, if there were no electron-electron repulsions, i.e. v ee = 0, the canonical equa-


tions above would turn into single-electron Hamiltonian eigenvalue problems of
6.3. THE HARTREE-FOCK APPROXIMATION 249

the form he ψne = ǫn ψne where ǫn would be the energy of the single-electron or-
bital. This is really what happened in the approximate analysis of atoms in
chapter 4.9: the electron to electron repulsions were ignored there in favor of
nuclear strength reductions, and the result was single-electron hydrogen-atom
orbitals.
In the presence of electron to electron repulsions, the equations for the or-
bitals can still symbolically be written as if they were single-electron eigenvalue
problems,
Fψne (~r)ln (Sz ) = ǫn ψne (~r)ln (Sz )
where F is called the “Fock operator,” and is written out further as:
F = he + v HF .
The first term in the Fock operator is the single-electron Hamiltonian. The
mischief is in the innocuous-looking second term v HF . Supposedly, this is the
potential energy related to the repulsion by the other electrons. What is it?
Well, it will have to be the terms in the canonical equations (6.27) not described
by the single-electron Hamiltonian he :
I
X
v HF ψ e (~r)l(Sz ) = hψne (~r)|v ee |ψne (~r)iψ e (~r)l(Sz )
n=1

I
X
− hψne (~r)ln (Sz1 )|v ee |ψ e (~r)l(Sz1 )iψne (~r)ln (Sz )
n=1

To recover the canonical equations (6.27) from the Fock form, take an inner
product with the spin l(Sz ). The definition of the Fock operator is unavoidably
in terms of spin rather than spatial single-electron functions: the spin of the
state on which it operates must be known to evaluate the final term.
Note that the above expression did not give an expression for v HF by itself,
but only for v HF applied to an arbitrary single-electron function ψ e l. The reason
is that v HF is not a normal potential at all: the second term, the one due to the
exchange integrals, does not multiply ψ e l by a potential function, it shoves it
into an inner product! The Hartree-Fock “potential” v HF is an operator, not a
normal potential energy. Given a single-electron function, it produces another
function.
Actually, even that is not quite true. The Hartree-Fock “potential” is only
an operator after you have found the orbitals ψ1e l1 , ψ2e l2 , . . . , ψne ln , . . . , ψIe lI ap-
pearing in it. While you are still trying to find them, the Fock “operator” is
not even an operator, it is just a “thing.” However, given the orbitals, at least
the Fock operator is a Hermitian one, one that can be taken to the other side
if it appears in an inner product, and that has real eigenvalues and a complete
set of eigenfunctions, {A.57}.
250 CHAPTER 6. NUMERICAL PROCEDURES

So how do you solve the canonical Hartree-Fock equations for the orbitals ψne ?
If the Hartree-Fock potential v HF was a known operator, you would have only
linear, single-electron eigenvalue problems to solve. That would be relatively
easy, as far as those things come. But since the operator v HF contains the
unknown orbitals, you do not have a linear problem at all; it is a system of
coupled cubic equations in infinitely many unknowns. The usual way to solve
it is iteratively: you guess an approximate form of the orbitals and plug it
into the Hartree-Fock potential. With this guessed potential, the orbitals may
then be found from solving linear eigenvalue problems. If all goes well, the
obtained orbitals, though not perfect, will at least be better than the ones
that you guessed at random. So plug those improved orbitals into the Hartree-
Fock potential and solve the eigenvalue problems again. Still better orbitals
should result. Keep going until you get the correct solution to within acceptable
accuracy.

You will know when you have got the correct solution since the Hartree-Fock
potential will no longer change; the potential that you used to compute the final
set of orbitals is really the potential that those final orbitals produce. In other
words, the final Hartree-Fock potential that you compute is consistent with the
final orbitals. Since the potential would be a field if it was not an operator, that
explains why such an iterative method to compute the Hartree-Fock solution is
called a “self-consistent field method.” It is like calling an iterative scheme for
the Laplace equation on a mesh a “self-consistent neighbors method,” instead
of “point relaxation.” Surely the equivalent for Hartree-Fock, like “iterated
potential” or “potential relaxation” would have been much clearer to a general
audience?

6.3.5 Additional points

This brief section was not by any means a tutorial of the Hartree-Fock method.
The purpose was only to explain the basic ideas in terms of the notations and
coverage of this book. If you actually want to apply the method, you will need
to take up a book written by experts who know what they are talking about.
The book by Szabo and Ostlund [18] was the main reference for this section,
and is recommended as a well written introduction. Below are some additional
concepts you may want to be aware of.
6.3. THE HARTREE-FOCK APPROXIMATION 251

Meaning of the orbital energies


In the single electron case, the “orbital energy” ǫn in the canonical Hartree-Fock
equation
I D
X ¯ ¯ E
h e
ψne (~r) + ψne (~r)¯¯v ee ¯¯ψne (~r) ψne (~r)
n=1
I
X D ¯ ¯ E
¯ ¯
− hln |ln i2 ψne (~r)¯v ee ¯ψne (~r) ψne (~r) = ǫn ψne (~r)
n=1

represents the actual energy of the electron. It also represents the ionization
energy, the energy required to take the electron away from the nuclei and leave
it far away at rest. This subsubsection will show that in the multiple electron
case, the “orbital energies” ǫn are not orbital energies in the sense of giving the
contributions of the orbitals to the total expectation energy. However, they can
still be taken to be approximate ionization energies. This result is known as
“Koopman’s theorem.”
To verify the theorem, a suitable equation for ǫn is needed. It can be found
by taking an inner product of the canonical equation above with ψne (~r), i.e. by
putting ψne (~r)∗ to the left of both sides and integrating over ~r. That produces
I
X I
X
ǫn = Ene + Jnn − hln |ln i2 Knn (6.28)
n=1 n=1

which consists of the single-electron energy Ene , Coulomb integrals Jnn and ex-
change integrals Knn as defined in subsection 6.3.3. It can already be seen that
if all the ǫn are summed together, it does not produce the total expectation
energy (6.25), because that one includes a factor 21 in front of the Coulomb
and exchange integrals. So, ǫn cannot be seen as the part of the system energy
associated with orbital ψne ln in any meaningful sense.
However, ǫn can still be viewed as an approximate ionization energy. Assume
that the electron is removed from orbital ψne ln , leaving the electron at infinite
distance at rest. No, scratch that; all electrons share orbital ψne ln , not just one.
Assume that one electron is removed from the system and that the remaining
I − 1 electrons stay out of the orbital ψne ln . Then, if it is assumed that the
other orbitals do not change, the new system’s Slater determinant is the same
as the original system’s, except that column n and a row have been removed.
The expectation energy of the new state then equals the original expectation
energy, except that Ene and the n-th column plus the n-th row of the Coulomb
and exchange integral matrices have been removed. The energy removed is then
exactly ǫn above. (While ǫn only involves the n-th row of the matrices, not the
n-th column, it does not have the factor 12 in front of them like the expectation
252 CHAPTER 6. NUMERICAL PROCEDURES

energy does. And rows equal columns in the matrices, so half the row in ǫn
counts as the half column in the expectation energy and the other half as the
half row. This counts the element n = n twice, but that is zero anyway since
Jnn = Knn .)
So by the removal of the electron “from” (read: and) orbital ψne ln , an amount
of energy ǫn has been removed from the expectation energy. Better put, a
positive amount of energy −ǫn has been added to the expectation energy. So the
ionization energy is −ǫn if the electron is removed from orbital ψne ln according
to this story.
Of course, the assumption that the other orbitals do not change after the
removal of one electron and orbital is dubious. If you were a lithium electron in
the expansive 2s state, and someone removed one of the two inner 1s electrons,
would you not want to snuggle up a lot more closely to the now much less
shielded three-proton nucleus? On the other hand, in the more likely case that
someone removed the 2s electron, it would probably not seem like that much of
an event to the remaining two 1s electrons near the nucleus, and the assumption
that the orbitals do not change would appear more reasonable. And normally,
when you say ionization energy, you are talking about removing the electron
from the highest energy state.
But still, you should really recompute the remaining two orbitals from the
canonical Hartree-Fock equations for a two-electron system to get the best,
lowest, energy for the new I − 1 electron ground state. The energy you get by
not doing so and just sticking with the original orbitals will be too high. Which
means that all else being the same, the ionization energy will be too high too.
However, there is another error of importance here, the error in the Hartree-
Fock approximation itself. If the original and final system would have the same
Hartree-Fock error, then it would not make a difference and ǫn would overes-
timate the ionization energy as described above. But Szabo and Ostlund [18,
p. 128] note that Hartree-Fock tends to overestimate the energy for the original
larger system more than for the final smaller one. The difference in Hartree-
Fock error tends to compensate for the error you make by not recomputing
the final orbitals, and in general the orbital energies provide reasonable first
approximations to the experimental ionization energies.
The opposite of ionization energy is “electron affinity,” the energy with which
the atom or molecule will bind an additional free electron [in its valence shell],
{A.60}. It is not to be confused with electronegativity, which has to do with
willingness to take on electrons in chemical bonds, rather than free electrons.
To compute the electron affinity of an atom or molecule with I electrons us-
ing the Hartree-Fock method, you can either recompute the I + 1 orbitals with
the additional electron from scratch, or much easier, just use the Fock operator
e
of the I electrons to compute one more orbital ψI+1 lI+1 . In the later case how-
ever, the energy of the final system will again be higher than Hartree-Fock, and
6.3. THE HARTREE-FOCK APPROXIMATION 253

it being the larger system, the Hartree-Fock energy will be too high compared
to the I-electron system already. So now the errors add up, instead of subtract
as in the ionization case. If the final energy is too high, then the computed
binding energy will be too low, so you would expect ǫI+1 to underestimate the
electron affinity relatively badly. That is especially so since affinities tend to
be relatively small compared to ionization energies. Indeed Szabo and Ostlund
[18, p. 128] note that while many neutral molecules will take up and bind a free
electron, producing a stable negative ion, the orbital energies almost always
predict negative binding energy, hence no stable ion.

Asymptotic behavior
The exchange terms in the Hartree-Fock potential are not really a potential,
but an operator. It turns out that this makes a major difference in how the
probability of finding an electron decays with distance from the system.
Consider again the Fock eigenvalue problem, but with the single-electron
Hamiltonian identified in terms of kinetic energy and nuclear attraction,

h̄2 2 e XI D ¯ ¯ E
− ∇ ψn (~r) + v Ne ψne (~r) + ψne ¯¯v ee ¯¯ψne ψne (~r)
2me n=1
I
X D ¯ ¯ E
− hln |ln i2 ψne ¯¯v ee ¯¯ψne ψne (~r) = ǫn ψne (~r)
n=1

Now consider the question which of these terms dominate at large distance from
the system and therefor determine the large-distance behavior of the solution.
The first term that can be thrown out is v Ne , the Coulomb potential due to
the nuclei; this potential decays to zero approximately inversely proportional to
the distance from the system. (At large distance from the system, the distances
between the nuclei can be ignored, and the potential is then approximately the
one of a single point charge with the combined nuclear strengths.) Since ǫn in
the right hand side does not decay to zero, the nuclear term cannot survive
compared to it.
Similarly the third term, the Coulomb part of the Hartree-Fock potential,
cannot survive since it too is a Coulomb potential, just with a charge distribution
given by the orbitals in the inner product.
However, the final term in the left hand side, the exchange part of the
Hartree-Fock potential, is more tricky, because the various parts of this sum
have other orbitals outside of the inner product. This term can still be ignored
for the slowest-decaying spin-up and spin-down states, because for them none
of the other orbitals is any larger, and the multiplying inner product still decays
like a Coulomb potential (faster, actually). Under these conditions the kinetic
254 CHAPTER 6. NUMERICAL PROCEDURES

energy will have to match the right hand side, implying



slowest decaying orbitals: ψne (~r) ∼ exp(− −2me ǫn r/h̄ + . . .)

From this expression, it can also be seen that the ǫn values must be negative,
or else the slowest decaying orbitals would not have the exponential decay with
distance of a bound state.
The other orbitals, however, cannot be less than the slowest decaying one
of the same spin by more than algebraic factors: the slowest decaying orbital
with the same spin appears in the exchange term sum and will have to be
matched. So, with the exchange terms included, all orbitals normally decay
slowly, raising the chances of finding electrons at significant distances. The
decay can be written as
q
ψne (~r) ∼ exp(− 2me |ǫm |min, same spin, no ss r/h̄ + . . .) (6.29)

where ǫm is the ǫ value of smallest magnitude (absolute value) among all the
orbitals with the same spin.
However, in the case that ψne (~r) is spherically symmetric, (i.e. an s state),
exclude other s-states as possibilities for ǫm . The reason is a peculiarity of the
Coulomb potential that makes the inner product appearing in the exchange term
exponentially small at large distance for two orthogonal, spherically symmetric
states. (For the incurably curious, it is a result of Maxwell’s first equation
applied to a spherically symmetric configuration like figure 9.7, but with multiple
spherically distributed charges rather than one, and the net charge being zero.)

Hartree-Fock limit
The Hartree-Fock approximation greatly simplifies finding a many-dimensional
wave function. But really, solving the “eigenvalue problems” (6.27) for the
orbitals iteratively is not that easy either. Typically, what one does is to write
the orbitals ψne as sums of chosen single-electron functions f1 , f2 , . . .. You can
then pre-compute various integrals in terms of those functions. Of course, the
number of chosen single-electron functions will have to be a lot more than the
number of orbitals I; if you are only using I chosen functions, it really means
that you are choosing the orbitals ψne rather than computing them.
But you do not want to choose too many functions either, because the re-
quired numerical effort will go up. So there will be an error involved; you will
not get as close to the true best orbitals as you can. One thing this means is
that the actual error in the ground state energy will be even larger than true
Hartree-Fock would give. For that reason, the Hartree-Fock value of the ground
state energy is called the “Hartree-Fock limit:” it is how close you could come to
the correct energy if you were able to solve the Hartree-Fock equations exactly.
6.3. THE HARTREE-FOCK APPROXIMATION 255

Configuration interaction
According to the previous subsubsection, to compute the Hartree-Fock solution
accurately, you want to select a large number of single-electron functions to
represent the orbitals. But don’t start using zillions of them. The bottom line
is that the Hartree-Fock solution still has a finite error, because a wave function
cannot in general be described accurately using only a single Slater determinant.
So what is the point in computing the wrong numbers to ten digits accuracy?
You might think that the error in the Hartree-Fock approximation would
be called something like “Hartree-Fock error,” “single determinant error,” or
“representation error,“ since it is due to an incomplete representation of the true
wave function. However, the error is called “correlation energy” because there is
a energizing correlation between the more impenetrable and poorly defined your
jargon, and the more respect you will get for doing all that incomprehensible
stuff, {A.58}.
Anyway, in view of the fact that even an exact solution to the Hartree-Fock
problem has a finite error, trying to get it exactly right is futile. At some stage,
you would be much better off spending your efforts trying to reduce the inherent
error in the Hartree-Fock approximation itself by including more determinants.
As noted in section 4.7, if you include enough orthonormal basis functions,
using all their possible Slater determinants, you can approximate any function
to arbitrary accuracy.
After the I, (or I/2 in the restricted closed-shell case,) orbitals ψ1e l1 , ψ2e l2 , . . .
have been found, the Hartree-Fock operator becomes just a Hermitian operator,
e e
and can be used to compute further orthonormal orbitals ψI+1 lI+1 , ψI+2 lI+2 , . . ..
You can add these to the stew, say to get a better approximation to the true
ground state wave function of the system.
You might want to try to start small. If you compute just one more orbital
e
ψI+1 lI+1 , you can already form I more Slater determinants: you can replace any
e
of the I orbitals in the original determinant by the new function ψI+1 lI+1 . So
you can now approximate the true wave function by the more general expression
³
Ψ = a0 |det(ψ1e l1 , ψ2e l2 , ψ3e l3 , . . . , ψIe lI )i
e
+a1 |det(ψI+1 lI+1 , ψ2e l2 , ψ3e l3 , . . . , ψIe lI )i
+a2 |det(ψ1e l1 , ψI+1
e
lI+1 , ψ3e l3 , . . . , ψIe lI )i
+...
´
+aI |det(ψ1e l1 , ψ2e l2 , ψ3e l3 , . . . , ψI+1
e
lI+1 )i .

where the coefficients a1 , a2 , . . . are to be chosen to approximate the ground


state energy more closely and a0 is a normalization constant.
256 CHAPTER 6. NUMERICAL PROCEDURES

The additional I Slater determinants are called “excited determinants”. For


e
example, the first excited state |det(ψI+1 lI+1 , ψ2e l2 , ψ3e l3 , . . . , ψIe lI )i is like a
state where you excited an electron out of the lowest state ψ1e l1 into an elevated
e
energy state ψI+1 lI+1 . (However, note that if you really wanted to satisfy the
variational requirement δhEi = 0 for such a state, you would have to recompute
e
the orbitals from scratch, using ψI+1 lI+1 in the Fock operator instead of ψ1e l1 .
That is not what you want to do here; you do not want to create totally new
orbitals, just more of them.)
It may seem that this must be a winner: as much as I more determinants
to further minimize the energy. Unfortunately, now you pay the price for doing
such a great job with the single determinant. Since, hopefully, the Slater deter-
minant is the best single determinant that can be formed, any changes that are
equivalent to simply changing the determinant’s orbitals will do no good. And
it turns out that the I + 1-determinant wave function above is equivalent to the
single-determinant wave function

Ψ = a0 |det(ψ1e l1 + a1 ψI+1
e
lI+1 , ψ2e l2 + a2 ψI+1
e
lI+1 , . . . , ψIe lI + aI ψI+1
e
lI+1 )i

as you can check with some knowledge of the properties of determinants. Since
you already have the best single determinant, all your efforts are going to be
wasted if you try this.
You might try forming another set of I excited determinants by replacing
e
one of the orbitals in the original Hartree-Fock determinant by ψI+2 lI+2 instead
e
of ψI+1 lI+1 , but the fact is that the variational condition δhEi = 0 is still going
to be satisfied when the wave function is the original Hartree-Fock one. For
small changes in wave function, the additional determinants can still be pushed
inside the Hartree-Fock one. To ensure a decrease in energy, you want to include
determinants that allow a nonzero decrease in energy even for small changes from
the original determinant, and that requires “doubly” excited determinants, in
e
which two different original states are replaced by excited ones like ψI+1 lI+1
e
and ψI+2 lI+2 .
Note that you can form I(I − 1) such determinants; the number of deter-
minants rapidly explodes when you include more and more orbitals. And a
mathematically convergent process would require an asymptotically large set of
orbitals, compare chapter 4.7. How big is your computer?
Most people would probably call improving the wave function representa-
tion using multiple Slater determinants something like “multiple-determinant
representation,” or maybe “excited-determinant correction” or so. However,
it is called “configuration interaction,” because every non-expert will wonder
whether the physicist is talking about the configuration of the nuclei or the
electrons, (actually, it refers to the practitioner “configuring” all those determi-
nants, no kidding,) and what it is interacting with (with the person bringing
6.3. THE HARTREE-FOCK APPROXIMATION 257

in the coffee, of course. OK.) If you said that you were performing a “config-
uration interaction” while actually doing, say, some finite difference or finite
element computation, just because it requires you to specify a configuration of
mesh points, some people might doubt your sanity. But in physics, the stan-
dards are not so high.
Chapter 7

Solids

After the quantum mechanics of single atoms, and then of multiple atoms com-
bined into molecules, the next step up is to have a look at the quantum me-
chanics of the countless atoms that make up a macroscopic solid.
The discussion will remain restricted to solids that have a “crystal structure,”
where on atomic scales the atoms are packed together in a regular manner.
Some important materials, like glass and plastic, are amorphous, they do not
have such a regular crystal structure, and neither do liquids, so not all the ideas
will apply to them.

7.1 Molecular Solids [Descriptive]


The hydrogen molecule is the most basic example in quantum mechanics of how
atoms can combine into molecules in order to share electrons. So, the question
suggests itself whether, if hydrogen molecules are brought close together in a
solid, will the atoms start sharing their electrons not just with one other atom,
but with all surrounding atoms? The answer under normal conditions is no.
Metals do that, but hydrogen under normal conditions does not. Hydrogen
atoms are very happy when combined in pairs, and have no desire to reach
out to further atoms and weaken the strong bond they have already created.
Normally hydrogen is a gas, not a metal.
However, if you cool hydrogen way down to 20 K, it will eventually condense
into a liquid, and if you cool it down even further to 14 K, it will then freeze into
a solid. That solid still consists of hydrogen molecules, so it is called a molecular
solid. (Note that solidified noble gases, say frozen neon, are called molecular
solids too, even though they are made up of atoms rather than molecules.)
The forces that glue the hydrogen molecules together in the liquid and solid
phases are called Van der Waals forces, and more specifically, they are called
London forces. (Van der Waals forces are often understood to be all intermolec-

259
260 CHAPTER 7. SOLIDS

ular forces, not just London forces.) London forces are also the only forces that
can glue noble gas atoms together. These forces are weak.
It is exactly because these forces are so weak that hydrogen must be cooled
down so much to condense it into liquid and finally freeze it. At the time of this
writing, that is a significant issue in the “hydrogen economy.” Unless you go
to very unusual temperatures and pressures, hydrogen is a very thin gas, hence
extremely bulky.
Helium is even worse; it must be cooled down to 4 K to condense it into a
liquid, and under normal pressure it will not freeze into a solid at all. These
two, helium and hydrogen are the worst elements of them all, and the reason is
that their atoms are so small. Van der Waals forces increase with size.
To explain why the London forces occur is easy; there are in fact two expla-
nations that can be given. There is a simple, logical, and convincing explanation
that can easily be found on the web, and that is also completely wrong. And
there is a weird quantum explanation that is also correct, {A.59}.
If you are the audience that this book is primarily intended for, you may
already know the London forces under the guise of the Lennard-Jones potential.
London forces produce an attractive potential between atoms that is propor-
tional to 1/d6 where d is a scaled distance between the molecules. So the
Lennard-Jones potential is taken to be
³ ´
VLJ = C d−12 − d−6 (7.1)
where C is a constant. The second term represents the London forces.
The first term in the Lennard-Jones potential is there to model the fact
that when the atoms get close enough, they rapidly start repelling instead of
attracting each other. (See section 4.10 for more details.) The power 12 is
computationally convenient, since it makes the first term just the square of the
second one. However, theoretically it is not very justifiable. A theoretically
more reasonable repulsion would be one of the form C̄e−d/c /dn , with C̄, c, and
n suitable constants, since that reflects the fact that the strength of the electron
wave functions ramps up exponentially when you get closer to an atom. But
practically, the Lennard-Jones potential works very well; the details of the first
term make no big difference as long as the potential ramps up quickly.
Molecular solids may be held together by other Van der Waals forces besides
London forces. Some molecules have an electron distribution that is shifted
towards one side of the molecule or the other. It means that the average position
of the negative electron charges is different from that of the positive nuclei, and
such a molecule is said to have a “dipole strength.” The molecules can arrange
themselves so that the negative sides of the molecules are close to the positive
sides of other molecules and vice versa, producing attraction.
Chemguide [[2]] notes: “Surprisingly dipole-dipole attractions are fairly mi-
nor compared with dispersion [London] forces, and their effect can only really
7.2. IONIC SOLIDS [DESCRIPTIVE] 261

be seen if you compare two molecules with the same number of electrons and
the same size.” One reason is that thermal motion tends to kill off the dipole
attractions by messing up the alignment between molecules. But note that the
dipole forces act on top of the London ones, so everything else being the same,
the molecules with a dipole strength will be bound together more strongly.
When more than one molecular species is around, species with inherent
dipoles can induce dipoles in other molecules that normally do not have them.
Another way molecules can be kept together in a solid is by what are called
“hydrogen bonds.” In a sense, they too are dipole-dipole forces. In this case, the
molecular dipole is created when the electrons are pulled away from hydrogen
atoms. This leaves a partially uncovered nucleus, since an hydrogen atom does
not have any other electrons to shield it. Since it allows neighboring molecules
to get very close to a nucleus, hydrogen bonds can be strong. They remain a
lot weaker than a typical chemical bond, though.

Key Points
⋄ Even neutral molecules that do not want to create other bonds can be
glued together by various “Van der Waals forces.”
⋄ These forces are weak, though hydrogen bonds are much less so.
⋄ The London type Van Der Waals forces affects all molecules, even noble
gas atoms.
⋄ London forces can be modeled using the Lennard-Jones potential.
⋄ London forces are one of these weird quantum effects. Molecules with
inherent dipole strength feature a more classically understandable ver-
sion of such forces.

7.2 Ionic Solids [Descriptive]


A typical example of a ionic solid is ordinary salt, NaCl. There is little quanti-
tative quantum mechanics required to describe either the salt molecule or solid
salt. Still, there are some important qualitative points, so it seems useful to
include a discussion in this book. Both molecule and solid will be described in
this subsection, since the ideas are very similar.
To form a NaCl salt molecule, a clorine atom takes the loosely bound lone
3s electron away from a natrium (sodium) atom and puts it in its single still
vacant 3p position. That leaves a negative chlorine ion with filled K, L, and
M shells and a positive natrium ion with just filled K and L shells. Since the
combined electron distribution of filled shells is spherically symmetric, you can
reasonably think of the two ions as somewhat soft billiard balls. Since they have
262 CHAPTER 7. SOLIDS

opposite charge, they stick together into a salt molecule as sketched in figure
7.1. The natrium ion is a bit less than two Å in diameter, the clorine one a bit
less than four.

Na+ Cl−

Figure 7.1: Billiard-ball model of the salt molecule.

The energetics of this process is rather interesting. Assume that you start
out with a neutral natrium atom and a neutral chlorine atom that are far apart.
To take the lone 2s electron out of the natrium atom, and leave it at rest at a
position far from either the natrium or the chlorine atom, takes an amount of
energy called the “ionization energy” of natrium. Its value is 5.14 eV (electron
volts).
To take that free electron at rest and put it into the vacant 3p position of
the chlorine ion gives back an amount of energy called the “electron affinity” of
clorine. Its value is 3.62 eV.
(Electron affinity, the willingness to take on free electrons, is not to be con-
fused with “electronegativity,” the willingness to take on electrons in chemical
bonds. Unlike electronegativity, electron affinity varies wildly from element to
element in the periodic table. There is some system in it, still, especially within
single columns. It may also be noted that there seems to be some disagreement
about the definition of electronegativity, in particular for atoms or molecules
that cannot stably bind a free electron, {A.60}.)
Anyway, since it takes 5.14 eV to take the electron out of natrium, and you
get only 3.62 eV back by putting it into clorine, you may wonder how a salt
molecule could ever be stable. But the described picture is very misleading.
It does not really take 5.14 eV to take the electron out of natrium; most of
that energy is used to pull the liberated electron and positive ion far apart. In
the NaCl molecule, they are not pulled far apart; the positive natrium ion and
negative chlorine ion stick together as in figure 7.1.
In other words, to create the widely separated positive natrium ion and
negative chlorine ion took 5.14 − 3.62 eV, but watch the energy that is recov-
ered when the two ions are brought together to their correct 2.36 Å separation
distance in the molecule. It is approximately given by the Coulomb expression

e2 1
4πǫ0 d
7.2. IONIC SOLIDS [DESCRIPTIVE] 263

where ǫ0 = 8.85 10−12 C2 /J m is the permittivity of space and d is the 2.36


Å distance between the nuclei. Putting in the numbers, dropping an e to get
the result in eV, this energy is 6.1 eV. That gives the total binding energy as
−5.14 + 3.62 + 6.1, or 4.58 eV. That is not quite right, but it is close; the true
value is 4.26 eV.
There are a few reasons why it is slightly off, but one is that the Coulomb
expression above is only correct if the ions were billiard balls that would move
unimpeded towards each other until they hit. Actually, the atoms are somewhat
softer than billiard balls; their mutual repulsion force ramps up quickly, but not
instantaneously. That means that the repulsion force will do a small amount
of negative work during the final part of the approach of the ions. Also, the
uncertainty principle does not allow the localized ions to have exactly zero
kinetic energy. But as you see, these are small effects. It may also be noted
that the repulsion between the ions is mostly Pauli repulsion, as described in
section 4.10.

Figure 7.2: Billiard-ball model of a salt crystal.

Now the electrostatic force that keeps the two ions together in the molecule is
omni-directional. That means that if you bring a lot of salt molecules together,
the clorine ions will also attract the natrium ions of other molecules and vice
versa. As a result, under normal conditions, salt molecules pack together into
solid salt crystals, as shown in figure 7.2. The ions arrange themselves very
neatly into a pattern that allows each ion to be surrounded by as many attracting
ions of the opposite kind as possible. In fact, as figure 7.2 indicates, each ion
is surrounded by six ions of the opposite kind: four in the same vertical plane,
a fifth behind it, and a sixth in front of it. A more detailed description of the
crystal structure will be given next, but first consider what it means for the
energy.
264 CHAPTER 7. SOLIDS

Since when the molecules pack into a solid, each ion gets next to six ions
of the opposite type, the simplest guess would be that the 6.1 eV Coulomb
attraction of the ions in the molecule would increase by a factor 6 in the solid.
But that is a bad approximation: in the solid, each ion is not just surrounded by
six attracting ions of the opposite kind, but also by twelve repelling ions of the
same kind that are only slightly further away, then again eight attracting ions
still a bit further away, etcetera. The net effect is that the Coulomb attraction
is only 1.75 times higher in the solid than the lone molecules would have. The
factor 1.75 is called the “Madelung constant. So, all else being the same, by
forming a salt crystal the salt molecules would raise their Coulomb attraction
to 1.75 × 6.1 or 10.7 eV.
That is still not quite right, because in the solid, the ions are farther apart
than in the molecule. Recall that in the solid, each attracting ion is surrounded
by repelling ions of the opposite kind, reducing the attraction between pairs.
In the solid, opposite ions are 2.82 Å apart instead of 2.36, so the Coulomb
energy reduces to 10.7 × 2.36/2.82 or 8.93 eV. Still, the bottom line is that
the molecules pick up about 2.8 eV more Coulomb energy by packing together
into salt crystals, and that is quite a bit of energy. So it should not come as a
surprise that salt must be heated as high as 801◦ C to melt it, and as high as
1465◦ C to boil it.
Finally, consider the crystal structure that the molecules combine into. One
way of thinking of it is as a three-dimensional chess board structure. In figure
7.2, think of the frontal plane as a chess board of black and white cubes, with
a natrium nucleus in the center of each white cube and a clorine nucleus in the
center of each black one. The next plane of atoms can similarly be considered
to consists of black and white cubes, where the back cubes are behind the white
cubes of the frontal plane and vice-versa. And the same way for further planes.
However, this is not how a material scientist would think about the structure.
A material scientist likes to describe a crystal in terms copies of a simple unit,
called the “basis,” that are stacked together in a regular manner. One possible
choice for the basis in salt is a single natrium ion plus a single clorine ion to
the right of it, like the molecule of figure 7.1. In figure 7.3 the ions of the salt
crystal have been moved far apart to make the actual structure visible, and the
two atoms of the basis units have been joined by a blue line. Note that the
entire structure consists of these basis units.
But also note that the molecules lose their identity in a ionic solid. You
could just as well build up the crystal from vertical “molecules,” say, instead of
horizontal ones. In fact, there are six reasonable choices of basis, depending on
which of its six surrounding chlorine ions you want to associate each natrium
ion with. There are of course always countless unreasonable ones. . .
The regular way in which the bases are stacked together to form the complete
crystal structure is called the “lattice.” You can think of the volume of the salt
7.2. IONIC SOLIDS [DESCRIPTIVE] 265

Figure 7.3: The salt crystal disassembled to show its structure.

crystal as consisting of little cubes called “unit cells” indicated by the red frames
in figure 7.3. There are clorine atoms at the corners of the cubes as well as at
the center points of the faces of the cubes. That is the reason the salt lattice is
called the “face centered cubic” (fcc) lattice. Also note that if you shift the unit
cells half a cell to the left, it will be the natrium ions that are at the corners
and face centers of the cubes. In general, every point of a basis is arranged in
the crystal according to the same lattice.
You will agree that it sounds much more professional to say that you have
studied the face-centered cubic arrangement of the basis in a NaCl crystal than
to say that you have studied the three-dimensional chess board structure of salt.

Key Points
⋄ In a fully ionic bond like NaCl, one atom takes an electron away from
another.
⋄ The positive and negative ions stick together by electrostatic force,
creating a molecule.
⋄ Because of the same electrostatic force, molecules clump together into
strong ionic solids.
⋄ The crystal structure of NaCl consists of copies of a two-atom NaCL
basis arranged in a face-centered cubic lattice.
266 CHAPTER 7. SOLIDS

7.3 Introduction to Band Structure [Descrip-


tive]
Quantum mechanics is essential to describe the properties of solid materials, just
as it was for atoms and molecules. One well known example is superconductivity,
in which current flows without any resistance. The complete absence of any
resistance cannot be explained by classical physics, just like superfluidity cannot
for fluids.
But even normal electrical conduction simply cannot be explained without
quantum theory. Consider the fact that at ordinary temperatures, metals have
electrical resistivities of a few times 10−8 ohm-m (and up to a hundred thousand
times less at very low temperatures), while Wikipedia lists a resistance for Teflon
of up to 1024 ohm-m. (Teflon’s “one-minute” resistivity can be up to 1019 ohm-
m.) That is a difference in resistance between the best conductors and the best
insulators by over thirty orders of magnitude!
There is simply no way that classical physics could even begin to explain it.
As far as classical physics is concerned, all of these materials are quite similar
combinations of positive nuclei and negative electrons. Consider an ordinary
sewing needle. You would have as little trouble supporting its tiny 60 µg weight
as a metal has conducting electricity. But multiply it by 1030 , well, don’t worry
about supporting its weight. Worry about the entire earth coming up over your
ears and engulfing you, because the needle now has ten times the mass of the
earth. That is how widely different the electrical conductivities of solids are.

Ee Ee unoccupied states

occupied states

metal insulator

Figure 7.4: Sketch of electron energy spectra in solids.

Only quantum mechanics can explain why it is possible, by making the


electron energy levels discrete, and more importantly, by grouping them together
in “bands.” Sketches of electron energy spectra in solids are shown in figure
7.4. Since macroscopic solids have so many electrons, they have large numbers
of discrete energy states, much more than could ever be shown in a figure like
figure 7.4. But these energy levels cluster into bands with gaps in between
them. (As shown in section 7.4, and from another perspective in section 7.8,
the formation of bands arises from the interaction of the valence electrons with
7.3. INTRODUCTION TO BAND STRUCTURE [DESCRIPTIVE] 267

the crystal structure.)


For metals, this banding has no great consequences. For them, it requires
only a small amount of energy to excite electrons from the normally occupied
states to slightly higher energy levels in which they can achieve a net motion
of electric charge. Or of heat, incidentally; the electrons in metals are also very
good at conducting heat.
But insulators completely fill up a band, called the “valence band” and the
empty next higher energy band, called the “conduction band,” starts at an
energy that is much higher. This jump in energy is called the “band gap”. To
create a combination of slightly excited energy states that describe net electron
motion is no longer possible for an insulator, since there are no such states. The
electrons are stuck into doing nothing.
Of course, if the electrons are somehow given enough additional energy to
cross the band gap, conduction is again possible. In fact, then both the electrons
in the conduction band and the holes they leave behind in the valence band will
allow conduction to occur. A small number of electrons may get such energy
through random heat motion, especially if the band gap is relatively small. Also,
stray atoms of the wrong element may be present. Stray atoms with too few
valence electrons can create vacancies in the valence band. On the other hand,
stray atoms with too many valence electrons can put these electrons into the
conduction band. In either case, the strays will allow some conduction.
Such changes in electrical properties can also be done deliberately for various
purposes, such as in semi-conductor applications. Energy can be provided in
the form of light, heat, or voltage, stray atoms can deliberately be added by
“doping” the material with another one, and materials with holes in their valence
bands can physically be joined to materials with electrons in their conduction
band, to create various very interesting effects at the contact surface.

Key Points
⋄ Even excluding superconductivity, the electrical conductivities of solids
vary enormously.
⋄ Quantum mechanics allows only discrete energy levels for the electrons,
and these levels group together in bands with gaps in between them.
⋄ If in the ground state the electrons fill the spectrum right up to a gap
between bands, the electrons are stuck. It will require a large amount
of energy to activate them to conduct electricity or heat. Such a solid
is an insulator, or if the band gap is small enough, a semi-conductor.
⋄ For semi-conductors, conduction can occur because some electrons
from the valence band are thermally excited to the conduction band.
Impurities can also cause conduction.
268 CHAPTER 7. SOLIDS

⋄ Semi-conductor applications manipulate the electrons at the band gap.

7.4 Metals [Descriptive]


Metals are unique in the sense that there is no true molecular equivalent to the
way the atoms are bound together in metals. In a metal, the valence electrons
are shared on crystal scales, rather than between pairs of atoms. This and
subsequent sections will discuss what this really means in terms of quantum
mechanics.

7.4.1 Lithium
The simplest metal is lithium. Before examining solid lithium, first consider
once more the free lithium atom. Figure 7.5 gives a more realistic picture of the
atom than the simplistic analysis of chapter 4.9 did. The atom is really made
up of two tightly bound electrons in “|1si” states very close to the nucleus, plus
a loosely bound third “valence” electron in an expansive “|2si” state. The core,
consisting of the nucleus and the two closely bound 1s electrons, resembles an
helium atom that has picked up an additional proton in its nucleus. It will be
referred to as the “atom core.” As far as the 2s electron is concerned, this entire
atom core is not that much different from an hydrogen nucleus: it is compact
and has a net charge equivalent to one proton.

Figure 7.5: The lithium atom, scaled more correctly than in chapter 4.9

One obvious question is then why under normal circumstances lithium is


a solid metal and hydrogen is a thin gas. The quantitative difference is that
a single-charge core has a favorite distance at which it would like to hold its
electron, the Bohr radius. In the hydrogen atom, the electron is about at the
Bohr radius, and hydrogen holds onto it tightly. It is willing to share electrons
7.4. METALS [DESCRIPTIVE] 269

with one other hydrogen atom, but after that, it is satisfied. It is not looking
for any other hydrogen molecules to share electrons with; that would weaken
the bond it already has. On the other hand, the 2s electron in the lithium atom
is only loosely attached and readily given up or shared among multiple atoms.

Figure 7.6: Body-centered-cubic (bcc) structure of lithium.

Now consider solid lithium. The perfect lithium crystal would look as
sketched in figure 7.6. The atom cores arrange themselves in a regular, re-
peating, pattern called the “crystal structure.” As indicated in the figure by
the thick red lines, you can think of the total crystal volume as consisting of
many identical little cubes called “(unit) cells.”. There are atom cores at all
eight corners of these cubes and there is an additional core in the center of the
cubic cell. In solid mechanics, this arrangement of positions is referred to as the
“body-centered cubic” (bcc) lattice. The crystal “basis” for lithium is a single
lithium atom, (or atom core, really); if you put a single lithium atom at every
point of the bcc lattice, you get the complete lithium crystal.
Around the atom cores, the 2s electrons form a fairly homogeneous electron
density distribution. In fact, the atom cores get close enough together that a
typical 2s electron is no closer to the atom core to which it supposedly “belongs”
than to the surrounding atom cores. Under such conditions, the model of the
2s electrons being associated with any particular atom core is no longer really
270 CHAPTER 7. SOLIDS

meaningful. It is better to think of them as belonging to the solid as a whole,


moving freely through it like an electron “gas.”
Under normal conditions, bulk lithium is “poly-crystalline,” meaning that
it consists of many microscopically small crystals, or “grains,“ each with the
above BCC structure. The “grain boundaries“ where different crystals meet
are crucial to understand the mechanical properties of the material, but not so
much to understand its electrical or heat properties, and their effects will be
ignored. Only perfect crystals will be discussed.

Key Points
⋄ Lithium can meaningfully be thought of as an atom core, with a net
charge of one proton, and a 2s valence electron around it.
⋄ In the solid, the cores arrange themselves into a “body-centered cubic”
(bcc) lattice.
⋄ The 2s electrons form an “electron gas” around the cores.
⋄ Normally the solid, like other solids, does not have the same crystal
lattice throughout, but consists of microscopic grains, each crystaline,
(i.e. with its lattice oriented its own way).
⋄ The grain structure is critical for mechanical properties like strength
and plasticity. But that is another book.

7.4.2 One-dimensional crystals


Even the quantum mechanics of a perfect crystal like the lithium one described
above is not very simple. So it is a good idea to start with an even simpler
crystal. The easiest example would be a “crystal” consisting of only two atoms,
but two lithium atoms do not make a lithium crystal, they make a lithium
molecule.
Fortunately, there is a dirty trick to get a “crystal” with only two atoms:
assume that nature keeps repeating itself as indicated in figure 7.7. Mathe-
matically, this is called “using periodic boundary conditions.” It assumes that
after moving towards the left over a distance called the period, you are back at
the same point as you started, as if you are walking around in a circle and the
period is the circumference.
Of course, this is an outrageous assumption. If nature repeats itself at all,
and that is doubtful at the time of this writing, it would be on a cosmological
scale, not on the scale of two atoms. But the fact remains that if you make
the assumption that nature repeats, the two-atom model gives a much better
description of the mathematics of a true crystal than a two-atom molecule would.
And if you add more and more atoms, the point where nature repeats itself
7.4. METALS [DESCRIPTIVE] 271

-¾ period -¾ period -¾ period -¾

6
x
-
z

|2si(1) |2si(1) |2si(1) |2si(1) |2si(1) |2si(1)

Figure 7.7: Fully periodic wave function of a two-atom lithium “crystal.”

moves further and further away from the typical atom, making it less and less
of an issue for the local quantum mechanics.

Key Points
⋄ Periodic boundary conditions are very artificial.
⋄ Still, for crystal lattices, periodic boundary conditions often work very
well.
⋄ And nobody is going to put any real grain boundaries into any basic
model of solids anyway.

7.4.3 Wave functions of one-dimensional crystals


To describe the energy eigenstates of the electrons in one-dimensional crystals in
simple terms, a further assumption must be made: that the detailed interactions
between the electrons can be ignored, except for the exclusion principle. Trying
to correctly describe the complex interactions between the large numbers of
electrons found in a macroscopic solid is simply impossible. And it is not really
such a bad assumption as it may appear. In a metal, electron wave functions
overlap greatly, and when they do, electrons see other electrons in all directions,
and effects tend to cancel out. The equivalent in classical gravity is where you
go down far below the surface of the earth. You would expect that gravity would
become much more important now that you are surrounded by big amounts of
mass at all sides. But they tend to cancel each other out, and gravity is actually
reduced. Little gravity is left at the center of the earth. It is not recommended
as a vacation spot anyway due to excessive pressure and temperature.
In any case, it will be assumed that for any single electron, the net effect of
the atom cores and smeared-out surrounding 2s electrons produces a periodic
272 CHAPTER 7. SOLIDS

potential that near every core resembles that of an isolated core. In particular, if
the atoms are spaced far apart, the potential near each core is exactly the one of
a free lithium atom core. For an electron in this two atom “crystal,” the intuitive
eigenfunctions would then be where it is around either the first or the second
core in the 2s state, (or rather, taking the periodicity into account, around every
first or every second core in each period.) Alternatively, since these two states
are equivalent, quantum mechanics allows the electron to hedge its bets and to
be about each of the two cores at the same time with some probability.

-¾ period -¾ period -¾ period -¾

6
x
-
z

|2si(2) −|2si(2) |2si(2) −|2si(2) |2si(2) −|2si(2)

Figure 7.8: Flip-flop wave function of a two-atom lithium “crystal.”

But as soon as the atoms are close enough to start noticeably affecting
each other, only two true energy eigenfunctions remain, and they are ones in
which the electron is around both cores with equal probability. There is one
eigenfunction that is exactly the same around both of the atom cores. This
eigenfunction is sketched in figure 7.7; it is periodic from core to core, rather
than merely from pair of cores to pair of cores. The second eigenfunction is the
same from core to core except for a change of sign, call it a flip-flop eigenfunction.
It is shown in figure 7.8. Since the grey-scale electron probability distribution
only shows the magnitude of the wave function, it looks periodic from atom to
atom, but the actual wave function is only the same after moving along two
atoms.
To avoid the grey fading away, the shown wave functions have not been
normalized; the darkness level is as if the 2s electrons of both the atoms are in
that state.
As long as the atoms are far apart, the wave functions around each atom
closely resemble the isolated-atom |2si state. But when the atoms get closer
together, differences start to show up. Note for example that the flip-flop wave
function is exactly zero half way in between two cores, while the fully periodic
one is not. To indicate the deviations from the true free-atom |2si wave function,
parenthetical superscripts will be used.
7.4. METALS [DESCRIPTIVE] 273

-¾ period -¾

6
x
-
z d~

|2si(1) |2si(1) |2si(1) |2si(1) |2si(1) |2si(1)


−i|2si(3) |2si(3) i|2si(3) −|2si(3) −i|2si(3) |2si(3)
i|2si(4) |2si(4) −i|2si(4) −|2si(4) i|2si(4) |2si(4)
−|2si(2) |2si(2) −|2si(2) |2si(2) −|2si(2) |2si(2)

Figure 7.9: Wave functions of a four-atom lithium “crystal.” The actual picture
is that of the fully periodic mode.

A one-dimensional crystal made up from four atoms is shown in figure 7.9.


Now there are four energy eigenstates. The energy eigenstate that is the same
from atom to atom is still there, as is the flip-flop one. But there is now also an
energy eigenstate that changes by a factor i from atom to atom, and one that
changes by a factor −i. They change more slowly from atom to atom than the
flip-flop one: it takes two atom distances for them to change sign. Therefor it
takes a distance of four atoms, rather than two, for them to return to the same
values.

Key Points
⋄ The electron energy eigenfunctions in a metal like lithium extend over
the entire crystal.
⋄ If the cores are relatively far apart, near each core the energy eigen-
function of an electron still resembles the 2s state of the free lithium
atom.
⋄ However, the magnitude near each core is of course much less, since
the electron is spread out over the entire crystal.
⋄ Also, from core to core, the wave function changes by a factor of mag-
nitude one.
⋄ The extreme cases are the fully periodic wave function that changes
by a factor one (stays the same) from core to core, versus the flip-flop
mode that changes sign completely from one core to the next.
⋄ The other eigenfunctions change by an amount in between these two
extremes from core to core.
274 CHAPTER 7. SOLIDS

7.4.4 Analysis of the wave functions


There is a pattern to the wave functions of one-dimensional crystals as discussed
in the previous subsection. First of all, while the spatial energy eigenfunctions
of the crystal are different from those of the individual atoms, their number is
the same. Four free lithium atoms would each have one |2si spatial state to put
their one 2s electron in. Put them in a crystal, and there are still four spatial
states to put the four 2s electrons in. But the four spatial states in the crystal
are no longer single atom states; each now extends over the entire crystal. The
atoms share all the electrons. If there were eight atoms, the eight atoms would
share the eight 2s electrons in eight possible crystal-wide states. And so on.
To be very precise, a similar thing is true of the inner 1s electrons. But
since the |1si states remain well apart, the effects of sharing the electrons are
trivial, and describing the 1s electrons as belonging pair-wise to a single lithium
nucleus is fine. In fact, you may recall that the antisymmetrization requirement
of electrons requires every electron in the universe to be slightly present in every
occupied state around every atom. Obviously, you would not want to consider
that in the absence of a non-trivial need.
The reason that the energy eigenfunctions take the form shown in figure 7.9
is relatively simple. It follows from the fact that the Hamiltonian commutes
with the “shift operator” that shifts the entire wave function over one atom
spacing d.~ After all, because the potential energy is exactly the same after such
a shift, it does not make a difference whether you evaluate the energy before or
after you shift the wave function over.
Now commuting operators have a common set of eigenfunctions, so the en-
ergy eigenfunctions can be taken to be also eigenfunctions of the shift operator.
The shift eigenvalue must have magnitude one, since periodic wave functions
cannot change in overall magnitude when shifted. So the eigenvalue describing
the effect of an atom-spacing shift on an energy eigenfunction can be written
as ei2πν with ν a real number. (The factor 2π does nothing except rescale the
value of ν. Apparently, crystallographers do not even put it in. This book does
so that you do not feel short-changed because other books have factors 2π and
yours does not.)
This can be verified for the example energy eigenfunctions shown in figure
7.9. For the fully periodic eigenfunction ν = 0, making the shift eigenvalue
ei2πν equal to one. So this eigenfunction is multiplied by one under a shift by
one atom spacing d: it is the same after such a shift. For the flip-flop mode,
ν = 12 ; this mode changes by eiπ = −1 under a shift over an atom spacing d.
That means that it changes sign when shifted over an atom spacing d. For the
two intermediate eigenfunctions ν = ± 14 , so, using the Euler formula (1.5), they
7.4. METALS [DESCRIPTIVE] 275

change by factors e±iπ/2 = ±i for each shift over a distance d.


In general, for an J-atom periodic crystal, there will be J values of ν in the
range − 12 < ν ≤ 12 . In particular for an even number of atoms J:

j J J J J J
ν= for j = − + 1, − + 2, − + 3, . . . , − 1,
J 2 2 2 2 2
Note that for these values of ν, if you move over J atom spacings, ei2πνJ = 1
as it should; according to the imposed periodic boundary conditions, the wave
functions must be the same after J atoms. Also note that it suffices for j to be
restricted to the range −J/2 < j ≤ J/2, hence − 21 < ν ≤ 12 : if j is outside that
range, you can always add or subtract a whole multiple of J to bring it back in
that range. And changing j by a whole multiple of J does absolutely nothing
to the shift eigenvalue ei2πν since ei2πJ/J = ei2π = 1.

7.4.5 Floquet (Bloch) theory


Mathematically it is awkward to describe the energy eigenfunctions piecewise,
as figure 7.9 does. To arrive at a better way, it is helpful first to replace the
axial Cartesian coordinate z by a new “crystal coordinate” u defined by

z k̂ = ud~ (7.2)

where d~ is the vector shown in figure 7.9 that has the length of one atom spacing
d. Material scientists call this vector the “primitive translation vector” of the
crystal lattice. Primitive vector for short.
The advantage of the crystal coordinate u is that if it changes by one unit,
it changes the z-position by exactly one atom spacing. As noted in the previous
subsection, such a shift should multiply an energy eigenfunction by a factor
ei2πν . A continuous function that does that is the exponential ei2πνu . And that
means that if you factor out that exponential from the energy eigenfunction,
what is left does not change under the shift; it will be periodic on atom scale.
In other words, the energy eigenfunctions can be written in the form

ψ e = ei2πνu ψpe

where ψpe is a function that is periodic on the atom scale d; it is the same in
each successive interval d.
This result is part of what is called “Floquet theory:”

If the Hamiltonian is periodic of period d, the energy eigenfunctions


are not in general periodic of period d, but they do take the form of
exponentials times functions that are periodic of period d.
276 CHAPTER 7. SOLIDS

In physics, this result is known as “Bloch’s theorem,” and the Floquet-type wave
function solutions are called “Bloch functions” or “Bloch waves,” because Flo-
quet was just a mathematician, and the physicists’ hero is Bloch, the physicist
who succeeded in doing it too, half a century later. {A.61}.
The periodic part ψpe of the energy eigenfunctions is not the same as the
|2si(.) states of figure 7.9, because ei2πνu varies continuously with the crystal
position z = ud, unlike the factors shown in figure 7.9. However, since the
magnitude of ei2πνu is one, the magnitudes of ψpe and the |2si(.) states are the
same, and therefor, so are their grey scale electron probability pictures.
It is often more convenient to have the energy eigenfunctions in terms of the
Cartesian coordinate z instead of the crystal coordinate u, writing them in the
form
ψke = eikz ψp,k
e e
with ψp,k periodic on the atom scale d (7.3)

The constant k in the exponential is called the wave number, and subscripts
k have been added to ψ e and ψpe just to indicate that they will be different
for different values of this wave number. Since the exponential must still equal
ei2πνu , clearly the wave number k is proportional to ν. Indeed, substituting
z = ud into eikz , k can be traced back to be

2π 1 1
k = νD D= − 2
<ν≤ 2
(7.4)
d

7.4.6 Fourier analysis


As the previous subsection explained, the energy eigenfunctions in a crystal take
e
the form of a Floquet exponential times a periodic function ψp,k . This periodic
part is not normally an exponential. However, it is generally possible to write
it as an infinite sum of exponentials:

X
e
ψp,k = ckm eikm z km = mD for m an integer (7.5)
m=−∞

where the ckm are constants whose values will depend on x and y, as well as on
k and the integer m.
e
Writing the periodic function ψp,k as such a sum of exponentials is called
“Fourier analysis,” after another French mathematician. That it is possible
follows from the fact that these exponentials are the atom-scale-periodic eigen-
functions of the z-momentum operator pz = h̄∂/i∂z, as is easily verified by
straight substitution. Since the eigenfunctions of an Hermitian operator like pz
e
are complete, any atom-scale-periodic function, including ψp,k , can be written
as a sum of them. See also {A.5}.
7.4. METALS [DESCRIPTIVE] 277

7.4.7 The reciprocal lattice


As the previous two subsections discussed, the energy eigenfunctions in a one-
dimensional crystal take the form of a Floquet exponential eikz times a periodic
e
function ψp,k . That periodic function can be written as a sum of Fourier expo-
nentials eikm z . It is a good idea to depict all those k-values graphically, to keep
them apart. That is done in figure 7.10.

first
Brillouin
zone zone zone k zone zone
3 2 2 3
~
k−3 D k−2 k−1 k0 k1 k2 k3

Figure 7.10: Reciprocal lattice of a one-dimensional crystal.

The Fourier k values, km = mD with m an integer, form a lattice of points


spaced a distance D apart. This lattice is called the “reciprocal lattice.” The
spacing of the reciprocal lattice, D = 2π/d, is proportional to the reciprocal
of the atom spacing d in the physical lattice. Since on a macroscopic scale the
atom spacing d is very small, the spacing of the reciprocal lattice is very large.
The Floquet k value, k = νD with − 12 < ν ≤ 12 , is somewhere in the
grey range in figure 7.10. This range is called the first “Brillouin zone.’ It is
an interval, a unit cell if you want, of length D around the origin. The first
Brillouin zone is particularly important in the theory of solids. The fact that
the Floquet k value may be assumed to be in it is but one reason.
To be precise, the Floquet k value could in principle be in an interval of
length D around any wave number km , not just the origin, but if it is, you can
shift it to the first Brillouin zone by splitting off a factor eikm z from the Floquet
exponential eikz . The eikm z can be absorbed in a redefinition of the Fourier series
e
for the periodic part ψp,k of the wave function, and what is left of the Floquet k
value is in the first zone. Often it is good to do so, but not always. For example,
in the analysis of the free-electron gas done later, it is critical not to shift the
k value to the first zone because you want to keep the (there trivial) Fourier
series intact.
The first Brillouin zone are the points that are closest to the origin on the
k-axis, and similarly the second zone are the points that are second closest to
the origin. The points in the interval of length D/2 in between k−1 and the
first Brillouin zone make up half of the second Brillouin zone: they are closest
to k−1 , but second closest to the origin. Similarly, the other half of the second
Brillouin zone is given by the points in between k1 and the first Brillouin zone.
In one dimension, the boundaries of the Brillouin zone fragments are called the
278 CHAPTER 7. SOLIDS

“Bragg points.” They are either reciprocal lattice points or points half way in
between those.

7.4.8 The energy levels


Valence band. Conduction band. Band gap. Crystal. Lattice. Basis. Unit cell.
Primitive vector. Bloch wave. Fourier analysis. Reciprocal lattice. Brillouin
zones. These are the jargon of solid mechanics; now they have all been defined.
(Though certainly not fully discussed.) But jargon is not physics. The physically
interesting question is what are the energy levels of the energy eigenfunctions.

Ee flip-flop Ee Ee Ee
2s 2s 2s 2s
fully periodic
d d d 6 d
actual separation between the atoms

Figure 7.11: Schematic of energy bands.

For the two-atom crystal of figures 7.7 and 7.8, the answer is much like that
for the hydrogen molecular ion of chapter 3.5 and hydrogen molecule of chapter
4.2. In particular, when the atom cores are far apart, the |2si(.) states are the
same as the free lithium atom wave function |2si. In either the fully periodic
or the flip-flop mode, the electron is with 50% probability in that state around
each of the two cores. That means that at large spacing d between the cores,
the energy is the 2s free lithium atom energy, whether it is the fully periodic or
flip-flop mode. That is shown in the left graph of figure 7.11.
When the distance d between the atoms decreases so that the 2s wave func-
tions start to noticeably overlap, things change. As the same left graph in figure
7.11 shows, the energy of the flip-flop state increases, but that of the fully pe-
riodic state initially decreases. The reasons for the latter are similar to those
that gave the symmetric hydrogen molecular ion and hydrogen molecule states
lower energy. In particular, the electrons pick up more effective space to move
in, decreasing their uncertainty-principle demanded kinetic energy. Also, when
the electron clouds start to merge, the repulsion between electrons is reduced,
allowing the electrons to lose potential energy by getting closer to the nuclei of
the neighboring atoms. (Note however that the simple model used here would
not faithfully reproduce that since the repulsion between the electrons is not
correctly modeled.)
7.4. METALS [DESCRIPTIVE] 279

Next consider the case of a four-atom crystal, as shown in the second graph
of figure 7.11. The fully periodic and flip flop states are unchanged, and so
are their energies. But there are now two additional states. Unlike the fully
periodic state, these new states vary from atom, but less rapidly than the flip
flop mode. As you would then guess, their energy is somewhere in between that
of the fully periodic and flip-flop states. Since the two new states have equal
energy, it is shown as a double line in 7.11. The third graph in that figure shows
the energy levels of an 8 atom crystal, and the final graph that of a 24 atom
crystal. When the number of atoms increases, the energy levels become denser
and denser. By the time you reach a one hundredth of an inch, one-million atom
one-dimensional crystal, you can safely assume that the energy levels within the
band have a continuous, rather than discrete distribution.
Now recall that the Pauli exclusion principle allows up to two electrons in
a single spatial energy state. Since there are an equal number of spatial states
and electrons, that means that the electrons can pair up in the lowest half of the
states. The upper states will then be unoccupied. Further, the actual separation
distance between the atoms will be the one for which the total energy of the
crystal is smallest. The energy spectrum at this actual separation distance is
found inside the vanishingly thin vertical frame in the rightmost graph of figure
7.11. It shows that lithium forms a metal with a partially-filled band.

7.4.9 Electrical conduction


What exiting things happen to the energy levels of the previous subsection when
a voltage is applied and a current starts to run? That question, for once, is easy
to answer. Nothing happens to them. Nothing at all. Electron energies are in
terms electron volts. For an applied electric field to have any effect on them,
it would need to drop say one volt per atom. An electric cord would require
billions of volts to affect its electron energy levels, not the 100 or 200 found in
a household electrical outlet.
To understand why a current runs, recall the Floquet/Fourier representation
of the energy eigenfunctions. The Floquet and Fourier exponentials that the en-
ergy eigenfunctions consist of are eigenfunctions of the pbz momentum operator:
these energy eigenfunctions have in general z-momentum. (The exception is the
fully periodic and flip-flop modes, for which the expectation linear momentum is
zero, like it is for the eigenfunctions of isolated atoms. But for the other states,
the z-momentum is not zero when the spacing between the atoms is finite.)
Linear momentum means in classical terms velocity; these wave functions have
the capability of describing nontrivial motion in the z-direction.
Figure 7.12 gives a sketch of what the energy levels of a 48 atom one-
dimensional lithium crystal might look like at a given atom spacing if they
are horizontally shifted apart to indicate their amount of expectation linear
280 CHAPTER 7. SOLIDS

Ee Ee

hpz i hpz i

Figure 7.12: Energy versus linear momentum.

momentum. At the left, the occupied states are shown for the case that no
electric current is flowing, and at the right the occupied states when there is
current in the positive z-direction. Note that the energy states themselves do
not change; what changes is which ones are occupied. The distribution shifts
towards a positive average momentum hpz i. In classical terms, net z-momentum
means a net velocity of the electrons towards positive z.
As the figure shows, the net change in energy is that a few of the neg-
ative momentum states at the highest energy become vacated, and positive
momentum states of only slightly higher energy become occupied. So the en-
ergy requirements for this process are very small. Of course, the electrons pick
up this energy when moving towards lower potential. When they get scattered
by impurities or crystal vibrations, they lose their energy again, meaning that
there will be a slight but continued energy requirement to keep up the motion,
reflected in a nonzero electrical resistance.
There is a lot more involved if you want to describe actual conduction, but
the shift in occupation levels is the basic quantum mechanics of it. You will
probably find much of the rest of it described in terms of classical physics, since
that is so much easier.

7.4.10 Merging and splitting bands


The description of electrical conduction given in the previous subsections seems
to show that beryllium, (and similarly other metals of valence two,) is an insu-
lator! Two valence electrons per atom will completely fill up all 2s states, and
with all states filled, there would be no possibility for the average momentum
to shift away from zero. All states would be red in figure 7.12, so nothing could
change. What is missing here is consideration of the 2p atom states. When the
atoms are far enough apart not to affect each other, the 2p energy levels are a bit
higher than the 2s ones and not involved. However, as figure 7.13 shows, when
the atom spacing decreases to the actual one in a crystal, the widening bands
7.4. METALS [DESCRIPTIVE] 281

merge together. With this influx of 300% more states, valence-two metals have
plenty of free states to shift towards nonzero average momentum. Beryllium is
actually a better conductor than lithium.

Ee
2p

2s

6 d
actual separation between the atoms

Figure 7.13: Schematic of merging bands.

Hydrogen is a more complicated story. Solid hydrogen consists of molecules


and the attractions between different molecules are weak. The proper model of
hydrogen is not a series of equally spaced atoms, but a series of pairs of atoms
joined into molecules, and with wide gaps between the molecules. When the two
atoms in a single molecule are brought together, the energy varies with distance
between the atoms much like the left graph in figure 7.11. The wave function
that is the same for the two atoms in the current simple model corresponds
to the normal covalent bond in which the electrons are symmetrically shared;
the flip-flop function that changes sign describes the “anti-bonding” state in
which the two electrons are anti-symmetrically shared. In the ground state,
both electrons go into the state corresponding to the covalent bond, and the
anti-bonding state stays empty. For multiple molecules, each of the two states
turns into a band, but since the interactions between the molecules are weak,
these two bands do not fan out much. So the energy spectrum of solid hydrogen
remains much like the left graph in figure 7.11, with the bottom curve becoming
a filled band and the top curve an empty one. An equivalent way to think of
this is that the 1s energy level of hydrogen does not fan out into a single band
like the 2s level of lithium, but into two half bands, since there are two spacings
involved; the spacing between the atoms in a molecule and the spacing between
molecules. In any case, because of the band gap energy required to reach the
empty upper half 1s band, hydrogen is an insulator.
282 CHAPTER 7. SOLIDS

7.4.11 Three-dimensional metals


The ideas of the previous subsections generalize towards three-dimensional crys-
tals in a relatively straightforward way.

d~3

d~2 d~1

Figure 7.14: A primitive cell and primitive translation vectors of lithium.

As the lithium crystal of figure 7.14 illustrates, in a three-dimensional crystal


there are three “primitive translation vectors.” The three dimensional Cartesian
position ~r can be written as

~r = u1 d~1 + u2 d~2 + u3 d~3 (7.6)

where if any of the “crystal coordinates” u1 , u2 , or u3 changes by exactly one


unit, it produces a physically completely equivalent position.
Note that the vectors d~1 and d~2 are two bottom sides of the “cubic unit cell”
defined earlier in figure 7.6. However, d~3 is not the vertical side of the cube.
The reason is that primitive translation vectors must be chosen to allow you to
reach any point of the crystal from any equivalent point in whole steps. Now d~1
and d~2 allow you to step from any point in a horizontal plane to any equivalent
point in the same plane. But if d~3 was vertically upwards like the side of the
cubic unit cell, stepping with d~3 would miss every second horizontal plane. With
d~1 and d~2 defined as in figure 7.14, d~3 must point to an equivalent point in an
immediately adjacent horizontal plane, not a horizontal plane farther away.
Despite this requirement, there are still many ways of choosing the primitive
translation vectors other than the one shown in figure 7.14. The usual way is
to choose all three to extend towards adjacent cube centers. However, then it
gets more difficult to see that no lattice point is missed when stepping around
with them.
The parallelepiped shown in figure 7.14, with sides given by the primitive
translation vectors, is called the “primitive cell.” It is the smallest building
7.4. METALS [DESCRIPTIVE] 283

block that can be stacked together to form the total crystal. The cubic unit cell
from figure 7.6 is not a primitive cell since it has twice the volume. The cubic
unit cell is instead called the “conventional cell.”
Since the primitive vectors are not unique, the primitive cell they define is not
either. These primitive cells are purely mathematical quantities; an arbitrary
choice for the smallest single volume element from which the total crystal volume
can be build up. The question suggests itself whether it would not be possible
to define a primitive cell that has some physical meaning; whose definition is
unique, rather than arbitrary. The answer is yes, and the unambiguously defined
primitive cell is called the “Wigner-Seitz cell.” The Wigner-Seitz cell around a
lattice point is the vicinity of locations that are closer to that lattice point than
to any other lattice point.

Figure 7.15: Wigner-Seitz cell of the bcc lattice.

Figure 7.15 shows the Wigner-Seitz cell of the bcc lattice. To the left, it is
shown as a wire frame, and to the right as an opaque volume element. To put it
within context, the atom around which this Wigner-Seitz cell is centered was also
put in the center of a conventional cubic unit cell. Note how the Wigner-Seitz
primitive cell is much more spherical than the parallelepiped-shaped primitive
cell shown in figure 7.14. The outside surface of the Wigner-Seitz cell consists
of hexagonal planes on which the points are just on the verge of getting closer
to a corner atom of the conventional unit cell than to the center atom, and of
squares on which the points are just on the verge of getting closer to the center
atom of an adjacent conventional unit cell. The squares are located within the
faces of the conventional unit cell.
The reason that the entire crystal volume can be build up from Wigner-Seitz
cells is simple: every point must be closest to some lattice point, so it must be
in some Wigner-Seitz cell. When a point is equally close to two nearest lattice
284 CHAPTER 7. SOLIDS

points, it is on the boundary where adjacent Wigner-Seitz cells meet.


Turning to the energy eigenfunctions, they can now be taken to be eigen-
functions of three shift operators; they will change by some factor ei2πν1 when
shifted over d~1 , by ei2πν2 when shifted over d~2 , and by ei2πν3 when shifted over
d~3 . All that just means that they must take the Floquet (Bloch) function form
ψ e = ei2π(ν1 u1 +ν2 u2 +ν3 u3 ) ψpe ,
where ψpe is periodic on atom scales, exactly the same after one unit change in
any of the crystal coordinates u1 , u2 or u3 .
It is again often convenient to write the Floquet exponential in terms of nor-
mal Cartesian coordinates. To do so, note that the relation giving the physical
position ~r in terms of the crystal coordinates u1 , u2 , and u3 ,
~r = u1 d~1 + u2 d~2 + u3 d~3
can be inverted to give the crystal coordinates in terms of the physical position,
as follows:
1 ~ 1 ~ 1 ~
u1 = D1 · ~r u2 = D2 · ~r u3 = D3 · ~r (7.7)
2π 2π 2π
(Again, factors 2π have been thrown in merely to fully satisfy even the most
demanding quantum mechanics reader.) To find the vectors D ~ 1, D
~ 2 , and D
~ 3,
simply solve the expression for ~r in terms of u1 , u2 , and u3 using linear algebra
procedures. In particular, they turn out to be the rows of the inverse of matrix
(d~1 , d~2 , d~3 ).
If you do not know linear algebra, it can be done geometrically: if you dot
the expression for ~r above with D~ 1 /2π, you must get u1 ; for that to be true, the
first three conditions below are required:

d~1 · D
~ 1 = 2π, d~2 · D
~ 1 = 0, d~3 · D~ 1 = 0,
d~1 · D
~ 2 = 0, d~2 · D
~ 2 = 2π, d~3 · D
~ 2 = 0, (7.8)
d~1 · D
~ 3 = 0, d~2 · D
~ 3 = 0, d~3 · D~ 3 = 2π.

The second set of three equations is obtained by dotting with D ~ 2 /2π to get u2
and the third by dotting with D ~ 3 /2π to get u3 . From the last two equations in
the first row, it follows that vector D ~ 1 must be orthogonal to both d~2 and d~3 .
That means that you can get D ~ 1 by first finding the vectorial cross product of
vectors d2 and d3 and then adjusting the length so that d~1 · D
~ ~ ~ 1 = 2π. In similar
ways, D~ 2 and D~ 3 may be found.
If the expressions for the crystal coordinates are substituted into the expo-
nential part of the Bloch functions, the result is
~ ~k = ν1 D
ψ~ke = eik·~r ψp,
e
~k
~ 1 + ν2 D
~ 2 + ν3 D
~3 (7.9)
7.4. METALS [DESCRIPTIVE] 285

So, in three dimensions, a wave number k becomes a “wave number vector” ~k.
e
Just like for the one-dimensional case, the periodic function ψp, ~k too can
be written in terms of exponentials. The appropriate Fourier exponentials are
the eigenfunctions of the momentum operators in the three primitive directions.
Converted to physical coordinates, it gives:
XXX ~
e ikm
~ ·~
r
ψp,~k = cp,~km
~e
m1 m2 m3

~km ~ ~ ~
~ = m1 D1 + m2 D2 + m3 D3 for m1 , m2 , and m3 integers (7.10)

If these wave number vectors ~km ~ are plotted three-dimensionally, it again forms
a lattice called the “reciprocal lattice,” and its primitive vectors are D ~ 1, D~ 2 , and
~ 3 . Remarkably, the reciprocal lattice to lithium’s bcc physical lattice turns out
D
to be the fcc lattice of NaCl fame!
And now note the beautiful symmetry in the relations (7.8) between the
primitive vectors D ~ 1, D
~ 2 , and D~ 3 of the reciprocal lattice and the primitive
vectors d~1 , d~2 , and d~3 of the physical lattice. Because these relations involve
both sets of primitive vectors in exactly the same way, if a physical lattice with
primitive vectors d~1 , d~2 , and d~3 has a reciprocal lattice with primitive vectors
~ 1, D
D ~ 2 , and D~ 3 , then a physical lattice with primitive vectors D ~ 1, D
~ 2 , and D ~3
~ ~ ~
has a reciprocal lattice with primitive vectors d1 , d2 , and d3 . Which means
that since NaCl’s fcc lattice is the reciprocal to lithium’s bcc lattice, lithium’s
bcc lattice is the reciprocal to NaCl’s fcc lattice. You now see where the word
“reciprocal” in reciprocal lattice comes from. Lithium and NaCl borrow each
other’s lattice to serve as their lattice of wave number vectors.
Finally, how about the definition of the “Brillouin zones” in three dimen-
sions? In particular, how about the first Brillouin zone to which you often
prefer to move the Floquet wave number vector ~k? Well, it is the magnitude of
the wave number vector that is important, so the first Brillouin zone is defined
to be the Wigner-Seitz cell around the origin in the reciprocal lattice. Note
that this means that in the first Brillouin zone, ν1 , ν2 , and ν3 are not simply
numbers in the range from − 21 to 12 as in one dimension; that would give a
parallelepiped-shaped primitive cell instead.
Solid state physicists may tell you that the other Brillouin zones are also
reciprocal lattice Wigner-Seitz cells, [11, p. 38], but if you look closer at what
they are actually doing, the higher zones consist of fragments of reciprocal lattice
Wigner-Seitz cells that can be assembled together to produce a Wigner-Seitz
cell shape. Like for the one-dimensional crystal, the second zone are again the
points that are second closest to the origin, etcetera.
The boundaries of the Brillouin zone fragments are now planes called “Bragg
planes.” Each is a perpendicular bisector of a lattice point and the origin.
286 CHAPTER 7. SOLIDS

That is so because the locations where points stop being first/, second/, third/,
. . . closest to the origin and become first/, second/, third/, . . . closest to some
other reciprocal lattice point must be on the bisector between that lattice point
and the origin. Sections 7.7 and 7.8 will give Bragg planes and Brillouin zones
for a simple cubic lattice.
The qualitative story for the valence electron energy levels is the same in
three dimensions as in one. Sections 7.6 and 7.8 will look a bit closer at them
quantitatively.

7.5 Covalent Materials [Descriptive]


In covalent materials, the atoms are held together by covalent chemical bonds.
Such bonds are strong. Note that the classification is somewhat vague; many
crystals, like quartz (silicon dioxide), have partly ionic, partly covalent binding.
Another ambiguity occurs for graphite, the stable form of carbon under normal
condition. Graphite consists of layers of carbon atoms arranged in a hexagonal
pattern. There are four covalent bonds binding each carbon to three neighboring
atoms in the layer: three sp2 hybrid bonds in the plane and a fourth π-bond
normal it. The π-electrons are delocalized and will conduct electricity. (When
rolled into carbon nanotubes, this becomes a bit more complicated.) As far as
the binding of the solid is concerned, however, the point is that different layers
of graphite are only held together with weak Van der Waals forces, rather than
covalent bonds. This makes graphite one of the softest solids known.
Under pressure, carbon atoms can form diamond rather than graphite, and
diamond is one of the hardest substances known. The diamond structure is
a very clean example of purely covalent bonding, and this section will have a
look at its nature. Other group IV elements in the periodic table, in particular
silicon, germanium, and grey tin also have the diamond structure. All these, of
course, are very important for engineering applications.
One question that suggests itself in view of the earlier discussion of metals
is why these materials are not metals. Consider carbon for example. Compared
to beryllium, it has four rather than two electrons in the second, L, shell. But
the merged 2s and 2p bands can hold eight electrons, so that cannot be the
explanation. In fact, tin comes in two forms under normal conditions: covalent
grey tin is stable below 13◦ C; while above that temperature, metallic white tin
is the stable form. It is often difficult to guess whether a particular element will
form a metallic or covalent substance near the middle of the periodic table.
Figure 7.16 gives a schematic of the energy band structure for a diamond-
type crystal when the spacing between the atoms is artificially changed. When
the atoms are far apart, i.e. d is large, the difference from beryllium is only
that carbon has two electrons in 2p states versus beryllium none. But when
7.5. COVALENT MATERIALS [DESCRIPTIVE] 287

Ee
2p
6
band gap
?
2s

6 d
actual separation between the atoms

Figure 7.16: Schematic of crossing bands.

the carbon atoms start coming closer, they have a group meeting and hit upon
the bright idea to reduce their energy even more by converting their one 2s and
three 2p spatial states into four hybrid sp3 states. This allows them to share
pairs of electrons symmetrically in as much as four strong covalent bonds. And
it does indeed work very well for lowering the energy of these states, filled to
the gills with electrons. But it does not work well at all for the “anti-bonding”
states that share the electrons antisymmetrically, (as discussed for the hydrogen
molecule in chapter 4.2.4), and who do not have a single electron to support
their case at the meeting. So a new energy gap now opens up.
At the actual atom spacing of diamond, this band gap has become as big as
5.4 eV, making it an electric insulator (unlike graphite, which is a semi-metal).
For silicon however, the gap is a much smaller 1.1 eV, similar to the one for
germanium of 0.7 eV; grey tin is considerably smaller still; recent authoritative
sources list it as zero. These smaller band gaps allow noticeable numbers of
electrons to get into the empty conduction band by thermal excitation, so these
materials are semiconductors at room temperature.
The crystal structure of these materials is rather interesting. It must allow
each atom core to connect to 4 others to form the hybrid covalent bonds. That
requires the rather spacious structure sketched in figure 7.17. For simplicity and
clarity, the four hybrid bonds that attach each atom core to its four neighbors
are shown as blue or black sticks rather than as a distribution of grey tones.
To understand the figure beyond that, first note that it turns out to be
impossible to create the diamond crystal structure from a basis of a single atom.
It is simply not possible to distribute clones of one carbon atom around using a
single set of three primitive vectors, and produce all the atoms in the diamond
crystal. A basis of a pair of atoms is needed. The choice of which pair is quite
arbitrary, but in figure 7.17 the clones of the chosen pair are linked by blue
288 CHAPTER 7. SOLIDS

Figure 7.17: Ball and stick schematic of the diamond crystal.

lines. Notice how the entire crystal is build up from such clones. (Physically,
the choice of basis is artificial, and the blue sticks indicate hybrid bonds just
like the black ones.)

Now notice that the lower members of these pairs are located at the corners
and face centers of the cubic volume elements indicated by the fat red lines. Yes,
diamond is another example of a face-centered cubic lattice. What is different
from the NaCl case is the basis; two carbon atoms at some weird angle, instead
of a natrium and a chlorine ion sensibly next to each other. Actually, if you
look a bit closer, you will notice that in terms of the half-size cubes indicated
by thin red frames, the structure is not that illogical. It is again that of a
three-dimensional chess board, where the centers of the black cubes contain the
upper carbon of a basis clone, while the centers of the white cubes are empty.
But of course, you would not want to tell people that. They might think you
spend your time playing games, and terminate your support.

If you look at the massively cross-linked diamond structure, it may not


come as that much of a surprise that diamond is the hardest substance to
occur naturally. Under normal conditions, diamond will supposedly degenerate
extremely slowly into graphite, but without doubt, diamonds are forever.
7.6. CONFINED FREE ELECTRONS 289

7.6 Confined Free Electrons


Heisenberg’s uncertainty relationship implies that the more you try to confine
a set of particles spatially, the more linear momentum they have to have. Such
increased momentum means increased kinetic energy.
Confined fermions, such as the valence electrons in metals, add another twist.
They cannot all just go into whatever is the state of lowest energy: the Pauli
exclusion principle, (or antisymmetrization requirement), forces them to spread
out to higher energy states. The resulting large kinetic energy of the electrons
creates an internal pressure, called “degeneracy pressure”, that allows solids to
withstand high external pressures without collapsing.
The kinetic energy of the valence electrons is the primary reason that metals
do not get any denser than they do. If they would get a bit more dense, the
kinetic energy increase of the electrons would be more than the potential energy
benefits due to the electrons getting closer to the nuclei.
This section analyzes the confined electrons problem using a highly simpli-
fied, but surprisingly effective, model due to Sommerfeld. In this model it is
assumed that the electrons do not experience any forces, which is the reason
why it is called a “free electron gas.”
Of course, electrons should repel each other, but the assumption is that
since the electron wave functions overlap so much, this force comes from all
directions and averages away. Just like if you would be in the center of the
earth, you would be weightless. Similarly, valence electrons moving through a
crystal structure should experience forces from the nuclei they pass, but this
too will be ignored. If an electron is very close to a nucleus, it could definitely
not fail to notice it, but generally speaking, there are nuclei in all directions.
In accordance with the title of this section, confined electrons, it will be
assumed that the electrons are trapped in an impenetrable box. Clearly that
is not a good assumption if you are trying to describe electrical conduction in
a piece of wire, but it is appropriate for an isolated block of metal. While the
valence electrons are free to move around inside a metal, if they try to get off
the block, the nuclei interfere. The details are not that simple, but electrons
that try to escape repel other, easily displaced, electrons that might aid in their
escape, leaving the nuclei unopposed to pull them back. Obviously, electrons
are not very smart.
In any case, when the final results for confined electrons are examined more
closely at the end of this section, it will turn out that they are really no different
from the solutions for the periodic boundary conditions that you may want
to use to describe electrons moving in and out of the metal during electrical
conduction.
290 CHAPTER 7. SOLIDS

7.6.1 The Hamiltonian eigenvalue problem


The first step to solve the problem of confined electrons is to write down the
Hamiltonian. Under the stated crude assumption that there are no forces on
the electrons, the potential is a constant, and this constant can be taken to be
zero. (A nonzero value would merely shift the energy levels by that amount
without changing the physics.)
In that case, the total energy Hamiltonian is just the kinetic energy operator
b
T of chapter 2.3, and the Hamiltonian eigenvalue problem for each electron is
" #
h̄2 ∂2 ∂2 ∂2 e
− + + ψe = E ψe (7.11)
2me ∂x2 ∂y 2 ∂z 2

It will further be assumed that the electrons are confined to a rectangular


box of dimensions ℓx × ℓy × ℓz :

0 ≤ x ≤ ℓx 0 ≤ y ≤ ℓy 0 ≤ z ≤ ℓz (7.12)

The boundary condition on the surface of the box is that ψ e = 0 there. Phys-
ically, if electrons attempt to escape from the solid their potential energy in-
creases rapidly because the atom nuclei pull them back. This means that the
wave function beyond the surface must be vanishingly small, and becomes zero
on the surface in case of perfect confinement.

7.6.2 Solution by separation of variables


The Hamiltonian eigenvalue problem derived in the previous section is mathe-
matically equivalent to the particle in the pipe of chapter 2.5.
Assuming that each eigenfunction takes the form ψ e = X(x)Y (y)Z(z), with
X, Y , and Z still to be determined functions of their single argument, the eigen-
value problem falls apart into partial problems in each of the three coordinate
directions. In particular, the partial problem in the x direction is:

h̄2 ∂ 2 X e
− 2
= E xX
2me ∂x
where E ex is the measurable value of the kinetic energy in the x-direction.
The normalized solutions of this equation are all of the form
s
2
sin(kx x)
ℓx

in which kx is a constant which is called the “wave number in the x-direction.”


The higher the value of this wave number, the more rapidly the sine oscillates up
7.6. CONFINED FREE ELECTRONS 291

and down when x changes. To avoid counting equivalent eigenfunctions twice,


kx must be taken positive.
The sinusoidal solution above may be checked by simple substitution in
the partial problem. Doing so produces the following important relationship
between the wave number and the partial energy eigenvalue:
e h̄2 2
Ex = k
2me x
So, the square of the wave number kx is a direct measure for the energy E ex of
the state.
To satisfy the boundary condition that ψ e = 0 at x = ℓx , sin(kx ℓx ) must be
zero, which is only true for discrete values of the wave number kx :
π
kx = nx with nx a natural number
ℓx
Note that the wave numbers are equally spaced;
π π π π
kx1 = , kx2 = 2 , kx3 = 3 , kx4 = 4 ,...
ℓx ℓx ℓx ℓx
Each value is an constant amount π/ℓx greater than the previous one. Since the
square wave number is a measure of the energy, the values for the wave number
also fix the energy eigenvalues:
e h̄2 π 2 e h̄2 π 2 e h̄2 π 2 e h̄2 π 2
E x1 = , E x2 = 4 , E x3 = 9 , E x4 = 16 ,...
2me ℓ2x 2me ℓ2x 2me ℓ2x 2me ℓ2x
The problems in the y- and z-directions are equivalent to the one in the
x-direction, and they have similar solutions. The final three-dimensional com-
bined energy eigenfunctions depend therefore on the values of a so-called “wave
number vector” ~k = (kx , ky , kz ) and they are, properly normalized:
s
8
ψ~ke = sin(kx x) sin(ky y) sin(kz z) (7.13)
ℓx ℓy ℓz

The corresponding energy eigenvalues only depend on the square magnitude k 2


of the wave number vector:

e h̄2 2 2 2 h̄2 2
Ek = (k + ky + kz ) ≡ k (7.14)
2me x 2me
The possible wave number vector values are
π π π
kx = nx ky = ny kz = nz with nx , ny , and nz natural numbers
ℓx ℓy ℓz
(7.15)
292 CHAPTER 7. SOLIDS

7.6.3 Discussion of the solution


This section examines the physical interpretation of the energy eigenfunctions
obtained in the previous subsection.
Each solution turned out to be in terms of a wave number vector ~k, so
to understand it, the possible values of this vector must be examined. Since
the possible values of its components (kx , ky , kz ) are equally spaced in each
individual direction, (7.15), the possible wave number vectors form an infinite
lattice of points as illustrated in figure 7.18.

ky

kx

kz

Figure 7.18: Allowed wave number vectors.

Each point in this “wave number space” represents one set of wave number
values (kx , ky , kz ), corresponding to one eigenfunction
s
8
ψ~ke = sin(kx x) sin(ky y) sin(kz z)
ℓx ℓy ℓz

with an energy:
e h̄2 2 2 2 h̄2 2
Ek = (k + ky + kz ) ≡ k
2me x 2me
This energy is simply the square distance k 2 of the point from the origin in wave
number space, times a simple numerical factor h̄2 /2me . So the wave number
space figure 7.18 also graphically illustrates the possible energy levels by means
of the distances of the points to the origin. In particular the lowest energy state
7.6. CONFINED FREE ELECTRONS 293

available to the electrons occurs for the wave number vector point closest to the
origin of wave number space.
That point corresponds to the lowest energy state in the energy spectrum
sketched in figure 7.19. Similarly the points farther from the origin in wave
number space have correspondingly higher energy values in the spectrum. It
should be pointed out that actually, the energy levels are not quite as equally
spaced as it seems from the shown spectrum figure 7.19. But it is quite true that
the spectrum is continuous; there are no energy bands in a free electron gas.
The states are closely spaced everywhere, assuming the box has macroscopic
dimensions.

E ek unoccupied states

occupied states

Figure 7.19: Schematic energy spectrum of the free electron gas.

The most interesting question is again the ground state of lowest energy,
corresponding to absolute zero temperature. If the electrons would have been
nice docile bosons, in the ground state they would all be willing to pile into the
bottom state of lowest energy in the spectrum. But, just like for the electrons of
the atoms in section 4.9, the Pauli exclusion principle allows no more than two
electrons for each spatial energy state, one with spin up and one with spin down.
So, only two electrons can go into the bottom energy state. The more electrons
there are, the more different states must be occupied, and hence, the further the
occupied states in the spectrum extend upwards towards higher energy levels.
This can raise the energy greatly, since the number of electrons in a macroscopic
solid is huge, much more than could possibly be shown in a figure like figure
7.19.
Seen in wave number space figure 7.18, the number of wave number points
occupied must be one half the number of electrons. Within that constraint, the
lowest energy occurs when the states squeeze as closely to the origin as possible.
As a result, the occupied states will cluster around the origin in an eighth of a
sphere, as shown in figure 7.20.
The spherical outside surface of the occupied energy states is called the
“Fermi surface.” The corresponding energy, the highest occupied energy level,
is called the “Fermi energy.” Fermi surfaces are of critical importance in un-
derstanding the properties of metals. Indeed, Mackintosh suggested that the
most meaningful definition that can be given of a metal is “a solid with a Fermi
surface.” For example, recall from figure 7.12 that conduction has to do with
294 CHAPTER 7. SOLIDS

ky

kx

kz

Figure 7.20: Occupied wave number states and Fermi surface in the ground
state

shuffling the electrons of highest energy around. That are the electrons at the
Fermi surface.

7.6.4 A numerical example


As an example of the energies involved, consider a 1 cm3 block of copper. The
block will contain 8.5 1022 valence electrons, and with up to two electrons allowed
per energy state at least 4.25 1022 different energy states must be occupied.
As shown in figure 7.20, in the ground state of lowest energy, these states
form an octant of a sphere in wave number space. But since there are so many
electrons, the sphere extends far from the origin, enclosing a lot more points
than could possibly be shown. And remember that the distance from the origin
gives the energy of the states. With the states extending so far from origin, the
average kinetic energy is as much as a factor 1015 larger than what it would
have been if the electrons were all in the state of lowest energy right next to
the origin. In fact, the average kinetic energy becomes so large that it dwarfs
normal heat motion. Copper stays effectively in this ground state until it melts,
shrugging off temperature changes.
Macroscopically, the large kinetic energy of the electrons leads to a “degen-
eracy pressure” on the outside surfaces of the region containing the electrons.
This pressure is quite large, of order 1010 Pa; it is balanced by the nuclei pulling
7.6. CONFINED FREE ELECTRONS 295

on the electrons trying to escape, keeping them in the solid. Note that it is not
mutual repulsion of the electrons that causes the degeneracy pressure; all forces
on the electrons were ignored. It is the uncertainty relationship that requires
spatially confined electrons to have momentum, and the exclusion principle that
explodes the resulting amount of kinetic energy, creating fast electrons that are
as hard to contain as students on the last day of classes.
Compared to a 1010 Pa degeneracy pressure, the normal atmospheric pres-
sure of about 105 Pa hardly adds any additional compression. Pauli’s exclusion
principle makes liquids and solids quite incompressible under normal pressures.

7.6.5 The density of states and confinement


The free electron gas is a simple model, but it illustrates a lot of what is different
in quantum mechanics compared to classical mechanics. This section provides
more insight in what the solution really says.
First, of course, it says that the possible energy states are discrete, and
only two electrons can go into a single spatial state. The Fermi energy and the
degeneracy pressure result.
Which brings up the first question: given an energy E e , like the Fermi energy,
how many states are there with energy no more than E e ? Assuming the state
points are densely spaced in k-space, this is easy to answer. Consider again
figure 7.20. Each point represents a little block, of “volume”, (in k-space),
π π π
∆kx × ∆ky × ∆kz = × ×
ℓ x ℓ y ℓz
compare (7.15). Now consider the octant of the sphere bounded by the energy
level E e ; that has a “volume”
à !3/2
14 2me E e
π
83 h̄2

since its square radius equals 2me E e /h̄2 . To figure out the number N of the
little blocks that are contained within the octant of the sphere, just take the
ratio of the two “volumes”:
à !3/2 Á
14 2me E e π π π √ e
N= π × × = Cℓx ℓy ℓz E e E
83 h̄2 ℓ x ℓ y ℓz
where C is just shorthand for a collection of constants that are not of interest
for the story. Since each little block represents one state, the number of energy
states with energy less than E e is also N .
Note that ℓx ℓy ℓz is the physical volume of the box in which the electrons
are contained, (rather than the mathematical “volumes” in k space that were
296 CHAPTER 7. SOLIDS

manipulated above.) So the formula gets even simpler if you define N to be the
number of spatial states per unit volume of the box:

√ µ ¶3/2
e e 1 2me
N = CE E with C = 2 (7.16)
6π h̄2

That is the number of states with energy less than a given value E e . (The value
of C is included for future reference.)
But physicists are also interested in knowing how many energy states there
are with energy approximately equal to E e . To express the latter more precisely,
the “Density Of States” D will be defined as the number of single-electron states
with energies in a narrow range about E e , per unit volume and per unit energy
range. That makes D just the derivative of the states per unit volume N :

√ µ ¶3/2
e 1 2me
D = 1.5C E with C = 2 (7.17)
6π h̄2
This function is plotted in figure 7.21. One thing it shows is that at higher
energy levels, there are more states available at that level.

Ee

Figure 7.21: Density of states for the free electron gas.

That would finish the analysis, except that there is a problem. Remember,
the states N below a given energy level E e were found by computing how many
little state volumes are contained within the octant of the sphere. That is all
very fine when the energy states are densely spaced together in k-space, but its
starts to unravel when they get farther apart. An energy state can either be
less than a given energy E e or not: even if half its volume is inside the sphere
octant, the state itself will still be outside, not halfway in.
That makes a difference when you, for example, squeeze down on the y-di-
mension of the box to confine the electrons severely in the y-direction in order
to create a “quantum well”. Since the spacing of the energy states ∆ky equals
π/ℓy , making ℓy small spreads the states well apart in the ky -direction, as shown
in figure 7.22. Compare this to the normal case figure 7.20.
7.6. CONFINED FREE ELECTRONS 297

ky

kx

kz

Ee

Figure 7.22: Energy states, top, and density of states, bottom, when there is
severe confinement in the y-direction, as in a quantum well.
298 CHAPTER 7. SOLIDS

A look at figure 7.22 shows that now there are no energy states at all, hence
no density of states, until the energy, indicated by the size of the red sphere, hits
the level of the smaller blue sphere which signifies the start of the first plane
of states. When the energy gets a bit above that threshold level, the energy
sphere initially gobbles up quite a few states relative to the much reduced box
size, and the density of states jumps up. But after that jump, the density of
states does not grow like the normal case in figure 7.21 did: while the normal
case keeps adding more and more circles of states, here there is only one circle
until the energy eventually hits the level of the second blue sphere. The density
of states remains constant before that happens, reflecting the fact that both the
area of the circle and the partial energies E ex + E ez increase proportional to the
square of the radius of the circle.
When the energy does hit the level of the larger blue sphere, states from the
second plane of states are added, and the density of states jumps up once more.
By jumps like that, the growth of the normal density of states of figure 7.21 will
be approximated when the energy gets big enough.
You can limit the size of the electron-containing box in both the y and z
directions to create a “quantum wire” where there is full freedom of motion only
in the x-direction. This case is shown in figure 7.23. Now the states separate
into individual lines of states. There are no energy states, hence no density
of states, until the energy exceeds the level of the smaller blue sphere which
just reaches the line of states closest to the origin. Just above that level, a lot
of states are encountered relative to the small box volume, and the density of
states jumps way up. When the energy increases further, however, the density
of states comes down again: compared to the less confined cases, no new lines
of states are added until the energy hits the level of the larger blue sphere, at
which time the density of states jumps way up once again. Mathematically, the
density of states of each line is proportional to the inverse square root of the
excess energy above the one needed to reach the line.
Finally, if you make the box small in all three directions, you create a “quan-
tum dot” or “artificial atom”. Now each energy state is a separate point, figure
7.24. The density of states is now zero unless the energy sphere exactly hits one
of the individual points, in which case the density of states is infinite. So, the
density of states is a set of vertical spikes. Mathematically, the contribution of
each state to the density of states is proportional to a delta function located at
that energy.
(It may be pointed out that strictly speaking, every density of states is in
reality a set of delta functions. It is only if you average the delta functions over
a small energy range, chosen based on how dense the points are in k-space, that
you get the smooth mathematical functions of the previous three examples as
approximations.)
7.6. CONFINED FREE ELECTRONS 299

ky

kx

kz

Ee

Figure 7.23: Energy states, top, and density of states, bottom, when there is
severe confinement in both the y- and z-directions, as in a quantum wire.
300 CHAPTER 7. SOLIDS

ky

kx

kz

Ee

Figure 7.24: Energy states, top, and density of states, bottom, when there is
severe confinement in all three directions, as in a quantum dot or artificial atom.
7.7. FREE ELECTRONS IN A LATTICE 301

7.6.6 Relation to Bloch functions


You may wonder how all this relates to Floquet theory, (or, according to physi-
cists, Bloch’s theorem,) that says that the energy eigenfunctions in a crystalline
solid should take the form of exponentials times functions that are periodic on
atom scale. Here we have a crystalline solid, a trivial one to be sure, but still
a crystalline solid, and the energy eigenfunctions of the valence electrons were
found to be s
e 8
ψ~k = sin(kx x) sin(ky y) sin(kz z)
ℓx ℓy ℓz
The relation to Floquet theory comes from the Euler formula (1.5), which
implies that the sines can be taken apart into complex exponentials as follows:
s
8 eikx x − e−ikx x eiky y − e−iky y eikz z − e−ikz z
ψ~ke =
ℓx ℓy ℓz 2i 2i 2i
and multiplying out shows that every eigenfunction consists of eight complex
exponentials, each of the form of the form ei(±kx x±ky y±kz z) . Those are the Floquet
exponential parts of the Bloch waves; the periodic parts are trivial constants,
reflecting the fact the periodic potential itself is trivially constant for a free
electron gas.
The reason that the exponentials group together into sines is the imposed
boundary condition that the electrons cannot escape from the box. By pairing
the states of positive momentum (positive-k exponentials) with corresponding
negative momentum ones, the solutions achieve that the net momentum is zero.
The ei(±kx x±ky y±kz z) Floquet exponentials have obviously not been shifted to
any first Brillouin zone. In fact, since the electrons experience no forces, as far
as they are concerned, there is no crystal structure, hence no Brillouin zones.

7.7 Free Electrons in a Lattice


As far as the mathematics of free electrons is concerned, the box in which they
are confined may as well be empty. However,it is useful to put the results in
context of a surrounding crystal lattice anyway. That will allow some of the
basic concepts of the solid mechanics of crystals to be defined within a simple
setting.
The starting point will be the Floquet exponentials
e 1
ψ~k,Floquet =q ei(kx x+ky y+kz z)
8ℓx ℓy ℓz

as the electron energy eigenfunctions. One important reason is that working


with exponentials is mathematically easier than working with the sinusoidal
302 CHAPTER 7. SOLIDS

eigenfunctions s
8
ψ~ke = sin(kx x) sin(ky y) sin(kz z)
ℓx ℓy ℓz
derived in the previous section. However, it is not quite the same.
The physical difference between the two is in the boundary conditions. The
sinusoidal solutions were zero on the boundaries of the box 0 < x < ℓx , 0 <
y < ℓy , 0 < z < ℓz , thus describing electrons confined to the inside of that box.
It can be seen that the Floquet exponentials are instead periodic over periods
−ℓx < x < ℓx , −ℓy < y < ℓy , and −ℓz < z < ℓz .
Periodic boundary conditions may be just what you want: often the interest
is not so much in confined electrons, but in, say, electrical conduction, and
periodic boundary conditions are ideal for a simple description of that.

ky

kx
ky

kx

Figure 7.25: Wave number vectors seen in a cross section of constant kz . Top:
sinusoidal solutions. Bottom: exponential solutions.

There is one other important difference between the Floquet solutions and
the sinusoidal ones besides boundary conditions. The sinusoidal ones only take
7.7. FREE ELECTRONS IN A LATTICE 303

positive values for kx , ky , and kz , but to get a complete set of Floquet expo-
nentials, both positive and negative values must be included. So, when the
wave number vectors for the sinusoidal solutions are plotted in a cross section
of given kz , as at the top in figure 7.25, it fills only a quarter plane, while the
wave number vectors for the exponentials fill the entire plane, bottom.
In three dimensions that means that while the wave number vectors of the
sinusoidal solutions fill only an octant of space, as in figure 7.18, the Floquet
wave number vectors spread out over all of space. However, to keep the plots
simple and readable, in this section the wave number vectors will only be shown
in a cross section, the one corresponding to kz = 0, like in figure 7.25.

7.7.1 The lattice structure


If a crystal lattice is to be added to the free electron gas solutions, an appropriate
choice for it must be made. The plan here is to keep the lattice as simple as
possible. That is no significant restriction because the ideas are really the same
for other lattices.
Also, the plan is to keep the Floquet wave number vectors as derived for the
free electrons in a rectangular box in the previous section. Therefor, it is best
to figure out a suitable reciprocal lattice first. To do so, compare the general
expression for the Fourier ~km ~ values that make up the reciprocal lattice:

~km ~ ~ ~
~ = m 1 D1 + m 2 D2 + m 3 D 3

in which m1 , m2 , and m3 are integers, with the Floquet ~k-values,


~k = ν1 D
~ 1 + ν2 D
~ 2 + ν3 D
~3

(compare section 7.4.11.) Now ν1 is of the form ν1 = j1 /J1 where j1 is an integer


just like m1 is an integer, and J1 is the number of lattice cells in the direction
of the first primitive vector. For a macroscopic crystal, J1 will be a very large
number, so the conclusion must be that the Floquet wave numbers are spaced
much more closely together than the Fourier ones. And so they are in the other
two directions.
In particular, if it is assumed that there are an equal number of cells in
each primitive direction, J1 = J2 = J3 = J, then the Fourier wave numbers
are spaced farther apart than the Floquet ones by a factor J in each direction.
Such a reciprocal lattice is shown as fat black dots in figure 7.26. A lattice
like the one shown is called a “simple cubic lattice,” and it is the easiest lattice
that you can define. The primitive vectors are orthonormal, just a multiple of
the Cartesian unit vectors ı̂, ̂, and k̂. Each lattice point can be taken to be
the center of a primitive cell that is a cube, and this cubic primitive cell just
happens to be the Wigner-Seitz cell too.
304 CHAPTER 7. SOLIDS

ky

kx

Figure 7.26: Assumed simple cubic reciprocal lattice, shown as black dots, in
cross-section. The boundaries of the surrounding primitive cells are shown as
thin red lines.

It is of course not that strange that the simple cubic lattice would work here,
because the assumed wave number vectors were derived for electrons confined
in a rectangular box.
How about the physical lattice? That is easy too. The simple cubic lattice
is its own reciprocal. So the physical crystal too consists of cubic cells stacked
together. (Atomic scale ones, of course, for a physical lattice.) In particular, the
wave numbers as shown in figure 7.26 correspond to a crystal that is macroscop-
ically a cube with equal sides 2ℓ, and that on atomic scale consists of J × J × J
identical cubic cells of size d = 2ℓ/J. Here J, the number of atom-scale cells in
each direction, will be a very large number, so d will be very small.
In ~k-space, J is the number of Floquet points in each direction within a unit
cell. Figure 7.26 would correspond to a physical crystal that has only 40 atoms
in each direction. A real crystal would have many thousands, and the Floquet
points would be much more densely spaced than could be shown in a figure like
figure 7.26.
It should be pointed out that the simple cubic lattice, while definitely simple,
is not that important physically unless you happen to be particularly interested
in polonium or compounds like cesium chloride or beta brass. But the math-
ematics is really no different for other crystal structures, just messier, so the
simple cubic lattice makes a good example.
7.7. FREE ELECTRONS IN A LATTICE 305

7.7.2 Occupied states and Brillouin zones


The previous subsection chose the reciprocal lattice in wave number space to
be the simple cubic one. The next question is how the occupied states show
up in it. As usual, it will be assumed that the crystal is in the ground state,
corresponding to zero absolute temperature.
Since for the Floquet exponentials, the wave numbers can be both positive
and negative, in the ground state the occupied energy levels form a full sphere
in wave number space, instead of an octant of a sphere like in figure 7.20.
That reflects the fact that the exponentials correspond to periodic boundary
conditions for crystals twice the size in each direction; so, there should be 23 = 8
times as many states. The Fermi surface is still the bounding surface of the
occupied states, but is now a complete spherical surface.
Figure 7.27 shows the occupied states in cross section if there are one, two,
and three valence electrons per physical lattice cell. (In other words, if there are
J 3 , 2J 3 , and 3J 3 electrons.) For one valence atom per lattice cell, the spherical
region of occupied states stays within the first Brillouin zone, i.e. the Wigner-
Seitz cell around the origin, though just barely. There are J 3 spatial states in a
Wigner-Seitz cell, the same number as the number of physical lattice cells, and
each can hold two electrons, (one spin up and one spin down,) so half the states
in the first Brillouin zone are filled. For two electrons per lattice cell, there are
just as many occupied spatial states as there are states within the first Brillouin
zone. But since in the ground state, the occupied free electron states form a
spherical region, rather than a cubic one, the occupied states spill over into
immediately adjacent Wigner-Seitz cells. For three valence electrons per lattice
cell, the occupied states spill over into still more neighboring Wigner-Seitz cells.
(It is hard to see, but the diameter of the occupied sphere is slightly larger than
the diagonal of the Wigner-Seitz cell cross-section.)
However, these results may show up presented in a different way in literature.
The reason is that a Bloch-wave representation is not unique. In terms of Bloch
waves, the free-electron exponential solutions as used here can be represented
in the form
~
ψ~ke = eik·~r ψp,
e
~k

e
where the atom-scale periodic part ψp, ~k of the solution is a trivial constant. In

addition, the Floquet wave number ~k can be in any Wigner-Seitz cell, however
far away from the origin. Such a description is called an “extended zone scheme”.
This free-electron way of thinking about the solutions is often not the best
way to understand the physics. Seen within a single physical lattice cell, a
solution with a Floquet wave number in a Wigner-Seitz cell far from the origin
looks like an extremely rapidly varying exponential. However, all of that atom-
scale physics is in the crystal-scale Floquet exponential; the lattice-cell scale part
306 CHAPTER 7. SOLIDS

ky

kx

ky

kx

ky

kx

Figure 7.27: Occupied states for one, two, and three free electrons per physical
lattice cell.
7.7. FREE ELECTRONS IN A LATTICE 307

e
ψp, ~k is a trivial constant. It may be better to shift the Floquet wave number
to the Wigner-Seitz cell around the origin, the first Brillouin zone. That will
turn the crystal-scale Floquet exponential into one that varies relatively slowly
over the physical lattice cell; the rapid variation will now be absorbed into the
e
lattice-cell part ψp, ~k . This idea is called the “reduced zone scheme.” As long as
the Floquet wave number vector is shifted to the first Brillouin zone by whole
e
amounts of the primitive vectors of the reciprocal lattice, ψp, ~k will remain an
atom-scale-periodic function; it will just become nontrivial. This shifting of the
Floquet wave numbers to the first Brillouin zone is illustrated in figures 7.28a
and 7.28b. The figures are for the case of three valence electrons per lattice cell,
but with a slightly increased radius of the sphere to avoid visual ambiguity.

a)
A
a b
b) c C
d

D B B D

b A a
d c
C

c)

first second third fourth

Figure 7.28: Redefinition of the occupied wave number vectors into Brillouin
zones.

Now each Floquet wave number vector in the first Brillouin zone does no
longer correspond to just one spatial energy eigenfunction like in the extended
zone scheme. There will now be multiple spatial eigenfunctions, distinguished
e
by different lattice-scale variations ψp,~k . Compare that with the earlier approx-
imation of one dimensional crystals as widely separated atoms. That was in
terms of different atomic wave functions like the 2s and 2p ones, not a single
one, that were modulated by Floquet exponentials that varied relatively slowly
over an atomic cell. In other words, the reduced zone scheme is the natural one
308 CHAPTER 7. SOLIDS

e
for widely spaced atoms: the lattice scale parts ψp, ~k correspond to the different
atomic energy eigenfunctions. And since they take care of the nontrivial vari-
ations within each lattice cell, the Floquet exponentials become slowly varying
ones.
But you might rightly feel that the critical Fermi surface is messed up pretty
badly in the reduced zone scheme figure 7.28b. That does not seem to be
such a hot idea, since the electrons near the Fermi surface are critical for the
properties of metals. However, the picture can now be taken apart again to
produce separate Brillouin zones. There is a construction credited to Harrison
that is illustrated in figure 7.28c. For points that are covered by at least one
fragment of the original sphere, (which means all points, here,) the first covering
is moved into the first Brillouin zone. For points that are covered by at least two
fragments of the original sphere, the second covering is moved into the second
Brillouin zone. And so on.

Figure 7.29: Second, third, and fourth Brillouin zones seen in the periodic zone
scheme.

Remember that in say electrical conduction, the electrons change occupied


states near the Fermi surfaces. To simplify talking about that, physicist like to
extend the pictures of the Brillouin zones periodically, as illustrated in figure
7.29. This is called the “periodic zone scheme.” In this scheme, the boundaries
of the Wigner-Seitz cells, which are normally not Fermi surfaces, are no longer
7.8. NEARLY-FREE ELECTRONS 309

a distracting factor. It may be noted that a bit of a lattice potential will round
off the sharp corners in figure 7.29, increasing the esthetics.

7.8 Nearly-Free Electrons


The free-electron energy spectrum does not have bands. Bands only form when
some of the forces that the ambient solid exerts on the electrons are included.
In this section, some of the mechanics of that process will be explored. The only
force considered will be one given by a periodic lattice potential. The discussion
will still ignore true electron-electron interactions, time variations of the lattice
potential, lattice defects, etcetera.
In addition, to simplify the mathematics it will be assumed that the lattice
potential is weak. That makes the approach here diametrically opposite to
the one followed in the discussion of the one-dimensional crystals. There the
starting point was electrons tightly bound to widely spaced atoms; the atom
energy levels then corresponded to infinitely concentrated bands that fanned
out when the distance between the atoms was reduced. Here the starting idea
is free electrons in closely packed crystals for which the bands are completely
fanned out so that there are no band gaps left. But it will be seen that when a
bit of nontrivial lattice potential is added, energy gaps will appear.
The analysis will again be based on the Floquet energy eigenfunctions for
the electrons. As noted in the previous section, they correspond to periodic
boundary conditions for periods 2ℓx , 2ℓy , and 2ℓz . In case that the energy
eigenfunctions for confined electrons are desired, they can be obtained from
the Bloch solutions to be derived in this section in the following way: Take
a Bloch solution and flip it over around the x = 0 plane, i.e. replace x by
−x. Subtract that from the original solution, and you have a solution that is
zero at x = 0. And because of periodicity and odd symmetry, it will also be
zero at x = ℓx . Repeat these steps in the y and z directions. It will produce
energy eigenfunctions for electrons confined to a box 0 < x < ℓx , 0 < y < ℓy ,
0 < z < ℓz . This method works as long as the lattice potential has enough
symmetry that it does not change during the flip operations.
The approach will be to start with the solutions for force-free electrons and
see how they change if a small, but nonzero lattice potential is added to the
motion. It will be a “nearly-free electron model.” Consider a sample Floquet
wave number as shown by the red dot in the wave number space figure 7.30.
If there is no lattice potential, the corresponding energy eigenfunction is the
free-electron one,
e 1
ψ~k,0 =q ei(kx x+ky y+kz z)
8ℓx ℓy ℓz
where the subscript zero merely indicates that the lattice potential is zero. (This
310 CHAPTER 7. SOLIDS

ky

kx

Figure 7.30: The red dot shows the wavenumber vector of a sample free electron
wave function. It is to be corrected for the lattice potential.

section will use the extended zone scheme because it is mathematically easiest.)
If there is a lattice potential, the eigenfunction will change into a Bloch one of
the form
e
ψ~ke = ψp,~k e
i(kx x+ky y+kz z)

e
where ψp,~k is periodic on an atomic scale. If the lattice potential is weak, as
assumed here,
e 1
ψp,~k ≈ q
8ℓx ℓy ℓz
Also, the energy will be almost the free-electron one:

e e h̄2 2
E~k ≈ E~k,0 = k
2me
However, that is not good enough. The interest here is in the changes in the
energy due to the lattice potential, even if they are weak. So the first thing will
be to figure out these energy changes.

7.8.1 Energy changes due to a weak lattice potential


Finding the energy changes due to a small change in a Hamiltonian can be done
by a mathematical technique called “perturbation theory.” A full description
and derivation are in chapter 10.1 and {A.95}. This subsection will simply state
the needed results.
7.8. NEARLY-FREE ELECTRONS 311

The effects of a small change in a Hamiltonian, here being the weak lat-
tice potential, are given in terms of the so-called “Hamiltonian perturbation
coefficients” defined as
e e
H~k~k ≡ hψ~k,0 |V ψ~k,0 i (7.18)
e
where V is the lattice potential, and the ψ~k,0 are the free-electron energy eigen-
functions.
In those terms, the energy of the eigenfunction ψ~k with Floquet wave number
~k is
e e X |H~k~k |2
E~k ≈ E~k,0 + H~k~k − + ... (7.19)
~ ~
E~ek,0 − E~ek,0
k6=k

Here E~ek,0 is the free-electron energy. The dots stand for contributions that can
be ignored for sufficiently weak potentials.
The first correction to the free-electron energy is the Hamiltonian pertur-
bation coefficient H~k~k . However, by writing out the inner product, it is seen
that this perturbation coefficient is just the average lattice potential. Such a
constant energy change is of no particular physical interest; it can be eliminated
by redefining the zero level of the potential energy.

ky

kx

Figure 7.31: The grid of nonzero Hamiltonian perturbation coefficients and the
problem sphere in wave number space.

That makes the sum in (7.19) the physically interesting change in energy.
Now, unlike it seems from the given expression, it is not really necessary to
sum over all free-electron energy eigenfunctions ψ~k,0 . The only Hamiltonian
perturbation coefficients that are nonzero occur for the ~k values shown in figure
312 CHAPTER 7. SOLIDS

7.31 as blue stars. They are spaced apart by amounts J in each direction, where
J is the large number of physical lattice cells in that direction. These claims
can be verified by writing the lattice potential as a Fourier series and then
integrating the inner product. More elegantly, you can use the observation from
chapter 10.1.3 that the only eigenfunctions that need to be considered are those
with the same eigenvalues under displacement over the primitive vectors of the
lattice. (Since the periodic lattice potential is the same after such displacements,
these displacement operators commute with the Hamiltonian.)
The correct expression for the energy change has therefor now been identi-
fied. There is one caveat in the whole story, though. The above analysis is not
e
justified if there are eigenfunctions ψ~k,0 on the grid of blue stars that have the
e e
same free-electron energy E~k,0 as the eigenfunction ψ~k,0 . You can infer the prob-
lem from (7.19); you would be dividing by zero if that happened. You would
have to fix the problem by using so-called “singular perturbation theory,” which
is much more elaborate.
Fortunately, since the grid is so widely spaced, the problem occurs only for
relatively few energy eigenfunctions ψ~ke . In particular, since the free-electron
energy E~ek,0 equals h̄2 k 2 /2me , the square magnitude of ~k would have to be the
same as that of ~k. In other words, ~k would have to be on the same spherical
surface around the origin as point ~k. So, as long as the grid has no points other
than ~k on the spherical surface, all is OK.

7.8.2 Discussion of the energy changes


The previous subsection determined how the energy changes from the free elec-
tron gas values due to a small lattice potential. It was found that an energy
level E~ek,0 without lattice potential changes due to the lattice potential by an
amount:
e X |H~k~k |2
∆E~k = − e e (7.20)
~ ~
E ~
k,0 − E ~
k,0
k6=k

where the H~k~k were coefficients that depend on the details of the lattice poten-
tial; ~k was the wave number vector of the considered free electron gas solution,
shown as a red dot in the wavenumber space figure 7.31, ~k was an summation
index over the blue grid points of that figure, and E~ek,0 and E~ek,0 were propor-
tional to the square distances from the origin to points ~k, respectively ~k. E~ek,0
is also the energy level of the eigenfunction without lattice potential.
The expression above for the energy change is not valid when E~ek,0 = E~ek,0 ,
in which case it would incorrectly give infinite change in energy. However, it is
does apply when E~ek,0 ≈ E~ek,0 , in which case it predicts unusually large changes
7.8. NEARLY-FREE ELECTRONS 313

in energy. The condition E~ek,0 ≈ E~ek,0 means that a blue star ~k on the grid in
figure 7.31 is almost the same distance from the origin as the red point ~k itself.

ky

kx

Figure 7.32: Tearing apart of the wave number space energies.

One case for which this happens is when the wave number vector ~k is right
next to one of the boundaries of the Wigner-Seitz cell around the origin. When-
ever a ~k is on the verge of leaving this cell, one of its lattice points is on the
verge of getting in. As an example, figure 7.32 shows two neighboring states
~k straddling the right-hand vertical plane of the cell, as well as their lattice
~k-values that cause the unusually large energy changes.
For the left of the two states, E~ek,0 is just a bit larger than E~ek.0 , so the
energy change (7.20) due to the lattice potential is large and negative. All
energy decreases will be represented graphically by moving the points towards
the origin, in order that the distance from the origin continues to indicate the
energy of the state. That means that the left state will move strongly towards
the origin. Consider now the other state just to the right; E~ek,0 for that state
is just a bit less than E~ek,0 , so the energy change of this state will be large and
positive; graphically, this point will move strongly away from the origin. The
result is that the energy levels are torn apart along the surface of the Wigner-
Seitz cell.
That is illustrated for an arbitrarily chosen example lattice potential in figure
7.33. It is another reason why the Wigner-Seitz cell around the origin, i.e. the
first Brillouin zone, is particularly important. For different lattices than the
simple cubic one considered here, it is still the distance from the origin that is
the deciding factor, so in general, it is the Wigner-Seitz cell, rather than some
314 CHAPTER 7. SOLIDS

Figure 7.33: Effect of a lattice potential on the energy. The energy is represented
by the square distance from the origin, and is relative to the energy at the origin.

parallelepiped-shaped primitive cell along whose surfaces the energies get torn
apart.
But notice in figure 7.33 that the energy levels get torn apart along many
more surfaces than just the surface of the first Brillouin zone. In general, it
can be seen that tears occur in wave number space along all the perpendicular
bisector planes, or Bragg planes, between the points of the reciprocal lattice and
the origin. Figure 7.34 shows their intersections with the cross section kz = 0
as thin black lines. The kx and ky axes were left away to clarify that they do
not hide any lines.
Recall that the Bragg planes are also the boundaries of the fragments that
make up the various Brillouin zones. In fact the first Brillouin zone is the
cube or Wigner-Seitz cell around the origin; (the square around the origin in
the cross section figure 7.34). The second zone consists of six pyramid-shaped
regions whose bases are the faces of the cube; (the four triangles sharing a side
with the square in the cross section figure 7.34). They can be pushed into the
first Brillouin zone using the fundamental translation vectors to combine into a
Wigner-Seitz cell shape.
For a sufficiently strong lattice potential like the one in figure 7.33, the
energy levels in the first Brillouin zone, the center patch, are everywhere lower
than in the remaining areas. Electrons will then occupy these states first, and
since there are J × J × J spatial states in the zone, two valence electrons per
physical lattice cell will just fill it, figure 7.35. That produces an insulator whose
electrons are stuck in a filled valence band. The electrons must jump an finite
7.8. NEARLY-FREE ELECTRONS 315

Figure 7.34: Bragg planes seen in wave number space cross section.

ky

kx

Figure 7.35: Occupied states for the energies of figure 7.33 if there are two
valence electrons per lattice cell. Left: energy. Right: wave numbers.
316 CHAPTER 7. SOLIDS

ky

kx

ky

kx

ky

kx

Figure 7.36: Smaller lattice potential. From top to bottom shows one, two and
three valence electrons per lattice cell. Left: energy. Right: wave numbers.
7.9. QUANTUM STATISTICAL MECHANICS 317

energy gap to reach the outlying regions if they want to do anything nontrivial.
Since no particular requirements were put onto the lattice potential, the forming
of bands is self-evidently a very general process.
The wave number space in the right half of figure 7.35 also illustrates that a
lattice potential can change the Floquet wave number vectors that get occupied.
For the free electron gas, the occupied states formed a spherical region in terms
of the wave number vectors, as shown in the middle of figure 7.27, but here the
occupied states have become a cube, the Wigner-Seitz cell around the origin.
The Fermi surface seen in the extended zone scheme is now no longer a spherical
surface, but consists of the six faces of this cell.
But do not take this example too literally: the small-perturbation analysis is
invalid for the strong potential required for an insulator, and a real picture would
look very different. The example is given just to illustrate that the nearly-free
electron model can indeed describe band gaps if taken far enough.
The nearly-free electron model is more reasonable for the smaller lattice
forces experienced by valence electrons in metals. For example, at reduced
strength, the same potential as before produces figure 7.36. Now the electrons
have no trouble finding states of slightly higher energy, as it should be for a
metal. Note, incidentally, that the Fermi surfaces in the right-hand graphs
seem to meet the Bragg planes much more normally than the spherical free
electron surface. That leads to smoothing out of the corners of the surface seen
in the periodic zone scheme. For example, imagine the center zone of the one
valence electron wave number space periodically continued.

7.9 Quantum Statistical Mechanics


As various observations in previous sections indicate, it is not possible to solve
the equations of quantum mechanics exactly and completely unless it is a very
small number of particles under very simple conditions. Even then, “exactly”
probably means “numerically exactly”, not analytically. Fortunately, there is
good news: statistical mechanics can make meaningful predictions about the
behavior of large numbers of particles without trying to write down the solution
for every single particle.
A complete coverage is beyond the scope of this section, but some key results
for systems of weakly interacting particles should be mentioned. At absolute
zero temperature a system of identical fermions such as electrons completely
fills the lowest available energy states, as shown in figure 7.37. There will be
one electron per state in the lowest states, (assuming that spin is included in
the state count, otherwise two electrons per spatial state.) The higher energy
states remain unoccupied. In other words, there is one electron per state below
a dividing energy level and zero electrons per state above that energy level. The
318 CHAPTER 7. SOLIDS

dividing energy level between occupied and empty states is the Fermi energy.

Ee Ee Ee unoccupied states

occupied states

free
electron metal insulator
gas

Figure 7.37: Sketch of electron energy spectra in solids at absolute zero tem-
perature.

Ee Ee Ee unoccupied states

occupied states

free
electron metal insulator
gas

Figure 7.38: Sketch of electron energy spectra in solids at a nonzero tempera-


ture.

As figure 7.38 shows, for temperatures T greater than absolute zero, thermal
energy allows at least some electrons to move to higher energy levels. The law
of statistical mechanics that tells you how many, on average, is the so-called
“Fermi-Dirac distribution”; it predicts the average number ι of fermions per
single-particle state to be
1
Fermi-Dirac distribution: ιf = p (7.21)
e(E −µ)/kB T +1
where E p is the energy level of the state, (a superscript p, for particle, is
used instead of e since the formula applies to all fermions, not just electrons),
kB = 1.38 10−23 J/K is the Boltzmann constant, and µ is some function of
the temperature T and particle density that is called the “chemical potential.”
Derivations of this distribution may be found in chapter 8, three of them to be
precise. You cannot be sure enough.
Consider the algebra of the formula. First of all, ιf cannot be more than
one, because the exponential is greater than zero. This is as it should be:
7.9. QUANTUM STATISTICAL MECHANICS 319

according to the Pauli exclusion principle there can be at most one electron in
each state, so the average per state ι must be one or less. Next, at absolute zero
the chemical potential µ equals the Fermi energy. As a result, at absolute zero
temperature, energy levels E p above the Fermi energy have the argument of
the exponential infinite, (a positive number divided by an infinitesimally small
positive number), hence the exponential infinite, hence ι = 0; there are zero
particles in those states. On the other hand, for energy states below the Fermi
level, the argument of the exponential is minus infinity, hence the exponential
is zero, hence ι = 1; those states have one electron in every state. That is just
what it should be according to figure 7.37.
At a temperature slightly higher than zero, not much changes in this math,
except for states close to the Fermi energy. In particular, as indicated in figure
7.38, on average states just below the Fermi energy will lose some electrons to
states just above the Fermi energy. Simply put, the electrons in states just
below the Fermi level are close enough to the empty states to jump to them.
The affected energy levels extend over a range of roughly kB T height around
the Fermi energy. Typically, even at normal temperatures the affected range is
still small compared to the Fermi energy itself. Hence there are few electrons
that pick up on the order of kB T of thermal energy. And that in turn means
that at normal temperatures, the electrons do not contribute much to the heat
capacity.
For an insulator, the band-gap is so large compared to the kB T -size ther-
mal excitations that virtually no electrons reach the unoccupied states. The
electrons remain frozen in the same states as they have at zero temperature. It
may be noted that for an insulator, the Fermi energy level will be located in the
middle of the band gap. That follows from the requirement that the number of
electrons that leave the valence band, whatever few there might be, must be the
same number that show up in the conduction band. A similar idea determines
how µ changes with temperature for metals.
Identical bosons satisfy quite different statistics than fermions. For bosons,
the number of particles outside the ground state satisfies the “Bose-Einstein
distribution:”
1
Bose-Einstein distribution: ιb = p (7.22)
e(E −µ)/kB T −1
Note that bosons do not satisfy the exclusion principle; there can be multiple
bosons per state. Also note that the chemical potential µ in Bose-Einstein
distributions cannot exceed the lowest energy level, because the average number
ιb of particles in an energy state cannot be negative.
In fact, when the chemical potential does reach the lowest energy level,
(something that has been observed to occur at extremely low temperatures),
the Bose-Einstein distribution indicates that the number of particles in the
320 CHAPTER 7. SOLIDS

lowest energy state becomes arbitrarily large; too large to still be described
correctly by the distribution, in fact. If this happens, it is called “Bose-Einstein
condensation.” Below the temperature at which it happens, an increasing, finite,
fraction of the bosons then pile up together into the ground state of lowest
energy.
As various texts point out, this is presumably what happens to normal liquid
helium, called helium I, when it becomes “superfluid” helium II below 2.17 K.
Superfluid helium II moves freely through the tiniest pores without the internal
friction that other fluids have. In fact, if you want to keep it inside a container,
you better make sure it is really sealed. In a related effect, superconductivity, the
electrons link up into pairs that behave like bosons and move without resistance.
However, there are some problems with the hypothesis that superfluidity is due
to Bose-Einstein condensation {A.62}.
Turning to more established uses of the Bose-Einstein distribution, the aver-
age energy per state, relative to the chemical potential, follows from the Bose-
Einstein expression by simply multiplying by E p − µ. But watch what happens
to the resulting expression when kB T is large compared to the energy of the
states:
p Ep − µ
Ē = (E p −µ)/k T ∼ kB T when kB T >> E p − µ (7.23)
e B −1
from Taylor series expansion of the exponential. Now classical analysis would
be expected to apply when Planck’s constant is no longer relevant. Since
Planck’s constant is reflected in in the discrete energy levels E p , classical anal-
ysis apparently applies when kB T >> E p − µ and the energy becomes pro-
portional to kB T . Conversely, then, where classical analysis would predict an
amount kB T of energy for bosons, quantum mechanics apparently makes that
p
(E p − µ)/(e(E −µ)/kB T − 1) with E p the energy per state.
This is exactly what happens in the case of “blackbody radiation,” the elec-
tromagnetic radiation emitted by an idealized black body at a nonzero tem-
perature. Before the advent of quantum mechanics, Rayleigh and Jeans had
computed using classical physics that the energy emitted would vary with elec-
tromagnetic frequency ω and temperature T as

ω2
I(ω) = kB T
4π 2 c2
where kB = 1.38 10−23 J/K is the Boltzmann constant and c the speed of light.
That was clearly all wrong except at very low frequencies. For one thing, the
radiation would become infinite at infinite frequencies!
It was this very problem that led to the beginning of quantum mechan-
ics, because to fix it, Planck in 1900 made the unprecedented assumption that
energy would not come in arbitrary amounts, but only in discrete chunks of
7.9. QUANTUM STATISTICAL MECHANICS 321

size h̄ω. More specifically, Einstein, in his 1905 explanation of the photoelec-
tric effect, proposed that that equation should hold for photons, the particles
of electromagnetic radiation. Photons are bosons, (relativistic ones, to be sure,
but still bosons), so quantum mechanics replaces kB T in the classical expression
p
by E p /(eE /kB T − 1) with E p = h̄ω. (The chemical potential is a consequence
of conservation of the number of bosons and does not apply to photons, whose
number is not conserved. See chapter chapter 8 for more.) The replacement
gives Planck’s expression for blackbody radiation:
ω2 h̄ω
I(ω) = 2 2 h̄ω/k
4π c e BT − 1

The then unknown constant h̄ could be found from comparing with the ex-
perimentally measured radiation. For low frequencies, the final ratio is about
kB T , giving the Rayleigh-Jeans result, but for high frequencies it is much less
because of the rapid growth of the exponential for large values of its argument.
See chapter 8.14.5 for a derivation of these formulae.
For high-enough energy levels, or whenever the number of particles is small
enough compared to the number of available energy states, both the Fermi-Dirac
and Bose-Einstein distributions simplify to the classical “Maxwell-Boltzmann
distribution:”
p
Maxwell-Boltzmann distribution: ιd = e−(E −µ)/kB T
(7.24)
which applies to distinguishable particles of the same type, and which was,
essentially, derived well before the advent of quantum mechanics. It says that
the number of particles in a state decreases exponentially with the energy of
the state. (The factor eµ/kB T is the same for all energies.) A classical example
is the decrease of density with height in the atmosphere. In an equilibrium
(i.e. isothermal) atmosphere, the number of molecules per unit volume at a
given height h is proportional to e−mgh/kB T where mgh is the potential energy
of the molecules. (Note that the kinetic energy is the same at all heights because
the temperature is, and plays no part here.)
However, quantum mechanics adds the notion of discrete energy states. If
there are more energy states at a given energy, there are going to be more
particles at that energy, because (7.24) is per state. For example, the ratio of
the number I2 of hydrogen atoms thermally excited to energy E2 to the number
I1 in the ground state E1 is, modeled as distinguishable particles, [16],
I2 8
= e−(E2 −E1 )/kB T
I1 2
since there are eight 2s and 2p states, and only two 1s states. Note that kB T is
about 0.025 eV at room temperature and E2 − E1 is 10.2 eV, so there are not
going to be any thermally excited atoms at room temperature.
322 CHAPTER 7. SOLIDS

How about that common factor eµ/kB T ? Consider once more the classical
example of the isothermal atmosphere, with the particles at a given height
proportional to eµ/kB T e−mgh/kB T . Suppose you confine a bunch of molecules at
a given height to a box with a few small holes in it and then give them a chemical
potential µ that is too high. There will then be too many molecules in the box
for that height, and they will diffuse away through the small holes. Apparently,
then, too high a chemical potential promotes particle diffusion away from a site,
just like too high a temperature promotes energy diffusion away from a site.
The chemical potential is an important quantity; it is also related to the
work a device can produce and to phase and chemical equilibria. See chapter 8
for more information on this and many other topics in thermodynamics.

7.10 Additional Points [Descriptive]


This section mentions some additional very basic issues in the quantum me-
chanics of solids. The purpose is not to provide a real discussion, but just to
point out that they exist, as you are likely to run into them eventually.

7.10.1 Thermal properties


This section will look at some of the thermal properties of solids, and in partic-
ular their specific heat. The specific heat at constant volume, Cv , of a substance
is the thermal energy that gets stored internally in the substance per unit tem-
perature rise and per unit amount of substance.
Before examining solids, first consider simple monatomic ideal gases, and
in particular noble gases. Basic physics classes show that for an ideal gas,
the molecules have 21 kB T of translational kinetic energy in each of the three
directions of a Cartesian coordinate system, where kB = 1.38 10−23 J/K is
Boltzmann’s constant. So the specific heat per molecule is 23 kB . For a kmol
(6.02 1026 ) of molecules instead of one, kB becomes the “universal gas constant”
Ru = 8.31 kJ/kmol K. Hence for a

monatomic ideal gas: C̄v = 32 Ru = 12.5 kJ/kmol K (7.25)

on a kmol basis. This is very accurate for all the noble gases, including helium.
(To get the more usual specific heat Cv per kilogram instead of kmol, divide
by the molecular mass M . For example, for helium with two protons and two
neutrons in its nucleus, the molecular mass is about 4 kg/kmol, so divide by 4.)
Many important ideal gases, such as hydrogen, as well as the oxygen and
nitrogen that make up air, are diatomic. Classical physics, in particular the
“equipartition theorem,” would then predict 27 kB as the specific heat per mol-
ecule; 23 kB of kinetic energy for each atom, plus 12 kB of potential energy in
7.10. ADDITIONAL POINTS [DESCRIPTIVE] 323

the internal vibration of the pairs of atoms towards and away from each other.
However, experimental values do not at all agree. (And this assumes a simplified
analysis in terms of the dynamics of atoms with all their mass in their nuclei,
and the vibrations between the pairs of atoms modeled as a harmonic oscillator.
As Maxwell noted, if you really take classical theory at face value, things get
much worse still, since every internal part of the atoms would have to absorb
their own thermal energy too.)

3.5 Ru Br2
Cl2 O2
N2
F2 H2
2.5 Ru

1.5 Ru
He, Ne, Ar, Kr, . . .

0 300 K T (absolute)

Figure 7.39: Specific heat at constant volume of gases. Temperatures from


absolute zero to 1200 K. Data from NIST-JANAF and AIP.

Hydrogen in particular was a mystery before the advent of quantum mechan-


ics: at low temperatures it would behave as a monatomic gas, with a specific
heat of 32 kB per molecule, figure 7.39. That meant that the molecule had to be
translating only, like a monatomic gas. How could the random thermal motion
not cause any angular rotation of the two atoms around their mutual center of
gravity, nor vibration of the atoms towards and away from each other?
Quantum mechanics solved this problem. In quantum mechanics the angu-
lar momentum of the molecule, as well as the harmonic oscillation energy levels,
are quantized. For hydrogen at low temperatures, the typical available thermal
energy 12 kB T is not enough to reach the next level for either. No energy can
therefor be put into rotation of the molecule, nor in increased internal vibra-
tion. So hydrogen does indeed have the specific heat of monatomic gases at low
temperatures, weird as it may seem. The rotational and vibrational motion are
“frozen out.”
At normal temperatures, there is enough thermal energy to reach nonzero
angular momentum states, but not higher vibrational ones, and the specific heat
324 CHAPTER 7. SOLIDS

becomes

typical diatomic ideal gas: C̄v = 52 Ru = 20.8 kJ/kmol K. (7.26)

Actual values for hydrogen, nitrogen and oxygen at room temperature are 2.47,
2.50, and 2.53 Ru .
For high enough temperature, the vibrational modes will start becoming
active, and the specific heats will start inching up towards 3.5 Ru (and beyond),
figure 7.39. But it takes to temperatures of 1000 K (hydrogen), 600 K (nitrogen),
or 400 K (oxygen) before there is a 5% deviation from the 2.5 Ru value.
These differences may be understood from the solution of the harmonic
oscillator derived in chapter 2.6. The energy levels of an harmonic oscillator
are apart by an amount h̄ω, where q ω is the angular frequency. Modeled as a
simple spring-mass system, ω = c/m, where c is the equivalent spring stiffness
and m the equivalent mass. So light atoms that are bound tightly will require
a lot of energy to reach the second vibrational state. Hydrogen is much lighter
than nitrogen or oxygen, explaining the higher temperature before vibration
become important for it. The molecular masses of nitrogen and oxygen are
similar, but nitrogen is bound with a triple bond, and oxygen only a double
one. So nitrogen has the higher stiffness of the two and vibrates less readily.
Following this reasoning, you would expect fluorine, which is held together
with only a single covalent bond, to have a higher specific heat still, and figure
7.39 confirms it. And chlorine and bromine, also held together by a single
covalent bond, but heavier than fluorine, approach the classical value 3.5 Ru
fairly closely at normal temperatures: Cl2 has 3.08 Ru and Br2 3.34 Ru .
For solids, the basic classical idea in terms of atomic motion would be that
there would be 23 Ru per atom in kinetic energy and 32 Ru in potential energy:

law of Dulong and Petit: C̄v = 3Ru = 25 kJ/kmol K. (7.27)

Not only is this a nice round number, it actually works well for a lot of relatively
simple solids at room temperature. For example, aluminum is 2.91 Ru , copper
2.94, gold 3.05, iron 3.02. (Note that typically for solids C̄p , the heat added per
unit temperature change at constant pressure is given instead of C̄v . However,
unlike for gases, the difference between C̄p and C̄v is small for solids and will be
ignored here.)
Dulong and Petit also works for liquid water if you take it per kmol of
atoms, rather than kmol of molecules, but not for ice. Ice has 4.6 Ru per kmol
of molecules and 1.5 Ru per kmol of atoms. For molecules, certainly there is
an obvious problem in deciding how many pieces you need to count as indepen-
dently moving units. A value of 900 Ru for paraffin wax (per molecule) found at
Wikipedia may sound astonishing, until you find elsewhere at Wikipedia that
its chemical formula is C25 H52 . It is still quite capable of storing a lot of heat
7.10. ADDITIONAL POINTS [DESCRIPTIVE] 325

per unit weight too, in any case, but nowhere close to hydrogen. Putting 52 kB T
in a molecule with the tiny molecular mass of just about two protons is the real
way to get a high heat content per unit mass.
Complex molecules may be an understandable problem for the law of Dulong
and Petit, but how come that diamond has about 0.73 Ru , and graphite 1.02
Ru , instead of 3 as it should? No molecules are involved there. The values of
boron at 1.33 Ru and beryllium at 1.98 Ru are much too low too, though not
as bad as diamond or graphite.

3 Ru

Fe C
Cu Si
Au Pb
Li B
Be Al H2 O
0 300 K T (absolute)

Figure 7.40: Specific heat at constant pressure of solids. Temperatures from


absolute zero to 1200 K. Carbon is diamond; graphite is similar. Water is ice
and liquid. Data from NIST-JANAF, CRC, AIP, Rohsenow et al.

Actually, it turns out, figure 7.40, that at much higher temperatures diamond
does agree nicely with the Dulong and Petit value. Conversely, if the elements
that agree well with Dulong and Petit at room temperature are cooled to low
temperatures, they too have a specific heat that is much lower than the Dulong
and Petit value. For example, at 77 K, aluminum has 1.09 Ru , copper 1.5, and
diamond 0.01.
It turns out that for all of them a characteristic temperature can by found
above which the specific heat is about the Dulong and Petit value, but below
which the specific heat starts dropping precariously. This characteristic temper-
ature is called the Debye temperature. For example, aluminum, copper, gold,
and iron have Debye temperatures of 394, 315, 170, and 460 K, all near or below
room temperature, and their room temperature specific heats agree reasonably
with the Dulong and Petit value. Conversely, diamond, boron, and beryllium
have Debye temperatures of 1860, 1250, and 1000 K, and their specific heats
326 CHAPTER 7. SOLIDS

are much too low at room temperature.


The lack of heat capacity below the Debye temperature is again a matter of
“frozen out” vibrational modes, like the freezing out of the vibrational modes
that gave common diatomic ideal gases a heat capacity of only 52 Ru instead of
7
R . Note for example that carbon, boron and beryllium are light atoms, and
2 u
that the diamond structure is particularly stiff, just the properties that froze
out the vibrational modes in diatomic gas molecules too. However, the actual
description is more complex than for a gas: if all vibrations were frozen out in
a solid, there would be nothing left.
Atoms in a solid cannot be considered independent harmonic oscillators like
the pairs of atoms in diatomic molecules. If an atom in a solid moves, its neigh-
bors are affected. The proper way to describe the motion of the atoms is in terms
of crystal-wide vibrations, such as those that in normal continuum mechanics
describe acoustical waves. There are three variants of such waves, correspond-
ing to the three independent directions the motion of the atoms can take with
respect to the propagation direction of the wave. The atoms can move in the
same direction, like in the acoustics of air in a pipe, or in a direction normal
to it, like surface waves in water. Those are called longitudinal and transverse
waves respectively. If there is more than one atom in the basis from which the
solid crystal is formed, the atoms in a basis can also vibrate relative to each
other’s position in high-frequency vibrations called optical modes. However,
after such details are accounted for, the classical internal energy of a solid is
still the Dulong and Petit value.
Enter quantum mechanics. Just like quantum mechanics says that the energy
of vibrating electromagnetic fields of frequency ω comes in discrete units called
photons, with energy h̄ω, it says that the energy of crystal vibrations comes in
discrete units called “phonons” with energy h̄ω. As long as the typical amount
of heat energy, kB T , is larger than the largest of such phonon energies, the fact
that the energy levels are discrete make no real difference, and classical analysis
works fine. But for lower temperatures, there is not enough energy to create
the high-energy phonons and the specific heat will be less. The representative
temperature TD at which the heat energy kB TD becomes equal to the highest
phonon energies h̄ω is the Debye temperature. (The Debye analysis is not
exact except for low energies, and the definitions of Debye temperature vary
somewhat. See chapter 8.14.6 for more details.)
Quantum mechanics did not just solve the low temperature problems for
heat capacity; it also solved the electron problem. That problem was that
classically electrons in at least metals too should have 32 kB T of kinetic energy,
since electrical conduction meant that they moved independently of the atoms.
But observations showed it was simply not there. The quantum mechanical
explanation was the Fermi-Dirac distribution of figure 7.38: only a small fraction
of the electrons have free energy states above them within a distance of order
7.10. ADDITIONAL POINTS [DESCRIPTIVE] 327

kB T , and only these can take on heat energy. Since so few electrons are involved,
the amount of energy they absorb is negligible except at very low temperatures.
At very low temperatures, the energy in the phonons becomes very small, and
the conduction electrons in metals then do make a difference.
Also, when the heat capacity due to the atom vibrations levels off to the Du-
long and Petit value, that of the valence electrons keeps growing. Furthermore,
at higher temperatures the increased vibrations lead to increased deviations in
potential from the harmonic oscillator relationship. Wikipedia, Debye model,
says anharmonicity causes the heat capacity to rise further; apparently authori-
tative other sources say that it can either increase or decrease the heat capacity.
In any case, typical solids do show an increase of the heat capacity above the
Dulong and Petit value at higher temperatures, figure 7.40.
As far as heat conduction is concerned, phonons as well as valence electrons
in metals can conduct heat. For example, diamond, an electric insulator, is a
supreme conductor of heat. Since heat conduction is no longer a monopoly of
the electrons, but the atoms can do it too, there are no thermal insulators that
are anywhere near as effective as electric insulators. Practical thermal insulators
are highly porous materials whose volume consists largely of voids.

7.10.2 Ferromagnetism
Magnetism in all its myriad forms and complexity is far beyond the scope of
this book. But there is one very important fundamental quantum mechanics
issue associated with ferromagnetism that has not yet been introduced.
Ferromagnetism is the plain variety of magnetism, like in refrigerator mag-
nets. Ferromagnetic solids like iron are of great engineering interest. They can
significantly increase a magnetic field and can stay permanently magnetized
even in the absence of a field. The fundamental quantum mechanics issue has
to do with why they produce magnetic fields in the first place.
The source of the ferromagnetic field is the electrons. Electrons have spin,
and just like a classical charged particle that is spinning around in a circle
produces a magnetic field, so do electrons act as little magnets. A free iron
atom has 26 electrons, each with spin 21 . But two of these electrons are in the
1s states, the K shell, where they combine into a singlet state with zero net spin
which produces no magnetic field. Nor do the two 2s electrons and the six 2p
electrons in the L shell, and the two 3s electrons and six 3p electrons in the M
shell and the two 4s electrons in the N shell produce net spin. All of that lack of
net spin is a result of the Pauli exclusion principle, which says that if electrons
want to go two at a time into the lowest available energy states, they must do it
as singlet spin states. And these filled subshells produce no net orbital angular
momentum either, having just as many positive as negative orbital momentum
states filled in whatever way you look at it.
328 CHAPTER 7. SOLIDS

However, iron has a final six electrons in 3d states, and the 3d states can
accommodate ten electrons, five for each spin direction. So only two out of
the six electrons need to enter the same spatial state as a zero spin singlet.
The other four electrons can each go into their private spatial state. And the
electrons do want to do so, since by going into different spatial states, they can
stay farther away from each other, minimizing their mutual Coulomb repulsion
energy.
According to the simplistic model of non-interacting electrons that was used
to describe atoms in chapter 4.9, these last four electrons can then have equal
or opposite spin, whatever they like. But that is wrong. The four electrons
interact through their Coulomb repulsion, and it turns out that they achieve
the smallest energy when their spatial wave function is antisymmetric under
particle exchange.
(This is just the opposite of the conclusion for the hydrogen molecule, where
the symmetric spatial wave function had the lowest energy. The difference is that
for the hydrogen molecule, the dominant effect is the reduction of the kinetic
energy that the symmetric state achieves, while for the single-atom states, the
dominant effect is the reduction in electron to electron Coulomb repulsion that
the antisymmetric wave function achieves. In the antisymmetric spatial wave
function, the electrons stay further apart on average.)
If the spatial wave function of the four electrons takes care of the antisym-
metrization requirement, then their spin state cannot change under particle
exchange; they all must have the same spin. This is known as “Hund’s first
rule:” electron interaction makes the net spin as big as the exclusion principle
allows. The four unpaired 3d electrons in iron minimize their Coulomb energy
at the price of having to align all four of their spins. Which means their spin
magnetic moments add up rather than cancel each other. {A.63}.
Hund’s second rule says that the electrons will next maximize their orbital
angular momentum as much as is still possible. And according to Hund’s third
rule, this orbital angular momentum will add to the spin angular momentum
since the ten 3d states are more than half full. It turns out that iron’s 3d
electrons have the same amount of orbital angular momentum as spin, however,
orbital angular momentum is only about half as effective at creating a magnetic
dipole.
Also, the magnetic properties of orbital angular momentum are readily
messed up when atoms are brought together in a solid, and more so for transi-
tion metals like iron than for the lanthanoid series, whose unfilled 4f states are
buried much deeper inside the atoms. In most of the common ferromagnets, the
orbital contribution is negligible small, though in some rare earths there is an
appreciable orbital contribution.
Guessing just the right amounts of net spin angular momentum, net orbital
angular momentum, and net combined angular momentum for an atom can
7.10. ADDITIONAL POINTS [DESCRIPTIVE] 329

be tricky. So, in an effort make quantum mechanics as readily accessible as


possible, physicists provide the data in an intuitive hieroglyph. For example
5
D4

gives the angular momentum of the iron atom. The 5 indicates that the spin
angular momentum is 2. To arrive at 5, the physicists multiply by 2, since spin
can be half integer and it is believed that many people doing quantum mechanics
have difficulty with fractions. Next 1 is added to keep people from cheating and
mentally dividing by 2 – you must subtract 1 first. (Another quick way of getting
the actual spin: write down all possible values for the spin in increasing order,
and then count until the fifth value. Start counting from 1, of course, because
counting from 0 is so computer science.) The D intimates that the orbital
angular momentum is 2. To arrive at D, physicists write down the intuitive
sequence of letters S, P, D, F, G, H, I, K, . . . and then count, starting from zero,
to the orbital angular momentum. Unlike for spin, here it is not the count,
but the object being counted that is listed in the hieroglyph; unfortunately
the object being counted is letters, not angular momentum. Physicists assume
that after having practiced counting spin states and letters, your memory is
refreshed about fractions, and the combined angular momentum is simply listed
by value, 4 for iron. Listing spin and combined angular momentum in two
different formats achieves that the class won’t notice the error if the physics
professor misstates the spin or combined angular momentum for an atom with
zero orbital momentum. Also, combined angular momentum is not all that
meaningful as a number by itself, so stating the correct amount will not give
away too much of a secret.
On to the solid. The atoms act as little magnets because of their four aligned
electron spins and net orbital angular momentum, but why would different
atoms want to align their magnetic poles in the same direction in a solid? If
they don’t, there is not going to be any macroscopically significant magnetic
field. The logical reason for the electron spins of different atoms to align would
seem to be that it minimizes the magnetic energy. However, if the numbers
are examined, any such aligning force is far too small to survive random heat
motion at normal temperatures.
The primary reason is without doubt again the same weird quantum me-
chanics as for the single atom. Nature does not care about magnetic align-
ment or not; it is squirming to minimize its Coulomb energy under the massive
constraints of the antisymmetrization requirement. By aligning electron spins
globally, it achieves that electrons can stay farther apart spatially. {A.64}.
It is a fairly small effect; among the pure elements, it really only works
under normal operating temperatures for cobalt and its immediate neighbors
in the periodic table, iron and nickel. And alignment is normally not achieved
330 CHAPTER 7. SOLIDS

throughout a bulk solid, but only in microscopic zones, with different zones
having different alignment. But any electrical engineer will tell you it is a very
important effect anyway. For one since the zones can be manipulated with a
magnetic field.
And it clarifies that nature does not necessarily select singlet states of op-
posite spin to minimize the energy, despite what the hydrogen molecule and
helium atom might suggest. Much of the time, aligned spins are preferred.

7.10.3 X-ray diffraction


You may wonder how so much is known about the crystal structure of solids
in view of the fact that the atoms are much too small to be seen with visible
light. In addition, because of the fact that the energy levels get smeared out
into bands, like in figure 7.13, solids do not have those tell-tale line spectra that
are so useful for analyzing atoms and molecules.
To be precise, while the energy levels of the outer electrons of the atoms get
smeared out, those of the inner electrons do not do so significantly, and these
do produce line spectra. But since the energy levels of the inner electrons are
very high, transitions involving inner electrons do not produce visible light, but
X-rays.
There is a very powerful other technique for studying the crystal structure
of atoms, however, and it also involves X-rays. In this technique, called X-ray
diffraction, an X-ray is trained on a crystal from various angles, and the way
the crystal scatters the X-ray is determined.
There is no quantum mechanics needed to describe how this works, but
a brief description may be of value anyway. If you want to work in nano-
technology, you will inevitably run up against experimental work, and X-ray
diffraction is a key technique. Having some idea of how it works and what it
can do can be useful.
First a very basic understanding is needed of what is an X-ray. An X-ray
is a propagating wave of electromagnetic radiation just like a beam of visible
light. The only difference between them is that an X-ray is much more ener-
getic. Whether it is light or an X-ray, an electromagnetic wave is physically a
combination of electric and magnetic fields that propagate in a given direction
with the speed of light.
Figure 7.41 gives a sketch of how the strength of the electric field varies along
the propagation direction of a simple monochromatic wave; the magnetic field
is similar, but 90 degrees out of phase. Above that, a sketch is given how such
rays will be visualized in this subsection: the positive maxima will be indicated
by encircled plus signs, and the negative minima by encircled minus signs. Both
these maxima and minima propagate along the line with the speed of light; the
picture is just a snapshot at an arbitrary time.
7.10. ADDITIONAL POINTS [DESCRIPTIVE] 331

¾
wave length λ-

Figure 7.41: Depiction of an electromagnetic ray.

The distance between two successive maxima is called the wave length λ. If
the wave length is in the narrow range from about 4,000 to 7,000 Å, it is visible
light. But such a wave length is much too large to distinguish atoms, since atom
sizes are in the order of a few Å. Electromagnetic waves with the required wave
lengths of a few Å fall in what is called the X-ray range.
The wave number κ is the reciprocal of the wave length within a normaliza-
tion factor 2π: κ = 2π/λ. The wave number vector ~κ has the magnitude of the
wave number κ and points in the direction of propagation of the wave.

detector B

~κ ′′

incoming detector A
wave ~κ
~κ ′

θ
crystal plane

Figure 7.42: Law of reflection in elastic scattering from a plane.

Next consider a plane of atoms in a crystal, and imagine that it forms a


perfectly flat mirror, as in figure 7.42. No, there are no physical examples of flat
atoms known to science. But just imagine there would be, OK? Now shine an
X-ray from the left onto this crystal layer and examine the diffracted wave that
comes back from it. Assume Huygens’ principle that the scattered rays come
off in all directions, and that the scattering is elastic, meaning that the energy,
332 CHAPTER 7. SOLIDS

hence wave length, stays the same.


Under those conditions, a detector A, placed at a position to catch the rays
scattered to the same angle as the angle θ of the incident beam, will observe a
strong signal. All the maxima in the electric field of the rays arrive at detector A
at the same time, reinforcing each other. They march in lock-step. So a strong
positive signal will exist at detector A at their arrival. Similarly, the minima
march in lock-step, arriving at A at the same time and producing a strong
signal, now negative. Detector A will record a strong, fluctuating, electric field.
Detector B, at a position where the angle of reflection is unequal to the angle
of incidence, receives similar rays, but both positive and negative values of the
electric field arrive at B at the same time, killing each other off. So detector B
will not see an observable signal. That is the law of reflection: there is only a
detectable diffracted wave at a position where the angle of reflection equals the
angle of incidence. (Those angles are usually measured from the normal to the
surface instead of from the surface itself, but not in Bragg diffraction.)
For visible light, this is actually a quite reasonable analysis of a mirror,
since an atom-size surface roughness is negligible compared to the wave length
of visible light. For X-rays, it is not so hot, partly because a layer of atoms is
not flat on the scale of the wave length of the X-ray. But worse, a single layer of
atoms does not reflect an X-ray by any appreciable amount. That is the entire
point of medical X-rays; they can penetrate millions of layers of atoms to show
what is below. A single layer is nothing to them.

incoming detector
wave ~κ ~κ ′

θ
d6
?
crystal

Figure 7.43: Scattering from multiple “planes of atoms”.

For X-rays to be diffracted in an appreciable amount, it must be done by


many parallel layers of atoms, not just one, as in figure 7.43. The layers must
furthermore have a very specific spacing d for the maxima and minima from
different layers to arrive at the detector at the same time. Note that the angular
position of the detector is already determined by the law of reflection, in order
7.10. ADDITIONAL POINTS [DESCRIPTIVE] 333

to get whatever little there can be gotten from each plane separately. (Also
note that whatever variations in phase there are in the signals arriving at the
detector in figure 7.43 are artifacts: for graphical reasons the detector is much
closer to the specimen than it should be. The spacing between planes should
be on the order of Å, while the detector should be a macroscopic distance away
from the specimen.)
The spacing between planes needed to get a decent combined signal strength
at the detector is known to satisfy the Bragg law:

2d sin θ = nλ (7.28)

where n is a natural number. A derivation will be given below. One immediate


consequence is that to get X-ray diffraction, the wave length λ of the X-ray
cannot be more than twice the spacing between the planes of atoms. That
requires wave lengths no longer than of the order of Ånstroms. Visible light
does not qualify.
The above story is, of course, not very satisfactory. For one, layers of atoms
are not flat planes on the scale of the required X-ray wave lengths. And how
come that in one direction the atoms have continuous positions and in another
discrete? Furthermore, it is not obvious what to make of the results. Observing
a refracted X-ray at some angular location may suggest that there is some
reflecting plane in the crystal at an angle deducible from the law of reflection,
but many different planes of atoms exist in a crystal. If a large number of
measurements are done, typically by surrounding the specimen by detectors
and rotating it while shining an X-ray on it, how is the crystal structure to be
deduced from that overwhelming amount of information?
Clearly, a mathematical analysis is needed, and actually it is not very com-
plicated. First a mathematical expression is needed for the signal along the ray;
it can be taken to be a complex exponential

eiκ(s−ct) ,

where s is the distance traveled along the ray from a suitable chosen starting
position, t the time, and c the speed of light. The real part of the exponential
can be taken as the electric field, with a suitable constant, and the imaginary
part as the magnetic field, with another constant. The only important point
here is that if there is a difference in travel distance ∆s between two rays, their
signals at the detector will be out of phase by a factor eiκ∆s . Unless this factor
is one, which requires κ∆s to be zero or a whole multiple of 2π, there will be at
least some cancellation of signals at the detector.
So, how much is the phase factor eiκ∆s ? Figure 7.44 shows one ray that is
scattered at a chosen reference point O in the crystal, and another ray that is
scattered at another point P. The position vector of P relative to origin O is ~r.
334 CHAPTER 7. SOLIDS

incoming
wave ~κ
detector
~κ ′

O
~r
P

Figure 7.44: Difference in travel distance when scattered from P rather than O.

Now the difference in travel distance for the second ray to reach P versus the
first one to reach O is given by the component of vector ~r in the direction of the
incoming wave vector ~κ. This component can be found as a dot product with
the unit vector in the direction of ~κ:

∆s1 = ~r · so eiκ∆s1 = ei~κ·~r .
κ
The difference in travel distance for the second ray to reach the detector from
point P versus the first from O is similarly given as
~κ′ ′
∆s2 = −~r · so eiκ∆s2 = e−i~κ ·~r
κ
assuming that the detector is sufficiently far away from the crystal that the rays
can be assumed to travel to the detector in parallel.
The net result is then that the phase factor with which the ray from P arrives
at the detector compared to the ray from O is

ei(~κ−~κ )·~r .

This result may be used to check the law of reflection and Bragg’s law above.
First of all, for the law of reflection of figure 7.42, the positions of the scat-
tering points P vary continuously through the horizontal plane. That means
that the phase factor of the rays received at the detector will normally also
vary continuously from positive to negative back to positive etcetera, leading
to large-scale cancellation of the net signal. The one exception is when ~κ − ~κ′
happens to be normal to the reflecting plane, since a dot product with a nor-
mal vector is always zero. For ~κ − ~κ′ to be normal to the plane, its horizontal
component must be zero, meaning that the horizontal components of ~κ and ~κ′
must be equal, and for that to be true, their angles with the horizontal plane
7.10. ADDITIONAL POINTS [DESCRIPTIVE] 335

must be equal, since the vectors have the same length. So the law of reflection
is obtained.
Next for Bragg’s law of figure 7.43, the issue is the phase difference between
successive crystal planes. So the vector ~r in this case can be assumed to point
from one crystal plane to the next. Since from the law of reflection, it is already
known that ~κ −~κ′ is normal to the planes, the only component of ~r of importance
is the vertical one, and that is the crystal plane spacing d. It must be multiplied
by the vertical component of ~κ − ~κ′ , (its only component), which is according
to basic trig is equal to −2κ sin θ. The phase factor between successive planes
is therefor e−id2κ sin θ . The argument of the exponential is obviously negative,
and then the only possibility for the phase factor to be one is if the argument
is a whole multiple n times −i2π. So for signals from different crystal planes to
arrive at the detector in phase,

d2κ sin θ = n2π.

Substitute κ = 2π/λ and you have Bragg’s law.


Now how about diffraction from a real crystal? Well, assume that every
location in the crystal elastically scatters the incoming wave by a small amount
that is proportional to the electron density n at that point. (This n not to be
confused with the n in Bragg’s law.) Then the total signal D received by the
detector can be written as
Z
n(~r)ei(~κ−~κ )·~r d3~r

D=C
all ~r

where C is some constant. Now the electron density is periodic on crystal lattice
scale, so according to section 7.4.11 it can be written as a Fourier series, giving
the signal as
X Z ~
n~k~n ei(k~n +~κ−~κ )·~r d3~r

D=C
all ~r
all ~k~n

where the ~k~n wave number vectors form the reciprocal lattice and the numbers
n~k~n are constants. Because the volume integration above extends over countless
lattice cells, there will be massive cancellation of signal unless the exponential
is constant, which requires that the factor multiplying the position coordinate
is zero:
~k~n = ~κ′ − ~κ (7.29)
So the changes in the x-ray wave number vector ~κ for which there is a
detectable signal tell you the reciprocal lattice vectors. (Or at least the ones for
which n~k~n is not zero because of some symmetry.) After you infer the reciprocal
lattice vectors it is easy to figure out the primitive vectors of the physical crystal
you are analyzing. Furthermore, the relative strength of the received signal
336 CHAPTER 7. SOLIDS

tells you the magnitude of the Fourier coefficient n~k~n of the electron density.
Obviously, all of this is very specific and powerful information, far above trying
to make some sense out of mere collections of flat planes and their spacings.
One interesting additional issue has to do with what incoming wave vectors
~κ are diffracted, regardless of where the diffracted wave ends up. To answer it,
just eliminate ~κ′ from the above equation by finding its square and noting that
~κ′ · ~κ′ is κ2 since the magnitude of the wave number does not change in elastic
scattering. It produces
~κ · ~k~n = − 1 ~k~n · ~k~n
2
(7.30)
For this equation to be satisfied, the X-ray wave number vector ~κ must be in
the Bragg plane between −~k~n and the origin. For example, for a simple cubic
crystal, ~κ must be in one of the Bragg planes shown in cross section in figure
7.34. One general consequence is that the wave number vector κ must at least
be long enough to reach the surface of the first Brillouin zone for any Bragg
diffraction to occur. That determines the maximum wave length of usable X-
rays according to λ = 2π/κ. You may recall that the Bragg planes are also the
surfaces of the Brillouin zone segments and the surfaces along which the electron
energy states develop discontinuities if there is a lattice potential. They sure
get around.
Historically, Bragg diffraction was important to show that particles are in-
deed associated with wave functions, as de Broglie had surmised. When Davis-
son and Germer bombarded a crystal with a beam of single-momentum elec-
trons, they observed Bragg diffraction just like for electromagnetic waves. As-
suming for simplicity that the momentum of the electrons is in the z-direction
and that uncertainty in momentum can be ignored, the eigenfunctions of the
momentum operator pbz = h̄∂/i∂z are proportional to eiκz , where h̄κ is the z-
momentum eigenvalue. From the known momentum of the electrons, Davisson
and Germer could compute the wave number κ and verify that the electrons
suffered Bragg diffraction according to that wave number. (The value of h̄ was
already known from Planck’s blackbody spectrum, and from the Planck-Einstein
relation that the energy of the photons of electromagnetic radiation equals h̄ω
with ω the angular frequency.)
Chapter 8

Basic and Quantum


Thermodynamics

Chapter 7.9 mentioned the Maxwell-Boltzmann, Fermi-Dirac, and Bose-Einstein


energy distributions of systems of weakly interacting particles. This chapter
explains these results and then goes on to put quantum mechanics and thermo-
dynamics in context.
It is assumed that you have had a course in basic thermodynamics. If not,
rejoice, you are going to get one now. The exposition depends relatively strongly
upon the material in chapters 4.7–4.9 and 7.6.
This chapter will be restricted to systems of particles that are all the same.
Such a system is called a “pure substance.” Water would be a pure substance,
but air not really; air is mostly nitrogen, but the 20% oxygen can probably not
be ignored. That would be particularly important under cryogenic conditions
in which the oxygen condenses out first.
The primary quantum system to be studied in detail will be a macroscopic
number of weakly interacting particles, especially particles in a box. Non-trivial
interactions between even a few particles are very hard to account for correctly,
and for a macroscopic system, that becomes much more so: just a millimol
has well over 1020 particles. By ignoring particle interactions, the system can
be described in terms of single-particle energy eigenstates, allowing some real
analysis to be done.
However, a system of strictly non-interacting unperturbed particles would be
stuck into the initial energy eigenstate, or the initial combination of such states,
according to the Schrödinger equation. To get such a system to settle down
into a physically realistic configuration, it is necessary to include the effects
of the unavoidable real life perturbations, (molecular motion of the containing
box, ambient electromagnetic field, cosmic rays, whatever.) The effects of such
small random perturbations will be accounted for using reasonable assumptions.
In particular, it will be assumed that they tend to randomly stir up things a

337
338 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

bit over time, taking the system out of any physically unlikely state it may be
stuck in and making it settle down into the macroscopically stable one, called
“thermal equilibrium.”

8.1 Temperature
This book frequently uses the word “temperature,” but what does that really
mean? It is often said that temperature is some measure of the kinetic energy
of the molecules, but that is a dubious statement. It is OK for a thin noble gas,
where the kinetic energy per atom is 32 kB T with kB = 1.38065 10−23 J/K the
Boltzmann constant and T the (absolute) temperature in degrees Kelvin. But
the valence electrons in an metal typically have kinetic energies many times
greater than 32 kB T . And when the absolute temperature becomes zero, the
kinetic energy of a system of particles does not normally become zero, since the
uncertainty principle does not allow that.
In reality, the temperature of a system is not a measure of its thermal kinetic
energy, but of its “hotness.” So, to understand temperature, you first have to
understand hotness. A system A is hotter than a system B, (and B is colder
than A,) if heat energy flows from A to B if they are brought into thermal
contact. If no heat flows, A and B are equally hot. Temperature is a numerical
value defined so that, if two systems A and B are equally hot, they have the
same value for the temperature.
The so-called “zeroth law of thermodynamics” ensures that this definition
makes sense. It says that if systems A and B have the same temperature, and
systems B and C have the same temperature, then systems A and C have the
same temperature. Otherwise system B would have two temperatures: A and
C would have different temperatures, and B would have the same temperature
as each of them.
The systems are supposed to be in thermal equilibrium. For example, a
solid chunk of matter that is hotter on its inside than its outside simply does
not have a (single) temperature, so there is no point in talking about it.
The requirement that systems that are equally hot must have the same value
of the temperature does not say anything about what that value must be. Def-
initions of the actual values have historically varied. A good one is to compute
the temperature of a system A using an ideal gas B at equal temperature as
system A. Then 32 kB T can simply be defined to be the mean translational ki-
netic energy of the molecules of ideal gas B. That kinetic energy, in turn, can
be computed from the pressure and density of the gas. With this definition
of the temperature scale, the temperature is zero in the ground state of ideal
gas B. The reason is that a highly accurate ideal gas means very few atoms
or molecules in a very roomy box. With the vast uncertainty in position that
8.2. SINGLE-PARTICLE AND SYSTEM EIGENFUNCTIONS 339

the roomy box provides to the ground-state, the uncertainty-demanded kinetic


energy is vanishingly small. So kB T will be zero.
It then follows that all ground states are at absolute zero temperature, re-
gardless how large their kinetic energy. The reason is that all ground states must
have the same temperature: if two systems in their ground states are brought in
thermal contact, no heat can flow: neither ground state can sacrifice any more
energy, the ground state energy cannot be reduced.
However, the “ideal gas thermometer” is limited by the fact that the temper-
atures it can describe must be positive. There are some unstable systems that
in a technical and approximate, but meaningful, sense have negative absolute
temperatures [3]. Unlike what you might expect, (aren’t negative numbers less
than positive ones?) such systems are hotter than any normal system. Systems
of negative temperature will give off heat regardless of how searingly hot the
normal system that they are in contact with is.
In this chapter a definition of temperature scale will be given based on the
quantum treatment. Various equivalent definitions will pop up. Eventually,
section 8.14.4 will establish it is the same as the ideal gas temperature scale.
You might wonder why the laws of thermodynamics are numbered from zero.
The reason is historical; the first, second, and third laws were already firmly
established before in the early twentieth century it was belatedly recognized that
an explicit statement of the zeroth law was really needed. If you are already
familiar with the second law, you might think it implies the zeroth, but things
are not quite that simple.
What about these other laws? The “first law of thermodynamics” is simply
stolen from general physics; it states that energy is preserved. The second and
third laws will be described in sections 8.8 through 8.10.

8.2 Single-Particle and System Eigenfunctions


The purpose of this section is to describe the generic form of the energy eigen-
functions of a system of weakly interacting particles.
The total number of particles will be indicated by I. If the interactions
between the I particles are ignored, the energy eigenfunctions of the complete
system of I particles, call them ψ I , can be written in terms of single-particle
energy eigenfunctions ψ1p (~r, Sz ), ψ2p (~r, Sz ), . . ..
The basic case is that of noninteracting particles in a box, like in the free-
electron gas of chapter 7.6. While these results were derived in the context of
electrons, it remains true for any type of weakly interacting particles in a box
that the single-particle eigenfunctions take the spatial form
s
8
ψnp = sin(kx x) sin(ky y) sin(kz z)
ℓx ℓy ℓz
340 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

where kx , ky , kz are constants, called the “wave numbers” or “quantum num-


bers”. Different values for these constants correspond to different single-particle
eigenfunctions, with single-particle energy

p h̄2 2
En = (k + ky2 + kz2 ).
2m x
In chapter 7.6, these single particle energy eigenfunctions were numbered using
the vector of wave numbers ~k = (kx , ky , kz ), but here they will simply be num-
bered as 1, 2, 3, . . . in order of increasing energy. The generic index n will be used
to indicate the number of the single-particle eigenfunction, higher values of n
corresponding to eigenfunctions of higher energy. Of course, the single-particle
eigenfunctions do not have to correspond to a particle in a box. For example,
particles caught in a magnetic trap, like in the Bose-Einstein condensation ex-
periments of 1995, might be better described as harmonic oscillators. Or the
particles might be restricted to move in a lower-dimensional space. But a lot
of the formulae you can find in literature and this chapter are in fact derived
assuming the simplest case of noninteracting particles in a roomy box.
That is all that will be said about the single-particle energy eigenfunctions for
now. On to the energy eigenfunctions ψqI of the complete system of I particles.
It will be assumed that these are numbered using a counter q, but the way they
are numbered really does not make a difference to the analysis.
As long as the interactions between the particles are weak, energy eigen-
functions of the complete system can be found as products of the single-particle
ones. As an important example, at absolute zero temperature, all I particles
will be in the single-particle ground state ψ1p , and the system will be in its
ground state
ψ1I = ψ1p (~r1 , Sz1 )ψ1p (~r2 , Sz2 )ψ1p (~r3 , Sz3 )ψ1p (~r4 , Sz4 )ψ1p (~r5 , Sz5 ) . . . ψ1p (~rI , SzI ).
This does assume that the single-particle ground state ψ1p is not degenerate.
More importantly, it assumes that the I particles are not identical fermions, so
that the exclusion principle does not apply.
Statistical thermodynamics, in any case, is much more interested in tem-
peratures that are not zero. Then the system will not be in the ground state,
but in some combination of system eigenfunctions of higher energy. As a com-
pletely arbitrary example of such a system eigenfunction, take the following one,
describing I = 95 different particles:
ψqI = ψ7p (~r1 , Sz1 )ψ66
p p
(~r2 , Sz2 )ψ86 p
(~r3 , Sz3 )ψ40 (~r4 , Sz4 )ψ7p (~r5 , Sz5 ) . . . ψ59
p
(~r95 , Sz95 )
This system eigenfunction has an energy that is the sum of the 95 single-particle
eigenstate energies involved:
I p p p p p p
E q = E 7 + E 66 + E 86 + E 40 + E 7 + . . . + E 59
8.2. SINGLE-PARTICLE AND SYSTEM EIGENFUNCTIONS 341

p
ψ88
p 66
ψ73 i76i p
ψ87
p i
ψ59 95 ψ72 p p 3i12
ψ86 i
p
ψ46 p 70
ψ58 i93i ψ p 94i p
ψ85
71
p 16
ψ34 i26i ψ p 74i ψ p 34i ψ p p
ψ84 88i
45 57 70
p
ψ33 i ψ p 20i ψ p 15i ψ p 42i
77 p
ψ83 61 i
44 56 69
p 9i53 i ψ p 14i ψ p p p i p i
ψ23 32 43 ψ55 ψ68 62 ψ82 36
p
ψ22 p i p i iii p i ii
ψ31 25 ψ42 47558486 ψ54 525758 ψ67 p p
ψ81
p 38
ψ14 i78i ψ p 81i ψ p p
ψ41 p
ψ53 73i ψp 2i p
ψ80
21 30 66
ψ13 i32i ψ20 44i67i ψ29 28i ψ40 4i82i ψ52
p 21 p p p p p
ψ65 ψ79 65i
p
p 1i5i i
48 ψ p i ψ p 24i85i ψ p 60i ψ p 31i ψ p p p
ψ7 12
41
19 28 39 51 ψ64 ψ78
ψ6 13i ψ11 35i40i89iψ18 10i46i ψ27 8i39i ψ38
p p p p p
ψ50 19i ψ63
p p p 17
ψ77 i54i
ψ5p 18i63i92iψ10
p 50i ψ p 30i
17
45i iψ p 7i11i ψ p
49
26 37
p
ψ49 83i ψ p 90i
62
p
ψ76
i
p 33 ii p i i i p i
ψ2 6991 ψ4 225180 ψ9 37 ψ16 68 ψ25 p i p p i i p
ψ36 23 72 ψ48 p
ψ61 56i p
ψ75
ψ1 i87i ψ3 29i ψ8 6i59i ψ15
p 43 p p p
ψ24 27i64i71iψ35 79i ψ47
p p p
ψ60 75i
p p
ψ74
p p p p p p p
E b1 E b2 E b3 E b4 E b5 E b6 E b7 E pb8 E pb9

Figure 8.1: Graphical depiction of an arbitrary system energy eigenfunction for


95 distinguishable particles.

To understand the arguments in this chapter, it is very helpful to visualize


the system energy eigenfunctions as in figure 8.1. In this figure the different
types of single-particle states are shown as boxes, and the particles that are
in those particular single-particle states are shown inside the boxes. In the
example, particle 1 is inside the ψ7p box, particle 2 is inside the ψ66
p
one, etcetera.
It is just the reverse from the mathematical expression above: the mathematical
expression shows for each particle in turn what the single-particle eigenstate of
that particle is. The figure shows for each type of single-particle eigenstate in
turn what particles are in that eigenstate.
For reasons of simplifying mathematical analysis still to come, in the figure
single-particle eigenstates of about the same energy have been grouped together
into “buckets.” (As a consequence, from now on a subscript to a single-particle
energy E p may refer either to a single-particle eigenfunction number or to a
bucket number, depending on context.) The “skyline” of the buckets is intended
to roughly simulate the density of states of the particles in a box as described
in chapter 7.6.5. The larger the energy, the more single-particle states there
are at that energy; it increases like the square root of the energy. This may
not be true for other situations, such as when the particles are confined to a
lower-dimensional space, compare chapter 7.6.5. Various formulae given here
and in literature may need to be adjusted then.
Of course, in normal non-nano applications, the number of particles will be
342 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

astronomically larger than 95 particles; the example is just a small illustration.


And unless the temperature is incredibly low, those particles will extend to
many more buckets than the nine shown in the figure.
Next, note that particles of the same kind are not really going to be distin-
guishable; that is just a simplification that can be made if their wave functions
do not overlap nontrivially. If the particles are closer together, the symmetriza-
tion requirements of the system wave function can no longer be ignored.
Consider first the case that the I particles are all identical bosons, like plain
helium atoms. In that case the wave function must be symmetric, unchanged,
under the exchange of any two of the bosons, and the example wave function
above is not. If, for example, particles 2 and 4 are exchanged, it turns the
example wave function

ψqI = ψ7p (~r1 , Sz1 )ψ66


p p
(~r2 , Sz2 )ψ86 p
(~r3 , Sz3 )ψ40 (~r4 , Sz4 )ψ7p (~r5 , Sz5 ) . . . ψ59
p
(~r95 , Sz95 )

into

ψqI′ = ψ7p (~r1 , Sz1 )ψ40


p p
(~r2 , Sz2 )ψ86 p
(~r3 , Sz3 )ψ66 (~r4 , Sz4 )ψ7p (~r5 , Sz5 ) . . . ψ59
p
(~r95 , Sz95 )

and that is simply a different wave function, because the states are different,
independent functions. In terms of the pictorial representation figure 8.1, swap-
ping the numbers “2” and “4” in the particles changes the picture.
p
ψ88
p ii p
ψ73 ψ87
ψ59 i ψ72
p p p ii
ψ86
p ii p
p
ψ46 ψ58 ψ71 i p
ψ85
p ii p i i
ψ34 ψ45 p
ψ57 p
ψ70 ψ84 i
p

ψ33 i ψ44 i ψ56 i ψ69 i


p p p p p
ψ83 i
p ii p
ψ23 ψ32 i ψ43 p p
ψ55 p
ψ68 i p
ψ82 i
p
ψ22 p
ψ31 i p i
ψ42 iii p iii p
ψ54 ψ67 p
ψ81
p ii p i p p p i p i p
ψ14 ψ21 ψ30 ψ41 ψ53 ψ66 ψ80
p ii p ii p i p ii p p p i
ψ13 ψ20 ψ29 ψ40 ψ52 ψ65 ψ79
ψ7p iiiψ12 p i ψp i i ψp
19 28
i ψp
39
i ψp
51 ψ p
64
p
ψ78
ψ6p i ψ11 p iii p i i p i i p
ψ18 ψ27 ψ38 p
ψ50 i ψp
63
p ii
ψ77
p iii p i p iii p i i p p i p i p
ψ5 ψ10 ψ17 ψ26 ψ37 ψ49 ψ62 ψ76
p iii p iii p i ψ i ψ
ψ2 ψ4 ψ9 p
16
p
25 ψ36 i i ψ48
p p
ψ61 i
p p
ψ75
p ii p i p ii p p iii p i p p i p
ψ1 ψ3 ψ8 ψ15 ψ24 ψ35 ψ47 ψ60 ψ74
p p p p p p p p
E b1 E b2 E b3 E b4 E b5 E b6 E b7 E b8 E pb9

Figure 8.2: Graphical depiction of an arbitrary system energy eigenfunction for


95 identical bosons.
8.2. SINGLE-PARTICLE AND SYSTEM EIGENFUNCTIONS 343

As chapter 4.7 explained, to eliminate the problem that exchanging particles


2 and 4 changes the wave function, the original and exchanged wave functions
must be combined together. And to eliminate the problem for any two particles,
all wave functions that can be obtained by merely swapping numbers must be
combined together equally into a single wave function multiplied by a single
undetermined coefficient. In terms of figure 8.1, we need to combine the wave
functions with all possible permutations of the numbers inside the particles into
one. And if all permutations of the numbers are equally included, then those
numbers no longer add any nontrivial additional information; they may as well
be left out. That makes the pictorial representation of an example system wave
function for identical bosons as shown in figure 8.2.

p
ψ88
p p
ψ73 ψ87
p
ψ59 y p
ψ72 p
ψ86
p
ψ46 y ψp p
ψ71 p
ψ85
58
p
ψ34 y p
ψ45 p
ψ57 p
ψ70 p
ψ84
p
ψ33 y p
ψ44 p
ψ56 p
ψ69 p
ψ83
p
ψ23 y p
ψ32 y p
ψ43 p
ψ55 p
ψ68 p
ψ82
p
ψ22 p
ψ31 p
ψ42 y ψp p
ψ67 p
ψ81
54
p
ψ14 y p
ψ21 y p
ψ30 y p
ψ41 p
ψ53 p
ψ66 p
ψ80
ψ13 y
p
ψ20 y
p p
ψ29 p
ψ40 y ψ p
52
p
ψ65 p
ψ79
ψ7p y p
ψ12 p
ψ19 y p
ψ28 p
ψ39 p
ψ51 p
ψ64 p
ψ78
ψ6p y ψ11 y
p p
ψ18 y p
ψ27 p
ψ38 p
ψ50 p
ψ63 y ψp
77
ψ5p y p
ψ10 y p
ψ17 y p
ψ26 p
ψ37 p
ψ49 p
ψ62 p
ψ76
ψ2p y ψp y ψ9p y p
ψ16 ψ25 y
p p
ψ36 ψ48 y
p p
ψ61 p
ψ75
4
ψ1p y ψp y ψ8p y p
ψ15 y p
ψ24 p
ψ35 p
ψ47 p
ψ60 p
ψ74
3
E pb1 E pb2 E pb3 p
E b4 E pb5 p
E b6 E pb7 p
E b8 E pb9

Figure 8.3: Graphical depiction of an arbitrary system energy eigenfunction for


31 identical fermions.

For identical fermions, the situation is similar, except that the different wave
functions must be combined with equal or opposite sign, depending on whether
it takes an odd or even number of particle swaps to turn one into the other. And
such wave functions only exist if the I single-particle wave functions involved are
all different. That is the Fermi exclusion principle. The pictorial representation
figure 8.2 for bosons is totally unacceptable for fermions since it uses many of
the single-particle states for more than one particle. There can be at most one
fermion in each type of single-particle state. An example of a wave function
that is acceptable for a system of identical fermions is shown in figure 8.3.
Looking at the example pictorial representations for systems of bosons and
344 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

fermions, it may not be surprising that such particles are often called “indistin-
guishable.“ Of course, in classical quantum mechanics, there is still an electron
1, an electron 2, etcetera; they are mathematically distinguished. Still, it is
convenient to use the term “distinguishable” for particles for which the sym-
metrization requirements can be ignored.
The prime example is the atoms of an ideal gas in a box; almost by defini-
tion, the interactions between such atoms are negligible. And that allows the
quantum results to be referred back to the well-understood properties of ideal
gases obtained in classical physics. Probably you would like to see all results
follow naturally from quantum mechanics, not classical physics, and that would
be very nice indeed. But it would be very hard to follow up on. As Baierlein [3,
p. 109] notes, real-life physics adopts whichever theoretical approach offers the
easiest calculation or the most insight. This book’s approach really is to for-
mulate as much as possible in terms of the quantum-mechanical ideas discussed
here. But do be aware that it is a much more messy world when you go out
there.

8.3 How Many System Eigenfunctions?


The fundamental question from which all of quantum statistics springs is a very
basic one: How many system energy eigenstates are there with given generic
properties? This section will address that question.
Of course, by definition each system energy eigenfunction is unique. Figures
8.1–8.3 give examples of such unique energy eigenfunctions for systems of dis-
tinguishable particles, bosons, and fermions. But trying to get accurate data
on each individual eigenfunction just does not work. That is much too big a
challenge.
Quantum statistics must satisfy itself by figuring out the probabilities on
groups of system eigenfunctions with similar properties. To do so, the single-
particle energy eigenstates are best grouped together into buckets of similar
energy, as illustrated in figures 8.1–8.3. Doing so allows for more answerable
questions such as: “How many system energy eigenfunctions ψqI have I1 out of
the I total particles in bucket 1, another I2 in bucket 2, etcetera?” In other
words, if I~ stands for a given set of bucket occupation numbers (I1 , I2 , I3 , . . .),
then what is the number QI~ of system eigenfunctions ψqI that have those bucket
occupation numbers?
That question is answerable with some clever mathematics; it is a big thing
in various textbooks. However, the suspicion is that this is more because of
the “neat” mathematics than because of the actual physical insight that these
derivations provide. In this book, the derivations are shoved away into note
{A.65}. But here are the results. (Drums please.) The system eigenfunction
8.3. HOW MANY SYSTEM EIGENFUNCTIONS? 345

counts for distinguishable particles, bosons, and fermions are:


Y NbIb
QdI~ = I! ³ ´ (8.1)
all b Ib !
³ ´
Y Ib + N b − 1 !
QbI~ = ³ ´³ ´ (8.2)
all b Ib ! N b − 1 !
³ ´
Y Nb !
QfI~ = ³ ´³ ´ (8.3)
all b I b ! N b − Ib !

where Π means the product of all the terms of the form shown to its right that
can be obtained by substituting in every possible value of the bucket number.
That is just like Σ would mean the sum of all these terms. For example, for
distinguishable particles

N I1 N I2 N I3 N I4
QdI~ = I! ³ 1´ ³ 2´ ³ 3´ ³ 4´ . . .
I1 ! I2 ! I3 ! I4 !

where N1 is the number of single-particle energy states in bucket 1 and I1 the


number of particles in that bucket, N2 the number of single-particle energy
states in bucket 2 and I2 the number of particles in that bucket, etcetera. Also
an exclamation mark indicates the factorial function, defined as
n
Y
n! = n.
n=1

For example, 5! = 1 × 2 × 3 × 4 × 5 = 120. The eigenfunction counts may also


involve 0!, which is defined to be 1, and n! for negative n, which is defined to
be infinity. The latter is essential to ensure that the eigenfunction count is zero
as it should be for fermion eigenfunctions that try to put more particles in a
bucket than there are states in it.
This section is mainly concerned with explaining qualitatively why these
system eigenfunction counts matter physically. And to do so, a very simple
model system having only three buckets will suffice.
The first example is illustrated in quantum-mechanical terms in figure 8.4.
Like the other examples, it has only three buckets, and it has only I = 4
distinguishable particles. Bucket 1 has N1 = 1 single-particle state with energy
E p1 = 1 (arbitrary units), bucket
√ 2 has N2 = 3 single-particle
√ states with energy
E p2 = 2, (note that 3 ≈ 2 2), and bucket 3 has N3 = 4 4 = 8 single-particle
states with energy E p3 = 4. One major deficiency of this model is the small
number of particles and states, but that will be fixed in the later examples.
346 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

p
ψ12
p
ψ11
p
ψ10
p
ψ9
ψ8p 4i
ψ4p 3i ψ7p
ψ3p 1i ψ6p
ψ1p 2i ψ2p ψ5p
E p1 = 1 E p2 = 2 E p3 = 4

Figure 8.4: Illustrative small model system having 4 distinguishable particles.


The particular eigenfunction shown is arbitrary.

More seriously is that there are no buckets with energies above E p3 = 4. To


mitigate that problem, for the time being the average energy per particle of the
system eigenfunctions will be restricted to no more than 2.5. This will leave
bucket 3 largely empty, reducing the effects of the missing buckets of still higher
energy.

I3 /I I3 /I I3 /I
37% 37% 37%

40% I2 /I 40% I2 /I
40% I2 /I

Figure 8.5: The number of system energy eigenfunctions for a simple model
system with only three energy buckets. Positions of the squares indicate the
numbers of particles in buckets 2 and 3; darkness of the squares indicates the
relative number of eigenfunctions with those bucket numbers. Left: system with
4 distinguishable particles, middle: 16, right: 64.

Now the question is, how many energy eigenfunctions are there for a given
set of bucket occupation numbers I~ = (I1 , I2 , I3 )? The answer, as given by (8.1),
is shown graphically in the left graph of figure 8.5. Darker squares indicate more
eigenfunctions with those bucket occupation numbers. The oblique line in figure
8.5 is the line above which the average energy per particle exceeds the chosen
limit of 2.5.
Some example observations about the figure may help to understand it. For
example, there is only one system eigenfunction with all 4 particles in bucket 1,
8.3. HOW MANY SYSTEM EIGENFUNCTIONS? 347

i.e. with I1 = 4 and I2 = I3 = 0; it is

ψ1I = ψ1p (~r1 , Sz1 )ψ1p (~r2 , Sz2 )ψ1p (~r3 , Sz3 )ψ1p (~r4 , Sz4 ).

This is represented by the white square at the origin in the left graph of figure
8.5.
As another example, the darkest square in the left graph of figure 8.5 repre-
sents system eigenfunctions that have bucket numbers I~ = (1, 2, 1), i.e. I1 = 1,
I2 = 2, I3 = 1: one particle in bucket 1, two particles in bucket 2, and one
particle in bucket 3. A completely arbitrary example of such a system energy
eigenfunction,

ψ3p (~r1 , Sz1 )ψ1p (~r2 , Sz2 )ψ4p (~r3 , Sz3 )ψ8p (~r4 , Sz4 ),

is the one depicted in figure 8.4. It has particle 1 in single-particle state ψ3p ,
which is in bucket 2, particle 2 in ψ1p , which is in bucket 1, particle 3 in ψ4p
which is in bucket 2, and particle 4 in ψ8p , which is in bucket 3. But there are
a lot more system eigenfunctions with the same bucket occupation numbers; in
fact, there are
4 × 3 × 8 × 3 × 3 = 864
such eigenfunctions, since there are 4 possible choices for the particle that goes
in bucket 1, times a remaining 3 possible choices for the particle that goes
into bucket 3, times 8 possible choices ψ5p through ψ12 p
for the single-particle
eigenfunction in bucket 3 that that particle can go into, times 3 possible choices
ψ2p through ψ4p that each of the remaining two particles in bucket 2 can go into.
Next, consider a system four times as big. That means that there are four
times as many particles, so I = 16 particles, in a box that has four times the
volume. If the volume of the box becomes 4 times as large, there are four times
as many single-particle states in each bucket, since the number of states per unit
volume at a given single-particle energy is constant, compare (7.16). Bucket 1
now has 4 states, bucket 2 has 12, and bucket 3 has 32. The number of energy
states for given bucket occupation numbers is shown as grey tones in the middle
graph of figure 8.5. Now the number of system energy eigenfunctions that have
all particles in bucket 1 is not one, but 416 = 4 294 967 296, since there are 4
different states in bucket 1 that each of the 16 particles can go into. That is
obviously quite lot of system eigenfunctions, but it is dwarfed by the darkest
square, states with bucket occupation numbers I~ = (4, 6, 6). There are about
1.4 1024 system energy eigenfunctions with those bucket occupation numbers.
So the I~ = (16, 0, 0) square at the origin stays lily-white despite having over 4
billion energy eigenfunctions.
If the system size is increased by another factor 4, to 64 particles, the num-
ber of states with occupation numbers I~ = (64, 0, 0), all particles in bucket 1, is
348 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

1.2 1077 , a tremendous number, but totally humiliated by the 2.7 10138 eigenfunc-
tions that have occupation numbers I~ = (14, 27, 23). Taking the ratio of these
two numbers shows that there are 2.3 1061 energy eigenfunctions with bucket
numbers (14, 27, 23) for each eigenfunction with bucket numbers (64, 0, 0). By
the time the system reaches, say, 1020 particles, still less than a millimol, the
number of system energy eigenstates for each set of occupation numbers is as-
tronomical, but so are the differences between the bucket numbers that have
the most and those that have less. The tick marks in figure 8.5 indicate that
for large numbers of particles, the darkest square will have 40% of the particles
in bucket 2, 37% in bucket 3, and the remaining 23% in bucket 1.
These general trends do not just apply to this simple model system; they
are typical:

The number of system energy eigenfunctions for a macroscopic sys-


tem is astronomical, and so are the differences in numbers.

Another trend illustrated by figure 8.5 is also typical. The system energy
of an energy eigenfunction is given in terms of its bucket numbers by E I =
I1 E p1 + I2 E p2 + I3 E p3 , so all eigenfunctions with the same bucket numbers have
the same system energy. In particular, the squares just below the oblique cut-off
line in figure 8.5 have the highest system energy. It is seen that these bucket
numbers also have by far the most energy eigenfunctions:

The number of system energy eigenfunctions with a higher energy


typically dwarfs the number of system eigenfunctions with a lower
energy.

max max max


QI~ QI~ QI~

40% I2 /I 40% I2 /I 40% I2 /I

Figure 8.6: Number of energy eigenfunctions on the oblique energy line in 8.5.
(The curves are mathematically interpolated to allow a continuously varying
fraction of particles in bucket 2.) Left: 4 particles, middle: 64, right: 1024.

Next assume that the system has exactly the energy of the oblique cut-off
line in figure 8.5, with zero uncertainty. The number of energy eigenstates QI~ on
that oblique line is plotted in figure 8.6 as a function of the fraction of particles
I2 /I in bucket 2. (To get a smooth continuous curve, the values have been
mathematically interpolated in between the integer values of I2 . The continuous
8.4. PARTICLE-ENERGY DISTRIBUTION FUNCTIONS 349

function that interpolates n! is called the gamma function; see the notations
section under “!” for details.) The maximum number of energy eigenstates
occurs at about I2 /I = 40%, corresponding to I3 = 37% and I1 = 23%. This
set of occupation numbers, (I1 , I2 , I3 ) = (0.23, 0.40, 0.37)I, is called the most
probable set of occupation numbers. If you pick an eigenfunction at random,
you have more chance of getting one with that set of occupation numbers than
one with a different given set of occupation numbers.
To be sure, if the number of particles is large, the chances of picking any
eigenfunction with an exact set of occupation numbers is small. But note how
the “spike” in figure 8.6 becomes narrower with increasing number of particles.
You may not pick an eigenfunction with exactly the most probable set of bucket
numbers, but you are quite sure to pick one with bucket numbers very close to
it. By the time the system size reaches, say, 1020 particles, the spike becomes for
all practical purposes a mathematical line. Then essentially all eigenfunctions
have very precisely 23% of their particles in bucket 1 at energy E p1 , 40% in
bucket 2 at energy E p2 , and 37% in bucket 3 at energy E p3 .
Since there is only an incredibly small fraction of eigenfunctions that do not
have very accurately the most probable occupation numbers, it seems intuitively
obvious that in thermal equilibrium, the physical system must have the same
distribution of particle energies. Why would nature prefer one of those extremely
rare eigenfunctions that do not have these occupation numbers, rather than one
of the vast majority that do? In fact, {A.66},

It is a fundamental assumption of statistical mechanics that in ther-


mal equilibrium, all system energy eigenfunctions with the same en-
ergy have the same probability.

So the most probable set of bucket numbers, as found from the count of eigen-
functions, gives the distribution of particle energies in thermal equilibrium.
This then is the final conclusion: the particle energy distribution of a macro-
scopic system of weakly interacting particles at a given energy can be obtained
by merely counting the system energy eigenstates. It can be done without doing
any physics. Whatever physics may want to do, it is just not enough to offset
the vast numerical superiority of the eigenfunctions with very accurately the
most probable bucket numbers.

8.4 Particle-Energy Distribution Functions


The objective in this section is to relate the Maxwell-Boltzmann, Bose-Einstein,
and Fermi-Dirac particle energy distributions of chapter 7.9 to the conclusions
obtained in the previous section. The three distributions give the number of
particles that have given single-particle energies.
350 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

In terms of the picture developed in the previous sections, they describe how
many particles are in each energy bucket relative to the number of single-particle
states in the bucket. The distributions also assume that the number of buckets
is taken large enough that their energy can be assumed to vary continuously.
According to the conclusion of the previous section, for a system with given
energy it is sufficient to find the most probable set of energy bucket occupation
numbers, the set that has the highest number of system energy eigenfunctions.
That gives the number of particles in each energy bucket that is the most prob-
able. As the previous section demonstrated by example, the fraction of eigen-
functions that have significantly different bucket occupation numbers than the
most probable ones is so small for a macroscopic system that it can be ignored.
Therefor, the basic approach to find the three distribution functions is to
first identify all sets of bucket occupation numbers I~ that have the given energy,
and then among these pick out the set that has the most system eigenfunctions
QI~. There are some technical issues with that, {A.67}, but they can be worked
out, as in note {A.68}.
The final result is, of course, the particle energy distributions from chapter
7.9:
1 1 1
ιb = p ιd = p ιf = p .
e(E −µ)/kB T −1 e(E −µ)/kB T e(E −µ)/kB T +1
Here ι indicates the number of particles per single-particle state, more precisely,
ι = Ib /Nb . This ratio is independent of the precise details of how the buckets
are selected, as long as their energies are closely spaced. However, for identical
bosons it does assume that the number of single-particle states in a bucket is
large. If that assumption is problematic, the more accurate formulae in note
{A.68} should be consulted. The main case for which there is a real problem is
for the ground state in Bose-Einstein condensation.
It may be noted that “T ” in the above distribution laws is a temperature, but
the derivation in the note did not establish it is the same temperature scale that
you would get with an ideal-gas thermometer. That will be shown in section
8.14.4. For now note that T will normally have to be positive. Otherwise
the derived energy distributions would have the number of particles become
infinity at infinite bucket energies. For some weird system for which there is
an upper limit to the possible single-particle energies, this argument does not
apply, and negative temperatures cannot be excluded. But for particles in a
box, arbitrarily large energy levels do exist, see chapter 7.6, and the temperature
must be positive.
The derivation also did not show that µ in the above distributions is the
chemical potential as is defined in general thermodynamics. That will eventually
be shown in note {A.73}. Note that for particles like photons that can be
readily created or annihilated, there is no chemical potential; µ entered into the
8.5. THE CANONICAL PROBABILITY DISTRIBUTION 351

derivation in note {A.68} through the constraint that the number of particles
of the system is a given. A look at the note shows that the formulae still apply
for such transient particles if you simply put µ = 0.
For permanent particles, increasingly large negative values of the chemical
potential µ decrease the number of particles at all energies. Therefor large neg-
ative µ corresponds to systems of very low particle densities. If µ is sufficiently
p
negative that e(E −µ)/kB T is large even for the single-particle ground state, the
±1 that characterize the Fermi-Dirac and Bose-Einstein distributions can be
ignored compared to the exponential, and the three distributions become equal:

The symmetrization requirements for bosons and fermions can be


ignored under conditions of very low particle densities.

These are ideal gas conditions, section 8.14.4


Decreasing the temperature will primarily thin out the particle numbers at
high energies. In this sense, yes, temperature reductions are indeed to some
extent associated with (kinetic) energy reductions.

8.5 The Canonical Probability Distribution


The particle energy distribution functions in the previous section were derived
assuming that the energy is given. In quantum-mechanical terms, it was as-
sumed that the energy was certain. However, that cannot really be right, for
one because of the energy-time uncertainty principle.
Assume for a second that a lot of boxes of particles are carefully prepared, all
with a system energy as certain as it can be made. And that all these boxes are
then stacked together into one big system. In the combined system of stacked
boxes, the energy is presumably quite certain, since the random errors are likely
to cancel each other, rather than add up systematically. In fact, simplistic
statistics would expect the relative error in the energy of the combined system
to decrease like the square root of the number of boxes.
But for the carefully prepared individual boxes, the future of their energy cer-
tainty is much bleaker. Surely a single box in the stack may randomly exchange
a bit of energy with the other boxes. If the exchange of energy is completely
random, a single box is likely to acquire an uncertainty in its energy equal to
the typical exchanged amount times the square root of the number of boxes.
That would be an unlimited amount of uncertainty if the number of boxes is
made larger and larger. Of course, when a box acquires much more energy than
the others, the exchange will no longer be random, but almost certainly go from
the hotter box to the cooler ones. Still, it seems unavoidable that quite a lot of
uncertainty in the energy of the individual boxes would result. The boxes still
352 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

have a precise temperature, being in thermal equilibrium with the larger system,
but no longer a precise energy.
Then the appropriate way to describe the individual boxes is no longer in
terms of given energy, but in terms of probabilities. The proper expression
for the probabilities is “deduced” in note {A.69}. It turns out that when the
temperature T , but not the energy of a system is certain, the system energy
eigenfunctions ψqI can be assigned probabilities of the form

1 −E Iq /kB T
Pq = e (8.4)
Z
where kB = 1.38065 10−23 J/K is the Boltzmann constant. This equation for the
probabilities is called the Gibbs “canonical probability distribution.” Feynman
[7, p. 1] calls it the summit of statistical mechanics.
The exponential by itself is called the “Boltzmann factor.” The normaliza-
tion factor Z, which makes sure that the probabilities all together sum to one,
is called the “partition function.” It equals
X I
Z= e−E q /kB T (8.5)
all q

You might wonder why a mere normalization factor warrants its own name. It
turns out that if an analytical expression for the partition function Z(T, V, I) is
available, various quantities of interest may be found from it by taking suitable
partial derivatives. Examples will be given in subsequent sections.
The canonical probability distribution conforms to the fundamental assump-
tion of quantum statistics that eigenfunctions of the same energy have the same
probability. However, it adds that for system eigenfunctions with different ener-
gies, the higher energies are less likely. Massively less likely, to be sure, because
the system energy E Iq is a macroscopic energy, while the energy kB T is a micro-
scopic energy level, roughly the kinetic energy of a single atom in an ideal gas
at that temperature. So the Boltzmann factor decays extremely rapidly with
energy.
So, what happens to the simple model system from section 8.3 when the en-
ergy is no longer certain, and instead the probabilities are given by the canonical
probability distribution? The answer is in the middle graphic of figure 8.7. Note
that there is no longer a need to limit the displayed energies; the strong expo-
nential decay of the Boltzmann factor takes care of killing off the high energy
eigenfunctions. The rapid growth of the number of eigenfunctions does remain
evident at lower energies where the Boltzmann factor has not yet reached enough
strength.
There is still an oblique energy line in figure 8.7, but it is no longer limiting
energy; it is merely the energy at the most probable bucket occupation numbers.
8.6. LOW TEMPERATURE BEHAVIOR 353

I3 /I I3 /I I3 /I

48%
36%
27%

39% I2 /I 41% I2 /I 36% I2 /I

Figure 8.7: Probabilities for bucket-number sets for the simple 64 particle model
system if there is uncertainty in energy. More probable bucket-number distribu-
tions are shown darker. Left: identical bosons, middle: distinguishable particles,
right: identical fermions. The temperature is the same as in figure 8.5.

Equivalently, it is the “expectation energy” of the system, defined following the


ideas of chapter 3.3.1 as
X I
hEi ≡ Pq E q ≡ E
all q

because for a macroscopic system size, the most probable and expectation values
are the same. That is a direct result of the black blob collapsing towards a single
point for increasing system size: in a macroscopic system, essentially all system
eigenfunctions have the same macroscopic properties.
In thermodynamics, the expectation energy is called the “internal energy”
and indicated by E or U . This book will use E, dropping the angular brack-
ets. The difference from the single-particle/bucket/system energies is that the
internal energy is plain E with no subscripts or superscripts.
Figure 8.7 also shows the bucket occupation number probabilities if the
example 64 particles are not distinguishable, but identical bosons or identical
fermions. The most probable bucket numbers are not the same, since bosons and
fermions have different numbers of eigenfunctions than distinguishable particles,
but as the figure shows, the effects are not dramatic at the shown temperature,
kB T = 1.85 in the arbitrary energy units.

8.6 Low Temperature Behavior


The three-bucket simple model used to illustrate the basic ideas of quantum
statistics qualitatively can also be used to illustrate the low temperature behav-
ior that was discussed in chapter 7.9. To do so, however, the first bucket must
be taken to contain just a single, non degenerate ground state.
354 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

I3 /I I3 /I I3 /I

58%
47%
36%

64% I2 /I 51% I2 /I 41% I2 /I

Figure 8.8: Probabilities of bucket-number sets for the simple 64 particle model
system if bucket 1 is a non-degenerate ground state. Left: identical bosons,
middle: distinguishable particles, right: identical fermions. The temperature is
the same as in figure 8.7.

In that case, figure 8.7 of the previous section turns into figure 8.8. Neither
of the three systems sees much reason to put any measurable amount of particles
in the first bucket; why would they, it contains only one single-particle state out
of 177? In particular, the most probable bucket numbers are right at the 45◦
limiting line through the points I2 = I, I3 = 0 and I2 = 0, I3 = I on which
I1 = 0. Actually, the mathematics of the system of bosons would like to put a
negative number of bosons in the first bucket, and must be constrained to put
zero in it.

I3 /I I3 /I I3 /I

50%
33%
17%

56% I2 /I 64% I2 /I 49% I2 /I

Figure 8.9: Like figure 8.8, but at a lower temperature.

If the temperature is lowered however, as in figure 8.9 things change, espe-


cially for the system of bosons. Now the mathematics of the most probable state
wants to put a positive number of bosons in bucket one, and a large fraction
of them to boot, considering that it is only one state out of 177. The most
probable distribution drops way below the 45◦ limiting line. The mathematics
for distinguishable particles and fermions does not yet see any reason to panic,
8.6. LOW TEMPERATURE BEHAVIOR 355

and still leaves bucket 1 largely empty.

I3 /I I3 /I I3 /I

27%

0% 1%
6% I2 /I I2 /I 77% I2 /I 72%

Figure 8.10: Like figure 8.8, but at a still lower temperature.

When the temperature is lowered still much lower, as shown in figure 8.10,
almost all bosons drop into the ground state and the most probable state is right
next to the origin I2 = I3 = 0. In contrast, while the system of distinguishable
particles does recognize that high-energy bucket 3 becomes quite unreachable
with the available amount of thermal energy, it still has a quite significant
fraction of the particles in bucket 2. And the system of fermions will never drop
into bucket 1, however low the temperature. Because of the Pauli exclusion
principle, only one fermion out of the 64 can ever go into bucket one, and only
48, 75%. can go in bucket 2. The remaining 23% will stay in the high-energy
bucket however low the temperature goes.
If you still need convincing that temperature is a measure of hotness, and
not of thermal kinetic energy, there it is. The three systems of figure 8.10 are all
at the same temperature, but there are vast differences in their kinetic energy.
In thermal contact at very low temperatures, the system of fermions runs off
with almost all the energy, leaving a small morsel of energy for the system of
distinguishable particles, and the system of bosons gets practically nothing.
It is really weird. Any distribution of bucket numbers that is valid for dis-
tinguishable particles is exactly as valid for bosons and vice/versa; it is just the
number of eigenfunctions with those bucket numbers that is different. But when
the two systems are brought into thermal contact at very low temperatures, the
distinguishable particles get all the energy. It is just as possible from an energy
conservation and quantum mechanics point of view that all the energy goes to
the bosons instead of to the distinguishable particles. But it becomes astronom-
ically unlikely because there are so few eigenfunctions like that. (Do note that
it is assumed here that the temperature is so low that almost all bosons have
dropped in the ground state. As long as the temperatures do not become much
smaller than the one of Bose-Einstein condensation, the energies of systems of
bosons and distinguishable particles remain quite comparable, as in figure 8.9.)
356 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

8.7 The Basic Thermodynamic Variables


This section introduces the most important basic players in thermodynamics.
The primary thermodynamic property introduced so far is the temperature.
Recall that temperature is a measure of the hotness of the substance, a measure
of how eager it is to dump energy onto other systems. Temperature is called an
“intensive variable;“ it is the same for two systems that differ only in size.
The total number of particles I or the total volume of their box V are not
intensive variables; they are “extensive variables,“ variables that increase in
value proportional to the system size. Often, however, you are only interested
in the properties of your substance, not the amount. In that case, intensive
variables can be created by taking ratios of the extensive ones; in particular,
I/V is an intensive variable called the “particle density.” It is the number of
particles per unit volume. If you restrict your attention to only one half of your
box with particles, the particle density is still the same, with half the particles
in half the volume.
Note that under equilibrium conditions, it suffices to know the temperature
and particle density to fully fix the state that a given system is in. More
generally, the rule is that:

Two intensive variables must be known to fully determine the inten-


sive properties of a simple substance in thermal equilibrium.

(To be precise, in a two-phase equilibrium like a liquid-vapor mixture, pressure


and temperature are related, and would not be sufficient to determine something
like net specific volume. They do still suffice to determine the specific volumes of
the liquid and vapor parts individually, in any case.) If the amount of substance
is also desired, knowledge of at least one extensive variable is required, making
three variables that must be known in total.
Since the number of particles will have very large values, for macroscopic
work the particle density is often not very convenient, and somewhat differently
defined, but completely equivalent variables are used. The most common are the
(mass) “density” ρ, found by multiplying the particle density with the single-
particle mass m, ρ ≡ mI/V , or its inverse, the “specific volume” v ≡ V /mI.
The density is the system mass per unit system volume, and the specific volume
is the system volume per unit system mass.
Alternatively, to keep the values for the number of particles in check, they
may be expressed in “moles,” multiples of Avogadro’s number

IA = 6.0221 1023

That produces the “molar density” ρ̄ ≡ I/IA V and “molar specific volume”
v̄ ≡ V IA /I. In thermodynamic textbooks, the use of kilo mol (kmol) instead of
8.7. THE BASIC THERMODYNAMIC VARIABLES 357

mol has become quite standard (but then, so has the use of kilo Newton instead
of Newton.) The conversion factor between molar and non-molar specific quan-
tities is called the “molecular mass” M ; it is applied according to its dimensions
of kg/kmol. The numerical value of the molecular mass is roughly the total
number of protons and neutrons in the nuclei of a single molecule; in fact, the
weird number of particles given by Avogadro’s number was chosen to achieve
this.
So what else is there? Well, there is the energy of the system. In view
of the uncertainty in energy, the appropriate system energy is defined as the
expectation value,
X I
E= Pq E q (8.6)
all q

where Pq is the canonical probability of (8.4), (8.5). Quantity E is called the


“internal energy.” In engineering thermodynamics books, it is usually indicated
by U , but this is physics. The intensive equivalent e is found by dividing by the
system mass; e = E/mI. Note the convention of indicating extensive variables
by a capital and their intensive value per unit mass with the corresponding
lower case letter. A specific quantity on a molar basis is lower case with a bar
above it.
As a demonstration of the importance of the partition function mentioned in
the previous section, if the partition function (8.5) is differentiated with respect
to temperature, you get
à !
∂Z 1 X I −E Iq /kB T
= E e .
∂T V constant
kB T 2 all q q

(The volume of the system should be held constant in order that the energy
eigenfunctions do not change.) Dividing both sides by Z turns the derivative
in the left hand side into that of the logarithm of Z, and the sum in the right
hand side into the internal energy E, and you get
à !
2 ∂ ln Z
E = kB T (8.7)
∂T V constant

Next there is the “pressure” P , being the force with which the substance
pushes on the surfaces of the box it is in per unit surface area. To identify P
quantum mechanically, first consider a system in a single energy eigenfunction
E Iq for certain. If the volume of the box is slightly changed, there will be
a corresponding slight change in the energy eigenfunction E Iq , (the boundary
conditions of the Hamiltonian eigenvalue problem will change), and in particular
its energy will slightly change. Energy conservation requires that the change in
358 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

energy dE Iq is offset by the work done by the containing walls on the substance.
Now the work done by the wall pressure on the substance equals
−P dV.
(The force is pressure times area and is normal to the area; the work is force
times displacement in the direction of the force; combining the two, area times
displacement normal to that area gives change in volume. The minus sign is be-
cause the displacement must be inwards for the pressure force on the substance
to do positive work.) So for the system in a single eigenstate, the pressure
equals P = −dE Iq /dV . For a real system with uncertainty in energy and energy
eigenfunction, the pressure is defined as the expectation value:

X dE Iq
P =− Pq (8.8)
all q dV

It may be verified by simple substitution that this, too may be obtained from
the partition function, now by differentiating with respect to volume keeping
temperature constant:
à !
∂ ln Z
P = kB T (8.9)
∂V T constant

While the final quantum mechanical definition of the pressure is quite sound,
it should be pointed out that the original definition in terms of force was very
artificial. And not just because force is a poor quantum variable. Even if a
system in a single eigenfunction could be created, the walls of the system would
have to be idealized to assume that the energy change equals the work −P dV .
For example, if the walls of the box would consist of molecules that were hotter
than the particles inside, the walls too would add energy to the system, and
take it out of its single energy eigenstate to boot. And even macroscopically,
for pressure times area to be the force requires that the system is in thermal
equilibrium. It would not be true for a system evolving in a violent way.
Often a particular combination of the variables defined above is very conve-
nient; the“enthalpy” H is defined as
H = E + PV (8.10)
Enthalpy is not a fundamentally new variable, just a combination of existing
ones.
Assuming that the system evolves while staying at least approximately in
thermal equilibrium, the “first law of thermodynamics” can be stated macro-
scopically as follows:
dE = δQ − P dV (8.11)
8.7. THE BASIC THERMODYNAMIC VARIABLES 359

In words, the internal energy of the system changes by the amount δQ of heat
added plus the amount −P dV of work done on the system. It is just energy
conservation expressed in thermodynamic terms. (And it assumes that other
forms of energy than internal energy and work done while expanding can be
ignored.)
Note the use of a straight d for the changes in internal energy E and volume
V , but a δ for the heat energy added. It reflects that dE and dV are changes
in properties of the system, but δQ is not; δQ is a small amount of energy
exchanged between systems, not a property of any system. Also note that
while popularly you might talk about the heat within a system, it is standard
in thermodynamics to refer to the thermal energy within a system as internal
energy, and reserve the term “heat” for exchanged thermal energy.
Just two more variables. The “specific heat at constant volume” Cv is defined
as the heat that must be added to the substance for each degree temperature
change, per unit mass and keeping the volume constant. In terms of the first
law on a unit mass basis,
de = δq − P dv,

it means that Cv is defined as δq/dT when dv = 0. So Cv is the derivative


of the specific internal energy e with respect to temperature. To be specific,
since specifying e normally requires two intensive variables, Cv is the partial
derivative of e keeping specific volume constant:
à !
∂e
Cv ≡ (8.12)
∂T v

Note that in thermodynamics the quantity being held constant while taking the
partial derivative is shown as a subscript to parentheses enclosing the deriva-
tive. You did not see that in calculus, but that is because in mathematics, they
tend to choose a couple of independent variables and stick with them. In ther-
modynamics, two independent variables are needed, (assuming the amount of
substance is a given), but the choice of which two changes all the time. Therefor,
listing what is held constant in the derivatives is crucial.
The specific heat at constant pressure Cp is defined similarly as Cv , except
that pressure, instead of volume, is being held constant. According to the first
law above, the heat added is now de + P dv and that is the change in enthalpy
h = e + P v. There is the first practical application of the enthalpy already! It
follows that
à !
∂h
Cp ≡ (8.13)
∂T P
360 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

8.8 Introduction to the Second Law


Take a look around you. You are surrounded by air molecules. They are all over
the place. Isn’t that messy? Suppose there would be water all over the room,
wouldn’t you do something about it? Wouldn’t it be much neater to compress
all those air atoms together and put them in a glass? (You may want to wear
a space suit while doing this.)
The reality, of course, is that if you put all the air atoms in a glass, the high
pressure would cause the air to explode out of the glass and it would scatter all
over the room again. All your efforts would be for naught. It is like the clothes of
a ten-year old. Nature likes messiness. In fact, if messiness is properly defined,
and it will be in section 8.10, nature will always increase messiness as much as
circumstances and the laws of physics allow. The properly defined messiness is
called “entropy.” It is not to be confused with enthalpy, which is a completely
different concept altogether.
Entropy provides an unrelenting arrow of time. If you take a movie and run
it backwards, it simply does not look right, since you notice messiness getting
smaller, rather than larger. The movie of a glass of water slipping out of your
hand and breaking on the floor becomes, if run backwards, a spill of water and
pieces of glass combining together and jumping into your hand. It does not
happen. Messiness always increases. Even if you mop up the water and glue
the pieces of broken glass back together, it does not work. While you reduce
the messiness of the glass of water, you need to perform effort, and it turns out
that this always increases messiness elsewhere more than the messiness of the
glass of water is reduced.
It has big consequences. Would it not be nice if your car could run without
using gas? After all, there is lots of random kinetic energy in the air molecules
surrounding your car. Why not scope up some of that kinetic energy out of
the air and use it to run your car? It does not work because it would decrease
messiness in the universe, that’s why. It would turn messy random molecular
motion into organized motion of the engine of your car, and nature refuses to
do it. And there you have it, the second law of thermodynamics, or at least the
version of it given by Kelvin and Planck:
You cannot just take random thermal energy out of a substance and
turn it into useful work.
You expected a physical law to be a formula, instead of a verbal statement like
that? Well, you are out of luck for now.
To be sure, if the air around your car is hotter than the ground below it, then
it is possible with some ingenuity to set up a flow of heat from the air to the
ground, and you can then divert some of this flow of heat and turn it into useful
work. But that is not an unlimited free supply of energy; it stops as soon as the
8.9. THE REVERSIBLE IDEAL 361

temperatures of air and ground have become equal. The temperature difference
is an expendable energy source, much like oil in the ground is; you are not
simply scooping up random thermal energy out of a substance. If that sounds
like a feeble excuse, consider the following: after the temperature difference is
gone, the air molecules still have almost exactly the same thermal energy as
before, and the ground molecules have more. But you cannot get any of it out
anymore as usable energy. Zero. (Practically speaking, the amount of energy
you would get out of the temperature difference is not going to get you to work
in time anyway, but that is another matter.)
Would it not be nice if your fridge would run without electricity? It would
really save on the electricity bill. But it cannot be done; that is the Clausius
statement of the second law:

You cannot move heat the wrong way, from cold to hot, without doing
work.

It is the same thing as the Kelvin-Planck statement, of course. If you could


really have a fridge that ran for free, you could use it to create a temperature
difference, and you could use that temperature difference to run your car. So
your car would run for free. Conversely, if your car could run for free, you could
use the cigarette lighter socket to run your fridge for free.
As patent offices all over the world can confirm, the second law has been
solidly verified by countless masses of clever inventors all over the centuries
doing everything possible to get around it. All have failed, however ingenious
their tricks trying to fool nature. And don’t forget about the most brilliant
scientists of the last few centuries who have also tried wistfully and failed mis-
erably, usually by trying to manipulate nature on the molecular level. The two
verbal statements of the second law may not seem to have much mathematical
precision, but they do. If you find a kink in either one’s armor, however small,
the fabric of current science and technology comes apart. Fabulous riches will
be yours, and you will also be the most famous scientist of all time.

8.9 The Reversible Ideal


The statements of the previous section describing the second law are clearly
common sense: yes, you still need to plug in your fridge, and no, you cannot
skip the periodic stop at a gas station. What a surprise!
They seem to be fairly useless beyond that. For example, they say that it
takes electricity to run our fridge, but they do not say it how much. It might
be a megawatt, it might be a nanowatt.
Enter human ingenuity. With a some cleverness the two simple statements
of the second law can be greatly leveraged, allowing an entire edifice to be
362 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

constructed upon their basis.


A first insight is that if we are limited by nature’s unrelenting arrow of time,
then it should pay to study devices that almost ignore that arrow. If you make
a movie of a device, and it looks almost exactly right when run backwards, the
device is called (almost exactly) “reversible.” An example is a mechanism that
is carefully designed to move with almost no friction. If set into motion, the
motion will slow down only a negligible amount during a short movie. When
that movie is run backwards in time, at first glance it seems perfectly fine. If
you look more carefully, you will see a slight problem: in the backward movie,
the device is speeding up slightly, instead of slowing down due to friction as it
should. But it is almost right: it would require only a very small amount of
additional energy to speed up the actual device running backwards as it does
in the reversed movie.
Dollar signs may come in front of your eyes upon reading that last sentence:
it suggest that almost reversible devices may require very little energy to run.
In context of the second law it suggests that it may be worthwhile to study
refrigeration devices and engines that are almost reversible.
The second major insight is to look where there is light. Why not study, say,
a refrigeration device that is simple enough that it can be analyzed in detail?
At the very minimum it will give a standard against which other refrigeration
devices can be compared. And so it will be done.
High temperature side TH . (Kitchen.)
6 6 6 6 QH
6
heat exchanger ¾
'? $ ' $

⇐= turbine compressor ⇐=
& % & %
WT WC
6
- heat exchanger
6 6 6 6 6
QL
Low temperature side TL . (Fridge.)

Figure 8.11: Schematic of the Carnot refrigeration cycle.

The theoretically simple refrigeration device is called a “Carnot cycle” re-


frigeration device, or Carnot heat pump. A schematic is shown in figure 8.11.
A substance, the refrigerant, is circulating through four devices, with the ob-
jective of transporting heat out of the fridge, dumping it into the kitchen. In
the discussed device, the refrigerant will be taken to be some ideal gas with a
8.9. THE REVERSIBLE IDEAL 363

constant specific heat like maybe helium. You would not really want to use an
ideal gas as refrigerant in a real refrigerator, but the objective here is not to
make a practical refrigerator that you can sell for a profit. The purpose here is
to create a device that can be analyzed precisely, and an ideal gas is described
by simple mathematical formulae discussed in basic physics classes.
Consider the details of the device. The refrigerant enters the fridge at a
temperature colder than the inside of the fridge. It then moves through a long
piping system, allowing heat to flow out of the fridge into the colder refrigerant
inside the pipes. This piping system is called a heat exchanger. The first
reversibility problem arises: heat flow is most definitely irreversible. Heat flow
seen backwards would be flow from colder to hotter, and that is wrong. The
only thing that can be done to minimize this problem as much as possible is to
minimize the temperature differences. The refrigerant can be send in just slightly
colder than the inside of the fridge. Of course, if the temperature difference is
small, the surface through which the heat flows into the refrigerant will have to
be very large to take any decent amount of heat away. One impractical aspect
of Carnot cycles is that they are huge; that piping system cannot be small. Be
that as it may, the theoretical bottom line is that the heat exchange in the fridge
can be approximated as (almost) isothermal.
After leaving the inside of the refrigerator, the refrigerant is compressed to
increase its temperature to slightly above that of the kitchen. This requires
an amount WC of work to be done, indicating the need for electricity to run
the fridge. To avoid irreversible heat conduction in the compression process,
the compressor is thermally carefully insulated to eliminate any heat exchange
with its surroundings. Also, the compressor is very carefully designed to be
almost frictionless. It has expensive bearings that run with almost no friction.
Additionally, the refrigerant itself has “viscosity;” it experiences internal fric-
tion if there are significant gradients in its velocity. That would make the work
required to compress it greater than the ideal −P dV , and to minimize that
effect, the velocity gradients can be minimized by using lots of refrigerant. This
also has the effect of minimizing any internal heat conduction within the refrig-
erant that may arise. Viscosity is also an issue in the heat exchangers, because
the pressure differences cause velocity increases. With lots of refrigerant, the
pressure changes over the heat exchangers are also minimized.
Now the refrigerant is sent to a heat exchanger open to the kitchen air. Since
it enters slightly hotter than the kitchen, heat will flow out of the refrigerant into
the kitchen. Again, the temperature difference must be small for the process
to be almost reversible. Finally, the refrigerant is allowed to expand, which
reduces its temperature to below that inside the fridge. The expansion occurs
within a carefully designed turbine, because the substance does an amount of
work WT while expanding reversibly, and the turbine captures that work. It is
used to run a high-quality generator and recover some of the electric power WC
364 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

needed to run the compressor. Then the refrigerant reenters the fridge and the
cycle repeats.
If this Carnot refrigerator is analyzed theoretically, {A.70}, a very simple
result is found. The ratio of the heat QH dumped by the device into the kitchen
to the heat QL removed from the refrigerator is exactly the same as the ratio of
the temperature of the kitchen TH to that of the fridge TL :

QH TH
For an ideal cycle: = (8.14)
QL TL

That is a very useful result, because the net work W = WC − WT that must
go into the device is, by conservation of energy, the difference between QH and
QL . A “coefficient of performance” can be defined that is the ratio of the heat
QL removed from the fridge to the required power input W :

QL TL
For an ideal refrigeration cycle: β ≡ = (8.15)
W TH − TL

Actually, some irreversibility is unavoidable in real life, and the true work re-
quired will be more. The formula above gives the required work if everything is
truly ideal.
The same device can be used in winter to heat the inside of your house.
Remember that heat was dumped into the kitchen. So, just cross out “kitchen”
at the high temperature side in figure 8.11 and write in “house.” And cross
out “fridge“ and write in “outside.” The device removes heat from the outside
and dumps it into your house. It is the exact same device, but it is used for a
different purpose. That is the reason that it is no longer called a “refrigeration
cycle” but a “heat pump.” For an heat pump, the quantity of interest is the
amount of heat dumped at the high temperature side, into your house. So an
alternate coefficient of performance is now defined as

QH TH
For an ideal heat pump: β ′ ≡ = (8.16)
W TH − TL

The formula above is ideal. Real-life performance will be less, so the work
required will be more.
It is interesting to note that if you take an amount W of electricity and
dump it into a simple resistance heater, it adds exactly an amount W of heat
to your house. If you dump that same amount of electricity into a Carnot heat
pump that uses it to pump in heat from the outside, the amount of heat added
to your house will be much larger than W . For example, if it is 300 K (27◦ C)
inside and 275 K (2◦ C) outside, the amount of heat added is 300/25 = 12 W ,
twelve times the amount you got from the resistance heater!
8.9. THE REVERSIBLE IDEAL 365

High temperature side TH . (Fuel combustion.)


QH
? ? ? ? ?
- heat exchanger
' $ '? $

=⇒ compressor turbine =⇒
& % & %
WC WT
6
heat exchanger ¾
? ? ? ? QL
?
Low temperature side TL . (Environment.)

Figure 8.12: Schematic of the Carnot heat engine.

If you run the Carnot refrigeration cycle in reverse, as in figure 8.12, all
arrows reverse and it turns into a “heat engine.” The device now takes in
heat at the high temperature side and outputs a net amount of work. The
high temperature side is the place where you are burning the fuel. The low
temperature may be cooling water from the local river. The Kelvin-Planck
statement says that the device will not run unless some of the heat from the
combustion is dumped to a lower temperature. In a car engine, it are the
exhaust and radiator that take much of the heat away. Since the device is almost
reversible, the numbers for transferred heats and net work do not change much
from the non-reversed version. But the purpose is now to create work, so the
“thermal efficiency” of a heat engine is defined as
W TH − TL
For an ideal heat engine: ηth ≡ = (8.17)
QH TH
Unfortunately, this is always less than one. And to get close to that, the engine
must operate hot; the temperature at which the fuel is burned must be very
hot.
(Note that slight corrections to the strictly reversed refrigeration process
are needed; in particular, for the heat engine process to work, the substance
must now be slightly colder than TH at the high temperature side, and slightly
hotter than TL at the low temperature side. Heat cannot flow from colder to
hotter. But since these are small changes, the mathematics is almost the same.
In particular, the numerical values for QH and QL will be almost unchanged,
though the heat now goes the opposite way.)
The final issue to be resolved is whether other devices could not be better
than the Carnot ones. For example, could not a generic heat pump be more
366 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

High temperature side TH . (House.)

6 6 6 6 QH
6 QH
? ? ? ? ?
heat exchanger ¾ - heat exchanger
?
' $ '? $
Carnot Inc
Generic Heat Pump ⇐= compressor turbine =⇒
& %
Made in USA & %
Black Box Co.
WGeneric WCarnot
6
- heat exchanger 6 heat exchanger ¾
6 6 6 6 6
Q QL,Carnot
L,Generic ? ? ? ? ?

Low temperature side TL . (Outside.)

Figure 8.13: A generic heat pump next to a reversed Carnot one with the same
heat delivery.

efficient than the reversible Carnot version in heating a house? Well, put them
into different windows, and see. (The Carnot one will need the big window.)
Assume that both devices are sized to produce the same heat flow into the
house. On second thought, since the Carnot machine is reversible, run it in
reverse; that can be done without changing its numbers for the heat fluxes and
net work noticeably, and it will show up the differences between the devices.
The idea is shown in figure 8.13. Note that the net heat flow into the house
is now zero, confirming that running the Carnot in reverse really shows the
differences between the devices. Net heat is exchanged with the outside air
and there is net work. Enter Kelvin-Planck. According to Kelvin-Planck, heat
cannot simply be taken out of the outside air and converted into useful net work.
The net work being taken out of the air will have to be negative. So the work
required for the generic heat pump will need to be greater than that recovered
by the reversed Carnot one, the excess ending up as heat in the outside air. So,
the generic heat pump requires more work than a Carnot one running normally.
No device can therefor be more efficient than the Carnot one. The best case
is that the generic device, too, is reversible. In that case, neither device can
win, because the generic device can be made to run in reverse instead of the
Carnot one. That is the case where both devices are so perfectly constructed
that whatever work goes into the generic device is almost 100% recovered by
the reversed Carnot machine, with negligible amounts of work being turned into
heat by friction or other irreversibility and ending up in the outside air.
The conclusion is that:
All reversible devices exchanging heat at a given high temperature
8.10. ENTROPY 367

TH and low temperature TL , (and nowhere else,) have the same effi-
ciency. Irreversible devices have less.
To see that it is true for refrigeration cycles too, just note that because of con-
servation of energy, QL = QH − W . It follows that, considered as a refrigeration
cycle, not only does the generic heat pump above require more work, it also
removes less heat from the cold side. To see that it applies to heat engines too,
just place a generic heat engine next to a reversed Carnot one producing the
same power. The net work is then zero, and the heat flow QH of the generic
device better be greater than that of the Carnot cycle, because otherwise net
heat would flow from cold to hot, violating the Clausius statement. The heat
flow QH is a measure of the amount of fuel burned, so the non-reversible generic
device uses more fuel.
Practical devices may exchange heat at more than two temperatures, and
can be compared to a set of Carnot cycles doing the same. It is then seen that
it is bad news; for maximum theoretical efficiency of a heat engine, you prefer
to exchange heat at the highest available temperature and the lowest available
temperature, and for heat pumps and refrigerators, at the lowest available high
temperature and the highest available low temperature. But real-life and theory
are of course not the same.
Since the efficiency of the Carnot cycle has a unique relation to the tempera-
ture ratio between the hot and cold sides, it is possible to define the temperature
scale using the Carnot cycle. The only thing it takes is to select a single ref-
erence temperature to compare with, like water at its triple point. This was
in fact proposed by Kelvin as a conceptual definition, to be contrasted with
earlier definitions based on thermometers containing mercury or a similar fluid
whose volume expansion is read-off. While a substance like mercury expands in
volume very much linearly with the (Kelvin) temperature, it does not expand
exactly linearly with it. So slight variations in temperature would occur based
on which substance is arbitrarily selected for the reference thermometer. On
the other hand, the second law requires that all substances used in the Carnot
cycle will give the same Carnot temperature, with no deviation allowed. It may
be noted that the definition of temperature used in this chapter is completely
consistent with the Kelvin one, because “all” substances includes ideal gasses.

8.10 Entropy
With the cleverest inventors and the greatest scientists relentlessly trying to
fool nature and circumvent the second law, how come nature never once gets
confused, not even by the most complicated, convoluted, unusual, ingenious
schemes? Nature does not outwit them by out-thinking them, but by maintain-
ing an accounting system that cannot be fooled. Unlike human accounting sys-
368 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

tems, this accounting system does not assign a monetary value to each physical
system, but a measure of messiness called “entropy.” Then, in any transaction
within or between systems, nature simply makes sure that this entropy is not
being reduced; whatever entropy one system gives up must always be less than
what the other system receives.
So what can this numerical grade of messiness called entropy be? Surely, it
must be related somehow to the second law as stated by Clausius and Kelvin
and Planck, and to the resulting Carnot engines that cannot be beat. Note
that the Carnot engines relate heat added to temperature. In particular an
infinitesimally small Carnot engine would take in an infinitesimal amount δQH
of heat at a temperature TH and give up an infinitesimal amount δQL at a
temperature TL . This is done so that δQH /δQL = TH /TL , or separating the
two ends of the device, δQH /TH = δQL /TL . The quantity δQ/T is the same at
both sides, except that one is going in and the other out. Might this, then, be
the change in messiness? After all, for the ideal reversible machine no messiness
can be created, otherwise in the reversed process, messiness would be reduced.
Whatever increase in messiness one side receives, the other side must give up,
and δQ/T fits the bill for that.
If δQ/T gives the infinitesimal change in messiness, excuse, entropy, then it
should be possible to find the entropy of a system by integration. In particular,
choosing some arbitrary state of the system as reference, the entropy of a system
in thermal equilibrium can be found as:
Z desired state δQ
S ≡ Sref + along any reversible path (8.18)
reference state T

P CrDr uS = ?

ref u r r
AB

Figure 8.14: Comparison of two different integration paths for finding the en-
tropy of a desired state. The two different integration paths are in black and
the yellow lines are reversible adiabatic process lines.

The entropy as defined above is a specific number for a system in thermal


equilibrium, just like its pressure, temperature, particle density, and internal
8.10. ENTROPY 369

energy are specific numbers. You might think that you could get a different
value for the entropy by following a different process path from the reference
state to the desired state. But the second law prevents that. To see why,
consider the pressure-volume diagram in figure 8.14. Two different reversible
processes are shown leading from the reference state to a desired state. A
bundle of reversible adiabatic process lines is also shown; those are graphical
representations of processes in which there is no heat exchange between the
system and its surroundings. The bundle of adiabatic lines chops the two process
paths into small pieces, of almost constant temperature, that pairwise have the
same value of δQ/T . For, if a piece like AB would have a lower value for
δQ/T than the corresponding piece CD, then a heat engine running the cycle
CDBAC would lose less of the heat δQH at the low temperature side than
the Carnot ideal, hence have a higher efficiency than Carnot and that is not
possible. Conversely, if AB would have a higher value for δQ/T than CD, then
a refrigeration device running the cycle ABDCA would remove more heat from
the low side than Carnot, again not possible. So all the little segments pairwise
have the same value for δQ/T , which means the complete integrals must also
be the same. It follows that the entropy for a system in thermal equilibrium is
uniquely defined.
So what happens if the reference and final states are still the same, but
there is a slight glitch for a single segment AB, making the process over that
one segment irreversible? In that case, the heat engine argument no longer
applies, since it runs through the segment AB in reversed order, and irreversible
processes cannot be reversed. The refrigeration cycle argument says that the
amount of heat δQ absorbed by the system will be less; more of the heat δQ
going out at the high temperature side CD will come from the work done, and
less from the heat removed at the cold side. The final entropy is still the same,
because it only depends on the final state, not on the path to get there. So
during the slight glitch, the entropy of the system increased more than δQ/T .
In general:
δQ
dS ≥ (8.19)
T
where the = applies if the change is reversible and > if it is not.
Note that the above formula is only valid if the system has an unambiguous
temperature, as in this particular example. Typically this is simply not true
in irreversible processes; for example, the interior of the system might be hot-
ter than the outside. The real importance of the above formula is to confirm
that the defined entropy is indeed a measure of messiness and not of order;
reversible processes merely shuffle entropy around from one system to the next,
but irreversible processes increase the net entropy content in the universe.
So what about the entropy of a system that is not in thermal equilibrium?
370 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

Equation (8.18) only applies for systems in thermal equilibrium. In order for
nature not to become confused in its entropy accounting system, surely entropy
must still have a numerical value for non equilibrium systems. If the problem
is merely temperature or pressure variations, where the system is still in ap-
proximate thermal equilibrium locally, you could just integrate the entropy per
unit volume over the volume. But if the system is not in thermal equilibrium
even on macroscopically small scales, it gets much more difficult. For example,
air crossing a typical shock wave (sonic boom) experiences a significant increase
in pressure over an extremely short distance. Better bring out the quantum
mechanics trick box. Or at least molecular dynamics.
Still, some important general observations can be made without running
to a computer. An “isolated” system is a system that does not interact with
its surroundings in any way. Remember the example where the air inside a
room was collected and neatly put inside a glass? That was an example of an
isolated system. Presumably, the doors of the room were hermetically sealed.
The walls of the room are stationary, so they do not perform work on the air
in the room. And the air comes rushing back out of the glass so quickly that
there is really no time for any heat conduction through the walls. If there is no
heat conduction with the outside, then there is no entropy exchange with the
outside. So the entropy of the air can only increase due to irreversible effects.
And that is exactly what happens: the air exploding out of the glass is highly
irreversible, (no, it has no plans to go back in), and its entropy increases rapidly.
Quite quickly however, the air spreads again out over the entire room and settles
down. Beyond that point, the entropy remains further constant.

An isolated system evolves to the state of maximum possible entropy


and then stays there.

The state of maximum possible entropy is the thermodynamically stable state


a system will assume if left alone.
A more general system is an “adiabatic” or “insulated” system. Work may
be performed on such a system, but there is still no heat exchange with the
surroundings. That means that the entropy of such a system can again only
increase due to reversibility. A simple example is a thermos bottle with a cold
drink inside. If you continue shaking this thermos bottle violently, the cold
drink will heat up due to its viscosity, its internal friction, and it will not stay
a cold drink for long. Its entropy will increase while you are shaking it.

The entropy of adiabatic systems can only increase.

But, of course, that of an open system may not. It is the recipe of life, {A.71}.
You might wonder why this book on quantum mechanics included a concise,
but still very lengthy classical description of the second law. It is because the
8.10. ENTROPY 371

evidence for the second law is so much more convincing based on the macro-
scopic evidence than on the microscopic one. Macroscopically, the most complex
systems can be accurately observed, microscopically, the quantum mechanics of
only the most simplistic systems can be rigorously solved. And whether we can
observe the solution is still another matter.
However, given the macroscopic fact that there really is an accounting mea-
sure of messiness called entropy, the question becomes what is its actual mi-
croscopic nature? Surely, it must have a relatively simple explanation in terms
of the basic microscopic physics? For one, nature never seems to get confused
about what it is, and for another, you really would expect something that is
clearly so fundamental to nature to be relatively esthetic when expressed in
terms of mathematics.
And that thought is all that is needed to guess the true microscopic nature
of entropy. And guessing is good, because it gives a lot of insight why entropy
is what it is. And to ensure that the final result is really correct, it can be cross
checked against the macroscopic definition (8.18) and other known facts about
entropy.
The first guess is about what physical microscopic quantity would be in-
volved. Now microscopically, a simple system is described by energy eigenfunc-
tions ψqI , and there is nothing messy about those. They are the systematic
solutions of the Hamiltonian eigenvalue problem. But these eigenfunctions have
probabilities Pq , being the square magnitudes of their coefficients, and they are
a different story. A system of a given energy could in theory exist neatly as a
single energy eigenfunction with that energy. But according to the fundamen-
tal assumption of quantum statistics, this simply does not happen. In thermal
equilibrium, every single energy eigenfunction of the given energy achieves about
the same probability. Instead of nature neatly leaving the system in the single
eigenfunction it may have started out with, it gives every Johnny-come-lately
state about the same probability, and it becomes a mess.
If the system is in a single eigenstate for sure, the probability Pq of that one
eigenstate is one, and all others are zero. But if the probabilities are equally
spread out over a large number, call it N , of eigenfunctions, then each eigenfunc-
tion receives a probability Pq = 1/N . So your simplest thought would be that
maybe entropy is the average value of the probability. In particular, just like the
P P
average energy is Pq E Iq , the average probability would be Pq2 . It is always
the sum of the values for which you want the average times their probability.
P
You second thought would be that since Pq2 is one for the single eigenfunction
P
case, and 1/N for the spread out case, maybe the entropy should be − Pq2 in
order that the single eigenfunction case has the lower value of messiness. But
macroscopically it is known that you can keep increasing entropy indefinitely
by adding more and more heat, and the given expression starts at minus one
and never gets above zero.
372 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

So try a slightly more general possibility, that the entropy is the average of
P
some function of the probability, as in S = Pq f (Pq ). The question is then,
what function? Well, macroscopically it is also known that entropy is additive,
the values of the entropies of two systems simply add up. It simplifies nature’s
task of maintaining a tight accounting system on messiness. For two systems
with probabilities Pq and Pr ,
X X
S= Pq f (Pq ) + Pr f (Pr )
q r

This can be rewritten as


XX XX
S= Pq Pr f (Pq ) + Pq Pr f (Pr ).
q r q r

since probabilities by themselves must sum to one. On the other hand, if you
combine two systems, the probabilities multiply, just like the probability of
throwing a 3 with your red dice and a 4 with your black dice is 61 × 61 . So the
combined entropy should also be equal to
XX
S= Pq Pr f (Pq Pr )
q r

Comparing this with the previous equation, you see that f (Pq Pr ) must equal
f (Pq ) + f (Pr ). The function that does that is the logarithmic function. More
precisely, you want minus the logarithmic function, since the logarithm of a
small probability is a large negative number, and you need a large positive
messiness if the probabilities are spread out over a large number of states. Also,
you will need to throw in a factor to ensure that the units of the microscopically
defined entropy are the same as the ones in the macroscopical definition. The
appropriate factor turns out to be the Boltzmann constant kB = 1.38065 10−23
J/K; note that this factor has absolutely no effect on the physical meaning of
entropy; it is just a matter of agreeing on units.
The microscopic definition of entropy has been guessed:
X
S = −kB Pq ln(Pq ) (8.20)

That wasn’t too bad, was it? Note that if the system is in a single eigenstate, the
entropy is zero, because ln(1) = 0. The most important example is if the system
is in the ground state; that is known as the “third law of thermodynamics:” the
entropy is zero at absolute zero. (Even if the ground state is degenerate, the
number of states would be so small compared to normal numbers of states that
the entropy is zero for all practical purposes.)
If more than one eigenfunction has a nonzero probability the entropy is
positive, because logarithms of numbers less than one are negative. (It should
8.10. ENTROPY 373

be noted that Pq ln Pq becomes zero when Pq becomes zero; the blow up of


ln Pq is no match for the reduction in magnitude of Pq . So highly improbable
states will not contribute significantly to the entropy despite their relatively
large values of the logarithm.)
To put the definition of entropy on a less abstract basis, assume that you
schematize the system of interest into unimportant eigenfunctions that you give
zero probability, and a remaining N important eigenfunctions that all have the
same average probability 1/N . Sure, it is crude, but it is just to get an idea.
In this simple model, the entropy is kB ln(N ), proportional to the logarithm of
the number of quantum states that have an important probability. The more
states, the higher the entropy. This is what you will find in popular expositions.
And it would actually be correct for systems with zero indeterminacy in energy,
if they existed.
The next step is to check the expression. Derivations are given in note
{A.72}, but here are the results. For systems in thermal equilibrium, is the
entropy the same as the one given by the classical integration (8.18)? Check.
Does the entropy exist even for systems that are not in thermal equilibrium?
Check, quantum mechanics still applies. For a system of given energy, is the
entropy smallest when the system is in a single energy eigenfunction? Check,
it is zero then. For a system of given energy, is the entropy the largest when
all eigenfunctions of that energy have the same probability, as the fundamental
assumption of quantum statistics suggests? Check. For a system with given
expectation energy but uncertainty in energy, is the entropy highest when the
probabilities are given by the canonical probability distribution? Check. For
two systems in thermal contact, is the entropy greatest when their temperatures
have become equal? Check.
Feynman [7, p. 8] gives an argument to show that the entropy of an isolated
system always increases with time. Taking the time derivative of (8.20),

dS X dPq XX
= −kB [ln(Pq ) + 1] = −kB [ln(Pq ) + 1]Rqr [Pr − Pq ],
dt q dt q r

the final equality being from time-dependent perturbation theory, with Rqr =
Rrq > 0 the transition rate from state q to state p. In the double summation, a
typical term with indices q and r combines with the term having the reversed
indices as
kB [ln(Pr ) + 1 − ln(Pq ) − 1]Rqr [Pr − Pq ]
and that is always greater that zero because the terms in the square brackets
have the same sign: if Pq is greater/less than Pr then so is ln(Pq ) greater/less
than ln(Pr ). However, given the dependence of time-dependent perturbation
theory on linearization and worse, the “measurement” wild card, chapter 5.2
374 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

you might consider this more a validation of time dependent perturbation theory
than of the expression for entropy.
In any case, it may be noted that the checks on the expression for entropy,
as given above, cut both ways. If you accept the expression for entropy, the
canonical probability distribution follows. They are consistent, and in the end,
it is just a matter of which of the two postulates you are more willing to accept
as true.

8.11 The Big Lie of Distinguishable Particles


If you try to find the entropy of the system of distinguishable particles that
produces the Maxwell-Boltzmann distribution, you are in for an unpleasant
surprise. It just cannot be done. The problem is that the number of eigen-
functions for I distinguishable particles is typically roughly I! larger than for
I identical bosons or fermions. If the typical number of states becomes larger
by a factor I!, the logarithm of the number of states increases by I ln I, (using
the Stirling formula), which is no longer proportional to the size of the system
I, but much larger than that. The specific entropy would blow up with system
size.
What gives? Now the truth must be revealed. The entire notion of distin-
guishable particles is a blatant lie. You are simply not going to have 1023 distin-
guishable particles in a box. Assume they would be 1023 different molecules. It
would a take a chemistry handbook of 1021 pages to list them, one line for each.
Make your system size 1,000 times as big, and the handbook gets 1,000 times
thicker still. That would be really messy! When identical bosons or fermions
are far enough apart that their wave functions do no longer overlap, the sym-
metrization requirements are no longer important for most practical purposes.
But if you start counting energy eigenfunctions, as entropy does, it is a different
story. Then there is no escaping the fact that the particles really are, after all,
indistinguishable forever.

8.12 The New Variables


The new kid on the block is the entropy S. For an adiabatic system the entropy
is always increasing. That is highly useful information, if you want to know
what thermodynamically stable final state an adiabatic system will settle down
into. No need to try to figure out the complicated time evolution leading to the
final state. Just find the state that has the highest possible entropy S, that will
be the stable final state.
But a lot of systems of interest are not well described as being adiabatic.
8.12. THE NEW VARIABLES 375

A typical alternative case might be a system in a rigid box in an environment


that is big enough, and conducts heat well enough, that it can at all times
be taken to be at the same temperature Tsurr . Also assume that initially the
system itself is in some state 1 at the ambient temperature Tsurr , and that it
ends up in a state 2 again at that temperature. In the evolution from 1 to 2,
however, the system temperature could be be different from the surroundings,
or even undefined, no thermal equilibrium is assumed. The first law, energy
conservation, says that the heat Q12 added to the system from the surroundings
equals the change in internal energy E2 − E1 of the system. Also, the entropy
change in the isothermal environment will be −Q12 /Tsurr , so the system entropy
change S2 − S1 must be at least Q12 /Tsurr in order for the net entropy in the
universe not to decrease. From that it can be seen by simply writing it out that
the “Helmholtz free energy”

F = E − TS (8.21)

is smaller for the final system 2 than for the starting one 1. In particular, if the
system ends up into a stable final state that can no longer change, it will be
the state of smallest possible Helmholtz free energy. So, if you want to know
what will be the final fate of a system in a rigid, heat conducting, box in an
isothermal environment, just find the state of lowest possible Helmholtz energy.
That will be the one.
A slightly different version occurs even more often in real applications. In
these the system is not in a rigid box, but instead its surface is at all times
exposed to ambient atmospheric pressure. Energy conservation now says that
the heat added Q12 equals the change in internal energy E2 − E1 plus the work
done expanding against the atmospheric pressure, which is Psurr (V2 − V1 ). As-
suming that both the initial state 1 and final state 2 are at ambient atmospheric
pressure, as well as at ambient temperature as before, then it is seen that the
quantity that decreases is the “Gibbs free energy”

G = H − TS (8.22)

in terms of the enthalpy H defined as H = E + P V . As an example, phase


equilibria are at the same pressure and temperature. In order for them to be
stable, the phases need to have the same specific Gibbs energy. Otherwise all
particles would end up in whatever phase has the lower Gibbs energy. Similarly,
chemical equilibria are often posed at an ambient pressure and temperature.
There are a number of differential expressions that are very useful in doing
thermodynamics. The primary one is obtained by combining the differential
first law (8.11) with the differential second law (8.19) for reversible processes:

dE = T dS − P dV (8.23)
376 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

This no longer involves the heat transferred from the surroundings, just state
variables of the system itself. The equivalent one using the enthalpy H instead
of the internal energy E is

dH = T dS + V dP (8.24)

The differentials of the Helmholtz and Gibbs free energies are, after cleaning
up with the two expressions immediately above:

dF = −S dT − P dV (8.25)

and
dG = −S dT + V dP (8.26)
Expression (8.25) shows that the work obtainable in an isothermal reversible
process is given by the decrease in Helmholtz free energy. That is why Helmholtz
called it “free energy” in the first place. The Gibbs free energy is applicable
to steady flow devices such as compressors and turbines; the first law for these
devices must be corrected for the “flow work” done by the pressure forces on
the substance entering and leaving the device. The effect is to turn P dV into
−V dP as the differential for the actual work obtainable from the device. (This
assumes that the kinetic and/or potential energy that the substance picks up
while going through the device is a not a factor.)
Maxwell noted that, according to the total differential of calculus, the coef-
ficients of the differentials in the right hand sides of (8.23) through (8.26) must
be the partial derivatives of the quantity in the left hand side:
à ! à ! à ! à !
∂E ∂E ∂T ∂P
= T = −P =− (8.27)
∂S V
∂V S
∂V S
∂S V
à ! à ! à ! à !
∂H ∂H ∂T ∂V
= T = V = (8.28)
∂S P
∂P S
∂P S
∂S P
à ! à ! à ! à !
∂F ∂F ∂S ∂P
= −S = −P = (8.29)
∂T V
∂V T
∂V T
∂T V
à ! à ! à ! à !
∂G ∂G ∂S ∂V
= −S = V =− (8.30)
∂T P
∂P T
∂P T
∂T P

The final equation in each line can be verified by substituting in the previous
two and noting that the order of differentiation does not make a difference.
Those are called the “Maxwell relations.” They have a lot of practical uses. For
example, either of the final equations in the last two lines allows the entropy
to be found if the relationship between the “normal” variables P , V , and T is
8.12. THE NEW VARIABLES 377

known, assuming that at least one data point at every temperature is already
available. Even more important from an applied point of view, the Maxwell
relations allow whatever data you find about a substance in literature to be
stretched thin. Approximate the derivatives above with difference quotients,
and you can compute a host of information not initially in your table or graph.
There are two even more remarkable relations along these lines. They follow
from dividing (8.23) and (8.24) by T and rearranging so that S becomes the
quantity differenced. That produces
à ! à ! à ! à !
∂S 1 ∂E ∂S 1 ∂E P
= = +
∂T V
T ∂T V
∂V T
T ∂V T
T
à ! à !
∂E 2 ∂P/T
=T (8.31)
∂V T
∂T V
à ! à ! à ! à !
∂S 1 ∂H ∂S 1 ∂H V
= = −
∂T P
T ∂T P
∂P T
T ∂P T
T
à ! à !
∂H 2 ∂V /T
= −T (8.32)
∂P T
∂T P
What is so remarkable is the final equation in each case: they do not involve
entropy in any way, just the “normal” variables P , V , T , H, and E. Merely be-
cause entropy exists, there must be relationships between these variables which
seemingly have absolutely nothing to do with the second law.
As an example, consider an ideal gas, more precisely, any substance that
satisfies the ideal gas law

kB Ru kJ
P v = RT with R = = Ru = 8.314472 (8.33)
m M kmol K

The constant R is called the specific gas constant; it can be computed from
the ratio of the Boltzmann constant kB and the mass of a single molecule m.
Alternatively, it can be computed from the “universal gas constant” Ru = IA kB
and the molecular mass M = IA m. For an ideal gas like that, the equations
above show that the internal energy and enthalpy are functions of temperature
only. And then so are the specific heats Cv and Cp , because those are their
temperature derivatives:

For ideal gases: e, h, Cv , Cp = e, h, Cv , Cp (T ) CP = Cv + R (8.34)

(The final relation is because CP = dh/dT = d(e + P v)/dT with de/dT = Cv


and P v = RT .) Ideal gas tables can therefor be tabulated by temperature
only, there is no need to include a second independent variable. You might
378 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

think that entropy should be tabulated against both varying temperature and
varying pressure, because it does depend on both pressure and temperature.
However, the Maxwell equation (8.30) may be used to find the entropy at any
pressure as long as it is listed for just one pressure, say for one bar.
There is a sleeper among the Maxwell equations; the very first one, in (8.27).
Turned on its head, it says that
à !
1 ∂S
= (8.35)
T ∂E V and other external parameters fixed

This can be used as a definition of temperature. Note that in taking the deriva-
tive, the volume of the box, the number of particles, and other external pa-
rameters, like maybe an external magnetic field, must be held constant. To
understand qualitatively why the above derivative defines a temperature, con-
sider two systems A and B for which A has the larger temperature according
to the definition above. If these two systems are brought into thermal contact,
then net messiness increases when energy flows from high temperature system
A to low temperature system B, because system B, with the higher value of the
derivative, increases its entropy more than A decreases its.
Of course, this new definition of temperature is completely consistent with
the ideal gas one; it was derived from it. However, the new definition also works
fine for negative temperatures. Assume a system A has a negative tempera-
ture according to he definition above. Then its messiness (entropy) increases
if it gives up heat. That is in stark contrast to normal substances at positive
temperatures that increase in messiness if they take in heat. So assume that
system A is brought into thermal contact with a normal system B at a positive
temperature. Then A will give off heat to B, and both systems increase their
messiness, so everyone is happy. It follows that A will give off heat however hot
is the normal system it is brought into contact with. While the temperature
of A may be negative, it is hotter than any substance with a normal positive
temperature!
And now the big question: what is that “chemical potential” you hear so
much about? Nothing new, really. For a pure substance with a single constituent
like this chapter is supposed to discuss, the chemical potential is just the specific
Gibbs free energy on a molar basis, µ̄ = ḡ. More generally, if there is more than
one constituent the chemical potential µ̄c of each constituent c is best defined
as à !
∂G
µ̄c ≡ (8.36)
∂ı̄c P,T

(If there is only one constituent, then G = ı̄ḡ and the derivative does indeed
produce ḡ. Note that an intensive quantity like ḡ, when considered to be a
8.12. THE NEW VARIABLES 379

function of P , T , and ı̄, only depends on the two intensive variables P and T ,
not on the amount of particles ı̄ present.) If there is more than one constituent,
and assuming that their Gibbs free energies simply add up, as in
X
G = ı̄1 ḡ1 + ı̄ḡ2 + . . . = ı̄c ḡc ,
c

then the chemical potential µ̄c of each constituent is simply the molar specific
Gibbs free energy ḡc of that constituent,
The partial derivatives described by the chemical potentials are important for
figuring out the stable equilibrium state a system will achieve in an isothermal,
isobaric, environment, i.e. in an environment that is at constant temperature
and pressure. As noted earlier in this section, the Gibbs free energy must be
as small as it can be in equilibrium at a given temperature and pressure. Now
according to calculus, the full differential for a change in Gibbs free energy is

∂G ∂G ∂G ∂G
dG(P, T, ı̄1 , ı̄2 , . . .) = dT + dP + dı̄1 + dı̄2 + . . .
∂T ∂P ∂ı̄1 ∂ı̄2

The first two partial derivatives, which keep the number of particles fixed, were
identified in the discussion of the Maxwell equations as −S and V ; also the
partial derivatives with respect to the numbers of particles of the constituent
have been defined as the chemical potentials µ̄c . Therefor more shortly,
X
dG = −S dT + V dP + µ̄1 dı̄1 + µ̄2 dı̄2 + . . . = −S dT + V dP + µ̄c dı̄c
c
(8.37)
At equilibrium at given temperature and pressure, the Gibbs energy must be
minimal. It means that dG must be zero whenever dT = dP = 0, regardless
of any infinitesimal changes in the amounts of the constituents. That gives a
condition on the fractions of the constituents present.
Note that there are typically constraints on the changes dı̄c in the amounts
of the constituents. For example, in a liquid-vapor “phase equilibrium,” any ad-
ditional amount of particles dı̄f that condenses to liquid must equal the amount
−dı̄g of particles that disappears from the vapor phase. (The subscripts fol-
low the unfortunate convention liquid=fluid=f and vapor=gas=g. Don’t ask.)
Putting this relation in (8.37) it can be seen that the liquid and vapor phase
must have the same chemical potential, µ̄f = µ̄g . Otherwise the Gibbs free
energy would get smaller when more particles enter whatever is the phase of
lowest chemical potential and the system would collapse completely into that
phase alone.
The equality of chemical potentials suffices to derive the famous Clausius-
Clapeyron equation relating pressure changes under two-phase, or “saturated,”
380 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

conditions to the corresponding temperature changes. For, the changes in chem-


ical potentials must be equal too, dµf = dµg , and substituting in the differential
(8.26) for the Gibbs free energy, taking it on a molar basis since µ̄ = ḡ,

−s̄f dT + v̄f dP = −s̄g dT + v̄g dP

and rearranging gives the Clausius-Clapeyron equation:


dP sg − sf
=
dT vg − vf
Note that since the right-hand side is a ratio, it does not make a difference
whether you take the entropies and volumes on a molar basis or on a mass
basis. The mass basis is shown since that is how you will typically find the
entropy and volume tabulated. Typical engineering thermodynamic textbooks
will also tabulate sf g = sg − sf and vf g = vg − vf , making the formula above
very convenient.
In case your tables do not have the entropies of the liquid and vapor phases,
they often still have the “latent heat of vaporization,” also known as “enthalpy
of vaporization” or similar, and in engineering thermodynamics books typically
indicated by hf g . That is the difference between the enthalpy of the saturated
liquid and vapor phases, hf g = hg − hf . If saturated liquid is turned into
saturated vapor by adding heat under conditions of constant pressure and tem-
perature, (8.24) shows that the change in enthalpy hg − hf equals T (sg − sf ).
So the Clausius-Clapeyron equation can be rewritten as

dP hf g
= (8.38)
dT T (vg − vf )

Because T ds is the heat added, the physical meaning of the latent heat of
vaporization is the heat needed to turn saturated liquid into saturated vapor
while keeping the temperature and pressure constant.
For chemical reactions, like maybe

2H2 + O2 ⇐⇒ 2H2 O,

the changes in the amounts of the constituents are related as

dı̄H2 = −2dr̄ dı̄O2 = −1dr̄ dı̄H2 O = 2dr̄

where dr̄ is the additional number of times the forward reaction takes place from
the starting state. The constants −2, −1, and 2 are called the “stoichiometric
coefficients.” They can be used when applying the condition that at equilibrium,
the change in Gibbs energy due to an infinitesimal amount of further reactions
dr̄ must be zero.
8.13. MICROSCOPIC MEANING OF THE VARIABLES 381

However, chemical reactions are often posed in a context of constant volume


rather than constant pressure, for one because it simplifies the reaction kine-
matics. For constant volume, the Helmholtz free energy must be used instead
of the Gibbs one. Does that mean that a second set of chemical potentials is
needed to deal with those problems? Fortunately, the answer is no, the same
chemical potentials will do for Helmholtz problems. To see why, note that by
definition F = G − P V , so dF = dG − P dV − V dP , and substituting for dG
from (8.37), that gives
X
dF = −S dT − P dV + µ̄1 dı̄1 + µ̄2 dı̄2 + . . . = −S dT + V dP + µ̄c dı̄c
c
(8.39)
Under isothermal and constant volume conditions, the first two terms in the
right hand side will be zero and F will be minimal when the differentials with
respect to the amounts of particles add up to zero.
Does this mean that the chemical potentials are also specific Helmholtz free
energies, just like they are specific Gibbs free energies? Of course the answer
is no, and the reason is that the partial derivatives of F represented by the
chemical potentials keep extensive volume V , instead of intensive molar specific
volume v̄ constant. A single-constituent molar specific Helmholtz energy f¯ can
be considered to be a function f¯(T, v̄) of temperature
³ and molar specific´ volume,
two intensive variables, and then F = ı̄f¯(T, v̄), but ∂ı̄f¯(T, V /ı̄)/∂ı̄ does not
³ ´ TV
simply produce f¯, even if ∂ı̄ḡ(T, P )/∂ı̄ produces ḡ.
TP

8.13 Microscopic Meaning of the Variables


The new variables introduced in the previous section assume the temperature to
be defined, hence there must be thermodynamic equilibrium in some meaningful
sense. That is important for identifying their microscopic descriptions, since
I
the canonical expression Pq = e−E q /kt /Z can be used for the probabilities of the
energy eigenfunctions.
Consider first the Helmholtz free energy:
X X ³ I ´
I
F = E − TS = Pq E q + T kB Pq ln e−E q /kB T /Z
q q

This can be simplified by taking apart the logarithm, and noting that the prob-
P
abilities must sum to one, q Pq = 1, to give

F = −kB T ln Z (8.40)
That makes strike three for the partition function Z, since it already was able
to produce the internal energy E, (8.7), and the pressure P , (8.9). Knowing
382 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

Z as a function of volume V , temperature T , and number of particles I is all


that is needed to figure out the other variables. Indeed, knowing F is just as
good as knowing the entropy S, since F = E − T S. It illustrates why the
partition function is much more valuable than you might expect from a mere
normalization factor of the probabilities.
For the Gibbs free energy, add P V from (8.9):
" Ã ! #
∂ ln Z
G = −kB T ln Z − V (8.41)
∂V T

Dividing by the number of moles gives the molar specific Gibbs energy ḡ, equal
to the chemical potential µ̄.
How about showing that this chemical potential is the same one as in the
Maxwell-Boltzmann, Fermi-Dirac, and Bose-Einstein distribution functions for
weakly interacting particles? It is surprisingly difficult to show it; in fact, it
cannot be done for distinguishable particles for which the entropy does not exist.
It further appears that the best way to get the result for bosons and fermions
is to elaborately re-derive the two distributions from scratch, each separately,
using a new approach. Note that they were already derived twice earlier, once
for given system energy, and once for the canonical probability distribution. So
the dual derivations in note {A.73} make three. Please note that whatever this
book tells you thrice is absolutely true.

8.14 Application to Particles in a Box


This section applies the ideas developed in the previous sections to weakly in-
teracting particles in a box. This allows some of the details of the “buckets” in
figures 8.1 through 8.3 to be filled in for a concrete case.
For particles in a macroscopic box, the single-particle energy levels E p are
so closely spaced that they can be taken to be continuously varying. The one
exception is the ground state when Bose-Einstein condensation occurs; that will
be ignored for now. In continuum approximation, the number of single-particle
energy states in a macroscopically small energy range dE p is approximately,
following (7.17),
µ ¶3/2 √
p ns 2m p p
dN = V ns D dE = V E dE (8.42)
4π 2 h̄2
Here ns = 2s + 1 is the number of spin states.
Now according to the derived distributions, the number of particles in a
single energy state at energy E p is
1
ι = (E p −µ)/k T
e B ±1
8.14. APPLICATION TO PARTICLES IN A BOX 383

where the plus sign applies for fermions and the minus sign for bosons. The
term can be ignored completely for distinguishable particles.
To get the total number of particles, just integrate the particles per state ι
over all states dN :
Z ∞ µ ¶ √ p
p ns 2m 3/2 Z ∞ E p
I = p ιV ns D dE = V 2 2 (E p
−µ)/k T
dE
E =0 4π h̄ E =0 e
p B ±1
and to get the total energy, integrate the energy of each single-particle state
times the number of particles in that state over all states:
Z ∞ µ ¶ √
p p ns 2m 3/2 Z ∞ Ep Ep p
E = p E ιns V D dE = V 2 2 (E p
−µ)/k T
dE
E =0 4π h̄ E =0 e
p B ±1
The expression for the number of particles can be nondimensionalized by
rearranging and taking a root to give
µ ¶2/3
h̄2 I Ã √ !2/3
2m V ns Z ∞ u du Ep µ
= u≡ u0 ≡ (8.43)
kB T 4π 2 u=0 eu−u0 ± 1 kB T kB T

Note that the left hand side is a nondimensional ratio ofqa typical quantum
microscopic energy, based on the average particle spacing 3 V /I, to the typical
classical microscopic energy kB T . This ratio is a key nondimensional number
governing weakly interacting particles in a box. To put the typical quantum
energy into context, a single particle in its own volume of size V /I would have
a ground state energy 3π 2 h̄2 /2m(V /I)2/3 .
Some references, [3], define a “thermal de Broglie wavelength” λth by writing
the classical microscopic energy kB T in a quantum-like way:
h̄2 1
kB T ≡ 4π
2m λ2th
In some simple cases, you can think of this as roughly the quantum wavelength
corresponding to the momentum of the particles. It allows various results that
depend on the nondimensional ratio of energies to be reformulated in terms of
a nondimensional ratio of lengths, as in
µ ¶2/3
h̄2 I " #2
2m V 1 λth
=
kB T 4π (V /I)1/3

Since the ratio of energies is fully equivalent, and has an unambiguous meaning,
this book will refrain from making theory harder than needed by defining su-
perfluous quantities. But in practice, thinking in terms of numerical values that
384 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

are lengths is likely to be more intuitive than energies, and then the numerical
value of the thermal wavelength would be the one to keep in mind.
Note that (8.43) provides a direct relationship between the ratio of typical
quantum/classical energies on one side, and u0 , the ratio of atomic chemical
potential µ to typical classical microscopic energy kB T on the other side. While
the two energy ratios are not the same, (8.43) makes them equivalent for systems
of weakly interacting particles in boxes. Know one and you can in principle
compute the other.
The expression for the system energy may be nondimensionalized in a similar
way to get

Z ∞ √ ,Z √
E u u du ∞ u du Ep µ
= u−u u−u
u≡ u0 ≡ (8.44)
IkB T u=0 e 0 ±1 u=0 e 0 ± 1 kB T kB T

The integral in the bottom arises when getting rid of the ratio of energies that
forms using (8.43).
The quantity in the left hand side is the nondimensional ratio of the actual
system energy over the system energy if every particle had the typical classical
energy kB T . It too is a unique function of u0 , and as a consequence, also of the
ratio of typical microscopic quantum and classical energies.

8.14.1 Bose-Einstein condensation


Bose-Einstein condensation is said to have occurred when in a macroscopic
system the number of bosons in the ground state becomes a finite fraction of the
number of particles I. It happens when the temperature is lowered sufficiently
or the particle density is increased sufficiently or both.
According to note {A.68}, the number of particles in the ground state is
given by
N1 − 1
I1 = (E p −µ)/k T . (8.45)
e 1 B −1
In order for this to become a finite fraction of the large number of particles I of a
macroscopic system, the denominator must become extremely small, hence the
exponential must become extremely close to one, hence µ must come extremely
close to the lowest energy level E p1 . To be precise, E1 − µ must be small of order
kB T /I; smaller than the classical microscopic energy by the humongous factor
I. In addition, for a macroscopic system of weakly interacting particles in a
box, E p1 is extremely close to zero, (it is smaller than the microscopic quantum
energy defined above by a factor I 2/3 .) So condensation occurs when µ≈E p1 ≈0,
the approximations being extremely close. If the ground state is unique, N1 = 1,
Bose-Einstein condensation simply occurs when µ = E p1 ≈ 0.
8.14. APPLICATION TO PARTICLES IN A BOX 385

You would therefor expect that you can simply put u0 = µ/kB T to zero in
the integrals (8.43) and (8.44). However, if you do so (8.43) fails to describe the
number of particles in the ground state; it only gives the number of particles
I − I1 not in the ground state:
µ ¶2/3
h̄2 I − I1 Ã √ !2/3
2m V ns Z ∞ u du
= for BEC (8.46)
kB T 4π 2 u=0 eu − 1
To see that the number of particles in the ground state is indeed not included
in the integral, note that while the integrand
√ does become infinite when u ↓ 0,
q proportionally to 1/ u, which integrates as proportional to
it becomes infinite
√ √
u, and u1 = E p1 /kB T is vanishingly small, not finite. Arguments given in
note {A.68} do show that the only significant error occurs for the ground state;
the above integral does correctly approximate the number of particles not in
the ground state when condensation has occurred.
The value of the integral
³ ´ can be found in mathematical handbooks, [15,
p. 201, with typo], as 2 !ζ 23 with ζ the so-called Riemann zeta function, due
1

to, who else, Euler. Euler showed that it is equal to a product of terms ranging
over all prime ³numbers,
´ but you do not want to know that. All you want to
3 1 1√
know is that ζ 2 ≈ 2.612 and that 2 ! = 2 π.
The Bose-Einstein temperature TB is the temperature at which Bose-Ein-
stein condensation starts. That means it is the temperature for which I1 = 0 in
the expression above, giving
µ ¶ µ ¶
h̄2 I − I1 2/3 h̄2 I 2/3
µ ¶
2m V ns ³ 3 ´ 2/3
= 2m V = ζ T ≤ TB (8.47)
kB T kB TB 8π 3/2 2
It implies that for a given system of bosons, at Bose-Einstein condensation there
is a fixed numerical ratio between the microscopic quantum energy based on par-
ticle density and the classical microscopic energy kB TB . That also illustrates the
point made at the beginning of this subsection that both changes in temperature
and changes in particle density can produce Bose-Einstein condensation.
The first equality in the equation above can be cleaned up to give the fraction
of bosons in the ground state as:
µ ¶3/2
I1 T
=1− T ≤ TB (8.48)
I TB

8.14.2 Fermions at low temperatures


Another application of the integrals (8.43) and (8.44) is to find the Fermi en-
ergy E pF and internal energy E of a system of weakly interacting fermions for
vanishing temperature.
386 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

For low temperatures, the nondimensional energy ratio u0 = µ/kB T blows


up, since kB T becomes zero and the chemical potential µ does not; µ becomes
the Fermi energy E pF , chapter 7.9. To deal with the blow up, the integrals can
be rephrased in terms of u/u0 = E p /µ, which does not blow up.
In particular, the ratio (8.43) involving the typical microscopic quantum
3/2
energy can be rewritten by taking a factor u0 out of the integral and root and
to the other side to give:
µ ¶2/3
h̄2 I  q 2/3 
Z ∞ u/u0 d(u/u0 )
2m V n s
= 2 
µ 4π u/u0 =0 eu0 [(u/u0 )−1] + 1

Now since u0 is large, the exponential in the denominator becomes extremely


large for u/u0 > 1, making the integrand negligibly small. Therefor the upper
limit of integration can be limited to u/u0 = 1. In that range, the exponential
is vanishingly small, except for a negligibly small range around u/u0 = 1, so it
can be ignored. That gives
µ ¶2/3
h̄2 I Ã !2/3 µ ¶
2m V ns Z 1 q ns 2/3
= u/u0 d(u/u0 ) =
µ 4π 2 u/u0 =0 6π 2

It follows that the Fermi energy is


à !2/3 µ ¶2/3
p 6π 2 h̄2 I
EF = µ|T =0 =
ns 2m V

Physicist like to define a “Fermi temperature” as the temperature where the


classical microscopic energy kB T becomes equal to the Fermi energy. It is
à !2/3 µ ¶2/3
1 6π 2 h̄2 I
TF = (8.49)
kB ns 2m V

It may be noted that except for the numerical factor, the expression for the
Fermi temperature TF is the same as that for the Bose-Einstein condensation
temperature TB given in the previous subsection.
Electrons have ns = 2. For the valence electrons in typical metals, the Fermi
temperatures are in the order of ten thousands of degrees Kelvin. The metal
will melt before it is reached. The valence electrons are pretty much the same
at room temperature as they are at absolute zero.
The integral (8.44) can be integrated in the same way and then shows that
E = 53 Iµ = 53 IE pF . In short, at absolute zero, the average energy per particle is
3
5
times E pF , the maximum single-particle energy.
8.14. APPLICATION TO PARTICLES IN A BOX 387

It should be admitted that both of the results in this subsection could have
been obtained directly from the analysis in chapter 7.6. However, the analysis in
this subsection can be used to find the corrected expressions when the temper-
ature is fairly small but not zero, {A.74}, or for any temperature by brute-force
numerical integration.

8.14.3 A generalized ideal gas law


While the previous subsections produced a lot of interesting information about
weakly interacting particles near absolute zero, how about some info about
conditions that you can check in a T-shirt? And how about something math-
ematically simple, instead of elaborate integrals that have no anti-derivatives
among the normal functions?
Well, there is at least one. By definition, (8.8), the pressure is the expecta-
tion value of −dE Iq /dV where the E Iq are the system energy eigenvalues. For
weakly interacting particles in a box, chapter 7.6.2 found that the single par-
ticle energies are inversely proportional to the squares of the linear dimensions
of the box, which means proportional to V −2/3 . Then so are the system energy
eigenfunctions, since they are sums of single-particle ones: E Iq = const V −2/3
Differentiating produces dE Iq /dV = − 32 E Iq /V and taking the expectation value

P V = 32 E (8.50)

This expression is valid for weakly interacting bosons and fermions even if
the (anti)symmetrization requirements cannot be ignored.

8.14.4 The ideal gas


The weakly interacting particles in a box can be approximated as an ideal gas if
the number of particles is so small, or the box so large, that the average number
of particles in an energy state is much less than one.
Since the number of particles per energy state is given by

1
ι= p
e(E −µ)/kB T ±1

ideal gas conditions imply that the exponential must be much greater than
one, and then the ±1 can be ignored. That means that the difference between
fermions and bosons, which accounts for the ±1, can be ignored for an ideal
gas. Both can be approximated by the distribution derived for distinguishable
particles.
388 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

The energy integral (8.44) can now easily be done; the eu0 factor divides
away and an integration by parts in the numerator produces E = 32 IkB T . Plug
it into the generalized ideal gas law (8.50) to get the normal “ideal gas law”

kB
P V = IkB T ⇐⇒ P v = RT R≡ (8.51)
m

Also, following (8.34),

3 kB 5 kB
e= 2
T = Cv T h= 2
T = Cp T Cv = 23 R Cp = 25 R
m m
but note that these formulae are specific to the simplistic ideal gases described
by the model, (like noble gases.) For ideal gases with more complex molecules,
like air, the specific heats are not constants, but vary with temperature, as
discussed in chapter 7.10.1.
The ideal gas equation is identical to the one derived in classical physics.
That is important since it establishes that what was defined to be the temper-
ature in this chapter is in fact the ideal gas temperature that classical physics
defines.
The integral (8.43) can be done using integration by parts and a result found
in the notations under “!”. It gives an expression for the single-particle chemical
potential µ: " , #
2 µ ¶2/3
µ 3 −2/3 h̄ I
− = 2 ln kB T 4πns
kB T 2m V
Note that the argument of the logarithm is essentially the ratio between the
classical microscopic energy and the quantum microscopic energy based on av-
erage particle spacing. This ratio has to be big for an accurate ideal gas, to get
the exponential in the particle energy distribution ι to be big.
Next is the specific entropy s. Recall that the chemical potential is just
the Gibbs free energy. By the definition of the Gibbs free energy, the specific
entropy s equals (h − g)/T . Now the specific Gibbs energy is just the Gibbs
energy per unit mass, in other words, µ/m while h/T = Cp as above. So
" , µ ¶2/3 #
h̄2 I
s = Cv ln kB T 4πn−2/3
s + Cp (8.52)
2m V

In terms of classical thermodynamics, V /I is m times the specific volume v.


So classical thermodynamics takes the logarithm above apart as

s = Cv ln(T ) + R ln(v) + some combined constant

and then promptly forgets about the constant, damn units.


8.14. APPLICATION TO PARTICLES IN A BOX 389

8.14.5 Blackbody radiation


Blackbody radiation is the basic model for absorption and emission of electro-
magnetic radiation. Electromagnetic radiation includes light and a wide range
of other radiation, like radio waves, microwaves, and X-rays. All surfaces absorb
and emit radiation; otherwise we would not see anything. But “black” surfaces
are the most easy to understand theoretically.
No, a black body need not look black. If its temperature is high enough, it
could look like the sun. What defines an ideal black body is that it absorbs,
(internalizes instead of reflects,) all radiation that hits it. But it may be emitting
its own radiation at the same time. And that makes a difference. If the black
body is cool, you will need your infrared camera to see it; it would look really
black to the eye. It is not reflecting any radiation, and it is not emitting any
visible amount either. But if it is at the temperature of the sun, better take out
your sunglasses. It is still absorbing all radiation that hits it, but it is emitting
large amounts of its own too, and lots of it in the visible range.
So where do you get a nearly perfectly black surface? Matte black paint? A
piece of blackboard? Soot? Actually, pretty much all materials will reflect in
some range of wave lengths. You get the blackest surface by using no material
at all. Take a big box and paint its interior the blackest you can. Close the box,
then drill a very tiny hole in its side. From the outside, the area of the hole will
be truly, absolutely black. Whatever radiation enters there is gone. Still, when
you heat the box to very high temperatures, the hole will shine bright.
While any radiation entering the hole will most surely be absorbed some-
where inside, the inside of the box itself is filled with electromagnetic radiation,
like a gas of photons, produced by the hot inside surface of the box. And some
of those photons will manage to escape through the hole, making it shine.
The amount of photons in the box may be computed using the Bose-Einstein
distribution with a few caveats. The first is that there is no limit on the number
of photons; photons will be created or absorbed by the box surface to achieve
thermal equilibrium at whatever level is most probable at the given temperature.
This means the chemical potential µ of the photons is zero, as you can check
from the derivations in notes {A.68} and {A.69}.
The second caveat is that the density of states (7.17) was computed under the
assumption that E p = h̄2 k 2 /2m with k the wave number of the eigenfunction.
That is a non relativistic expression that simply does not apply to photons,
particles with zero rest mass. The energy of a photon is h̄ω where the angular
frequency ω equals kc with c the speed of light. To get the density of states in
terms of the wave number k, rather than classical energy, just restore for E p in
(8.42) its original value h̄2 k 2 /2m:
ns 2
dN = V ns Dk dk = V k dk
2π 2
390 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

or in terms of frequency
ns 2
dN = V ns Dω dω = V ω dω (8.53)
2π 2 c3
Some references refer to Dω as the “density of modes,” but it simply is not. It
is the density of states on a frequency basis, just like Dk is the density of states
on a mode basis.
The third caveat is that there are only two independent spin states for a
photon, ns = 2, not 3. As a spin-one particle you would expect that photons
would have the spin values 0 and ±1, but the zero value does not occur in the
direction of propagation, chapter 10.2.3, and therefor the number of independent
states that exist is two, not three. A different way to understand that ns = 2 is
classical: the electric field can only oscillate in the two independent directions
normal to the direction of propagation, (9.28); oscillation in the direction of
propagation itself is not allowed by Maxwell’s laws because it would make the
divergence of the electric field nonzero.
The energy per unit volume and unit frequency range is called Planck’s
“blackbody spectrum.” It is found by multiplying the number of states dN by
the Bose-Einstein distribution ι of particles per state and that by the energy
h̄ω per particle, and then dividing by the volume V of the box and dω:

d(E/V ) h̄ ω3
ρ(ω) ≡ = 2 3 h̄ω/k T (8.54)
dω π c e B −1
The expression for the total internal energy per unit volume is called the
“Stefan-Boltzmann formula.” It is found by integration of (8.54) over all fre-
quencies; that can be done using the substitution u = h̄ω/kB T and tables like
[15, 18.80, p. 132]:
E π2
= (kB T )4 (8.55)
V 15h̄3 c3
The number of particles may be found similar to the energy, by dropping
the h̄ω energy per particle from the integral, and is, [15, 36.24, with typo]:
I 2ζ(3)
= 2 3 3 (kB T )3 ζ(3) ≈ 1.202 (8.56)
V π h̄ c
Taking the ratio with (8.55), the average energy per photon may be found:

E π4
= kB T ≈ 2.7kB T (8.57)
I 30ζ(3)
The temperature has to be roughly 9,000 K for the average photon to become
visible light. That is one reason a black body will look black at a room temper-
ature of about 300 K. The solar surface has a temperature of about 6,000 K, so
8.14. APPLICATION TO PARTICLES IN A BOX 391

the visible light photons it emits are more energetic than the average, but there
are still plenty of them. R
The entropy S of the photon gas follows from integrating dE/T using
(8.55), starting from absolute zero and keeping the volume constant:

S 4π 2
= kB (kB T )3 (8.58)
V 45h̄3 c3

Dividing by (8.56) shows the average entropy per photon to be

S 2π 4
= kB (8.59)
I 45ζ(3)
independent of temperature.
The generalized ideal gas law (8.50) does not apply to the pressure exerted by
the photon gas, because the energy of the photons is h̄ck and that is proportional
to the wave number instead of its square. The corrected expression is:

P V = 31 E (8.60)

Now the question you have been waiting for: how much radiation comes
out of the perfectly black hole? Well, assume its area is A. If all the photons
next to the hole moved towards it with speed c, the volume that would go out
of the hole in a time interval dt would be dV = Acdt and they would carry
an energy (E/V )dV with the energy per unit volume (E/V ) given by (8.55).
But the photons move in all directions, so only half of them will move towards
the hole, and those with an average speed of 12 c. (The correct average can be
found by integration in spherical coordinates assuming all directions of motion
are equally likely.) So the bottom line is that the energy leaving per unit time
and unit area is

dEemitted π2 4 4 π 2 kB
4
−8
= 3 2 (kB T ) = σB T σB = 3 2 ≈ 5.67 10 W/m2 K4
Adt 60h̄ c 60h̄ c
(8.61)
This is called the “Stefan-Boltzmann law and σB the Stefan-Boltzmann con-
stant. At 9,000 K, the radiation energy is not only primarily visible light, it
also has 304 or about a million times more energy than at room temperature.
The same argument as for the Stefan-Boltzmann law shows that the energy
going out per unit area, unit time, and unit frequency range is, from (8.54),

dEemitted h̄ ω3
I(ω) = = 2 2 h̄ω/k T
Adtdω 4π c e B −1
which was already noted in chapter 7.9.
392 CHAPTER 8. BASIC AND QUANTUM THERMODYNAMICS

Finally, it may be noted that a black, or any, surface will radiate as much
as it absorbs at that temperature. For, just suspend a test item with the given
surface inside the box of thermal radiation and let the entire system come into
equilibrium. At thermal equilibrium, whatever radiation the surface does not
reflect, it must emit or the item would still be heating up. In particular, a
perfectly black surface must radiate at the full blackbody rate, since it absorbs
the full blackbody rate. A non-black surface that absorbs a fraction a of the
incident blackbody radiation will absorb aσB T 4 and emit the same amount. The
constant a is called the “absorptivity.” But since the surface also emits aσB T 4 ,
a is also the “emissivity.” Note that the surface will still emit this amount of
radiation even if it is not absorbing it. For by definition, whatever incident
radiation the surface does not internalize as thermal energy is called reflected
energy. What is called absorbed is what disappears into the surface, and the
generation of fresh emitted radiation is a separate physical process.

8.14.6 The Debye model


To explain the heat capacity of simple solids, Debye modeled the energy in the
crystal vibrations very much the same way as the photon gas of the previous
subsection. This subsection briefly outlines the main ideas.
For electromagnetic waves propagating with the speed of light c, substitute
acoustical waves propagating with the speed of sound cs . For photons with
energy h̄ω, substitute phonons with energy h̄ω. Since unlike electromagnetic
waves, sound waves can vibrate in the direction of wave propagation, for the
number of spin states substitute ns = 3 instead of 2; in other words, just
multiply the various expressions for photons by 1.5.
The critical difference for solids is that the number of modes, hence the
frequencies, is not infinitely large. Since each individual atom has three degrees
of freedom (it can move in three individual directions), there are 3I degrees
of freedom, and reformulating the motion in terms of acoustic waves does not
change the number of degrees of freedom. The shortest wave lengths will be
comparable to the atom spacing, and no waves of shorter wave length will exist.
As a result, there will be a highest frequency ωmax . The “Debye temperature” TD
is defined as the temperature at which the typical classical microscopic energy
kB T becomes equal to the maximum quantum microscopic energy h̄ωmax

kB TD = h̄ωmax (8.62)

The expression for the internal energy becomes, from (8.54) times 1.5:

E Z ωmax 3h̄ ω3
= dω (8.63)
V 0 2π 2 c3s eh̄ω/kB T − 1
8.14. APPLICATION TO PARTICLES IN A BOX 393

If the temperatures are very low the exponential will make the integrand zero
except for very small frequencies. Then the upper limit is essentially infinite
compared to the range of integration. That makes the energy proportional to
T 4 just like for the photon gas and the heat capacity is therefor proportional to
T 3 . At the other extreme, when the temperature is large, the exponential in the
bottom can be expanded in a Taylor series and the energy becomes proportional
to T , making the heat capacity constant.
The maximum frequency, hence the Debye temperature, can be found from
the requirement that the number of modes is 3I, to be applied by integrating
(8.53), or an empirical value can be used to improve the approximation for
whatever temperature range is of interest. Literature values are often chosen to
approximate the low temperature range accurately, since the model works best
for low temperatures. If integration of (8.53) is used at high temperatures, the
law of Dulong and Petit results, as described in chapter 7.10.1.
More sophisticated versions of the analysis exist to account for some of the
very nontrivial differences between crystal vibrations and electromagnetic waves.
They will need to be left to literature.
Chapter 9

Electromagnetism

The main objective of this chapter is to discuss electromagnetic effects. However,


these effects are closely tied to more advanced concepts in angular momentum
and relativity, so these will be discussed first.

9.1 All About Angular Momentum


The quantum mechanics of angular momentum is fascinating. It is also very
basic to much of quantum mechanics, so you may want to browse through this
section to get an idea of what is there.
In chapter 4.4, it was already mentioned that angular momentum comes in
two basic kinds: orbital angular momentum, which is a result of the motion of
particles, and the “built-in” angular momentum called spin.
The eigenfunctions of orbital angular momentum are the so called “spherical
harmonics” of chapter 3.1, and they show that the orbital angular momentum
in any arbitrarily chosen direction, taken as the z-direction from now on, comes
in whole multiples m of Planck’s constant h̄:

Lz = mh̄ with m an integer for orbital angular momentum

Integers are whole numbers, such as 0, ±1, ±2, ±3 . . .. The square orbital angu-
lar momentum L2 = L2x + L2y + L2z comes in values

L2 = l(l + 1)h̄2 with l ≥ 0, and for orbital angular momentum l is an integer.

The numbers l and m are called the azimuthal and magnetic quantum numbers.
When spin angular momentum is included, it is conventional to still write
Lz as mh̄ and L2 as l(l + 1)h̄2 , there is nothing wrong with that, but then m
and l are no longer necessarily integers. The spin of common particles, such as
electrons, neutrons, and protons, instead has m = ±1/2 and l = 1/2. But while m

395
396 CHAPTER 9. ELECTROMAGNETISM

and l can be half integers, this section will find that they can never be anything
more arbitrary than that, regardless of what sort of angular momentum it is. A
particle with, say, spin 1/3h̄ cannot not exist according to the theory.
In order to have a consistent notation, from now on every angular momentum
eigenstate with quantum numbers l and m will be indicated as |l mi whether it
is a spherical harmonic Ylm , a particle spin state, or a combination of angular
momenta from more than one source.

9.1.1 The fundamental commutation relations


Analyzing non-orbital angular momentum is a challenge. How can you say any-
thing sensible about angular momentum, the dynamic motion of masses around
a given point, without a mass moving around a point? For, while a particle like
an electron has spin angular momentum, trying to explain it as angular motion
of the electron about some internal axis leads to gross contradictions such as
the electron exceeding the speed of light [10, p. 172]. Spin is definitely part
of the law of conservation of angular momentum, but it does not seem to be
associated with any familiar idea of some mass moving around some axis as far
as is known.
There goes the Newtonian analogy, then. Something else than classical
physics is needed to analyze spin.
Now, the complex discoveries of mathematics are routinely deduced from
apparently self-evident simple axioms, such as that a straight line will cross
each of a pair of parallel lines under the same angle. Actually, such axioms are
not as obvious as they seem, and mathematicians have deduced very different
answers from changing the axioms into different ones. Such answers may be just
as good or better than others depending on circumstances, and you can invent
imaginary universes in which they are the norm.
Physics has no such latitude to invent its own universes; its mission is to
describe ours as well as it can. But the idea of mathematics is still a good one:
try to guess the simplest possible basic “law” that nature really seems to obey,
and then reconstruct as much of the complexity of nature from it as you can.
The more you can deduce from the law, the more ways you have to check it
against a variety of facts, and the more confident you can become in it.
Physicist have found that the needed equations for angular momentum are
given by the following “fundamental commutation relations:”
b ,L
[L b ] = ih̄L
b b ,L
[L b ] = ih̄L
b b ,L
[L b ] = ih̄L
b (9.1)
x y z y z x z x y

They can be derived for orbital angular momentum (see chapter 3.4.4), but
must be postulated to also apply to spin angular momentum {A.75}.
At first glance, these commutation relations do not look like a promising
starting point for much analysis. All they say on their face is that the angular
9.1. ALL ABOUT ANGULAR MOMENTUM 397

momentum operators L b ,L b , and L


b do not commute, so that they cannot have
x y z
a full set of eigenstates in common. That is hardly impressive.
But if you read the following sections, you will be astonished by what knowl-
edge can be teased out of them. For starters, one thing that immediately follows
is that the only eigenstates that Lb ,L b , and L
b have in common are states |0 0i
x y z
of no angular momentum at all {A.76}. No other common eigenstates exist.
One assumption will be implicit in the use of the fundamental commutation
relations, namely that they can be taken at face value. It is certainly possible
to imagine that say L b would turn an eigenfunction of say L b into some singular
x z
object for which angular momentum would be ill-defined. That would of course
make application of the fundamental commutation relations improper. It will
be assumed that the operators are free of such pathological nastiness.

9.1.2 Ladders
This section starts the quest to figure out everything that the fundamental
commutation relations mean for angular momentum. It will first be verified
that any angular momentum can always be described using |l mi eigenstates
with definite values of square angular momentum L2 and z-angular momentum
Lz . Then it will be found that these angular momentum states occur in groups
called “ladders”.
To start with the first one, the mathematical condition for a complete set of
eigenstates |l mi to exist is that the angular momentum operators L b 2 and Lb
z
commute. They do; using the commutator manipulations of chapter 3.4.4), it
is easily found that:

b 2, L
[L b ] = [L
b 2, L
b ] = [L
b 2, L
b ] = 0 where L
b2 = L
b2 + L
b2 + L
b2
x y z x y z

b and L
So mathematics says that eigenstates |l mi of L b 2 exist satisfying
z

b |l mi = L |l mi
L where by definition Lz = mh̄ (9.2)
z z

b 2 |l mi = L2 |l mi
L where by definition L2 = l(l + 1)h̄2 and l ≥ 0(9.3)

and that are complete in the sense that any state can be described in terms of
these |l mi.
Unfortunately the eigenstates |l mi, except for |0 0i states, do not satisfy
relations like (9.2) for Lb or L b . The problem is that L b and Lb do not commute
x y x y
b b b b 2
with Lz . But Lx and Ly do commute with L , and you might wonder if that is
still worth something. To find out, multiply, say, the zero commutator [L b 2, L
b ]
x
by |l mi:
b 2, L
[L b ]|l mi = (L b −L
b 2L b L b 2 )|l mi = 0
x x x
398 CHAPTER 9. ELECTROMAGNETISM

Now take the second term to the right hand side of the equation, noting that
b 2 |l mi = L2 |l mi with L2 just a number that can be moved up-front, to get:
L
³ ´ ³ ´
b2 L
L b |l mi = L2 L
b |l mi
x x

Looking a bit closer at this equation, it shows that the combination L b |l mi


x
b 2
satisfies the same eigenvalue problem for L as |l mi itself. In other words, the
multiplication by Lb does not affect the square angular momentum L2 at all.
x
b |l mi would be zero, because zero is
To be picky, that is not quite true if L x
not an eigenstate of anything. However, such a thing only happens if there is no
angular momentum; (it would make |l mi an eigenstate of L b with eigenvalue
x
b
zero in addition to an eigenstate of Lz {A.76}). Except for that trivial case,
b does not affect square angular momentum. And neither does L
L b or any
x y
combination of the two.
Angular momentum in the z-direction is affected by L b and by L b , since
x y
they do not commute with L b like they do with L b 2 . Nor is it possible to find
z
any linear combination of L b and L b that does commute with L b . What is the
x y z
next best thing? Well, it is possible to find two combinations, to wit
b+ ≡ L
L b + iL
b b− ≡ L
and L b − iL
b , (9.4)
x y x y

that satisfy the “commutator eigenvalue problems”:


b ,L
[L b + ] = h̄L
b+ b ,L
and [L b − ] = −h̄L
b −.
z z

These two turn out to be quite remarkable operators.


b and L
Like L b , their combinations Lb + and L
b − leave L2 alone. To exam-
x y
ine what the operator Lb does with the linear momentum in the z-direction,
+

multiply its commutator relation above by an eigenstate |l mi:


b L
(L b+ − L
b +L
b )|l mi = h̄L
b + |l mi
z z

Or, taking the second term to the right hand side of the equation and noting
b |l mi = mh̄|l mi,
that by definition L z
³ ´ ³ ´
b L
L b + |l mi
b + |l mi = (m + 1)h̄ L
z

b + |lmi is an eigenstate with z angular


That is a stunning result, as it shows that L
momentum Lz = (m + 1)h̄ instead of mh̄. In other words, L b + adds exactly one
unit h̄ to the z-angular momentum, turning an |l mi state into a |l m+1i one!
If you apply Lb + another time, you get a state of still higher z-angular mo-
mentum |l m + 2i, and so on, like the rungs on a ladder. This is graphically
illustrated for some examples in figures 9.1 and 9.2. The process eventually
9.1. ALL ABOUT ANGULAR MOMENTUM 399

6
3h̄ |3 3i |3 3i
Angular momentum in the z-direction b+ 6
L b−
L b+
L
6 b−
L
? ?
2h̄ |2 2i |3 2i |3 2i |2 2i
b+ 6
L b−
L b+ 6
L b−
L b+ 6
L b−
L b+
L
6 b−
L
? ? ? ?
h̄ |2 1i |3 1i |3 1i |1 1i |2 1i
b+ 6 b− b+ 6 b− b+ 6 b− b+ 6 b− b+ 6 b−
L L
?
L L
?
L L
?
L L
?
L L
?
0 |2 0i |3 0i |3 0i |0 0i |1 0i |2 0i
b+ 6 b− b+ 6 b− b+ 6 b− b+ 6 b− b+ 6 b−
L L ?
L L?
L ?
L a zero step L L ?
L L
?
−h̄ |2 1i |3 1i |3 1i ladder of |1 1i |2 1i
b+ 6 b− b+ 6 b− b+ 6 b− a π-meson b+ 6 b−
L L
?
L L
?
L L
? a ladder of L L
?
−2h̄ |2 2i |3 2i |3 2i a photon |2 2i
b+ 6 b− b+ 6 b−
a ladder of L L
?
L L
? a ladder of
−3h̄ a set of |3 3i |3 3i a graviton?
Y2m spherical
harmonics a ladder of a ladder of
−4h̄ states a set of different
Y3m spherical Y3m spherical
harmonics harmonics
−5h̄ states states

Figure 9.1: Example bosonic ladders.

6 |3/2 3/2i |3/2 3/2i


Angular momentum in the z-direction

b+ 6 b− b+ 6 b−
h̄ L L L L
? ?
|1/2 1/2i |3/2 1/2i |3/2 1/2i
0 b+ 6 b− b+ 6 b− b+ 6 b−
L L
?
L L
?
L L
?
|1/2 1/2i |3/2 1/2i |3/2 1/2i
−h̄ b+ 6 b− b+ 6 b−
a ladder of L L
?
L L
?
an electron, |3/2 3/2i |3/2 3/2i
proton, or
−2h̄
neutron a ladder of net z-angular
a ∆ baryon momentum of an
electron in an l = 1
−3h̄
orbit with lnet = 3/2

Figure 9.2: Example fermionic ladders.


400 CHAPTER 9. ELECTROMAGNETISM

b + |l m
comes to an halt at some top rung m = mmax where L max i = 0. It has
to, because the angular momentum in the z-direction cannot just keep growing
forever: the square angular momentum in the z-direction only must stay less
than the total square angular momentum in all three directions {A.77}.
The second “ladder operator” L b − works in much the same way, but it goes
down the ladder; its deducts one unit h̄ from the angular momentum in the
b − provides the second stile to the ladders, and
z-direction at each application. L
must terminate at some bottom rung mmin .

9.1.3 Possible values of angular momentum


The fact that the angular momentum ladders of the previous section must have
a top and a bottom rung restricts the possible values that angular momentum
can take. This section will show that the azimuthal quantum number l can
either be a nonnegative whole number or half of one, but nothing else. And it
will show that the magnetic quantum number m must range from −l to +l in
unit increments. In other words, the bosonic and fermionic example ladders in
figures 9.1 and 9.2 are representative of all that is possible.
b+
To start, in order for a ladder to end at a top rung¯ mmax , L ¯ |l mi has to
be zero for m = mmax . More specifically, its magnitude ¯¯L b + |l mi¯¯ must be zero.
The square magnitude is given by the inner product with itself:
¯ ¯2 ¿ ¯ À
¯ b+ ¯ ¯ +
¯L |l mi¯ = Lb + |l mi¯Lb |l mi = 0.
¯

Now because of the complex conjugate that is used in the left hand side of an
b+ = L
inner product, (see chapter 1.3), L b + iL
b goes to the other side of the
x y
b − b b
product as L = Lx − iLy , and you must have
¯ ¯2 ¿ ¯ À
¯ b+ ¯ ¯ − +
¯L |l mi¯ = |l mi¯¯Lb L
b |l mi

That operator product can be multiplied out:


b −L
b + ≡ (L
b − iL
b )(L
b + iL
b )=L
b2 + L
b 2 + i(L
b Lb b b
L x y x y x y x y − Ly Lx ),

b2 + L
but L b 2 is the square angular momentum L b 2 except for L
b 2 , and the term
x y z
within the parentheses is the commutator [Lb ,L b ] which is according to the
x y
fundamental commutation relations equal to ih̄Lb , so
z

b −L
L b+ = L
b2 − L b
b 2 − h̄L (9.5)
z z

The effect of each of the operators in the left hand side on a state |l mi is known
and the inner product can be figured out:
¯ ¯2
¯ b+ ¯ 2 2 2
¯L |l mi¯ = l(l + 1)h̄ − m2 h̄ − mh̄ (9.6)
9.1. ALL ABOUT ANGULAR MOMENTUM 401

The question where angular momentum ladders end can now be answered:
l(l + 1)h̄2 − m2max h̄2 − mmax h̄2 = 0
There are two possible solutions to this quadratic equation for mmax , to wit
mmax = l or −mmax = l + 1. The second solution is impossible since it already
would have the square z-angular momentum exceed the total square angular
momentum. So unavoidably,
mmax = l.
That is one of the things this section was supposed to show.
The lowest rung on the ladder goes the same way; you get
b +L
L b− = L
b2 − L
b 2 + h̄L
b (9.7)
z z

and then ¯ ¯2
¯ b− ¯ 2 2 2
¯L |l mi¯ = l(l + 1)h̄ − m2 h̄ + mh̄ (9.8)
and the only acceptable solution for the lowest rung on the ladders is
mmin = −l.
It is nice and symmetric; ladders run from m = −l up to m = l, as the examples
in figures 9.1 and 9.2 already showed.
And in fact, it is more than that; it also limits what the quantum numbers l
and m can be. For, since each step on a ladder increases the magnetic quantum
number m by one unit, you have for the total number of steps up from bottom
to top:
total number of steps = mmax − mmin = 2l
But the number of steps is a whole number, and so the azimuthal quantum l
must either be a nonnegative integer, such as 0, 1, 2, . . ., or half of one, such as
1/ 3/
2, 2, . . .. Integer l values occur, for example, for the spherical harmonics of
orbital angular momentum and for the spin of bosons like photons. Half-integer
values occur, for example, for the spin of fermions such as electrons, protons,
neutrons, and ∆ particles.
Note that if l is a half-integer, then so are the corresponding values of m,
since m starts from −l and increases in unit steps. See again figures 9.1 and 9.2
for some examples. Also note that ladders terminate just before z-momentum
would exceed total momentum.
It may also be noted that ladders are distinct. It is not possible to go up
one ladder, like the first Y3m one in figure 9.1 with L b + and then come down the
second one using L b − . The reason is that the states |l mi are eigenstates of the
b − b+
operators L L , (9.5), and L b +L
b − , (9.7), so going up with Lb + and then down
again with L b − , or vice-versa, returns to the same state. For similar reasons, if
the tops of two ladders are orthonormal, then so is the rest of their rungs.
402 CHAPTER 9. ELECTROMAGNETISM

9.1.4 A warning about angular momentum


Normally, eigenstates are indeterminate by a complex number of magnitude
one. If you so desire, you can multiply any normalized eigenstate by a number
of unit magnitude of your own choosing, and it is still a normalized eigenstate.
It is important to remember that in analytical expressions involving angular
momentum, you are not allowed to do this.
As an example, consider a pair of spin 1/2 particles, call them a and b,
in the “singlet state”, in which their spins cancel and there is no net angular
momentum. It was noted in chapter 4.5.6 that this state takes the form

|1/2 1/2ia |1/2 1/2ib − |1/2 1/2ia |1/2 1/2ib


|0 0iab = √
2

(This section will use kets rather than arrows for spin states.) But if you were
allowed to arbitrarily change the definition of say the spin state |1/2 1/2ia by a
minus sign, then the minus sign in the singlet state above would turn in a plus
sign. The given expression for the singlet state, with its minus sign, is only
correct if you use the right normalization factors for the individual states.
It all has to do with the ladder operators Lb + and Lb − . They are very con-
venient for analysis, but to make that easiest, you would like to know exactly
what they do to the angular momentum states |l mi. What you have seen so far
is that Lb + |l mi produces a state with the same square angular momentum, and
with angular momentum in the z-direction equal to (m + 1)h̄. In other words,
b + |l mi is some multiple of a suitably normalized eigenstate |l m+1i;
L

b + |l mi = C|l m+1i
L

where the number C is the multiple. What is that multiple? Well, from the
b + |l mi, derived earlier in (9.6) you know that its square magni-
magnitude of L
tude is
|C|2 = l(l + 1)h̄2 − m2 h̄2 − mh̄2 .
But that still leaves C indeterminate by a factor of unit magnitude. Which
would be very inconvenient in the analysis of angular momentum.
To resolve this conundrum, restrictions are put on the normalization fac-
tors of the angular momentum states |l mi in ladders. It is required that the
normalization factors are chosen such that the ladder operator constants are
positive real numbers. That really leaves only one normalization factor in an
entire ladder freely selectable, say the one of the top rung.
Most of the time, this is not a big deal. Only when you start trying to
get too clever with angular momentum normalization factors, then you want to
remember that you cannot really choose them to your own liking.
9.1. ALL ABOUT ANGULAR MOMENTUM 403

The good news is that in this convention, you know precisely what the ladder
operators do {A.78}:
q
b + |l mi = h̄ l(l + 1) − m(1 + m) |l m+1i
L (9.9)
q
b − |l mi = h̄ l(l + 1) + m(1 − m) |l m−1i
L (9.10)

9.1.5 Triplet and singlet states


With the ladder operators, you can determine how different angular momenta
add up to net angular momentum. As an example, this section will examine
what net spin values can be produced by two particles, each with spin 1/2. They
may be the proton and electron in a hydrogen atom, or the two electrons in the
hydrogen molecule, or whatever. The actual result will be to rederive the triplet
and singlet states described in chapter 4.5.6, but it will also be an example for
how more complex angular momentum states can be combined.
The particles involved will be denoted as a and b. Since each particle can
have two different spin states |1/2 1/2i and |1/2 1/2i, there are four different com-
bined “product” states:

|1/2 1/2ia |1/2 1/2ib , |1/2 1/2ia |1/2 1/2ib , |1/2 1/2ia |1/2 1/2ib , and |1/2 1/2ia |1/2 1/2ib .

In these product states, each particle is in a single individual spin state. The
question is, what is the combined angular momentum of these four product
states? And what combination states have definite net values for square and z
angular momentum?
The angular momentum in the z-direction is simple; it is just the sum of those
of the individual particles. For example, the z-momentum of the |1/2 1/2ia |1/2 1/2ib
state follows from
³ ´
b +L b
z b | /2 /2ia | /2 /2ib =
L 1 1 1 1 1/ h̄|1/ 1/ i |1/ 1/ i + |1/2 1/2ia 1/2h̄|1/2 1/2ib
za 2 2 2 a 2 2 b

= h|1/2 1/2ia |1/2 1/2ib

which makes the net angular momentum in the z direction h̄, or 1/2h̄ from each
particle. Note that the z angular momentum operators of the two particles
simply add up and that L b only acts on particle a, and Lb only on particle b
za zb
{A.79}. In terms of quantum numbers, the magnetic quantum number mab is
the sum of the individual quantum numbers ma and mb ; mab = ma + mb = 1.
The net total angular momentum is not so obvious; you cannot just add total
angular momenta. To figure out the total angular momentum of |1/2 1/2ia |1/2 1/2ib
anyway, there is a trick: multiply it with the combined step-up operator
b+ = L
L b+ + L
b+
ab a b
404 CHAPTER 9. ELECTROMAGNETISM

Each part returns zero: Lb + because particle a is at the top of its ladder and L b+
a b
because particle b is. So the combined state |1/2 1/2ia |1/2 1/2ib must be at the top
of the ladder too; there is no higher rung. That must mean lab = mab = 1; the
combined state must be a |1 1i state. It can be defined it as the combination
|1 1i state:
|1 1iab ≡ |1/2 1/2ia |1/2 1/2ib (9.11)
You could just as well have defined |1 1iab as −|1/2 1/2ia |1/2 1/2ib or i|1/2 1/2ia |1/2 1/2ib ,
say. But why drag along a minus sign or i if you do not have to? The first
triplet state has been found.
Here is another trick: multiply |1 1iab = |1/2 1/2ia |1/2 1/2ib by L b − : that will
ab
go one step down the combined states ladder and produce a combination state
|1 0iab :
q
L b − |1/ 1/ i |1/ 1/ i + L
b − |1 1i = h̄ 1(1 + 1) + 1(1 − 1)|1 0i = L b − |1/ 1/ i |1/ 1/ i
ab ab ab a 2 2 a 2 2 b b 2 2 a 2 2 b

or √
h̄ 2|1 0iab = h̄|1/2 1/2ia |1/2 1/2ib + h̄|1/2 1/2ia |1/2 1/2ib
where the effects of the ladder-down operators were taken from (9.10). (Note
that this requires that the individual particle spin states are normalized consis-
tent with the ladder operators.) The second triplet state is therefor:
q q
|1 0iab ≡ 1/ |1/ 1/ i |1/
2 2 2 a 2
1/ i
2 b + 1/ |1/
2 2
1/ i |1/ 1/ i
2 a 2 2 b (9.12)
But this gives only one |l mi combination state for the two product states
|1/2 1/2ia |1/2 1/2ib and |1/2 1/2ia |1/2 1/2ib with zero net z-momentum. If you want to
describe unequal combinations of them, like |1/2 1/2ia |1/2 1/2ib by itself, it cannot
be just a multiple of |1 0iab . This suggests that there may be another |l 0iab
combination state involved here. How do you get this second state?
Well, you can reuse the first trick. If you construct a combination of the
two product states that steps up to zero, it must be a state with zero z-angular
momentum that is at the end of its ladder, a |0 0iab state. Consider an arbi-
trary combination of the two product states with as yet unknown numerical
coefficients C1 and C2 :
C1 |1/2 1/2ia |1/2 1/2ib + C2 |1/2 1/2ia |1/2 1/2ib
For this combination to step up to zero,
³ ´³ ´
b+ + L
L b+ C1 |1/2 1/2ia |1/2 1/2ib + C2 |1/2 1/2ia |1/2 1/2ib
a b

= h̄C1 |1/2 1/2ia |1/2 1/2ib + h̄C2 |1/2 1/2ia |1/2 1/2ib
must be zero, which requires C2 = −C1 , leaving C1 undetermined. C1 must
be chosen such that the state is normalized, but that still leaves a constant of
9.1. ALL ABOUT ANGULAR MOMENTUM 405

magnitude one undetermined. To fix it, C1 is taken to be real and positive, and
so the singlet state becomes
q q
|0 0iab = 1/ |1/ 1/ i |1/
2 2 2 a 2
1/ i
2 b − 1/ |1/
2 2
1/ i |1/ 1/ i .
2 a 2 2 b (9.13)

To find the remaining triplet state, just apply Lb − once more, to |1 0i above.
ab ab
It gives:
|1 1iab = |1/2 1/2ia |1/2 1/2ib (9.14)
Of course, the normalization factor of this bottom state had to turn out to be
one; all three step-down operators produce only positive real factors.
6
Angular momentum in the z-direction

h̄ 1↑↑ = |1 1i
¾ -
↑↑ 6
|1/2 1/2i |1/2 1/2i
I
@ µ
¡ q q ? q q
6 @
↑↓ ¡
@¡ ↓↑ 6
0 ¡@
1/ ↑↓
2 + 1/ ↑↓
2 = |1 0i 1/ ↑↓
2 − 1/ ↑↓
2 = |0 0i
?¡¡ @
ª @R ?
|1/ 1/ i ¾ - |1/2 1/2i 6
2 2
↓↓ ? singlet
−h̄ 1↓↓ = |1 1i state
spin of spin of
ladder:
particle a: particle b:
triplet lab = 0
la = 1/2 lb = 1/2
−2h̄ states
ladder:
lab = 1
−3h̄

Figure 9.3: Triplet and singlet states in terms of ladders

Figure 9.3 shows the results graphically in terms of ladders. The two possible
spin states of each of the two electrons produce 4 combined product states
indicated using up and down arrows. These product states are then combined
to produce triplet and singlet states that have definite values for both z- and
total net angular momentum, and can be shown as rungs on ladders.
Note that a product state like |1/2 1/2ia |1/2 1/2ib cannot be shown as a rung on
a ladder. In fact, from adding (9.12) and (9.13) it is seen that
q q
|1/2 1/2ia |1/2 1/ i
2 b = 1/ |1
2 0iab + 1/ |0
2 0iab
which makes it a combination of the middle rungs of the triplet and singlet
ladders, rather than a single rung.

9.1.6 Clebsch-Gordan coefficients


In classical physics, combining angular momentum from different sources is
easy; the net components in the x, y, and z directions are simply the sum of the
406 CHAPTER 9. ELECTROMAGNETISM

individual components. In quantum mechanics, things are trickier, because if


the component in the z-direction exists, those in the x and y directions do not.
But the previous subsection showed how to the spin angular momenta of two spin
1/ particles could be combined. In similar ways, the angular momentum states
2
of any two ladders, whatever their origin, can be combined into net angular
momentum ladders. And then those ladders can in turn be combined with still
other ladders, allowing net angular momentum states to be found for systems
of arbitrary complexity.
The key is to be able to combine the angular momentum ladders from two
different sources into net angular momentum ladders. To do so, the net angular
momentum can in principle be described in terms of product states in which
each source is on a single rung of its ladder. But as the example of the last
section illustrated, such product states give incomplete information about the
net angular momentum; they do not tell you what square net angular momen-
tum is. You need to know what combinations of product states produce rungs
on the ladders of the net angular momentum, like the ones illustrated in figure
9.3. In particular, you need to know the coefficients that multiply the product
states in those combinations.
|1 1iab
|0 0iab

|1 0iab

1 |1/2 1/2ia |1/2 1/2ib


q q
1/
2
1/
2 |1/2 1/2ia |1/2 1/2ib
|1 1iab

q q
1/
2
1/
2 |1/2 1/2ia |1/2 1/2ib

1 |1/2 1/2ia |1/2 1/2ib

Figure 9.4: Clebsch-Gordan coefficients of two spin one half particles.

These coefficients are called “Clebsch-Gordan” coefficients. Figure 9.4 shows


the ones from figure 9.3 tabulated. Note that there are really three tables of
numbers; one for each rung level. The top, single number, “table” says that the
|1 1i net momentum state is found in terms of product states as:

|1 1iab = 1 × |1/2 1/2ia |1/2 1/2ib

The second table gives the states with zero net angular momentum in the z-
direction. For example, the first column of the table says that the |0 0i singlet
state is found as:
q q
|0 0iab = 1/ |1/ 1/ i |1/
2 2 2 a 2
1/ i
2 b − 1/ |1/
2 2
1/ i |1/ 1/ i
2 a 2 2 b
9.1. ALL ABOUT ANGULAR MOMENTUM 407

Similarly the second column gives the middle rung |1 0i on the triplet ladder.
The bottom “table” gives the bottom rung of the triplet ladder.
You can also read the tables horizontally {A.80}. For example, the first row
of the middle table says that the |1/2 1/2ia |1/2 1/2ib product state equals
q q
|1/2 1/2ia |1/2 1/2ib = 1/
2 |0 0iab + 1/
2 |1 0iab
That in turn implies that if the net square angular momentum of this product
state is measured, there is a 50/50 chance of it turning out to be either zero, or
the l = 1 (i.e. 2h̄2 ) value. The z-momentum will always be zero.
|3/2 3/2iab

|1 1iab
la = 1/2, lb = 1/2
la = 1, lb = 1/2
|1/2 1/2iab

|3/2 1/2iab

|0 0iab

|1 0iab
1 |1/2 1/2ia |1/2 1/2ib
1 |1 1ia |1/2 1/2ib
q q
q q 1/ 1/ |1/2 1/2ia |1/2 1/2ib
2/ 1/ |1 1ia |1/2 1/2ib 2 2
|1/2 1/2iab

|3/2 1/2iab

|1 1iab
3 3 q q
q q 1/ 1/ |1/2 1/2ia |1/2 1/2ib
1/ 2/ |1 0ia |1/2 1/2ib 2 2
3 3
q q 1 |1/2 1/2ia |1/2 1/2ib
1/ 2/ |1 0ia |1/2 1/2ib
|3/2 3/2iab

3 3
q q
|2 2iab

2/
3
1/
3 |1 1ia |1/2 1/2ib
la = 3/2, lb = 1/2
|1 1iab

|2 1iab

|5/2 5/2iab
1 |1 1ia |1/2 1/2ib
1 |3/2 3/2ia |1/2 1/2ib
la = 2, lb = 1/2
|3/2 3/2iab

|5/2 3/2iab
q q
3/
4
1/
4 |3/2 3/2ia |1/2 1/2ib
1 |2 2ia |1/2 1/2ib
|1 0iab

|2 0iab

q q
1/
4
3/
4 |3/2 1/2ia |1/2 1/2ib q q
4/ 1/ |2 2ia |1/2 1/2ib
|3/2 1/2iab

|5/2 1/2iab

q q 5 5
1/ 1/ |3/2 1/2ia |1/2 1/2ib q q
2 2 1/ 4/ |2 1ia |1/2 1/2ib
|1 1iab

|2 1iab

q q 5 5
1/
2
1/
2 |3/2 1/2ia |1/2 1/2ib q q
3/ 2/ |2 1ia |1/2 1/2ib
|3/2 1/2iab

|5/2 1/2iab

q q 5 5
1/ 3/ |3/2 1/2ia |1/2 1/2ib q q
4 4 2/ 3/ |2 0ia |1/2 1/2ib
|2 2iab

q q 5 5
3/
4
1/
4 |3/2 3/2ia |1/2 1/2ib q q
2/ 3/ |2 0ia |1/2 1/2ib
|3/2 3/2iab

|5/2 3/2iab

5 5
1 |3/2 3/2ia |1/2 1/2ib q q
3/
5
2/
5 |2 1ia |1/2 1/2ib
q q
1/ 4/ |2 1ia |1/2 1/2ib
|5/2 5/2iab

5 5
q q
4/
5
1/
5 |2 2ia |1/2 1/2ib

1 |2 2ia |1/2 1/2ib

Figure 9.5: Clebsch-Gordan coefficients for lb equal to one half.


408 CHAPTER 9. ELECTROMAGNETISM

|5/2 5/2iab
la = 3/2, lb = 1

|3/2 3/2iab

|5/2 3/2iab
1 |3/2 3/2ia |1 1ib
q q
3/ 2/ |3/2 3/2ia |1 0ib

|1/2 1/2iab

|3/2 1/2iab

|5/2 1/2iab
5 5
q q
2/
5
3/
5 |3/2 1/2ia |1 1ib
q q q
1/ 2/ 1/ |3/ 3/ i |1 1ib
2 5 10 2 2 a

|3 3iab
q q q
1/ 1/ 3/ |3/2 1/2ia |1 0ib la = 2, lb = 1
|1/2 1/2iab

|3/2 1/2iab

|5/2 1/2iab

3 15 5
q q q

|2 2iab

|3 2iab
1/ 8/ 3/ |3/ 1/ i |1 1ib
6 15 10 2 2 a 1 |2 2ia |1 1ib
q q q
1/ 8/ 3/ |3/ 1/ i |1 q q
10 2 2 a 1ib
6 15 2/
3
1/
3 |2 2ia |1 0ib
q q q
|1 1iab

|2 1iab

|3 1iab
q q
1/ 1/ 3/ |3/2 1/2ia |1 0ib
|3/2 3/2iab

|5/2 3/2iab

3 15 5 1/
3
2/
3 |2 1ia |1 1ib
q q q
1/ 2/ 1/ |3/ 3/ i |1 1ib q q q
2 5 10 2 2 a
3/
5
1/
3
1/
15 |2 2ia |1 1ib
q q
q q q
2/ 3/ |3/2 1/2ia |1 1ib
|5/2 5/2iab

5 5 3/
10
1/
6
8/
15 |2 1ia |1 0ib
q q
|1 0iab

|2 0iab

|3 0iab

q q q
3/ 2/ |3/2 3/2ia |1 0ib
5 5 1/
10
1/
2
2/
5 |2 0ia |1 1ib
q q q
1 |3/2 3/2ia |1 1ib
|2 2iab
3/
10
1/
2
1/
5 |2 1ia |1 1ib
q q la = 1, lb = 1
2/ 0 3/ |2 0ia |1 0ib
|1 1iab

|2 1iab

5 5
|1 1iab

|2 1iab

|3 1iab

q q q
3/ 1/ 1/ |2 1ia |1 1ib 1 |1 1ia |1 1ib
10 2 5
q q
q q q 1/ 1/ |1 1ia |1 0ib
1/ 1/ 2/ |2 0ia |1 1ib 2 2
|0 0iab

|1 0iab

|2 0iab

10 2 5 q q
q q q 1/ 1/ |1 0ia |1 1ib
3/
10
1/
6
8/
15 |2 1ia |1 0ib 2 2
|2 2iab

|3 2iab

q q q q q q
3/
5
1/
3
1/
15 |2 2ia |1 1ib 1/
3
1/
2
1/
6 |1 1ia |1 1ib
q q
q q 1/ 0 2/ |1 0ia |1 0ib
1/ 2/ |2 1ia |1 1ib 3 3
|1 1iab

|2 1iab

3 3 q q q
|3 3iab

q q 1/ 1/ 1/ |1 1ia |1 1ib
2/
3
1/
3 |2 2ia |1 0ib 3 2 6
q q
1/ 1/ |1 0ia |1 1ib
1 |2 2ia |1 1ib 2 2
|2 2iab

q q
1/
2
1/
2 |1 1ia |1 0ib

1 |1 1ia |1 1ib

Figure 9.6: Clebsch-Gordan coefficients for lb equal to one.


9.1. ALL ABOUT ANGULAR MOMENTUM 409

How about the Clebsch-Gordan coefficients to combine other ladders than


the spins of two spin 1/2 particles? Well, the same procedures used in the previous
section work just as well to combine the angular momenta of any two angular
momentum ladders, whatever their size. Just the thing for a long winter night.
Or, if you live in Florida, you just might want to write a little computer program
that does it for you {A.81} and outputs the tables in human-readable form
{A.82}, like figures 9.5 and 9.6.
From the figures you may note that when two states with total angular
momentum quantum numbers la and lb are combined, the combinations have
total angular quantum numbers ranging from la + lb to |la − lb |. This is similar
to the fact that when in classical mechanics two angular momentum vectors are
combined, the combined total angular momentum Lab is at most La + Lb and
at least |La − Lb |. (The so-called “triangle inequality” for combining vectors.)
But of course,
q l is not quite a proportional measure of L unless L is large; in
fact, L = l(l + 1)h̄ {A.83}.

9.1.7 Pauli spin matrices


This subsection returns to the simple two-rung spin ladder (doublet) of an elec-
tron, or any other spin 1/2 particle for that matter, and tries to tease out some
more information about the spin. While the analysis so far has made statements
about the angular momentum in the arbitrarily chosen z-direction, you often
also need information about the spin in the corresponding x and y directions.
This subsection will find it.
But before getting at it, a matter of notations. It is customary to indicate
angular momentum that is due to spin not by a capital L, but by a capital S.
Similarly, the azimuthal quantum number is then indicated by s instead of l.
This subsection will follow this convention.
Now, suppose you know that the particle is in the “spin-up” state with
Sz = 1/2h̄ angular momentum in a chosen z direction; in other words that it is
in the |1/2 1/2i, or ↑, state. You want the effect of the Sbx and Sby operators on this
state. In the absence of a physical model for the motion that gives rise to the
spin, this may seem like a hard question indeed. But again the faithful ladder
operators Sb+ and Sb− clamber up and down to your rescue!
Assuming that the normalization factor of the ↓ state is chosen in terms of
the one of the ↑ state consistent with the ladder relations (9.9) and (9.10), you
have:
Sb+ ↑ = (Sbx + iSby )↑ = 0 Sb− ↑ = (Sbx − iSby )↑ = h̄↓
By adding or subtracting the two equations, you find the effects of Sbx and Sby
on the spin-up state:
Sbx ↑ = 21 h̄↓ Sby ↑ = 21 ih̄↓
410 CHAPTER 9. ELECTROMAGNETISM

It works the same way for the spin-down state ↓ = |1/2 1/2i:

Sbx ↓ = 12 h̄↑ Sby ↓ = − 21 ih̄↑

You now know the effect of the x- and y-angular momentum operators on the
z-direction spin states. Chalk one up for the ladder operators.
Next, assume that you have some spin state that is an arbitrary combination
of spin-up and spin-down:
a↑ + b↓
Then, according to the expressions above, application of the x-spin operator Sbx
will turn it into:
³ ´ ³ ´
Sbx (a↑ + b↓) = a 0↑ + 21 h̄↓ + b 1
2
h̄↑ + 0↓

while the operator Sby turns it into


³ ´ ³ ´
Sby (a↑ + b↓) = a 0↑ + 21 h̄i↓ + b − 12 h̄i↑ + 0↓

And of course, since ↑ and ↓ are the eigenstates of Sbz ,


³ ´ ³ ´
Sbz (a↑ + b↓) = a 1
2
h̄↑ + 0↓ + b 0↑ − 21 h̄↓

If you put the coefficients in the formula above, except for the common factor
1
2
h̄, in little 2 × 2 tables, you get the so-called “Pauli spin matrices”:
à ! à ! à !
0 1 0 −i 1 0
σx = σy = σz = (9.15)
1 0 i 0 0 −1

where the convention is that a multiplies the first column of the matrices and b
the second. Also, the top rows in the matrices produce the spin-up part of the
result and the bottom rows the spin down part. In linear algebra, you also put
the coefficients a and b together in a vector:
à !
a
a↑ + b↓ ≡
b

You can now go further and find the eigenstates of the Sbx and Sby -operators in
terms of the eigenstates ↑ and ↓ of the Sbz operator. You can use the techniques
of linear algebra, or you can guess. For example, if you guess a = b = 1,
à ! à ! à ! à !
1 1 0×1+1×1 1
Sb
x = 1
h̄σx = 1
h̄ = 1

1 2 1 2 1×1+0×1 2 1
9.1. ALL ABOUT ANGULAR MOMENTUM 411

so a = b = 1 is an eigenstate of Sbx with eigenvalue 21 h̄, call it a →, “spin-right”,



state. To normalize the state, you still need to divide by 2:
1 1
→= √ ↑+ √ ↓
2 2
Similarly, you can guess the other eigenstates, and come up with:
1 1 i i 1 i 1 i
→ = √ ↑+ √ ↓ ← = −√ ↑ + √ ↓ ⊗ = √ ↑ + √ ↓ ⊙ = √ ↑ − √ ↓
2 2 2 2 2 2 2 2
(9.16)
Note that the square magnitudes of the coefficients of the states are all one
half, giving a 50/50 chance of finding the z-momentum up or down. Since the
choice of the axis system is arbitrary, this can be generalized to mean that if the
spin in a given direction has a definite value, then there will be a 50/50 chance
of the spin in any orthogonal direction turning out to be 12 h̄ or − 21 h̄.
You might wonder about the choice of normalization factors in the spin
states (9.16). For example, why not leave out the common factor i in the ←,
(negative x-spin, or spin-left), state? The reason is to ensure that the x-direction
ladder operator Sby ± iSbz and the y-direction one Sbz ± iSbx , as obtained by cyclic
permutation of the ones for z, produce real, positive multiplication factors.
This allows relations valid in the z-direction (like the expressions for triplet and
singlet states) to also apply in the x and y-directions. In addition, with this
choice, if you do a simple change in the labeling of the axes, from xyz to yzx
or zxy, the form of the Pauli spin matrices remains unchanged. The → and ⊗
states of positive x-, respectively y-momentum were chosen a different way: if
you rotate the axis system 90◦ around the y or x axis, these are the spin-up
states along the new z-axes, the x or y axis in the system you are looking at
now {A.84}.

9.1.8 General spin matrices


1
The arguments that produced the Pauli spin matrices for a system with spin 2
work equally well for systems with larger square angular momentum.
In particular, from the definition of the ladder operators
b+ ≡ L
L b + iL
b b− ≡ L
L b − iL
b
x y x y

it follows by taking the sum, respectively difference, that


b = 1L
L b+ + 1 L
b− b = −i 1 L
L b+ + i 1 L
b− (9.17)
x 2 2 y 2 2

b or L
Therefor, the effect of either L b is to produce multiples of the states
x y
with the next higher and the next lower magnetic quantum number. The mul-
tiples can be determined using (9.9) and (9.10).
412 CHAPTER 9. ELECTROMAGNETISM

If you put these multiples again in matrices, after ordering the states by mag-
netic quantum number, you get Hermitian tridiagonal matrices with nonzero
sub and superdiagonals and zero main diagonal, where L b is real symmetric
x
b
while Ly is purely imaginary, equal to i times a real skew-symmetric matrix.
Be sure to tell all you friends that you heard it here first. Do watch out for
the well-informed friend who may be aware that forming such matrices is bad
news anyway since they are almost all zeros. If you want to use canned matrix
software, at least use the kind for tridiagonal matrices.

9.2 The Relativistic Dirac Equation


Relativity threw up some road blocks when quantum mechanics was first formu-
lated, especially for the electrically charged particles physicist wanted to look
at most, electrons. This section explains some of the ideas. You will need a
good understanding of linear algebra to really follow the reasoning.
For zero spin particles, including relativity appears to be simple. The clas-
sical kinetic energy Hamiltonian for a particle in free space,
3
1 X h̄ ∂
H= pb2 pbi =
2m i=1 i i ∂ri

can be replaced by Einstein’s relativistic expression


v
u 3
u 2
X
t 2
H = (m0 c ) + (pbi c)2
i=1

where m0 is the rest mass of the particle and m0 c2 is the energy this mass is
equivalent to. You can again write Hψ = Eψ, or squaring the operators in both
sides to get rid of the square root:
" #
³ ´ 3
X
2 2 2
m0 c + (pbi c) ψ = E 2ψ
i=1

This is the “Klein-Gordon” relativistic version of the Hamiltonian eigenvalue


problem, and with a bit of knowledge of partial differential equations, you can
check that the unsteady version, chapter 5.1, obeys the speed of light as the
maximum propagation speed, as you would expect, chapter 11.6.
Unfortunately, throwing a dash of spin into this recipe simply does not seem
to work in a convincing way. Apparently, that very problem led Schrödinger to
limit himself to the nonrelativistic case. It is hard to formulate simple equations
with an ugly square root in your way, and surely, you will agree, the relativistic
equation for something so very fundamental as an electron in free space should
9.2. THE RELATIVISTIC DIRAC EQUATION 413

be simple and beautiful like other fundamental equations in physics. (Can you
be more concise than F~ = m~a or E = mc2 ?).
So P.A.M. Dirac boldly proposed that for a particle like an electron, (and
other spin 1/2 elementary particles like quarks, it turned out,) the square root
produces a simple linear combination of the individual square root terms:
v
u 3 3
u X X
t(m c2 )2 + (pbi c)2 = α0 m0 c2 + αi pbi c (9.18)
0
i=1 i=1

for suitable coefficients α0 , α1 , α2 and α3 . Now, if you know a little bit of algebra,
you will quickly recognize that there is absolutely no √ way this can be true. The
teacher will have told you
√ that, √ say, a function like x + y 2 is definitely not the
2

same as the function x2 + y 2 = x + y, otherwise the Pythagorean theorem


would look a lot different, and adding coefficients as in α1 x + α2 y does not do
any good at all.
But here is the key: while this does not work for plain numbers, Dirac showed
it is possible if you are dealing with matrices, tables of numbers. In particular,
it works if the coefficients are given by
à ! à ! à ! à !
1 0 0 σx 0 σy 0 σz
α0 = α1 = α2 = α3 =
0 −1 σx 0 σy 0 σz 0

This looks like 2 × 2 size matrices, but actually they are 4 × 4 matrices since all
elements are 2 × 2 matrices themselves: the ones stand for 2 × 2 unit matrices,
the zeros for 2 × 2 zero matrices, and the σx , σy and σz are the so-called 2 × 2
Pauli spin matrices that also pop up in the theory of spin angular momentum,
section 9.1.7. The square root cannot be eliminated with matrices smaller than
4 × 4 in actual size. A derivation is in note {A.86}.
Now if the Hamiltonian is a 4 × 4 matrix, the wave function at any point
must have four components. As you might guess from the appearance of the
spin matrices, half of the explanation of the wave function splitting into four is
the two spin states of the electron. How about the other half? It turns out that
the Dirac equation brings with it states of negative total energy, in particular
negative rest mass energy.
That was of course a curious thing. Consider an electron in what otherwise is
an empty vacuum. What prevents the electron from spontaneously transitioning
to the negative rest mass state, releasing twice its rest mass in energy? Dirac
concluded that what is called empty vacuum should in the mathematics of
quantum mechanics be taken to be a state in which all negative energy states are
already filled with electrons. Clearly, that requires the Pauli exclusion principle
to be valid for electrons, otherwise the electron could still transition into such
a state. According to this idea, nature really does not have a free choice in
414 CHAPTER 9. ELECTROMAGNETISM

whether to apply the exclusion principle to electrons if it wants to create a


universe as we know it.
But now consider the vacuum without the electron. What prevents you from
adding a big chunk of energy and lifting an electron out of a negative rest-mass
state into a positive one? Nothing, really. It will produce a normal electron and
a place in the vacuum where an electron is missing, a “hole”. And here finally
Dirac’s boldness appears to have deserted him; he shrank from proposing that
this hole would physically show up as the exact antithesis of the electron, its
anti-particle, the positively charged positron. Instead Dirac weakly pointed the
finger at the proton as a possibility. “Pure cowardice,” he called it later. The
positron that his theory really predicted was subsequently discovered anyway.
(It had already been observed earlier, but was not recognized.)
The reverse of the production of an electron/positron pair is pair annihi-
lation, in which a positron and an electron eliminate each other, creating two
gamma-ray photons. There must be two, because viewed from the combined
center of mass, the net momentum of the pair is zero, and momentum conser-
vation says it must still be zero after the collision. A single photon would have
nonzero momentum, you need two photons coming out in opposite directions.
However, pairs can be created from a single photon with enough energy if it
happens in the vicinity of, say, a heavy nucleus: a heavy nucleus can absorb
the momentum of the photon without picking up much velocity, so without
absorbing too much of the photon’s energy.
The Dirac equation also gives a very accurate prediction of the magnetic
moment of the electron, section 9.6, though the quantum electromagnetic field
affects the electron and introduces a correction of about a tenth of a percent.
But the importance of the Dirac equation was much more than that: it was
the clue to our understanding how quantum mechanics can be reconciled with
relativity, where particles are no longer absolute, but can be created out of
nothing or destroyed according to the mass-energy relation E = mc2 , {A.4}.
Dirac was a theoretical physicist at Cambridge University, but he moved to
Florida in his later life to be closer to his elder daughter, and was a professor
of physics at the Florida State University when I got there. So it gives me
some pleasure to include the Dirac equation in my text as the corner stone of
relativistic quantum mechanics.

9.3 The Electromagnetic Hamiltonian


This section describes very basically how electromagnetism fits into quantum
mechanics. However, electromagnetism is fundamentally relativistic; its carrier,
the photon, readily emerges or disappears. To describe electromagnetic effects
fully requires quantum electrodynamics, and that is far beyond the scope of this
9.3. THE ELECTROMAGNETIC HAMILTONIAN 415

text. (However, see chapter 10.2 for some of the ideas.)


In classical electromagnetics, the force on a particle with charge q in a field
with electric strength E~ and magnetic strength B ~ is given by the Lorentz force
law
d~v ³ ´
m =q E ~ + ~v × B
~ (9.19)
dt
where ~v is the velocity of the particle and for an electron, the charge is q = −e.
Unfortunately, quantum mechanics uses neither forces nor velocities. In
fact, the earlier analysis of atoms and molecules in this book used the fact
that the electric field is described by the corresponding potential energy V , see
for example the Hamiltonian of the hydrogen atom. The magnetic field must
appear differently in the Hamiltonian; as the Lorentz force law shows, it couples
with velocity. You would expect that still the Hamiltonian would be relatively
simple, and the simplest idea is then that any potential corresponding to the
magnetic field moves in together with momentum. Since the momentum is a
vector quantity, then so must be the magnetic potential. So, your simplest guess
would be that the Hamiltonian takes the form
1 ³b ´
~ 2 + qϕ
H= ~p − q A (9.20)
2m

~ is the “magnetic
where ϕ = V /q is the “electric potential” per unit charge, and A
vector potential” per unit charge. And this simplest guess is in fact right.
The relationship between the vector potential A ~ and the magnetic field
~
strength B will now be found from requiring that the classical Lorentz force
law is obtained in the classical limit that the quantum uncertainties in position
and momentum are small. In that case, expectation values can be used to de-
scribe position and velocity, and the field strengths E ~ and B~ will be constant
on the small quantum scales. That means that the derivatives of ϕ will be con-
~ is the negative gradient of ϕ), and presumably the same for the
stant, (since E
~
derivatives of A.
Now according to chapter 5.1.4, the evolution of the expectation value of
position is found as
b ¿ À
dh~ri i
= [H,~r]
dt h̄
Working out the commutator with the Hamiltonian above, {A.87}, you get,
b
dh~ri 1 Db E
= ~
~p − q A
dt m
This is unexpected; it shows that ~pb, i.e. h̄∇/i, is no longer the operator of the
normal momentum m~v when there is a magnetic field; ~pb − q A ~ gives the normal
416 CHAPTER 9. ELECTROMAGNETISM

momentum. The momentum represented by ~pb by itself is called “canonical”


momentum to distinguish it from normal momentum:
The canonical momentum h̄∇/i only corresponds to normal momen-
tum if there is no magnetic field involved.
(Actually, it was not that unexpected to physicists, since the same happens
in the classical description of electromagnetics using the so-called Lagrangian
approach, {A.4.10})
Next, Newton’s second law says that the time derivative of the linear mo-
mentum m~v is the force. Since according to the above, the linear momentum
~ then
operator is ~pb − q A,
¿ À* +
dh~v i ~
dh~pb − q Ai i ~
∂A
m = = b ~
[H, ~p − q A] − q
dt dt h̄ ∂t

The objective is now to ensure that the right hand side is the correct Lorentz
~ in terms
force (9.19) for the assumed Hamiltonian, by a suitable definition of B
~
of A.
After a lot of grinding down commutators, {A.87}, it turns out that indeed
the Lorentz force is obtained,
dh~v i ³ ´
m =q E~ + h~v i × B
~
dt
provided that:
~
~ = −∇ϕ − ∂ A
E ~ =∇×A
B ~ (9.21)
∂t
So the magnetic field is found as the curl of the vector potential A. ~ And the
electric field is no longer just the negative gradient of the scalar potential ϕ if
the vector potential varies with time.
These results are not new. The electric scalar potential ϕ and the magnetic
vector potential A ~ are the same in classical physics, though they are a lot less
easy to guess than done here. Moreover, in classical physics they are just conve-
nient mathematical quantities to simplify analysis. In quantum mechanics they
appear as central to the formulation.
And it can make a difference. Suppose you do an experiment where you
pass electron wave functions around both sides of a very thin magnet: you will
get a wave interference pattern behind the magnet. The classical expectation is
that this interference pattern will be independent of the magnet strength: the
magnetic field B ~ outside a very thin and long ideal magnet is zero, so there is no
force on the electron. But the magnetic vector potential A ~ is not zero outside the
magnet, and Aharonov and Bohm argued that the interference pattern would
9.4. MAXWELL’S EQUATIONS [DESCRIPTIVE] 417

therefor change with magnet strength. So it turned out to be in experiments


done subsequently. The conclusion is clear; nature really goes by the vector
~ and not the magnetic field B
potential A ~ in its actual workings.

9.4 Maxwell’s Equations [Descriptive]


Maxwell’s equations are commonly not covered in a typical engineering program.
While these laws are not directly related to quantum mechanics, they do tend
to pop up in nanotechnology. This section intends to give you some of the ideas.
The description is based on the divergence and curl spatial derivative operators,
and the related Gauss and Stokes theorems commonly found in calculus courses
(Calculus III in the US system.)
Skipping the first equation for now, the second of Maxwell’s equations comes
directly out of the quantum mechanical description of the previous section.
Consider the expression for the magnetic field B ~ “derived” (guessed) there,
(9.21). If you take its divergence, (premultiply by ∇·), you get rid of the vector
~ since the divergence of any curl is always zero, so you get
potential A,

Maxwell’s second equation: ~ =0


∇·B (9.22)

and that is the second of Maxwell’s four beautifully concise equations. (The
compact modern notation using divergence and curl is really due to Heaviside
and Gibbs, though.)
The first of Maxwell’s equations is a similar expression for the electric field
~ but its divergence is not zero:
E,

Maxwell’s first equation: ~ = ρ


∇·E (9.23)
ǫ0
where ρ is the electric charge per unit volume that is present and the constant
ǫ0 = 8.85 10−12 C2 /J m is called the permittivity of space.
What does it all mean? Well, the first thing to verify is that Maxwell’s first
equation is just a very clever way to write Coulomb’s law for the electric field
of a point charge. Consider therefor an electric point charge of strength q, and
imagine this charge surrounded by a translucent sphere of radius r, as shown in
figure 9.7. By symmetry, the electric field at all points on the spherical surface
is radial, and everywhere has the same magnitude E = |E|; ~ figure 9.7 shows it
for eight selected points.
Now watch what happens if you integrate both sides of Maxwell’s first equa-
tion (9.23) over the interior of this sphere. Starting with the right hand side,
since the charge density is the charge per unit volume, by definition its integral
over the volume is the charge q. So the right hand side integrates simply to
q/ǫ0 . How about the left hand side? Well, the Gauss, or divergence, theorem of
418 CHAPTER 9. ELECTROMAGNETISM

~
E

Figure 9.7: Relationship of Maxwell’s first equation to Coulomb’s law.

calculus says that the divergence of any vector, E ~ in this case, integrated over
the volume of the sphere, equals the radial electric field E integrated over the
surface of the sphere. Since E is constant on the surface, and the surface of a
sphere is just 4πr2 , the right hand side integrates to 4πr2 E. So in total, you get
for the integrated first Maxwell’s equation that 4πr2 E = q/ǫ0 . Take the 4πr2 to
the other side and there you have the Coulomb electric field of a point charge:
q
Coulomb’s law: E= (9.24)
4πr2 ǫ0
Multiply by −e and you have the electrostatic force on an electron in that field
according to the Lorentz equation (9.19). Integrate with respect to r and you
have the potential energy V = −qe/4πǫ0 r that has been used earlier to analyze
atoms and molecules.
Of course, all this raises the question, why bother? If Maxwell’s first equa-
tion is just a rewrite of Coulomb’s law, why not simply stick with Coulomb’s
law in the first place? Well, to describe the electric field at a given point using
Coulomb’s law requires you to consider every charge everywhere else. In con-
trast, Maxwell’s equation only involves local quantities at the given point, to wit,
the derivatives of the local electric field and the local charge per unit volume.
It so happens that in numerical or analytical work, most of the time it is much
more convenient to deal with local quantities, even if those are derivatives, than
with global ones.
Of course, you can also integrate Maxwell’s first equation over more general
regions than a sphere centered around a charge. For example figure 9.8 shows
a sphere with an off-center charge. But the electric field strength is no longer
constant over the surface, and divergence theorem now requires you to inte-
9.4. MAXWELL’S EQUATIONS [DESCRIPTIVE] 419

q q

~
E ~
E

Figure 9.8: Maxwell’s first equation for a more arbitrary region. The figure to
the right includes the field lines through the selected points.

grate the component of the electric field normal to the surface over the surface.
Clearly, that does not have much intuitive meaning. However, if you are willing
to loosen up a bit on mathematical preciseness, there is a better way to look at
it. It is in terms of the “electric field lines”, the lines that everywhere trace the
direction of the electric field. The left figure in figure 9.8 shows the field lines
through the selected points; a single charge has radial field lines.
Assume that you draw the field lines densely, more like figure 9.9 say, and
moreover, that you make the number of field lines coming out of a charge pro-
portional to the strength of that charge. In that case, the local density of field
lines at a point becomes a measure of the strength of the electric field at that
point, and in those terms, Maxwell’s integrated first equation says that the net
number of field lines leaving a region is proportional to the net charge inside
that region. That remains true when you add more charges inside the region.
In that case the field lines will no longer be straight, but the net number going
out will still be a measure of the net charge inside.
Now consider the question why Maxwell’s second equation says that the
divergence of the magnetic field is zero. For the electric field you can shove,
say, some electrons in the region to create a net negative charge, or you can
shove in some ionized molecules to create a net positive charge. But the mag-
netic equivalents to such particles, called “magnetic monopoles”, being separate
magnetic north pole particles or magnetic south pole particles, simply do not
exist, {A.88}. It might appear that your bar magnet has a north pole and a
south pole, but if you take it apart into little pieces, you do not end up with
north pole pieces and south pole pieces. Each little piece by itself is still a little
magnet, with equally strong north and south poles. The only reason the com-
420 CHAPTER 9. ELECTROMAGNETISM

Figure 9.9: The net number of field lines leaving a region is a measure for the
net charge inside that region.

bined magnet seems to have a north pole is that all the microscopic magnets of
which it consists have their north poles preferentially pointed in that direction.
If all microscopic magnets have equal strength north and south poles, then
the same number of magnetic field lines that come out of the north poles go back
into the south poles, as figure 9.10 illustrates. So the net magnetic field lines
leaving a given region will be zero; whatever goes out comes back in. True, if you
enclose the north pole of a long bar magnet by an imaginary sphere, you can get
a pretty good magnetic approximation of the electrical case of figure 9.7. But
even then, if you look inside the magnet where it sticks through the spherical
surface, the field lines will be found to go in towards the north pole, instead of
away from it. You see why Maxwell’s second equation is also called “absence of
magnetic monopoles.” And why, say, electrons can have a net negative charge,
but have zero magnetic pole strength; their spin and orbital angular momenta
produce equally strong magnetic north and south poles, a magnetic “dipole” (di
meaning two.)
You can get Maxwell’s third equation from the electric field “derived” in the
previous section. If you take its curl, (premultiply by ∇×), you get rid of the
potential ϕ, since the curl of any gradient is always zero, and the curl of A ~ is
the magnetic field. So the third of Maxwell’s equations is:

~
Maxwell’s third equation: ~ = − ∂B
∇×E (9.25)
∂t
9.4. MAXWELL’S EQUATIONS [DESCRIPTIVE] 421

SN

Figure 9.10: Since magnetic monopoles do not exist, the net number of magnetic
field lines leaving a region is always zero.

The “curl”, ∇×, is also often indicated as “rot”.


Now what does that one mean? Well, the first thing to verify in this case
is that this is just a clever rewrite of Faraday’s law of induction, governing
electric power generation. Assume that you want to create a voltage to drive
some load (a bulb or whatever, don’t worry what the load is, just how to get
the voltage for it.) Just take a piece of copper wire and bend it into a circle,
as shown in figure 9.11. If you can create a voltage difference between the ends
of the wire you are in business; just hook your bulb or whatever to the ends
of the wire and it will light up. But to get such a voltage, you will need an
electric field as shown in figure 9.11 because the voltage difference between the
ends is the integral of the electric field strength along the length of the wire.
Now Stokes’ theorem of calculus says that the electric field strength along the
wire integrated over the length of the wire equals the integral of the curl of the
electric field strength integrated over the inside of the wire, in other words over
the imaginary translucent circle in figure 9.11. So to get the voltage, you need
a nonzero curl of the electric field on the translucent circle. And Maxwell’s
third equation above says that this means a time-varying magnetic field on the
translucent circle. Moving the end of a strong magnet closer to the circle should
do it, as suggested by figure 9.11. You better not make that a big bulb unless
422 CHAPTER 9. ELECTROMAGNETISM

~ ~
E
E

Figure 9.11: Electric power generation.

you you wrap the wire around a lot more times to form a spool, but anyway.
{A.89}.
Maxwell’s fourth and final equation is a similar expression for the curl of the
magnetic field:
~
Maxwell’s fourth equation: ~ = ~ + ∂ E
c2 ∇ × B (9.26)
ǫ0 ∂t
where ~ is the “electric current density,” the charge flowing per unit cross sec-
~ by a factor c
tional area, and c is the speed of light. (It is possible to rescale B
to get the speed of light to show up equally in the equations for the curl of E ~
~
and the curl of B, but then the Lorentz force law must be adjusted too.)
The big difference from the third equation is the appearance of the current
density ~. So, there are two ways to create a circulatory magnetic field, as shown
in figure 9.12: (1) pass a current through the enclosed circle (the current density
integrates over the area of the circle into the current through the circle), and
(2) by creating a varying electric field over the circle, much like was done for
the electric field in figure 9.11.
The fact that a current creates a surrounding magnetic field was already
known as Ampere’s law when Maxwell did his analysis. Maxwell himself how-
ever added the time derivative of the electric field to the equation to have the
9.4. MAXWELL’S EQUATIONS [DESCRIPTIVE] 423

I q

~
B ~
B ~
B ~
B

Figure 9.12: Two ways to generate a magnetic field: using a current (left) or
using a varying electric field (right).

mathematics make sense. The problem was that the divergence of any curl must
be zero, and by itself, the divergence of the current density in the right hand
side of the fourth equation is not zero. Just like the divergence of the electric
field is the net field lines coming out of a region per unit volume, the divergence
of the current density is the net current coming out. And it is perfectly OK for a
net charge to flow out of a region: it simply reduces the charge remaining within
the region by that amount. This is expressed by the “continuity equation:”

∂ρ
Maxwell’s continuity equation: ∇ · ~ = − (9.27)
∂t
So Maxwell’s fourth equation without the time derivative of the electric field is
mathematically impossible. But after he added it, if you take the divergence of
the total right hand side then you do indeed get zero as you should. To check
that, use the continuity equation above and the first equation.
In empty space, Maxwell’s equations simplify: there are no charges so both
the charge density ρ and the current density ~ will be zero. In that case, the
solutions of Maxwell’s equations are simply combinations of “traveling waves.”
A traveling wave takes the form
³ ´ ³ ´
~ = k̂E0 cos ω(t − y/c) − φ
E ~ = ı̂ 1 E0 cos ω(t − y/c) − φ
B (9.28)
c
where for simplicity, the y-axis of the coordinate system has been aligned with
the direction in which the wave travels, and the z-axis with the amplitude k̂E0
of the electric field of the wave. The constant ω is the angular frequency of the
424 CHAPTER 9. ELECTROMAGNETISM

wave, equal to 2π times its frequency ν in cycles per second, and is related to
its wave length λ by ωλ/c = 2π. The constant φ is just a phase angle. For these
simple waves, the magnetic and electric field must be normal to each other, as
well as to the direction of wave propagation.
You can plug the above wave solution into Maxwell’s equations and so verify
that it satisfies them all. With more effort and knowledge of Fourier analysis,
you can show that they are the most general possible solutions that take this
traveling wave form, and that any arbitrary solution is a combination of these
waves (if all directions of the propagation direction and of the electric field
relative to it, are included.)
The point is that the waves travel with the speed c. When Maxwell wrote
down his equations, c was just a constant to him, but when the propagation
speed of electromagnetic waves matched the experimentally measured speed of
light, it was just too much of a coincidence and he correctly concluded that light
must be traveling electromagnetic waves.
It was a great victory of mathematical analysis. Long ago, the Greeks had
tried to use mathematics to make guesses about the physical world, and it was an
abysmal failure. You do not want to hear about it. Only when the Renaissance
started measuring how nature really works, the correct laws were discovered
for people like Newton and others to put into mathematical form. But here,
Maxwell successfully amends Ampere’s measured law, just because the math-
ematics did not make sense. Moreover, by deriving how fast electromagnetic
waves move, he discovers the very fundamental nature of the then mystifying
physical phenomenon humans call light.
You will usually not find Maxwell’s equations in the exact form described
here. To explain what is going on inside materials, you would have to account
for the electric and magnetic fields of every electron and proton (and neutron!)
of the material. That is just an impossible task, so physicists have developed
ways to average away all those effects by messing with Maxwell’s equations. But
then the messed-up E ~ in one of Maxwell’s equations is no longer the same as the
messed-up E ~ in another, and the same for B.~ So physicists rename one messed-
~
up E as, maybe, the “electric flux density” D, ~ and a messed up magnetic field
as, maybe, “the auxiliary field”. And they define many other symbols, and even
refer to the auxiliary field as being the magnetic field, all to keep engineers out of
nanotechnology. Don’t let them! When you need to understand the messed-up
Maxwell’s equations, Wikipedia has a list of the countless definitions.

9.5 Example Static Electromagnetic Fields


In this section, some basic solutions of Maxwell’s equations are described. They
will be of interest in chapter 10.1.6 for understanding relativistic effects on the
9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 425

hydrogen atom (though certainly not essential). They are also of considerable
practical importance for a lot of non-quantum applications.
It is assumed throughout this subsection that the electric and magnetic fields
do not change with time. All solutions also assume that the ambient medium
is vacuum.
For easy reference, Maxwell’s equations and various results to be obtained
in this section are collected together in tables 9.1 and 9.2. While the existence
of magnetic monopoles is unverified, it is often convenient to compute as if they
do exist. It allows you to apply ideas from the electric field to the magnetic field
and vice-versa. So, the tables include magnetic monopoles with strength qm , in
addition to electric charges with strength q, and a magnetic current density ~m
in addition to an electric current density ~. The table uses the permittivity of
space ǫ0 and the speed of light c as basic physical constants; the permeability of
space µ0 = 1/ǫ0 c2 is just an annoyance in quantum mechanics and is avoided.
The table has been written in terms of cB ~ and ~m /c because in terms of those
combinations Maxwell’s equations have a very pleasing symmetry. It allows you
to easily convert between expressions for the electric and magnetic fields. You
wish that physicists would have defined the magnetic field as cB ~ instead of B~
in SI units, but no such luck.

9.5.1 Point charge at the origin


A point charge is a charge concentrated at a single point. It is a very good
model for the electric field of the nucleus of an atom, since the nucleus is so
small compared to the atom. A point charge of strength q located at the origin
has a charge density
point charge at the origin: ρ(~r) = qδ 3 (~r) (9.29)
where δ 3 (~r) is the three dimensional delta function. A delta function is a spike
at a single point that integrates to one, so the charge density above integrates
to the total charge q.
The electric field lines of a point charge are radially outward from the charge;
see for example figure 9.9 in the previous subsection. According to Coulomb’s
law, the electric field of a point charge is
~ = q
electric field of a point charge: E ı̂r (9.30)
4πǫ0 r2
where r is the distance from the charge, ı̂r is the unit vector pointing straight
away from the charge, and ǫ0 = 8.85 10−12 C2 /J m is the permittivity of space.
Now for static electric charges the electric field is minus the gradient of a po-
tential ϕ,
~ = −∇ϕ ∂ ∂ ∂
E ∇ ≡ ı̂ + ̂ + k̂
∂x ∂y ∂z
426 CHAPTER 9. ELECTROMAGNETISM

Physical constants:
ǫ0 = 8.854 187 817 . . . 10−12 C2 /Nm2 c = 299 792 458 ≈ 3 108 m/s
Lorentz force law: Ã ! Ã !
~
v q m ~
v
F~ = q E ~ + × cB ~ + ~ − ×E
cB ~
c c c
Maxwell’s equations:
∇·E ~ = 1ρ ∇ · cB~ = 1 ρm
ǫ0 ǫ0 c
~ ~
~ = − 1 ∂cB − 1 ~m
∇×E ∇ × cB~ = 1 ∂ E + 1 ~
c ∂t ǫ0 c c c ∂t ǫ0 c
∂~
ρ ∂~
ρm
∇ · ~ + =0 ∇ · ~m + =0
∂t ∂t
Existence of a potential:
~ = −∇ϕ iff ∇ × E
E ~ =0 ~ = −∇ϕm
B ~ =0
iff ∇ × B
Point charge at the origin:
q 1 ~ = q ~r qm 1 ~ = qm ~r
ϕ= E cϕm = cB
4πǫ0 r 4πǫ0 r3 4πǫ0 c r 4πǫ0 c r3
Point charge at the origin in 2D:
q′ 1 ~ = q ~r
′ ′
qm 1 ~ =

qm ~r
ϕ= ln E cϕm = ln cB
2πǫ0 r 2πǫ0 r2 2πǫ0 c r 2πǫ0 c r2
Charge dipoles:
" # " #
q 1 1 qm 1 1
ϕ= − cϕm = −
4πǫ0 |~r − ~r⊕ | |~r − ~r⊖ | 4πǫ0 c |~r − ~r⊕ | |~r − ~r⊖ |
" # " #
~ = q ~r − ~
r ⊕ ~r − ~r ⊖ ~ = q m ~
r − ~
r ⊕ ~
r − ~
r ⊖
E − cB −
4πǫ0 |~r − ~r⊕ |3 |~r − ~r⊖ |3 4πǫ0 c |~r − ~r⊕ |3 |~r − ~r⊖ |3
~ = q (~r⊕ − ~r⊖ )
℘ Eext = −~ ~ ext
℘·E ~µ = qm (~r⊕ − ~r⊖ ) ~ ext
Eext = −~µ · B
Charge dipoles
"
in 2D: # " #
q′ 1 1 ′
qm 1 1
ϕ= ln − ln cϕm = ln − ln
2πǫ0 |~r − ~r⊕ | |~r − ~r⊖ | 2πǫ0 c |~r − ~r⊕ | |~r − ~r⊖ |
" # " #
′ ′
~ = q ~r − ~
r ⊕ ~
r − ~
r ⊖ ~ = q m ~
r − ~
r ⊕ ~
r − ~r ⊖
E − cB −
2πǫ0 |~r − ~r⊕ |2 |~r − ~r⊖ |2 2πǫ0 c |~r − ~r⊕ |2 |~r − ~r⊖ |2
~ ′ = q ′ (~r⊕ − ~r⊖ ) Eext
℘ ′
= −~ ~ ext
℘′ · E ~µ ′ = qm
′ ′
(~r⊕ − ~r⊖ ) Eext ~ ext
= −~µ ′ · B

Table 9.1: Elecromagnetics I: Fundamental equations and basic solutions.


9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 427

Distributed charges:
1 Z 1 1 Z 1 ρm (~r) 3
ϕ= ρ(~r) d3~r cϕm = d ~r
4πǫ0 all ~r |~r − ~r| 4πǫ0 all ~r |~r − ~r| c
Z Z
~ = 1 ~r − ~r ~ = 1 ~r − ~r ρm (~r) 3
E 3
ρ(~r) d3~r cB d ~r
4πǫ0 all ~r |~r − ~r| 4πǫ0 all ~r |~r − ~r|3 c
q 1 1 ℘ ~ · ~r qm 1 1 ~µ · ~r
ϕ∼ + cϕm ∼ +
4πǫ0 r 4πǫ0 r3 4πǫ0 c r 4πǫ0 cr3
~ ∼ q ~r + 1 3(~ ℘ · ~r)~r − ℘~r2 ~ ∼ qm ~r + 1 3(~µ · ~r)~r − ~µr
2
E cB
4πǫ0 r3 4πǫ0 r5 4πǫ0 c r3 4πǫ0 cr5
Z Z Z Z
3 3 3
q= ρ(~r) d ~r ℘
~= ~rρ(~r) d ~r qm = ρm (~r) d ~r ~µ = ~rρm (~r) d3~r

Ideal charge dipoles:


1 ℘~ · ~r 1 ~µ · ~r
ϕ= cϕm =
4πǫ0 r3 4πǫ0 cr3
~ = 1 3(~ ~r2
℘ · ~r)~r − ℘ ℘
~ 3 2
~ = 1 3(~µ · ~r)~r − ~µr − ~µ δ 3 (~r)
E − δ (~r) cB
4πǫ0 r5 3ǫ0 4πǫ0 cr5 3ǫ0 c
Biot-Savart law for current densities and currents:
~ = 1 Z ~r − ~r ~m (~r) 3 ~ =− 1 Z ~r − ~r
E 3
× d ~r cB × ~(~r) d3~r
4πǫ0 c all ~r |~r − ~r| c 4πǫ0 c all ~r |~r − ~r|3
Z Z
~ = 1 ~r − ~r Im (~r) ~ =− 1 ~r − ~r
E 3
× d~r cB × I(~r) d~r
4πǫ0 c all ~r |~r − ~r| c 4πǫ0 c all ~r |~r − ~r|3
2D field due to a straight current along the z-axis:
Im ~ = − Im 1 ı̂θ I ~ = I 1
ϕ= 2
θ E 2
cϕm = − θ cB ı̂θ
2πǫ0 c 2πǫ0 c r 2πǫ0 c 2πǫ0 c r
Current dipole moment:
1 Z ~m (~r) 3 1Z qc ~
~=−
℘ ~r × d ~r ~µ = ~r × ~(~r) d3~r = L
2c all ~r c 2 all ~r 2mc
~ =℘
M ~×E ~ ext Eext = −~ ℘·E ~ ext ~ = ~µ × B
M ~ ext ~ ext
Eext = −~µ · B
Ideal current dipoles:
1 ℘~ · ~r 1 ~µ · ~r
ϕ= cϕm =
4πǫ0 r3 4πǫ0 cr3
~ = 1 3(~ ~r2
℘ · ~r)~r − ℘ 2~
℘ 3 2
~ = 1 3(~µ · ~r)~r − ~µr + 2~µ δ 3 (~r)
E + δ (~r) cB
4πǫ0 r5 3ǫ0 4πǫ0 cr5 3ǫ0 c

Table 9.2: Elecromagnetics II: Electromagnetostatic solutions.


428 CHAPTER 9. ELECTROMAGNETISM

In everyday terms the potential ϕ is called the “voltage.” It follows by integra-


tion of the electric field strength with respect to r that the potential of a point
charge is
q
electric potential of a point charge: ϕ = (9.31)
4πǫ0 r
Multiply by −e and you get the potential energy V of an electron in the field of
the point charge. That was used in writing the Hamiltonians of the hydrogen
and heavier atoms.
Delta functions are often not that easy to work with analytically, since they
are infinite and infinity is a tricky mathematical thing. It is often easier to do
the mathematics by assuming that the charge is spread out over a small sphere
of radius ε, rather than concentrated at a single point. If it is assumed that the
charge distribution is uniform within the radius ε, then it is

 q

 4 if r ≤ ε
spherical charge around the origin: ρ = 3
πε3 (9.32)


 0 if r > ε

Since the charge density is the charge per unit volume, the charge density times
the volume 43 πε3 of the little sphere that holds it must be the total charge q.
The expression above makes it so.

r r

º¤
¤ ~r
¤
¤
¤ ε ε
¤
. ~
E ϕ

Figure 9.13: Electric field and potential of a charge that is distributed uniformly
within a small sphere. The dotted lines indicate the values for a point charge.

Figure 9.13 shows that outside the region with charge, the electric field and
potential are exactly like those of a point charge with the same net charge q.
But inside the region of charge distribution, the electric field varies linearly with
radius, and becomes zero at the center. It is just like the gravity of earth: going
above the surface of the earth out into space, gravity decreases like 1/r2 if r
is the distance from the center of the earth. But if you go down below the
surface of the earth, gravity decreases also and becomes zero at the center of
9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 429

the earth. If you want, you can derive the electric field of the spherical charge
from Maxwell’s first equation; it goes much in the same way that Coulomb’s
law was derived from it in the previous section.
If magnetic monopoles exist, they would create a magnetic field much like an
electric charge creates an electric field. As table 9.1 shows, the only difference
is the square of the speed of light c popping up in the expressions. (And that is
really just a matter of definitions, anyway.) In real life, these expressions give
an approximation for the magnetic field near the north or south pole of a very
long thin magnet as long as you do not look inside the magnet.

q′

Figure 9.14: Electric field of a two-dimensional line charge.

A homogeneous distribution of charges along an infinite straight line is called


a line charge. As shown in figure 9.14, it creates a two-dimensional field in the
planes normal to the line. The line charge becomes a point charge within such
a plane. The expression for the field of a line charge can be derived in much the
same way as Coulomb’s law was derived for a three-dimensional point charge in
the previous section. In particular, where that derivation surrounded the point
charge by a spherical surface, surround the line charge by a cylinder. (Or by a
circle, if you want to think of it in two dimensions.) The resulting expressions
are given in table 9.1; they are in terms of the charge per unit length of the line
q ′ . Note that in this section a prime is used to indicate that a quantity is per
unit length.

9.5.2 Dipoles
A point charge can describe a single charged particle like an atom nucleus or
electron. But much of the time in physics, you are dealing with neutral atoms or
430 CHAPTER 9. ELECTROMAGNETISM

molecules. For those, the net charge is zero. The simplest model for a system
with zero net charge is called the “dipole.” It is simply a combination of a
positive point charge q and a negative one −q, making the net charge zero.

–q

Figure 9.15: Field lines of a vertical electric dipole.

Figure 9.15 shows an example of a dipole in which the positive charge is


straight above the negative one. Note the distinctive egg shape of the biggest
electric field lines. The “electric dipole moment” ℘ ~ is defined as the product
of the charge strength q times the connecting vector from negative to positive
charge:
electric dipole moment: ℘~ = q(~r⊕ − ~r⊖ ) (9.33)
where ~r⊕ and ~r⊖ are the positions of the positive and negative charges respec-
tively.
The potential of a dipole is simply the sum of the potentials of the two
charges:
q 1 q 1
potential of an electric dipole: ϕ = − (9.34)
4πǫ0 |~r − ~r⊕ | 4πǫ0 |~r − ~r⊖ |
Note that to convert the expressions for a charge at the origin to one not at the
origin, you need to use the position vector measured from the location of the
charge.
The electric field of the dipole can be found from either taking minus the
gradient of the potential above, or from adding the fields of the individual point
charges, and is

~ = q ~r − ~r⊕ q ~r − ~r⊖
field of an electric dipole: E 3
− (9.35)
4πǫ0 |~r − ~r⊕ | 4πǫ0 |~r − ~r⊖ |3
9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 431

To obtain that result from taking the the gradient of the potential, remember
the following important formula for the gradient of |~r −~r0 |n with n an arbitrary
power:

∂|~r − ~r0 |n
= n|~r − ~r0 |n−2 (ri − r0,i ) ∇~r |~r − ~r0 |n = n|~r − ~r0 |n−2 (~r − ~r0 )
∂ri
(9.36)
The first expression gives the gradient in index notation and the second gives it
in vector form. The subscript on ∇ merely indicates that the differentiation is
with respect to ~r, not ~r0 . These formulae will be used routinely in this section.
Using them, you can check that minus the gradient of the dipole potential does
indeed give its electric field above.
Similar expressions apply for magnetic dipoles. The field outside a thin bar
magnet can be approximated as a magnetic dipole, with the north and south
poles of the magnet as the positive and negative magnetic point charges. The
magnetic field lines are then just like the electric field lines in figure 9.15.

q′

−q ′

Figure 9.16: Electric field of a two-dimensional dipole.

Corresponding expressions can also be written down in two dimensions, for


opposite charges distributed along parallel straight lines. Figure 9.16 gives an
example. In two dimensions, all field lines are circles passing through both
charges.
A particle like an electron has an electric charge and no known size. It
can therefor be described as an ideal point charge. But an electron also has
a magnetic moment: it acts as a magnet of zero size. Such a magnet of zero
size will be referred to as an “ideal magnetic dipole.” More precisely, an ideal
magnetic dipole is defined as the limit of a magnetic dipole when the two poles
432 CHAPTER 9. ELECTROMAGNETISM

are brought vanishingly close together. Now if you just let the two poles ap-
proach each other without doing anything else, their opposite fields will begin to
increasingly cancel each other, and there will be no field left when the poles are
on top of each other. When you make the distance between the poles smaller,
you also need to increase the strengths qm of the poles to ensure that the
magnetic dipole moment: ~µ = qm (~r⊕ − ~r⊖ ) (9.37)
remains finite. So you can think of an ideal magnetic dipole as infinitely strong
magnetic poles infinitely close together.

Figure 9.17: Field of an ideal magnetic dipole.

The field lines of a vertical ideal magnetic dipole are shown in figure 9.17.
Their egg shape is in spherical coordinates described by, {A.90},
r = rmax sin2 θ φ = constant (9.38)
To find the magnetic field itself, start with the magnetic potential of a non-ideal
dipole, " #
qm 1 1
ϕm = −
4πǫ0 c2 |~r − ~r⊕ | |~r − ~r⊖ |
Now take the negative pole at the origin, and allow the positive pole to approach
it vanishingly close. Then the potential above takes the generic form
qm 1
ϕm = f (~r − ~r⊕ ) − f (~r) f (~r) =
4πǫ0 c2 |~r|
Now according to the total differential of calculus, (or the multi-dimensional
Taylor series theorem, or the definition of directional derivative), for small ~r⊕
an expression of the form f (~r − ~r⊕ ) − f (~r) can be approximated as
f (~r − ~r⊕ ) − f (~r) ∼ −~r⊕ · ∇f for ~r⊕ → 0
9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 433

From this the magnetic potential of an ideal dipole at the origin can be found
by using the expression (9.36) for the gradient of 1/|~r| and then substituting
the magnetic dipole strength µ
~ for qm~r⊕ . The result is
1 ~µ · ~r
potential of an ideal magnetic dipole: ϕm = (9.39)
4πǫ0 c2 r3
The corresponding magnetic field can be found as minus the gradient of the
potential, using again (9.36) and the fact that the gradient of ~µ · ~r is just ~µ:

~ = 1 3(~µ · ~r)~r − ~µr2


B (9.40)
4πǫ0 c2 r5
Similar expressions can be written down for ideal electric dipoles and in two-
dimensions. They are listed in tables 9.1 and 9.2. (The delta functions will be
discussed in the next subsection.)

Figure 9.18: Electric field of an almost ideal two-dimensional dipole.

Figure 9.18 shows an almost ideal two-dimensional electric dipole. The spac-
ing between the charges has been reduced significantly compared to that in figure
9.16, and the strength of the charges has been increased. For two-dimensional
ideal dipoles, the field lines in a cross-plane are circles that all touch each other
at the dipole.

9.5.3 Arbitrary charge distributions


Modeling electric systems like atoms and molecules and their ions as singular
point charges or dipoles is not very accurate, except in a detailed quantum
solution. In a classical description, it is more reasonable to assume that the
434 CHAPTER 9. ELECTROMAGNETISM

charges are “smeared out” over space into a distribution. In that case, the
charges are described by the charge per unit volume, called the charge density
ρ. The integral of the charge density over volume then gives the net charge,
Z
qregion = ρ(~r) d3~r (9.41)
region

As far as the potential is concerned, each little piece ρ(~r) d3~r of the charge
distribution acts like a point charge at the point ~r. The expression for the
potential of such a point charge is like that of a point charge at the origin, but
with ~r replaced by ~r − ~r. The total potential results from integrating over all
the point charges. So, for a charge distribution,

1 Z 1
ϕ(~r) = ρ(~r) d3~r (9.42)
4πǫ0 all ~r |~r − ~r|

The electric field and similar expression for magnetic charge distributions and
in two dimensions may be found in table 9.2
Note that when the integral expression for the potential is differentiated to
find the electric field, as in table 9.2, the integrand becomes much more singular
at the point of integration where ~r = ~r. This may be of importance in numerical
work, where the more singular integrand can lead to larger errors. It may then
be a better idea not to differentiate under the integral, but instead put the
derivative of the charge density in the integral, like in

∂ϕ 1 Z 1 ∂ρ(~r) 3
Ex = − =− d ~r
∂x 4πǫ0 all ~r |~r − ~r| ∂x

and similar for the y and z components. That you can do that may be verified
by noting that differentiating ~r −~r with respect to x is within a minus sign the
same as differentiating with respect to x, and then you can use integration by
parts to move the derivative to ρ.
Now consider the case that the charge distribution is restricted to a very
small region around the origin, or equivalently, that the charge distribution
is viewed from a very large distance. For simplicity, assume the case that the
charge distribution is restricted to a small region around the origin. In that case,
~r is small wherever there is charge; the integrand can therefor be approximated
by a Taylor series in terms of ~r to give:
" #
1 Z 1 ~r
ϕ= + 3 · ~r + . . . ρ(~r) d3~r
4πǫ0 all ~r |~r| |~r|

where (9.36) was used to evaluate the gradient of 1/|~r − ~r| with respect to ~r.
9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 435

Since the fractions no longer involve ~r, they can be taken out of the integrals
and so the potential simplifies to
Z Z
q 1 1 ℘~ · ~r
ϕ= + +... q≡ ρ(~r) d3~r ~≡
℘ ~rρ(~r) d3~r (9.43)
4πǫ0 r 4πǫ0 r3 all ~r all ~r

The leading term shows that a distributed charge distribution will normally look
like a point charge located at the origin when seen from a sufficient distance.
However, if the net charge q is zero, like happens for a neutral atom or molecule,
it will look like an ideal dipole, the second term, when seen from a sufficient
distance.
The expansion (9.43) is called a “multipole expansion.” It allows the effect
of a complicated charge distribution to be described by a few simple terms,
assuming that the distance from the charge distribution is sufficiently large
that its small scale features can be ignored. If necessary, the accuracy of the
expansion can be improved by using more terms in the Taylor series. Now
recall from the previous section that one advantage of Maxwell’s equations over
Coulomb’s law is that they allow you to describe the electric field at a point using
purely local quantities, rather than having to consider the charges everywhere.
But using a multipole expansion, you can simplify the effects of distant charge
distributions. Then Coulomb’s law can become competitive with Maxwell’s
equations, especially in cases where the charge distribution is restricted to a
relatively limited fraction of the total space.
The previous subsection discussed how an ideal dipole could be created by
decreasing the distance between two opposite charges with a compensating in-
crease in their strength. The multipole expansion above shows that the same
ideal dipole is obtained for a continuous charge distribution, provided that the
net charge q is zero.
The electric field of this ideal dipole can be found as minus the gradient of
the potential. But caution is needed; the so-obtained electric field may not be
sufficient for your needs. Consider the following ballpark estimates. Assume
that the charge distribution has been contracted to a typical small size ε. Then
the net positive and negative charges will have been increased by a corresponding
factor 1/ε. The electric field within the contracted charge distribution will then
have a typical magnitude 1/ε|~r − ~r|2 , and that means 1/ε3 , since the typical
size of the region is ε. Now a quantity of order 1/ε3 can integrate to a finite
amount even if the volume of integration is small of order ε3 . In other words,
there seems to be a possibility that the electric field may have a delta function
hidden within the charge distribution when it is contracted to a point. And so
it does. The correct delta function is derived in note {A.90} and shown in table
9.2. It is important in applications in quantum mechanics where you need some
integral of the electric field; if you forget about the delta function, you will get
the wrong result.
436 CHAPTER 9. ELECTROMAGNETISM

9.5.4 Solution of the Poisson equation


The previous subsections stumbled onto the solution of an important mathe-
matical problem, the Poisson equation. The Poisson equation is

∇2 ϕ = f (9.44)

where f is a given function and ϕ is the unknown one to be found. The Laplacian
∇2 is also often found written as ∆.
The reason that the previous subsection stumbled on to the solution of this
equation is that the electric potential ϕ satisfies it. In particular, minus the
gradient of ϕ gives the electric field; also, the divergence of the electric field
gives according to Maxwell’s first equation the charge density ρ divided by ǫ0 .
Put the two together and it says that ∇2 ϕ = −ρ/ǫ0 . So, identify the function
f in the Poisson equation with −ρ/ǫ0 , and there you have the solution of the
Poisson equation.
Because it is such an important problem, it is a good idea to write out the
abstract mathematical solution without the “physical entourage” of (9.42):
Z
1
2
∇ ϕ=f =⇒ ϕ= G(~r − ~r)f (~r) d3~r G(~r) = − (9.45)
all ~r 4π|~r|

The function G(~r − ~r) is called the Green’s function of the Laplacian. It is
the solution for ϕ if the function f is a delta function at point ~r. The integral
solution of the Poisson equation can therefor be understood as dividing function
f up into spikes f (~r) d3~r; for each of these spikes the contribution to ϕ is given
by corresponding Green’s function.
It also follows that applying the Laplacian on the Green’s function produces
the three-dimensional delta function,
1
∇2 G(~r) = δ 3 (~r) G(~r) = − (9.46)
4π|~r|

with |~r| = r in spherical coordinates. That sometimes pops up in quantum


mechanics, in particular in perturbation theory. You might object that the
Green’s function is infinite at ~r = 0, so that its Laplacian is undefined there,
rather than a delta function spike. And you would be perfectly right; just
saying that the Laplacian of the Green’s function is the delta function is not
really justified. However, if you slightly round the Green’s function near ~r = 0,
say like ϕ was rounded in figure 9.13, its Laplacian does exist everywhere. The
Laplacian of this rounded Green’s function is a spike confined to the region of
rounding, and it integrates to one. (You can see the latter from applying the
divergence theorem on a sphere enclosing the region of rounding.) If you then
contract the region of rounding to zero, this spike becomes a delta function in
9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 437

the limit of no rounding. Understood in this way, the Laplacian of the Green’s
function is indeed a delta function.
The multipole expansion for a charge distribution can also be converted to
purely mathematical terms:
1 Z 3 ~r Z
ϕ=− f (~r) d ~r − · ~rf (~r) d3~r + . . . (9.47)
4πr all ~r 4πr3 all ~r
(Of course, delta functions are infinite objects, and you might wonder at the
mathematical rigor of the various arguments above. However, there are solid
arguments based on “Green’s second integral identity” that avoid the infinities
and produce the same final results.)

9.5.5 Currents
Streams of moving electric charges are called currents. The current strength I
through an electric wire is defined as the amount of charge flowing through a
cross section per unit time. It equals the amount of charge q ′ per unit length
times its velocity v;
I ≡ q′v (9.48)
The current density ~ is defined as the current per unit volume, and equals the
charge density times the charge velocity. Integrating the current density over
the cross section of a wire gives its current.

Figure 9.19: Magnetic field lines around an infinite straight electric wire.

As shown in figure 9.19, electric wires are encircled by magnetic field lines.
The strength of this magnetic field may be computed from Maxwell’s fourth
438 CHAPTER 9. ELECTROMAGNETISM

equation. To do so, take an arbitrary field line circle. The field strength is
constant on the line by symmetry. So the integral of the field strength along
the line is just 2πrB; the perimeter of the field line times its magnetic strength.
Now the Stokes’ theorem of calculus says that this integral is equal to the curl
of the magnetic field integrated over the interior of the field line circle. And
Maxwell’s fourth equation says that that is 1/ǫ0 c2 times the current density
integrated over the circle. And the current density integrated over the circle is
just the current through the wire. Put it all together to get
I
magnetic field of an infinite straight wire: B = (9.49)
2πǫ0 c2 r

Figure 9.20: An electromagnet consisting of a single wire loop. The generated


magnetic field lines are in blue.

An infinite straight wire is of course not a practical way to create a magnetic


field. In a typical electromagnet, the wire is spooled around an iron bar. Figure
9.20 shows the field produced by a single wire loop, in vacuum. To find the
fields produced by curved wires, use the so-called “Biot-Savart law” listed in
table 9.2 and derived in {A.90}. You need it when you end up writing a book
on quantum mechanics and have to plot the field.
Of course, while figure 9.20 does not show it, you will also need a lead from
your battery to the electromagnet and a second lead back to the other pole of
the battery. These two leads form a two-dimensional “current dipole,” as shown
in figure 9.21, and they produce a magnetic field too. However, the currents in
the two leads are opposite; one coming from the battery and other returning to
it, so the magnetic fields that they create are opposite. Therefor, if you strand
the wires very closely together, their magnetic fields will cancel each other, and
not mess up that of your electromagnet.
9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 439

I −I

Figure 9.21: A current dipole.

It may be noted that if you bring the wires close together, whatever is left
of the field has circular field lines that touch at the dipole. In other words,
a horizontal ideal current dipole produces the same field as a two-dimensional
vertical ideal charge dipole. Similarly, the horizontal wire loop, if small enough,
produces the same field lines as a three-dimensional vertical ideal charge dipole.
(However, the delta functions are different, {A.90}.)

9.5.6 Principle of the electric motor


The previous section discussed how Maxwell’s third equation allows electric
power generation using mechanical means. The converse is also possible; electric
power allows mechanical power to be generated; that is the principle of the
electric motor.
It is possible because of the Lorentz force law, which says that a charge
q moving with velocity ~v in a magnetic field B~ experiences a force pushing it
sideways equal to
F~ = q~v × B
~

Consider the wire loop in an external magnetic field sketched in figure 9.22. The
sideways forces on the current carriers in the wire produce a net moment M ~ on
the wire loop that allows it to perform useful work.
To be more precise, the forces caused by the component of the magnetic
field normal to the wire loop are radial and produce no net force nor moment.
However, the forces caused by the component of the magnetic field parallel to
the loop produce forces normal to the plane of the loop that do generate a net
moment. Using spherical coordinates aligned with the wire loop as in figure
440 CHAPTER 9. ELECTROMAGNETISM

~ ext
B

Figure 9.22: Electric motor using a single wire loop. The Lorentz forces (black
vectors) exerted by the external magnetic field on the electric current carriers
in the wire produce a net moment M on the loop. The self-induced magnetic
field of the wire and the corresponding radial forces are not shown.

~ ext
B

M φ

Figure 9.23: Variables for the computation of the moment on a wire loop in a
magnetic field.
9.5. EXAMPLE STATIC ELECTROMAGNETIC FIELDS 441

9.23, the component of the magnetic field parallel to the loop equals Bext sin θ.
It causes a sideways force on each element rdφ of the wire equal to

dF = q ′ rdφ vBext sin θ sin φ


| {z } | {z }
dq ~ parallel
~v ×B

where q ′ is the net charge of current carriers per unit length and v their velocity.
The corresponding net force integrates to zero. However the moment does not;
integrating
dM = r sin φ q ′ rdφvBext sin θ sin φ
| {z } | {z }
arm force

produces
M = πr2 q ′ vBext sin θ
If the work M dθ done by this moment is formulated as a change in energy of
the loop in the magnetic field, that energy is

Eext = −πr2 q ′ vBext cos θ

The magnetic dipole moment ~µ is defined as the factor that only depends
on the wire loop, independent of the magnetic field. In particular µ = πr2 q ′ v
and it is taken to be in the axial direction. So the moment and energy can be
written more concisely as
~ = ~µ × B
M ~ ext ~ ext
Eext = −~µ · B

Yes, ~µ also governs how the magnetic field looks at large distances; feel free to
approximate the Biot-Savart integral for large distances to check.
A book on electromagnetics would typically identify q ′ v with the current
through the wire I and πr2 with the area of the loop, so that the magnetic
dipole moment is just IA. This is then valid for a flat wire loop of any shape,
not just a circular one.
But this is a book on quantum mechanics, and for electrons in orbits about
nuclei, currents and areas are not very useful. In quantum mechanics the more
meaningful quantity is angular momentum. So identify 2πrq ′ as the total electric
charge going around in the wire loop, and multiply that with the ratio mc /qc
of mass of the current carrier to its charge to get the total mass going around.
Then multiply with rv to get the angular momentum L. In those terms, the
magnetic dipole moment is
qc ~
~µ = L (9.50)
2mc
Usually the current carrier is an electron, so qc = −e and mc = me .
These results apply to any arbitrary current distribution, not just a circular
wire loop. Formulae are in table 9.2 and general derivations in note {A.90}.
442 CHAPTER 9. ELECTROMAGNETISM

9.6 Particles in Magnetic Fields


Maxwell’s equations are fun, but back to real quantum mechanics. The serious
~ affects a quantum system, like
question in this section is how a magnetic field B
say an electron in an hydrogen atom.
Well, if the Hamiltonian (9.20) for a charged particle is written out and
cleaned up, {A.91}, it is seen that a constant magnetic field adds two terms.
The most important of the two is

q ~ ~b
HBL = − B·L (9.51)
2m

~ the external magnetic field,


where q is the charge of the particle, m its mass, B
assumed to be constant on the scale of the atom, and L ~b is the orbital angular
momentum of the particle.
In terms of classical physics, this can be understood as follows: a parti-
cle with angular momentum L ~ can be pictured to be circling around the axis
~
through L. Now according to Maxwell’s equations, a charged particle going
around in a circle acts as a little electromagnet. Think of a version of figure
9.12 using a circular path. And a little magnet wants to align itself with an
ambient magnetic field, just like a magnetic compass needle aligns itself with
the magnetic field of earth.
In electromagnetics, the effective magnetic strength of a circling charged
particle is described by the so called orbital “magnetic dipole moment” ~µL ,
defined as
q ~
~µL ≡ L. (9.52)
2m
In terms of this magnetic dipole moment, the energy is
~
HBL = −~µL · B. (9.53)

which is the lowest when the magnetic dipole moment is in the same direction
as the magnetic field.
The scalar part of the magnetic dipole moment, to wit,
q
γL = (9.54)
2m
is called the “gyromagnetic ratio.” But since in quantum mechanics the orbital
angular momentum comes in chunks of size h̄, and the particle is usually an
electron with charge q = −e, much of the time you will find instead the “Bohr
magneton”
eh̄
µB = ≈ 9.274 10−24 J/T (9.55)
2me
9.6. PARTICLES IN MAGNETIC FIELDS 443

used. Here T stands for Tesla, the kg/C-s unit of magnetic field strength.
Please, all of this is serious; this is not a story made up by this book to put
physicists in a bad light. Note that the original formula had four variables in
~ and L,
it: q, m, B, ~b and the three new names they want you to remember are
less than that.
The big question now is: since electrons have spin, build-in angular momen-
tum, do they still act like little magnets even if not going around in a circle?
The answer is yes; there is an additional term in the Hamiltonian due to spin.
Astonishingly, the energy involved pops out of Dirac’s relativistic description of
the electron, {A.92}. The energy that an electron picks up in a magnetic field
due to its inherent spin is:

q ~ ~b
HBS = −ge B·S ge ≈ 2 q = −e (9.56)
2me

(This section uses again S rather than L to indicate spin angular momentum.)
The constant g is called the “g-factor”. Since its value is 2, electron spin pro-
duces twice the magnetic dipole strength as the same amount of orbital angular
momentum would. That is called the “magnetic spin anomaly,” [20, p. 222].
It should be noted that really the g-factor of an electron is about 0.1% larger
than 2 because of interaction with the quantized electromagnetic field ignored
in the Dirac equation. This quantized electromagnetic field, whose particle is
the photon, has a ground state energy that is nonzero even in vacuum, much
like a harmonic oscillator has a nonzero ground state energy. You can think
of it qualitatively as virtual photons popping up and disappearing continuously
according to the energy-time uncertainty ∆E∆t ≈ h̄, allowing particles with
energy ∆E to appear as long as they don’t stay around longer than a very
brief time ∆t. “Quantum electrodynamics” says that to a better approximation
g ≈ 2 + α/π where α = e2 /4πǫ0 h̄c ≈ 1/137 is called the fine structure constant.
This correction to g, due to the possible interaction of the electron with a vir-
tual photon, [6], is called the “anomalous magnetic moment,” [10, p. 273]. (The
fact that physicists have not yet defined potential deviations from the quan-
tum electrodynamics value to be “magnetic spin anomaly anomalous magnetic
moment anomalies” is an anomaly.) The prediction of the g-factor of the elec-
tron is a test for the accuracy of quantum electrodynamics, and so this g-factor
has been measured to exquisite precision. At the time of writing, (2008), the
experimental value is 2.002319304362, to that many correct digits. Quantum
electrodynamics has managed to get things right to more than ten digits by
including more and more, increasingly complex interactions with virtual pho-
tons and virtual electron/positron pairs, [6], one of the greatest achievements
of twentieth century physics.
You might think that the above formula for the energy of an electron in a
444 CHAPTER 9. ELECTROMAGNETISM

magnetic field should also apply to protons and neutrons, since they too are spin
1
2
particles. However, this turns out to be untrue. Protons and neutrons are not
elementary particles, but consist of three “quarks.” Still, for both electron and
proton spin the gyromagnetic ratio can be written as
q
γS = g (9.57)
2m
but while the g-factor of the electron is 2, the measured one for the proton is
5.59.
Do note that due to the much larger mass of the proton, its actual magnetic
dipole moment is much less than that of an electron despite its larger g-factor.
Still, under the right circumstances, like in nuclear magnetic resonance, the
magnetic dipole moment of the proton is crucial despite its relative small size.
For the neutron, the charge is zero, but the magnetic moment is not, which
would make its g-factor infinite! The problem is that the quarks that make up
the neutron do have charge, and so the neutron can interact with a magnetic
field even though its net charge is zero. When the proton mass and charge are
arbitrarily used in the formula, the neutron’s g factor is -3.83. More generally,
nuclear magnetic moments are expressed in terms of the “nuclear magneton”

eh̄
µN = ≈ 5.05078 10−27 J/T (9.58)
2mp

that is based on proton charge and mass.


At the start of this subsection, it was noted that the Hamiltonian for a
charged particle has another term. So, how about it? It is called the “diamag-
netic contribution,” and it is given by

q 2 ³ ~ b ´2
HBD = B × ~r (9.59)
8m
Note that a system, like an atom, minimizes this contribution by staying away
from magnetic fields: it is positive and proportional to B 2 .
The diamagnetic contribution can usually be ignored if there is net orbital or
spin angular momentum. To see why, consider the following numerical values:
eh̄ e2 a20
µB = ≈ 5.788 10−5 eV/T = 6.1565 10−11 eV/T2
2me 8me
The first number gives the magnetic dipole energy, for a quantum of angular
momentum, per Tesla, while the second number gives the diamagnetic energy,
for a Bohr-radius spread around the magnetic axis, per square Tesla.
It follows that it takes about a million Tesla for the diamagnetic energy
to become comparable to the dipole one. Now at the time of this writing,
9.7. STERN-GERLACH APPARATUS [DESCRIPTIVE] 445

(2008), the world record magnet that can operate continuously is right here at
the Florida State University. It produces a field of 45 Tesla, taking in 33 MW
of electricity and 4,000 gallons of cooling water per minute. The world record
magnet that can produce even stronger brief magnetic pulses is also here, and
it produces 90 Tesla, going on 100. (Still stronger magnetic fields are possible
if you allow the magnet to blow itself to smithereens during the fraction of a
second that it operates, but that is so messy.) Obviously, these numbers are
way below a million Tesla. Also note that since atom energies are in electron
volts or more, none of these fields are going to blow an atom apart.

9.7 Stern-Gerlach Apparatus [Descriptive]


A constant magnetic field will exert a torque, but no net force on a magnetic
dipole like an electron; if you think of the dipole as a magnetic north pole and
south pole close together, the magnetic forces on north pole and south pole will
be opposite and produce no net force on the dipole. However, if the magnetic
field strength varies with location, the two forces will be different and a net
force will result.
The Stern-Gerlach apparatus exploits this process by sending a beam of
atoms through a magnetic field with spatial variation, causing the atoms to
deflect upwards or downwards depending on their magnetic dipole strength.
The magnetic dipole strengths of the atoms will be proportional to the relevant
electron angular momenta, (the nucleus can be ignored because of the large
mass in its gyromagnetic ratio), and that will be quantized. So the incoming
beam will split into distinct beams corresponding to the quantized values of the
electron angular momentum.
The experiment was a great step forward in the development of quantum me-
chanics, because there is really no way that classical mechanics can explain the
splitting into separate beams; classical mechanics just has to predict a smeared-
out beam. Angular momentum in classical mechanics can have any value, not
just the values mh̄ of quantum mechanics. Moreover, by capturing one of the
split beams, you have a source of particles all in the same state without uncer-
tainty, to use for other experiments or practical applications such as masers.
Stern and Gerlach used a beam of silver atoms in their experiment, and the
separated beams deposited this silver on a plate. Initially, Gerlach had difficulty
seeing any deposited silver on those plates because the layer was extremely thin.
But fortunately for quantum mechanics, Stern was puffing his usual cheap cigars
when he had a look, and the large amount of sulphur in the smoke was enough
to turn some of the silver into jet-black silver sulfide, making it show clearly.
An irony is that that Stern and Gerlach assumed that that they had verified
Bohr’s orbital momentum. But actually, they had discovered spin. The net
446 CHAPTER 9. ELECTROMAGNETISM

magnetic moment of silver’s inner electrons is zero, and the lone valence electron
is in a 5s orbit with zero orbital angular momentum. It was the spin of the
valence electron that caused the splitting. While spin has half the strength of
orbital angular momentum, its magnetic moment is about the same due to its
g-factor being two rather than one.
To use the Stern Gerlach procedure with charged particles such as lone
electrons, a transverse electric field must be provided to counteract the large
Lorentz force that the magnet imparts on the moving electrons.

9.8 Nuclear Magnetic Resonance


Nuclear magnetic resonance, or NMR, is a valuable tool for examining nuclei, for
probing the structure of molecules, in particular organic ones, and for medical
diagnosis, as MRI. This section will give a basic quantum description of the
idea. Linear algebra will be used.

9.8.1 Description of the method


First demonstrated independently by Bloch and Purcell in 1946, NMR probes
nuclei with net spin, in particular hydrogen nuclei or other nuclei with spin 1/2.
Various common nuclei, like carbon and oxygen do not have net spin; this can
be a blessing since they cannot mess up the signals from the hydrogen nuclei, or
a limitation, depending on how you want to look at it. In any case, if necessary
isotopes such as carbon 13 can be used which do have net spin.
It is not actually the spin, but the associated magnetic dipole moment of the
nucleus that is relevant, for that allows the nuclei to be manipulated by magnetic
fields. First the sample is placed in an extremely strong steady magnetic field.
Typical fields are in terms of Tesla. (A Tesla is about 20,000 times the strength
of the magnetic field of the earth.) In the field, the nucleus has two possible
energy states; a ground state in which the spin component in the direction of
the magnetic field is aligned with it, and an elevated energy state in which the
spin is opposite {A.93}. (Despite the large field strength, the energy difference
between the two states is extremely small compared to the thermal kinetic
energy at room temperature. The number of nuclei in the ground state may
only exceed those in the elevated energy state by say one in 100,000, but that
is still a large absolute number of nuclei in a sample.)
Now perturb the nuclei with a second, much smaller and radio frequency,
magnetic field. If the radio frequency is just right, the excess ground state
nuclei can be lifted out of the lowest energy state, absorbing energy that can
be observed. The “resonance” frequency at which this happens then gives in-
formation about the nuclei. In order to observe the resonance frequency very
9.8. NUCLEAR MAGNETIC RESONANCE 447

accurately, the perturbing rf field must be very weak compared to the primary
steady magnetic field.
In Continuous Wave NMR, the perturbing frequency is varied and the ab-
sorption examined to find the resonance. (Alternatively, the strength of the
primary magnetic field can be varied, that works out to the same thing using
the appropriate formula.)
In Fourier Transform NMR, the perturbation is applied in a brief pulse just
long enough to fully lift the excess nuclei out of the ground state. Then the
decay back towards the original state is observed. An experienced operator can
then learn a great deal about the environment of the nuclei. For example, a
nucleus in a molecule will be shielded a bit from the primary magnetic field by
the rest of the molecule, and that leads to an observable frequency shift. The
amount of the shift gives a clue about the molecular structure at the nucleus,
so information about the molecule. Additionally, neighboring nuclei can cause
resonance frequencies to split into several through their magnetic fields. For
example, a single neighboring perturbing nucleus will cause a resonance fre-
quency to split into two, one for spin up of the neighboring nucleus and one
for spin down. It is another clue about the molecular structure. The time for
the decay back to the original state to occur is another important clue about
the local conditions the nuclei are in, especially in MRI. The details are beyond
this author’s knowledge; the purpose here is only to look at the basic quantum
mechanics behind NMR.

9.8.2 The Hamiltonian


The magnetic fields will be assumed to be of the form
B~ = B0 k̂ + B1 (ı̂ cos ωt − ̂ sin ωt) (9.60)
where B0 is the Tesla-strength primary magnetic field, B1 the very weak per-
turbing field strength, and ω is the frequency of the perturbation.
The component of the magnetic field in the xy-plane, B1 , rotates around
the z-axis at angular velocity ω. Such a rotating magnetic field can be achieved
using a pair of properly phased coils placed along the x and y axes. (In Fourier
Transform NMR, a single perturbation pulse actually contains a range of differ-
ent frequencies ω, and Fourier transforms are used to take them apart.) Since
the apparatus and the wave length of a radio frequency field is very large on
the scale of a nucleus, spatial variations in the magnetic field can be ignored.
Now suppose you place a spin 1/2 nucleus in the center of this magnetic field.
As discussed in section 9.6, a particle with spin will act as a little compass
needle, and its energy will be lowest if it is aligned with the direction of the
ambient magnetic field. In particular, the energy is given by
H = −~µ · B~
448 CHAPTER 9. ELECTROMAGNETISM

where ~µ is called the magnetic dipole strength of the nucleus. This dipole
~b
strength is proportional to its spin angular momentum S:
~b
~µ = γ S
where the constant of proportionality γ is called the gyromagnetic ratio. The
numerical value of the gyromagnetic ratio can be found as
gq
γ=
2m
In case of a hydrogen nucleus, a proton, the mass mp and charge qp = e can be
found in the notations section, and the proton’s experimentally found g-factor
is gp = 5.59.
The bottom line is that you can write the Hamiltonian of the interaction of
the nucleus with the magnetic field in terms of a numerical gyromagnetic ratio
value, spin, and the magnetic field:
~b · B
H = −γ S ~ (9.61)
Now turning to the wave function of the nucleus, it can be written as a
combination of the spin-up and spin-down states,
Ψ = a↑ + b↓,
where ↑ has spin 12 h̄ in the z-direction, along the primary magnetic field, and ↓
has − 12 h̄. Normally, a and b would describe the spatial variations, but spatial
variations are not relevant to the analysis, and a and b can be considered to be
simple numbers.
You can use the concise notations of linear algebra by combining a and b in
a two-component column vector (more precisely, a spinor),
à !
a
Ψ=
b
In those terms, the spin operators become matrices, the so-called Pauli spin
matrices of section 9.1.7,
à ! à ! à !
h̄ 0 1 h̄ 0 −i h̄ 1 0
Sbx = Sby = Sbz = (9.62)
2 1 0 2 i 0 2 0 −1
Substitution of these expressions for the spin, and (9.60) for the magnetic
field into (9.61) gives after cleaning up the final Hamiltonian:
à !
h̄ ω0 ω1 eiωt
H=− ω0 = γB0 ω1 = γB1 (9.63)
2 ω1 e−iωt −ω0
The constants ω0 and ω1 have the dimensions of a frequency; ω0 is called the
“Larmor frequency.” As far as ω1 is concerned, the important thing to remember
is that it is much smaller than the Larmor frequency ω0 because the perturbation
magnetic field is small compared to the primary one.
9.8. NUCLEAR MAGNETIC RESONANCE 449

9.8.3 The unperturbed system


Before looking at the perturbed case, it helps to first look at the unperturbed
solution. If there is just the primary magnetic field affecting the nucleus, with
no radio-frequency perturbation ω1 , the Hamiltonian derived in the previous
subsection simplifies to à !
h̄ ω0 0
H=−
2 0 −ω0
The energy eigenstates are the spin-up state, with energy − 21 h̄ω0 , and the spin-
down state, with energy 12 h̄ω0 .
The difference in energy is in relativistic terms exactly equal to a photon
with the Larmor frequency ω0 . While the treatment of the electromagnetic field
in this discussion will be classical, rather than relativistic, it seems clear that
the Larmor frequency must play more than a superficial role.
The unsteady Schrödinger equation tells you that the wave function evolves
in time like ih̄Ψ̇ = HΨ, so if Ψ = a↑ + b↓,
à ! à !à !
ȧ h̄ ω0 0 a
ih̄ =−
ḃ 2 0 −ω0 b
The solution for the coefficients a and b of the spin-up and -down states is:
a = a0 eiω0 t/2 b = b0 e−iω0 t/2
if a0 and b0 are the values of these coefficients at time zero.
Since |a|2 = |a0 |2 and |b|2 = |b0 |2 at all times, the probabilities of measuring
spin-up or spin-down do not change with time. This was to be expected, since
spin-up and spin-down are energy states for the steady system. To get more
interesting physics, you really need the unsteady perturbation.
But first, to understand the quantum processes better in terms of the ideas of
nonquantum physics, it will be helpful to write the unsteady quantum evolution
in terms of the expectation values of the angular momentum components. The
expectation value of the z-component of angular momentum is
h̄ h̄
hSz i = |a|2 − |b|2
2 2
To more clearly indicate that the value must be in between −h̄/2 and h̄/2,
you can write the magnitude of the coefficients in terms of an angle α, the
“precession angle”,
|a| = |a0 | ≡ cos(α/2) |b| = |b0 | ≡ sin(α/2)
In terms of the so-defined α, you simply have, using the half-angle trig formulae,

hSz i = cos α
2
450 CHAPTER 9. ELECTROMAGNETISM

The expectation values of the angular momenta in the x- and y-directions


can by found as the inner products hΨ|Sbx Ψi and hΨ|Sby Ψi, chapter 3.3.3. Sub-
stituting the representation in terms of spinors and Pauli spin matrices, and
cleaning up using the Euler formula (1.5), you get
h̄ h̄
hSx i = sin α cos(ω0 t + φ) hSy i = − sin α sin(ω0 t + φ)
2 2
where φ is some constant phase angle that is further unimportant.
The first thing that can be seen from these results is that the length of the
expectation angular momentum vector is h̄/2. Next, the component with the
z-axis, the direction of the primary magnetic field, is at all times 21 h̄ cos α. That
implies that the expectation angular momentum vector is under a constant angle
α with the primary magnetic field.

~
B

~ or h~µi
hSi

Figure 9.24: Larmor precession of the expectation spin (or magnetic moment)
vector around the magnetic field.

The component in the x, y-plane is 12 h̄ sin α, and this component rotates


around the z-axis, as shown in figure 9.24, causing the end point of the ex-
pectation angular momentum vector to sweep out a circular path around the
magnetic field B.~ This rotation around the z-axis is called “Larmor precession.”
Since the magnetic dipole moment is proportional to the spin, it traces out the
same conical path.
Caution should be used against attaching too much importance to this clas-
sical picture of a precessing magnet. The expectation angular momentum vector
is not a physically measurable quantity. One glaring inconsistency in the expec-
tation angular momentum vector versus the true angular momentum is that the
9.8. NUCLEAR MAGNETIC RESONANCE 451

square magnitude of the expectation angular momentum vector is h̄2 /4, three
times smaller than the true square magnitude of angular momentum.

9.8.4 Effect of the perturbation


In the presence of the perturbing magnetic field, the unsteady Schrödinger equa-
tion ih̄Ψ̇ = HΨ becomes
à ! à !à !
ȧ h̄ ω0 ω1 eiωt a
ih̄ =− (9.64)
ḃ 2 ω1 e−iωt −ω0 b

where ω0 is the Larmor frequency, ω is the frequency of the perturbation, and


ω1 is a measure of the strength of the perturbation and small compared to ω0 .
The above equations can be solved exactly using standard linear algebra
procedures, though the the algebra is fairly stifling {A.94}. The analysis brings
in an additional quantity that will be called the “resonance factor”
v
u
u ω12
f =t (9.65)
(ω − ω0 )2 + ω12

Note that f has its maximum value, one, at “resonance,” i.e. when the pertur-
bation frequency ω equals the Larmor frequency ω0 .
The analysis finds the coefficients of the spin-up and spin-down states to be:
" Ã µ ¶ µ ¶! µ ¶#
ω1 t ω − ω0 ω1 t ω1 t
a = a0 cos − if sin + b0 if sin eiωt/2 (9.66)
2f ω1 2f 2f
" Ã µ ¶ µ ¶! µ ¶#
ω1 t ω − ω0 ω1 t ω1 t
b = b0 cos + if sin + a0 if sin e−iωt/2 (9.67)
2f ω1 2f 2f
where a0 and b0 are the initial coefficients of the spin-up and spin-down states.
This solution looks pretty forbidding, but it is not that bad in application.
The primary interest is in nuclei that start out in the spin-up ground state, so
you can set |a0 | = 1 and b0 = 0. Also, the primary interest is in the probability
that the nuclei may be found at the elevated energy level, which is
µ ¶
2 2 2 ω1 t
|b| = f sin (9.68)
2f
That is a pretty simple result. When you start out, the nuclei you look at are in
the ground state, so |b|2 is zero, but with time the rf perturbation field increases
the probability of finding the nuclei in the elevated energy state eventually to a
maximum of f 2 when the sine becomes one.
Continuing the perturbation beyond that time is bad news; it decreases the
probability of elevated states again. As figure 9.25 shows, over extended times,
452 CHAPTER 9. ELECTROMAGNETISM

|b|2

0
t

Figure 9.25: Probability of being able to find the nuclei at elevated energy versus
time for a given perturbation frequency ω.

there is a flip-flop between the nuclei being with certainty in the ground state,
and having a probability of being in the elevated state. The frequency at which
the probability oscillates is called the “Rabi flopping frequency”. The author’s
sources differ about the precise definition of this frequency, but the one that
seems to be most logical is ω1 /f .

1
f2
2ω1

0 ω0 ω

Figure 9.26: Maximum probability of finding the nuclei at elevated energy.

Anyway, by keeping up the perturbation for the right time you can raise the
probability of elevated energy to a maximum of f 2 . A plot of f 2 against the
perturbing frequency ω is called the “resonance curve,“ shown in figure 9.26.
For the perturbation to have maximum effect, its frequency ω must equal the
nuclei’s Larmor frequency ω0 . Also, for this frequency to be very accurately
observable, the “spike” in figure 9.26 must be narrow, and since its width is
proportional to ω1 = γB1 , that means the perturbing magnetic field must be
very weak compared to the primary magnetic field.
There are two qualitative ways to understand the need for the frequency
of the perturbation to equal the Larmor frequency. One is geometrical and
classical: as noted in the previous subsection, the expectation magnetic moment
precesses around the primary magnetic field with the Larmor frequency. In
order for the small perturbation field to exert a long-term downward “torque”
on this precessing magnetic moment as in figure 9.27, it must rotate along with
it. If it rotates at any other frequency, the torque will quickly reverse direction
compared to the magnetic moment, and the vector will start going up again.
The other way to look at it is from a relativistic quantum perspective: if the
9.8. NUCLEAR MAGNETIC RESONANCE 453

~
B

Figure 9.27: A perturbing magnetic field, rotating at precisely the Larmor fre-
quency, causes the expectation spin vector to come cascading down out of the
ground state.

magnetic field frequency equals the Larmor frequency, its photons have exactly
the energy required to lift the nuclei from the ground state to the excited state.
At the Larmor frequency, it would naively seem that the optimum time
to maintain the perturbation is until the expectation spin vector is vertically
down; then the nucleus is in the exited energy state with certainty. If you
then allow nature the time to probe its state, every nucleus will be found to
be in the excited state, and will emit a photon. (If not messed up by some
collision or whatever, little in life is ideal, is it?) However, according to actual
descriptions of NMR devices, it is better to stop the perturbation earlier, when
the expectation spin vector has become horizontal, rather than fully down. In
that case, nature will only find half the nuclei in the excited energy state after
the perturbation, presumably decreasing the radiation yield by a factor 2. The
classical explanation that is given is that when the (expectation) spin vector
is precessing at the Larmor frequency in the horizontal plane, the radiation is
most easily detected by the coils located in that same plane. And that closes
this discussion.
Chapter 10

Some Additional Topics

Below are some additional topics that are intended to make the coverage of this
book fairly complete. They may not be that important to most engineers, (but
that depends on what you work on), or they may require a much more advanced
description than can be given in a single book.

10.1 Perturbation Theory


Most of the time in quantum mechanics, exact solution of the Hamiltonian
eigenvalue problem of interest is not possible. To deal with that, approximations
are made.
Perturbation theory can be used when the Hamiltonian H consists of two
parts H0 and H1 , where the problem for H0 can be solved and where H1 is
small. The idea is then to adjust the found solutions for the “unperturbed
Hamiltonian” H0 so that they become approximately correct for H0 + H1 .

10.1.1 Basic perturbation theory


To use perturbation theory, the eigenfunctions and eigenvalues of the unper-
turbed Hamiltonian H0 must be known. These eigenfunctions will here be
indicated as ψ~n,0 and the corresponding eigenvalues by E~n,0 . Note the use of
the generic ~n to indicate the quantum numbers of the eigenfunctions. If the
basic system is an hydrogen atom, as is often the case in textbook examples,
and spin is unimportant, ~n would likely stand for the set of quantum numbers
n, l, and m. But for a three-dimensional harmonic oscillator, ~n might stand for
the quantum numbers nx , ny , and nz . In a three-dimensional problem with one
spinless particle, it takes three quantum numbers to describe an energy eigen-
function. However, which three depends on the problem and your approach to
it. The additional subscript 0 in ψ~n,0 and E~n,0 indicates that they ignore the

455
456 CHAPTER 10. SOME ADDITIONAL TOPICS

perturbation Hamiltonian H1 . They are called the unperturbed wave functions


and energies.
The key to perturbation theory are the “Hamiltonian perturbation coeffi-
cients” defined as
H~n~n,1 ≡ hψ~n,0 |H1 ψ~n,0 i (10.1)

If you can evaluate these for every pair of energy eigenfunctions, you should
be OK. Note that evaluating inner products is just summation or integra-
tion; it is generally a lot simpler than trying to solve the eigenvalue problem
(H0 + H1 ) ψ = Eψ.
In the application of perturbation theory, the idea is to pick one unperturbed
eigenfunction ψ~n,0 of H0 of interest and then correct it to account for H1 , and
especially correct its energy E~n,0 . Caution! If the energy E~n,0 is degenerate,
i.e. there is more than one unperturbed eigenfunction ψ~n,0 of H0 with that
energy, you must use a “good” eigenfunction to correct the energy. How to do
that will be discussed in subsection 10.1.3.
For now just assume that the energy is not degenerate or that you picked a
good eigenfunction ψ~n,0 . Then a first correction to the energy E~n,0 to account
for the perturbation H1 is very simple, {A.95}; just add the corresponding
Hamiltonian perturbation coefficient:

E~n = E~n,0 + H~n~n,1 + . . . (10.2)

This is a quite user-friendly result, because it only involves the selected energy
eigenfunction ψ~n,0 . The other energy eigenfunctions are not involved. In a
numerical solution, you might only have computed one state, say the ground
state of H0 . Then you can use this result to correct the ground state energy for
a perturbation even if you do not have data about any other energy states of
H0 .
Unfortunately, it does happen quite a lot that the above correction H~n~n,1 is
zero because of some symmetry or the other. Or it may simply not be accurate
enough. In that case, to find the energy change you have to use what is called
“second order perturbation theory:”

X |H~n~n,1 |2
E~n = E~n,0 + H~n~n,1 − + ... (10.3)
E~n,0 6=E~n,0 E~n,0 − E~n,0

Now all eigenfunctions of H0 will be needed, which makes second order theory a
lot nastier. Then again, even if the “first order” correction H~n~n,1 to the energy is
nonzero, the second order formula will likely give a much more accurate result.
10.1. PERTURBATION THEORY 457

Sometimes you may also be interested in what happens to the energy eigen-
functions, not just the energy eigenvalues. The corresponding formula is

X H~n~n,1 X
ψ~n = ψ~n,0 − ψ~n,0 + c~n ψ~n,0 + . . . (10.4)
E~n,0 6=E~n,0 E~n,0 − E~n,0 E~n,0 =E~n,0
n6=~
~ n

That is the first order result. The second sum is zero if the problem is not
degenerate. Otherwise its coefficients c~n are determined by considerations found
in note {A.95}.
In some cases, instead of using second order theory as above, it may be
simpler to compute the first order wave function perturbation and the second
order energy change from

(H0 − E~n,0 )ψ~n,1 = −(H1 − E~n,1 )ψ~n,0 E~n,2 = hψ~n,0 |(H1 − E~n,1 )ψ~n,1 i (10.5)

Eigenfunction ψ~n,0 must be good. The good news is that this does not require
all the unperturbed eigenfunctions. The bad news is that it requires solution
of a nontrivial equation involving the unperturbed Hamiltonian instead of just
integration. It may be the best way to proceed for a perturbation of a numerical
solution.
One application of perturbation theory is the “Hellmann-Feynman theo-
rem.” Here the perturbation Hamiltonian is an infinitesimal change ∂H in the
unperturbed Hamiltonian caused by an infinitesimal change in some parameter
that it depends on. If the parameter is called λ, perturbation theory says that
the first order energy change is
* ¯ +
∂E~n ¯ ∂H
= ψ~n,0 ¯¯ ψ~n,0 (10.6)
∂λ ∂λ

when divided by the change in parameter ∂λ. If you can figure out the inner
product, you can figure out the change in energy. But more important is the re-
verse: if you can find the derivative of the energy with respect to the parameter,
you have the inner product. For example, the Hellmann-Feynman theorem is
helpful for finding the expectation value of 1/r2 for the hydrogen atom, a nasty
problem, {A.99}. Of course, always make sure the eigenfunction ψ~n,0 is a good
one for the derivative of the Hamiltonian.

10.1.2 Ionization energy of helium


One prominent deficiency in the approximate analysis of the heavier atoms in
chapter 4.9 was the poor ionization energy that it gave for helium. The purpose
458 CHAPTER 10. SOME ADDITIONAL TOPICS

of this example is to derive a much more reasonable value using perturbation


theory.
Exactly speaking, the ionization energy is the difference between the energy
of the helium atom with both its electrons in the ground state and the helium
ion with its second electron removed. Now the energy of the helium ion with
electron 2 removed is easy; the Hamiltonian for the remaining electron 1 is

h̄2 2 e2 1
HHe ion = − ∇1 − 2
2me 4πǫ0 r1
where the first term represents the kinetic energy of the electron and the second
its attraction to the two-proton nucleus. The helium nucleus normally also
contains two neutrons, but they do not attract the electron.
This Hamiltonian is exactly the same as the one for the hydrogen atom in
chapter 3.2, except that it has 2e2 where the hydrogen one, with just one proton
in its nucleus, has e2 . So the solution for the helium ion is simple: just take the
hydrogen solution, and everywhere where there is an e2 in that solution, replace
it by 2e2 . In particular, the Bohr radius a for the helium ion is half the Bohr
radius a0 for hydrogen,
4πǫ0 h̄2
a= = 21 a0
me 2e2
and so its energy and wave function become

h̄2 1 −r/a
Egs,ion = − = 4E1 ψgs,ion (~r) = √ e
2me a2 πa3
where E1 = −13.6 eV is the energy of the hydrogen atom.
It is interesting to see that the helium ion has four times the energy of the
hydrogen atom. The reasons for this much higher energy are both that the
nucleus is twice as strong, and that the electron is twice as close to it: the
Bohr radius is half the size. More generally, in heavy atoms the electrons that
are poorly shielded from the nucleus, which means the inner electrons, have
energies that scale with the square of the nuclear strength. For such electrons,
relativistic effects are much more important than they are for the electron in a
hydrogen atom.
The neutral helium atom is not by far as easy to analyze as the ion. Its
Hamiltonian is, from (4.32):

h̄2 2 e2 1 h̄2 2 e2 1 e2 1
HHe = − ∇1 − 2 − ∇2 − 2 +
2me 4πǫ0 r1 2me 4πǫ0 r2 4πǫ0 |~r2 − ~r1 |

The first two terms are the kinetic energy and nuclear attraction of electron
1, and the next two the same for electron 2. The final term is the electron
10.1. PERTURBATION THEORY 459

to electron repulsion, the curse of quantum mechanics. This final term is the
reason that the ground state of helium cannot be found analytically.
Note however that the repulsion term is qualitatively similar to the nuclear
attraction terms, except that there are four of these nuclear attraction terms
versus a single repulsion term. So maybe then, it may work to treat the repulsion
term as a small perturbation, call it H1 , to the Hamiltonian H0 given by the
first four terms? Of course, if you ask mathematicians whether 25% is a small
amount, they are going to vehemently deny it; but then, so they would for any
amount if there is no limit process involved, so just don’t ask them, OK?
The solution of the eigenvalue problem H0 ψ = Eψ is simple: since the
electrons do not interact with this Hamiltonian, the ground state wave function
is the product of the ground state wave functions for the individual electrons,
and the energy is the sum of their energies. And the wave functions and energies
for the separate electrons are given by the solution for the ion above, so
1 −(r1 +r2 )/a
ψgs,0 = e Egs,0 = 8E1
πa3
According to this result, the energy of the atom is 8E1 while the ion had
4E1 , so the ionization energy would be 4|E1 |, or 54.4 eV. Since the experimental
value is 24.6 eV, this is no better than the 13.6 eV section 4.9 came up with.
To get a better ionization energy, try perturbation theory. According to first
order perturbation theory, a better value for the energy of the hydrogen atom
should be
Egs = Egs,0 + hψgs,0 |H1 ψgs,0 i
or substituting in from above,
¿ ¯ À
e2 1 −(r1 +r2 )/a ¯¯ 1 1 −(r1 +r2 )/a
Egs = 8E1 + 3
e ¯ e
4πǫ0 πa |~r2 − ~r1 | πa3
The inner product of the final term can be written out as

e2 1 Z Z
e−2(r1 +r2 )/a 3 3
d ~r1 d ~r2
4πǫ0 π 2 a6 all ~r1 all ~r2 |~r2 − ~r1 |
This integral can be done analytically. Try it, if you are so inclined; integrate
d3~r1 first, using spherical coordinates with
q ~r 2 as their axis and doing the az-
imuthal and polar angles first. Be careful, (r1 − r2 )2 = |r1 − r2 |, not r1 − r2 , so
you will have to integrate r1 < r2 and r1 > r2 separately in the final integration
over dr1 . Then integrate d3~r2 .
The result of the integration is
¿ ¯ À
e2 1 −(r1 +r2 )/a ¯¯ 1 1 −(r1 +r2 )/a e2 5 5
3
e ¯ 3
e = = |E1 |
4πǫ0 πa |~r2 − ~r1 | πa 4πǫ0 8a 2
460 CHAPTER 10. SOME ADDITIONAL TOPICS

Therefor, the helium atom energy increases by 2.5|E1 | due to the electron re-
pulsion, and with it, the ionization energy decreases to 1.5|E1 |, or 20.4 eV. It is
not 24.6 eV, but it is clearly much more reasonable than 54 or 13.6 eV were.
The second order perturbation result should give a much more accurate
result still. However, if you did the integral above, you may feel little inclination
to try the ones involving all possible products of hydrogen energy eigenfunctions.
Instead, the result can be improved using a variational approach, like the
ones that were used earlier for the hydrogen molecule and molecular ion, and
this requires almost no additional work. The idea is to accept the hint from
perturbation theory that the wave function of helium can be approximated
as ψa (~r1 )ψa (~r2 ) where ψa is the hydrogen ground state wave function using a
modified Bohr radius a instead of a0 :
1 −r/a
ψgs = ψa (~r1 )ψa (~r2 ) ψa (~r) ≡ √ e
πa3
However, instead of accepting the perturbation theory result that a should be
half the normal Bohr radius a0 , let a be optimized to make the expectation
energy for the ground state
Egs = hψgs |HHe ψgs i
as small as possible. This will produce the most accurate ground state energy
possible for a ground state wave function of this form, guaranteed no worse than
assuming that a = 21 a0 , and probably better.
No new integrals need to be done to evaluate the inner product above. In-
stead, noting that for the hydrogen atom according to the virial theorem of
chapter 5.1.4 the expectation kinetic energy equals −E1 = h̄2 /2me a20 and the
potential energy equals 2E1 , two of the needed integrals can be inferred from
the hydrogen solution: 3.2,
¿ ¯ À
¯
¯
h̄2 2 h̄2
ψa ¯ − ∇ ψa =
2me 2me a2
¿ ¯ À ¿ ¯ À
e2 ¯1
¯
h̄2 ¯1
¯
h̄2 1
− ψa ¯ ψa = − ψa ¯ ψa = −
4πǫ0 r m e a0 r m e a0 a
and this subsection added
¿ ¯ À
¯ 1 5
ψa ψa ¯¯ ψa ψa =
|~r2 − ~r1 | 8a
Using these results with the helium Hamiltonian, the expectation energy of the
helium atom can be written out to be
h̄2 27 h̄2
hψa ψa |HHe ψa ψa i = −
m e a2 8 m e a0 a
10.1. PERTURBATION THEORY 461

16
Setting the derivative with respect to a to zero locates the minimum at a = 27 a0 ,
1 6 2
rather than 2 a0 . Then the corresponding expectation energy is −3 h̄ /2 me a20 ,
8

or 36 E1 /27 . Putting in the numbers, the ionization energy is now found as 23.1
eV, in quite good agreement with the experimental 24.6 eV.

10.1.3 Degenerate perturbation theory


Energy eigenvalues are degenerate if there is more than one independent eigen-
function with that energy. Now, if you try to use perturbation theory to correct
a degenerate eigenvalue of a Hamiltonian H0 for a perturbation H1 , there may
be a problem. Assume that there are d > 1 independent eigenfunctions with
energy E~n,0 and that they are numbered as
ψ~n1 ,0 , ψ~n2 ,0 , . . . , ψ~nd ,0
Then as far as H0 is concerned, any combination
ψ~n,0 = c1 ψ~n1 ,0 + c2 ψ~n2 ,0 + . . . + cd ψ~nd ,0
with arbitrary coefficients c1 , c2 , . . . , cd , (not all zero, of course), is just as good
an eigenfunction with energy E~n,0 as any other.
Unfortunately, the full Hamiltonian H0 + H1 is not likely to agree with H0
about that. As far as the full Hamiltonian is concerned, normally only very
specific combinations are acceptable, the “good” eigenfunctions. It is said that
the perturbation H1 “breaks the degeneracy” of the energy eigenvalue. The
single energy eigenvalue splits into several eigenvalues of different energy. Only
good combinations will show up these changed energies; the bad ones will pick
up uncertainty in energy that hides the effect of the perturbation.
The various ways of ensuring good eigenfunctions are illustrated in the fol-
lowing subsections for example perturbations of the energy levels of the hydro-
gen atom. Recall that the unperturbed energy eigenfunctions of the hydrogen
atom electron, as derived in chapter 3.2, and also including spin, are given as
ψnlm ↑ and ψnlm ↓. They are highly degenerate: all the eigenfunctions with the
same value of n have the same energy En , regardless of what is the value of the
azimuthal quantum number 0 ≤ l ≤ n − 1 corresponding to the square orbital
angular momentum L2 = l(l + 1)h̄2 ; regardless of what is the magnetic quantum
number |m| ≤ l corresponding to the orbital angular momentum Lz = mh̄ in
the z-direction; and regardless of what is the spin quantum number ms = ± 21
corresponding to the spin angular momentum ms h̄ in the z-direction. In par-
ticular, the ground state energy level E1 is two-fold degenerate, it is the same
for both ψ100 ↑, i.e. ms = 21 and ψ100 ↓, ms = − 21 . The next energy level E2 is
eight-fold degenerate, it is the same for ψ200 l, ψ211 l, ψ210 l, and ψ21−1 l, and so
on for higher values of n.
There are two important rules to identify the good eigenfunctions, {A.95}:
462 CHAPTER 10. SOME ADDITIONAL TOPICS

1. Look for good quantum numbers. The quantum numbers that make the
energy eigenfunctions of the unperturbed Hamiltonian H0 unique corre-
spond to the eigenvalues of additional operators besides the Hamiltonian.
If the perturbation Hamiltonian H1 commutes with one of these additional
operators, the corresponding quantum number is good. You do not need
to combine eigenfunctions with different values of that quantum number.
In particular, if the perturbation Hamiltonian commutes with all addi-
tional operators that make the eigenfunctions of H0 unique, stop worrying:
every eigenfunction is good already.
For example, for the usual hydrogen energy eigenfunctions ψnlm l, the
quantum numbers l, m, and ms make the eigenfunctions at a given unper-
turbed energy level n unique. They correspond to the operators L b 2, L
b ,
z
and Sbz . If the perturbation Hamiltonian H1 commutes with any one of
these operators, the corresponding quantum number is good. If the per-
turbation commutes with all three, all eigenfunctions are good already.
2. Even if some quantum numbers are bad because the perturbation does
not commute with that operator, eigenfunctions are still good if there are
no other eigenfunctions with the same unperturbed energy and the same
good quantum numbers.
Otherwise linear algebra is required. For each set of energy eigenfunctions
ψ~n1 ,0 , ψ~n2 ,0 , . . .
with the same unperturbed energy and the same good quantum num-
bers, but different bad ones, form the matrix of Hamiltonian perturbation
coefficients  
hψ~n1 ,0 |H1 ψ~n1 ,0 i hψ~n1 ,0 |H1 ψ~n2 ,0 i · · ·
 
 hψ~n2 ,0 |H1 ψ~n1 ,0 i hψ~n2 ,0 |H1 ψ~n2 ,0 i · · · 
 
.. .. ...
. .
The eigenvalues of this matrix are the first order energy corrections. Also,
the coefficients c1 , c2 , . . . of each good eigenfunction
c1 ψ~n1 ,0 + c2 ψ~n2 ,0 + . . .
must be an eigenvector of the matrix.
Unfortunately, if the eigenvalues of this matrix are not all different, the
eigenvectors are not unique, so you remain unsure about what are the
good eigenfunctions. In that case, if the second order energy corrections
are needed, the detailed analysis of note {A.95} will need to be followed.
If you are not familiar with linear algebra at all, in all cases mentioned
here the matrices are just two by two, and you can find that solution
spelled out in the notations under “eigenvector.”
10.1. PERTURBATION THEORY 463

The following, related, practical observation can also be made:

Hamiltonian perturbation coefficients can only be nonzero if all the


good quantum numbers are the same.

10.1.4 The Zeeman effect


If you put an hydrogen atom in an external magnetic field B ~ ext , the energy levels
of the electron change. That is called the “Zeeman effect.”
If for simplicity a coordinate system is used with its z-axis aligned with the
magnetic field, then according to chapter 9.6, the Hamiltonian of the hydrogen
atom acquires an additional term
e ³ ´
H1 = Bext Lb + 2S
b (10.7)
z z
2me
beyond the basic hydrogen atom Hamiltonian H0 of chapter 3.2.1. Qualitatively,
it expresses that a spinning charged particle is equivalent to a tiny electromag-
net, and a magnet wants to align itself with a magnetic field, just like a compass
needle aligns itself with the magnetic field of earth.
For this perturbation, the ψnml l energy eigenfunctions are already good ones,
because H1 commutes with all of L b 2, L
b and S b . So, according to perturbation
z z
theory, the energy eigenvalues of an hydrogen atom in a magnetic field are
approximately
e
En + hψnml l|H1 |ψnml li = En + Bext (m + 2ms )h̄
2me
Actually, this is not approximate at all; it is the exact eigenvalue of H0 + H1
corresponding to the exact eigenfunction ψnml l.
The Zeeman effect can be seen in an experimental spectrum. Consider first
the ground state. If there is no electromagnetic field, the two ground states ψ100 ↑
and ψ100 ↓ would have exactly the same energy. Therefor, in an experimental
spectrum, they would show up as a single line. But with the magnetic field, the
two energy levels are different,
eh̄ eh̄
E100↓ = E1 − Bext E100↑ = E1 + Bext E1 = −13.6 eV
2me 2me
so the single line splits into two! Do note that the energy change due to even an
extremely strong magnetic field of 100 Tesla is only 0.006 eV or so, chapter 9.6,
so it is not like the spectrum would become irrecognizable. The single spectral
line of the eight ψ2lm l “L” shell states will similarly split in five closely spaced
but separate lines, corresponding to the five possible values −2,−1, 0, 1 and 2
for the factor m + 2ms above.
464 CHAPTER 10. SOME ADDITIONAL TOPICS

Some disclaimers should be given here. First of all, the 2 in m + 2ms is only
equal to 2 up to about 0.1% accuracy. More importantly, even in the absence
of a magnetic field, the energy levels at a given value of n do not really form a
single line in the spectrum if you look closely enough. There are small errors in
the solution of chapter 3.2 due to relativistic effects, and so the theoretical lines
are already split. That is discussed in subsection 10.1.6. The description given
above is a good one for the “strong” Zeeman effect, in which the magnetic field
is strong enough to swamp the relativistic errors.

10.1.5 The Stark effect


If an hydrogen atom is placed in an external electric field E ~ ext instead of the
magnetic one of the previous subsection, its energy levels will change too. That
is called the “Stark effect.” Of course a Zeeman, Dutch for sea-man, would be
most interested in magnetic fields. A Stark, maybe in a spark? (Apologies.)
If the z-axis is taken in the direction of the electric field, the contribution of
the electric field to the Hamiltonian is given by:

H1 = eEext z (10.8)

It is much like the potential energy mgh of gravity, with the electron charge e
taking the place of the mass m, Eext that of the gravity strength g, and z that
of the height h.
Since the typical magnitude of z is of the order of a Bohr radius a0 , you
would expect that the energy levels will change due to the electric field by an
amount of rough size eEext a0 . A strong laboratory electric field might have
eEext a0 of the order of 0.0005 eV, [10, p. 339]. That is really small compared to
the typical electron energy levels.
And additionally, it turns out that for many eigenfunctions, including the
ground state, the first order correction to the energy is zero. To get the energy
change in that case, you need to compute the second order term, which is a
pain. And that term will be much smaller still than even eEext a0 for reasonable
field strengths.
Now first suppose that you ignore the warnings on good eigenfunctions, and
just compute the energy changes using the inner product hψnlm l|H1 ψnlm li. You
will then find that this inner product is zero for whatever energy eigenfunction
you take:
hψnlm l|eEext zψnlm li = 0 for all n, l, m, and ms
The reason is that negative z-values integrate away against positive ones. (The
inner products are integrals of z times |ψnlm |2 , and |ψnlm |2 is the same at opposite
sides of the nucleus while z changes sign, so the contributions of opposite sides
to the inner product pairwise cancel.)
10.1. PERTURBATION THEORY 465

So, since all first order energy changes that you compute are zero, you would
naturally conclude that to first order approximation none of the energy levels of
a hydrogen atom changes due to the electric field. But that conclusion is wrong
for anything but the ground state energy. And the reason it is wrong is because
the good eigenfunctions have not been used.
Consider the operators L b 2, L
b , and S that make the energy eigenfunctions
z z
ψnlm l unique. If H1 = eEext z commuted with them all, the ψnlm l would be
good eigenfunctions. Unfortunately, while z commutes with L b and S , it does
z z
not commute with L b 2 , see chapter 3.4.4. The quantum number l is bad.

Still, the two states ψ100 l with the ground state energy are good states,
because there are no states with the same energy and a different value of the
bad quantum number l. Really, spin has nothing to do with the Stark problem.
If you want, you can find the purely spatial energy eigenfunctions first, then
for every spatial eigenfunction, there will be one like that with spin up and one
with spin down. In any case, since the two eigenfunctions ψ100 l are both good,
the ground state energy does indeed not change to first order.
But now consider the eight-fold degenerate n = 2 energy level. Each of
the four eigenfunctions ψ211 l and ψ21−1 l is a good one because for each of
them, there is no other n = 2 eigenfunction with a different value of the bad
quantum number l. The energies corresponding to these good eigenfunctions
too do indeed not change to first order.
However, the remaining two n = 2 spin-up states ψ200 ↑ and ψ210 ↑ have
different values for the bad quantum number l, and they have the same values
m = 0 and ms = 12 for the good quantum numbers of orbital and spin z-momen-
tum. These eigenfunctions are bad and will have to be combined to produce
good ones. And similarly the remaining two spin-down states ψ200 ↓ and ψ210 ↓
are bad and will have to be combined.
It suffices to just analyze the spin up states, because the spin down ones
go exactly the same way. The coefficients c1 and c2 of the good combinations
c1 ψ200 ↑ + c2 ψ210 ↑ must be eigenvectors of the matrix
à !
hψ200 ↑|H1 ψ200 ↑i hψ200 ↑|H1 ψ210 ↑i
H1 = eEext z
hψ210 ↑|H1 ψ200 ↑i hψ210 ↑|H1 ψ210 ↑i

The “diagonal” elements of this matrix (top left corner and bottom right corner)
are zero because of cancellation of negative and positive z-values as discussed
above. And the top right and bottom left elements are complex conjugates,
(1.16), so only one of them needs to be actually computed. And the spin part
of the inner product produces one and can therefor be ignored. What is left is
a matter of finding the two spatial eigenfunctions involved according to (3.18),
looking up the spherical harmonics in table 3.1 and the radial functions in table
466 CHAPTER 10. SOME ADDITIONAL TOPICS

3.2, and integrating it all against eEext z. The resulting matrix is


à !
0 −3eEext a0
−3eEext a0 0

The eigenvectors of this matrix are simple enough to guess; they have either
equal or opposite coefficients c1 and c2 :
à !  q1   q
1

0 −3eEext a0  q 2  = −3eEext a0  q2 
−3eEext a0 0 1 1
2 2
à ! q 1
  q
1

0 −3eEext a0  q2  = 3eEext a0  q2 
−3eEext a0 0 − 1
− 1
2 2

If you want to check these expressions, note that the product of a matrix times
a vector is found by taking dot products between q the rowsq of the matrix and
the vector. It follows that the good combination 2 ψ200 ↑ + 12 ψ210 ↑ has a first
1
q q
order energy change −3eEext a0 , and the good combination 12 ψ200 ↑ − 12 ψ210 ↑
has +3eEext a0 . The same applies for the spin down states. It follows that to
first order the n = 2 level splits into three, with energies E2 − 3eEext a0 , E2 , and
E2 +3eEext a0 , where the value E2 applies to the eigenfunctions ψ211 l and ψ21−1 l
that were already good. The conclusion, based on the wrong eigenfunctions, that
the energy levels do not change was all wrong.
Remarkably, the good combinations of ψ200 and ψ210 are the “sp” hybrids
of carbon fame, as described in section 4.11.4. Note from figure 4.11 in that
section that these hybrids do not have the same magnitude at opposite sides of
the nucleus. They have an intrinsic “electric dipole moment,” with the charge
shifted towards one side of the atom, and the electron then wants to align this
dipole moment with the ambient electric field. That is much like in Zeeman
splitting, where electron wants to align its orbital and spin magnetic dipole
moments with the ambient magnetic field.
The crucial thing to take away from all this is: always, always, check whether
the eigenfunction is good before applying perturbation theory.
It is obviously somewhat disappointing that perturbation theory did not give
any information about the energy change of the ground state beyond the fact
that it is second order, i.e. very small compared to eEext a0 . You would like to
know approximately what it is, not just that it is very small. Of course, now that
it is established that ψ100 ↑ is a good state with m = 0 and ms = 21 , you could
think about evaluating the second order energy change (10.3), by integrating
hψ100 ↑|eEext zψnl0 ↑i for all values of n and l. But after refreshing your memory
about the analytical expression (A.30) for the ψnlm , you might think again.
10.1. PERTURBATION THEORY 467

It is however possible to find the perturbation in the wave function from the
alternate approach (10.5), {A.96}. In that way the second order ground state
energy is found to be

3eEext a0
E100 = E1 − 3eEext a0 E1 = −13.6 eV
8|E1 |

Note that the atom likes an electric field: it lowers its ground state energy. Also
note that the energy change is indeed second order; it is proportional to the
square of the electric field strength. You can think of the attraction of the atom
to the electric field as a two-stage process: first the electric field polarizes the
atom by distorting its initially symmetric charge distribution. Then it interacts
with this polarized atom in much the same way that it interacts with the sp
hybrids. But since the polarization is now only proportional to the field strength,
the net energy drop is proportional to the square of the field strength.
Finally, note that the typical value of 0.0005 eV or so for eEext a0 quoted
earlier is very small compared to the about 100 eV for 8|E1 |, making the fraction
in the expression above very small. So, indeed the second order change in the
ground state energy E1 is much smaller than the first order energy changes
±3eEext a0 in the E2 energy level.
A weird prediction of quantum mechanics is that the electron will eventu-
ally escape from the atom, leaving it ionized. The reason is that the potential
is linear in z, so if the electron goes out far enough in the z-direction, it will
eventually encounter potential energies that are lower than the one it has in the
atom. Of course, to get at such large values of z, the electron must pass posi-
tions where the required energy far exceeds the −13.6 eV it has available, and
that is impossible for a classical particle. However, in quantum mechanics the
position of the electron is uncertain, and the electron does have some miniscule
chance of “tunneling out” of the atom through the energy barrier, chapter 5.7.2.
Realistically, though, for even strong experimental fields like the one mentioned
above, the “life time” of the electron in the atom before it has a decent chance
of being found outside it far exceeds the age of the universe.

10.1.6 The hydrogen atom fine structure


According to the description of the hydrogen atom given in chapter 3.2, all
energy eigenfunctions ψnlm l with the same value of n have the same energy
En , and should show up as a single line in an experimental line spectrum. But
actually, when these spectra are examined very precisely, the En energy levels
for a given value of n are found to consist of several closely spaced lines, rather
than a single one. That is called the “hydrogen atom fine structure.” It means
that eigenfunctions that all should have exactly the same energy, don’t.
468 CHAPTER 10. SOME ADDITIONAL TOPICS

To explain why, the solution of chapter 3.2 must be corrected for a variety of
relativistic effects. Before doing so, it is helpful to express the non-relativistic
energy levels of that chapter in terms of the “rest mass energy” me c2 of the
electron, as follows:
α2 e2 1
En = − me c2 where α = ≈ (10.9)
2n2 4πǫ0 h̄c 137
The constant α is called the “fine structure constant.” It combines the constants
e2 /4πǫ0 from electromagnetism, h̄ from quantum mechanics, and the speed of
light c from relativity into one nondimensional number. It is without doubt the
single most important number in all of physics, [6].
Nobody knows why it has the value that it has. Still, obviously it is a
measurable value, so, following the stated ideas of quantum mechanics, maybe
the universe “measured” this value during its early formation by a process that
we may never understand, (since we do not have other measured values for α to
deduce any properties of that process from.) If you have a demonstrably better
explanation, Sweden awaits you. In any case, for engineering purposes it is a
small number, less than 1%. That makes the hydrogen energy levels really small
compared to the rest mass energy of the electron, because they are proportional
to the square of α, which is as small as 0.005%. In simple terms, the electron
in hydrogen stays well clear of the speed of light.
And that in turn means that the relativistic errors in the hydrogen energy
levels are small. Still, even small errors can sometimes be very important. The
required corrections are listed below in order of decreasing magnitude.
• Fine structure.
The electron should really be described relativistically using the Dirac
equation instead of classically. In classical terms, that will introduce three
corrections to the energy levels:
– Einstein’s relativistic correction of the classical kinetic energy p2 /2me
of the electron.
– “Spin-orbit interaction”, due to the fact that the spin of the moving
electron changes the energy levels. The spin of the electron makes it
act like a little electromagnet. It can be seen from classical electro-
dynamics that a moving magnet will interact with the electric field
of the nucleus, and that changes the energy levels.
– There is a third correction for states of zero angular momentum, the
Darwin term. It is a crude fix for the fundamental problem that
the relativistic wave function is not just a modified classical one, but
also involves interaction with the anti-particle of the electron, the
positron.
10.1. PERTURBATION THEORY 469

Fortunately, all three of these effects are very small; they are smaller than
the uncorrected energy levels by a factor of order α2 , and the error they
introduce is on the order of 0.001%. So the “exact” solution of chapter
3.2 is, by engineering standards, pretty exact after all.
• Lamb shift. Relativistically, the electron is affected by virtual photons
and virtual electron-positron pairs. It adds a correction of relative magni-
tude α3 to the energy levels, one or two orders of magnitude smaller still
than the fine structure corrections. To understand the correction properly
requires quantum electrodynamics.
• Hyperfine splitting. Like the electron, the proton acts as a little elec-
tromagnet too. Therefor the energy depends on how it aligns with the
magnetic field generated by the electron. This effect is a factor me /mp
smaller still than the fine structure corrections, making the associated
energy changes about two orders of magnitude smaller.
Hyperfine splitting couples the spins of proton and electron, and in the
ground state, they combine in the singlet state. A slightly higher energy
level occurs when they are in a spin-one triplet state; transitions between
these states radiate very low energy photons with a wave length of 21 cm.
This is the source of the “21 centimeter line” or “hydrogen line” radiation
that is of great importance in cosmology. For example, it has been used
to analyze the spiral arms of the galaxy, and the hope at the time of this
writing is that it can shed light on the so called “dark ages” that the
universe went through. The transition is highly forbidden in the sense of
chapter 5.2, and takes on the order of 10 million years, but that is a small
time on the scale of the universe.
The message to take away from that is that even errors in the ground state
energy of hydrogen that are two million times smaller than the energy itself
can be of critical importance under the right conditions.
The following subsubsections discuss each correction in more detail.

Fine structure
From the Dirac equation, it can be seen that three terms need to be added to
the nonrelativistic Hamiltonian of chapter 3.2 to correct the energy levels for
relativistic effects. The three terms are worked out in note A.97. But that
mathematics really provides very little insight. It is much more instructive to
try to understand the corrections from a more physical point of view.
The first term is relatively easy to understand. Consider Einstein’s famous
relation E = mc2 , where E is energy, m mass, and c the speed of light. Ac-
cording to this relation, the kinetic energy of the electron is not 12 me v 2 , with v
470 CHAPTER 10. SOME ADDITIONAL TOPICS

the velocity, as Newtonian physics says. Instead it is the difference between the
energy me,v c2 based on the mass me,v of the electron in motion and the energy
me c2 based on the mass me of the electron at rest. In terms of momentum
p = me,v v, {A.4}, v
u
2t
u p2
T = me c 1+ − me c2 (10.10)
m2e c2
Since the speed of light is large compared to the typical speed of the electron,
the square root can be expanded in a Taylor series, [15, 22.12], to give:

p2 p4
T ≈ − + ...
2me 8m3e c2
The first term corresponds to the kinetic energy operator used in the non-
relativistic quantum solution of chapter 3.2. (It may be noted that the rel-
ativistic momentum p~ is based on the moving mass of the electron, not its
rest mass. It is this relativistic momentum that corresponds to the operator
~pb = h̄∇/i. So the Hamiltonian used in chapter 3.2 was a bit relativistic already,
because in replacing p~ by h̄∇/i, it used the relativistic expression.) The second
term in the Taylor series expansion above is the first of the corrections needed
to fix up the hydrogen energy levels for relativity. Rewritten in terms of the
square of the classical kinetic energy operator, the Bohr ground state energy E1
and the fine structure constant α, it is
à !2
α2 pb2
H1,Einstein =− (10.11)
4|E1 | 2me

The second correction that must be added to the non-relativistic Hamilto-


nian is the so-called “spin-orbit interaction.” In classical terms, it is due to the
spin of the electron, which makes it into a “magnetic dipole.” Think of it as a
magnet of infinitesimally small size, but with infinitely strong north and south
poles to make up for it. The product of the infinitesimal vector from south to
north pole times the infinite strength of the poles is finite, and defines the mag-
netic dipole moment ~µ. By itself, it is quite inconsequential since the magnetic
dipole does not interact directly with the electric field of the nucleus. How-
ever, moving magnetic poles create an electric field just like the moving electric
charges in an electromagnet create a magnetic field. The electric fields gen-
erated by the moving magnetic poles of the electron are opposite in strength,
but not quite centered at the same position. Therefor they correspond to a
motion-induced electric dipole. And an electric dipole does interact with the
electric field of the nucleus; it wants to align itself with it. That is just like the
magnetic dipole wanted to align itself with the external magnetic field in the
Zeeman effect.
10.1. PERTURBATION THEORY 471

So how big is this effect? Well, the energy of an electric dipole ℘


~ in an
~
electric field E is
E1,spin-orbit = −~ ~
℘·E
As you might guess, the electric dipole generated by the magnetic poles of the
moving electron is proportional to the speed of the electron ~v and its magnetic
dipole moment ~µ. More precisely, the electric dipole moment ℘ ~ will be propor-
tional to ~v × ~µ because if the vector connecting the south and north poles is
parallel to the motion, you do not have two neighboring currents of magnetic
poles, but a single current of both negative and positive poles that completely
cancel each other out. Also, the electric field E ~ of the nucleus is minus the
gradient of its potential e/4πǫ0 r, so
e
E1,spin-orbit ∝ (~v × µ
~) · ~r
4πǫ0 r3
Now the order of the vectors in this triple product can be changed, and the
dipole strength ~µ of the electron equals its spin S ~ times the charge per unit
mass −e/me , so
e2 ~
E1,spin-orbit ∝ (~r × ~v ) · S
me 4πǫ0 r3
The expression between the parentheses is the angular momentum L ~ save for
the electron mass. The constant of proportionality is worked out in note {A.98},
giving the spin-orbit Hamiltonian as
µ ¶3
a0 1 ~b ~b
H1,spin-orbit = α2 |E1 | L·S (10.12)
r h̄2
The final correction that must be added to the non-relativistic Hamiltonian
is the so-called “Darwin term:”

H1,Darwin = α2 |E1 | πa30 δ 3 (~r) (10.13)

According to its derivation in note A.97, it is a crude fix-up for an interac-


tion with a virtual positron that simply cannot be included correctly in a non-
relativistic analysis.
If that is not very satisfactory, the following much more detailed derivation
can be found on the web. It does succeed in explaining the Darwin term fully
within the non-relativistic picture alone. First assume that the electric potential
of the nucleus does not really become infinite as 1/r at r = 0, but is smoothed
out over some finite nuclear size. Also assume that the electron does not “see”
this potential sharply, but perceives of its features a bit vaguely, as diffused
out symmetrically over a typical distance equal to the so-called Compton wave
length h̄/me c. There are several plausible reasons why it might: (1) the electron
472 CHAPTER 10. SOME ADDITIONAL TOPICS

has illegally picked up a chunk of a negative rest mass state, and it is trembling
with fear that the uncertainty in energy will be noted, moving rapidly back
and forwards over a Compton wave length in a so-called “Zitterbewegung”; (2)
the electron has decided to move at the speed of light, which is quite possible
non-relativistically, so its uncertainty in position is of the order of the Compton
wave length, and it just cannot figure out where the right potential is with all
that uncertainty in position and light that fails to reach it; (3) the electron
needs glasses. Further assume that the Compton wave length is much smaller
than the size over which the nuclear potential is smoothed out. In that case,
the potential within a Compton wave length can be approximated by a second
order Taylor series, and the diffusion of it over the Compton wave length will
produce an error proportional to the Laplacian of the potential (the only fully
symmetric combination of derivatives in the second order Taylor series.). Now
if the potential is smoothed over the nuclear region, its Laplacian, giving the
charge density, is known to produce a nonzero spike only within that smoothed
nuclear region, figure 9.13 or (9.46). Since the nuclear size is small compared
to the electron wave functions, that spike can then be approximated as a delta
function. Tell all your friends you heard it here first.
The key question is now what are the changes in the hydrogen energy lev-
els due to the three perturbations discussed above. That can be answered by
perturbation theory as soon as the good eigenfunctions have been identified.
Recall that the usual hydrogen energy eigenfunctions ψnlm l are made unique by
the square angular momentum operator L b 2 , giving l, the z-angular momentum
operator Lb , giving m, and the spin angular momentum operator S b giving the
z z
1
spin quantum number ms = ± 2 for spin up, respectively down. The decisive
term whether these are good or not is the spin-orbit interaction. If the inner
product in it is written out, it is
µ ¶3
2 a0 1 ³b b b S b +L
b S b
´
H1,spin-orbit = α |E1 | L x S x + L y y z z
r h̄2
The radial factor is no problem; it commutes with every orbital angular mo-
mentum component, since these are purely angular derivatives, chapter 3.1.2.
It also commutes with every component of spin because all spatial functions
and operators do, chapter 4.5.3. As far as the dot product is concerned, it
commutes with L ~b do, chapter 3.4.4, and since all
b 2 since all the components of L

the components of S ~b commute with any spatial operator. But unfortunately,


b and L
L b do not commute with L b , and S
b and S b do not commute with S b
x y z x y z
(chapters 3.4.4 and 4.5.3):
b ,L
[L b ] = −ih̄L
b b ,L
[L b ] = ih̄L
b [Sbx , Sbz ] = −ih̄Sby [Sby , Sbz ] = ih̄Sbx
x z y y z x

The quantum numbers m and ms are bad.


10.1. PERTURBATION THEORY 473

b
~ ·S b
~ does commute with the net z-angular momentum Jbz ,
Fortunately, L
defined as Lb +S b . Indeed, using the commutators above and the rules of
z z
chapter 3.4.4 to take apart commutators:
b Sb b b b b b b b b b b b b
[L x x , Lz + Sz ] = [Lx , Lz ]Sx + Lx [Sx , Sz ] = −ih̄Ly Sx − ih̄Lx Sy

b S
[L b b b b b b b b b b b b b
y y , Lz + Sz ] = [Ly , Lz ]Sy + Ly [Sy , Sz ] = ih̄Lx Sy + ih̄Ly Sx

b S
[L b b b b b b b b b
z z , Lz + Sz ] = [Lz , Lz ]Sz + Lz [Sz , Sz ] = 0

and adding it all up, you get [L ~b Jbz ] = 0. The same way of course L
~b · S, ~b
~b · S
commutes with the other components of net angular momentum J, ~b since the
z-axis is arbitrary. And if L~b · S
~b commutes with every component of J, ~b then it
commutes with their sum of squares Jb2 . So, eigenfunctions of L b 2 , Jb2 , and Jb
z
are good eigenfunctions.
Such good eigenfunctions can be constructed from the ψnlm l by forming
linear combinations of them that combine different m and ms values. The
coefficients of these good combinations are called Clebsch-Gordan coefficients
and are shown for l = 1 and l = 2 in figure 9.5. Note from this figure that the
quantum number j of net square momentum can only equal l + 12 or l − 21 . The
half unit of electron spin is not big enough to change the quantum number of
square orbital momentum by more than half a unit. For the rest, however, the
detailed form of the good eigenfunctions is of no interest here. They will just
be indicated in ket notation as |nljmj i, indicating that they have unperturbed
energy En , square orbital angular momentum l(l + 1)h̄2 , square net (orbital plus
spin) angular momentum j(j + 1)h̄2 , and net z-angular momentum mj h̄.
As far as the other two contributions to the fine structure are concerned,
2
according to chapter 3.2.1 ~pb in the Einstein term consists of radial functions
and radial derivatives plus L b 2 . These commute with the angular derivatives

that make up the components of L, ~b and as spatial functions and operators, they
commute with the components of spin. So the Einstein Hamiltonian commutes
with all components of L ~b and J~b = L ~b + S,
~b hence with L
b 2 , Jb2 , and Jb . And the
z
delta function in the Darwin term can be assumed to be the limit of a purely
radial function and commutes in the same way. The eigenfunctions |nljmj i
with given values of l, j, and mj are good ones for the entire fine structure
Hamiltonian.
To get the energy changes, the Hamiltonian perturbation coefficients
hmj jln|H1,Einstein + H1,spin-orbit + H1,Darwin |nljmj i
must be found. Starting with the Einstein term, it is
α2 pb 4
hmj jln|H1,Einstein |nljmj i = − hmj jln| 2 |nljmj i
4|E1 | 4me
474 CHAPTER 10. SOME ADDITIONAL TOPICS

Unlike what you may have read elsewhere, pb 4 is indeed a Hermitian operator,
but pb 4 |nljmj i may have a delta function at the origin, (9.46), so watch it with
blindly applying mathematical manipulations to it. The trick is to take half
of it to the other side of the inner product, and then use the fact that the
eigenfunctions satisfy the non-relativistic energy eigenvalue problem:
¯ ¯
pb2 ¯¯ pb2 ¯
hmj jln| ¯ |nljm j i = hm j jln|E n − V ¯En − V |nljmj i
2me 2me
= hmj jln|En2 − 2V En + V 2 |nljmj i

Noting from chapter 3.2 that En = E1 /n2 , V = 2E1 a0 /r and that the expecta-
tion values of a0 /r and (a0 /r)2 are given in note {A.99}, you find that
à !
α2 4n
hmj jln|H1,Einstein |nljmj i = − 2 − 3 |En |
4n l + 21

The spin-orbit energy correction is


µ ¶3
a0 1 ~b ~b
hmj jln|H1,spin-orbit |nljmj i = α2 |E1 |hmj jln| L · S|nljmj i
r h̄2
~ produce zero, b
For states with no orbital angular momentum, all components of L
b b
~ ·S
~ can be rewritten
so there is no contribution. Otherwise, the dot product L
by expanding
~b + S)
Jb2 = (L ~b 2 = L
b2 + S ~b · S
b2 + 2L ~b
to give
³ ´ ³ ´
~b S|nljm
L· ~b ji =
1
Jb2 − L
b 2 −S
b2 |nljm i = 1 h̄2 j(j +1)−l(l+1)− 1 (1+ 1 ) |nljm i
j j
2 2 2 2

That leaves only the expectation value of (a0 /r)3 to be determined, and that
can be found in note {A.99}. The net result is

α2 j(j + 1) − l(l + 1) − 21 (1 + 21 )
hmj jln|H1,spin-orbit |nljmj i = 2n |En | if l 6= 0
4n2 l(l + 12 )(l + 1)

or zero if l = 0.
Finally the Darwin term,

hmj jln|H1,Darwin |nljmj i = α2 |E1 | πa30 hmj jln|δ 3 (~r)|nljmj i

Now a delta function at the origin has the property to pick out the value at the
origin of whatever function it is in an integral with, compare chapter 5.3.1. Note
{A.17}, (A.31), implies that the value of the wave functions at the origin is zero
10.1. PERTURBATION THEORY 475

unless l = 0, and then the value is given in (A.32). So the Darwin contribution
becomes
α2
hmj jln|H1,Darwin |nljmj i = 2 4n|En | if l = 0
4n
To get the total energy change due to fine structure, the three contributions
must be added together. For l = 0, add the Einstein and Darwin terms. For
l 6= 0, add the Einstein and spin-orbit terms; you will need to do the two
possibilities that j = l + 21 and j = l − 21 separately. All three produce the same
final result, anyway:
à !
1 3 1
Enljmj ,1 =− 1 − α2 |En | (10.14)
n(j + 2 ) 4 n2

Since j + 21 is at most n, the energy change due to fine structure is always


negative. And it is the biggest fraction of En for j = 21 and n = 2, where it is
5 2
− 16 α |En |, still no more than a sixth of a percent of a percent change in energy.
In the ground state j can only be one half, (the electron spin), so the ground
state energy does not split into two due to fine structure. You would of course
not expect so, because in empty space, both spin directions are equivalent. The
ground state does show the largest absolute change in energy.
Woof.

Weak and intermediate Zeeman effect


The weak Zeeman effect is the effect of a magnetic field that is sufficiently
weak that it leaves the fine structure energy eigenfunctions almost unchanged.
The Zeeman effect is then a small perturbation on a problem in which the
“unperturbed” (by the Zeeman effect) eigenfunctions |nljmj i derived in the
previous subsubsection are degenerate with respect to l and mj .
The Zeeman Hamiltonian
e ³ ´
H1 = Bext Lb + 2S
b
z z
2me

commutes with both L b 2 and Jb = S


b +L b , so the eigenfunctions |nljm i are
z z z j
good. Therefor, the energy perturbations can be found as
e b + 2S
b |nljm i
Bext hmj jln|L z z j
2me

To evaluate this rigorously would require that the |nljmj i state be converted
into the one or two ψnlm l states with −l ≤ m = mj ± 12 ≤ l and ms = ∓ 12 using
the appropriate Clebsch-Gordan coefficients from figure 9.5.
476 CHAPTER 10. SOME ADDITIONAL TOPICS

However, the following simplistic derivation is usually given instead, includ-


ing in this book. First get rid of Lz by replacing it by Jbz − Sbz . The inner product
with Jbz can then be evaluated as being mj h̄, giving the energy change as
e h i
Bext mj h̄ + hmj jln|Sbz |nljmj i
2me
For the final inner product, make a semi-classical argument that only the com-
ponent of S~b in the direction of J~ gives a contribution. Don’t worry that J~ does
not exist. Just note that the component in the direction of J~ is constrained by
the requirement that L ~b and S
~b must add up to J,~b but the component normal to
J~ can be in any direction and presumably averages out to zero. Dismissing this
component, the component in the direction of J~ is

~bJ = 1 (S
S ~b · J)
~b J~b
J 2

and the dot product in it can be found from expanding

~b · L
b2 = L
L ~b = (J~b − S).(
~b J~b − S)
~b = J 2 − 2J~b · S
~b + S 2

to give
2 b2 2
~bJ = J − L + S J~b
S
2J 2
b 2 = h̄2 l(l + 1), and S 2 =
For a given eigenfunction |nljmj i, J 2 = h̄2 j(j + 1), L
h̄2 s(s + 1) with s = 21 .
~bJ is substituted for Sbz in the expression for the
If the z-component of S
Hamiltonian perturbation coefficients, the energy changes are
" #
j(j + 1) − l(l + 1) + s(s + 1) eh̄
1+ Bext mj (10.15)
2j(j + 1) 2me

(Rigorous analysis using figure 9.5 produces the same results.) The factor within
the brackets is called the “Landé g-factor.” It is the factor by which the magnetic
moment of the electron in the atom is larger than for a classical particle with
the same charge and total angular momentum. It generalizes the g-factor of the
electron in isolation to include the effect of orbital angular momentum. Note
that it is two, the Dirac g-factor, if there is no orbital momentum, and one, the
classical value, if the orbital momentum is so large that the half unit of spin
can be ignored.
In the intermediate Zeeman effect, the fine structure and Zeeman effects
are comparable in size. The dominant perturbation Hamiltonian is now the
combination of the fine structure and Zeeman ones. Since the Zeeman part
10.1. PERTURBATION THEORY 477

does not commute with Jb2 , the eigenfunctions |nljmj i are no longer good.
Eigenfunctions with the same values of l and mj , but different values of j must
be combined into good combinations. For example, if you look at n = 2, the
eigenfunctions |21 23 21 i and |21 12 21 i have the same unperturbed energy and good
quantum numbers l and mj . You will have to write a two by two matrix of
Hamiltonian perturbation coefficients for them, as in subsection 10.1.3, to find
the good combinations and their energy changes. And the same for the |21 32 12 i
and |21 21 12 i eigenfunctions. To obtain the matrix coefficients, use the Clebsch-
Gordan coefficients from figure 9.5 to evaluate the effect of the Zeeman part.
The fine structure contributions to the matrices are given by (10.14) when the
j values are equal, and zero otherwise. This can be seen from the fact that
the energy changes must be the fine structure ones when there is no magnetic
field; note that j is a good quantum number for the fine structure part, so its
perturbation coefficients involving different j values are zero.

Lamb shift
A famous experiment by Lamb & Retherford in 1947 showed that the hydrogen
atom state n = 2, l = 0, j = 21 , also called the 2S1/2 state, has a somewhat
different energy than the state n = 2, l = 1, j = 21 , also called the 2P1/2 state.
That was unexpected, because even allowing for the relativistic fine structure
correction, states with the same principal quantum number n and same total
angular momentum quantum number j should have the same energy. The dif-
ference in orbital angular momentum quantum number l should not affect the
energy.
The cause of the unexpected energy difference is called Lamb shift. To ex-
plain why it occurs would require quantum electrodynamics, and that is well
beyond the scope of this book. Roughly speaking, the effect is due to a vari-
ety of interactions with virtual photons and electron/positron pairs. A good
qualitative discussion on a non technical level is given by Feynman [6].
Here it must suffice to list the approximate energy corrections involved. For
states with zero orbital angular momentum, the energy change due to Lamb
shift is
α3
E~n,1,Lamb = − k(n, 0)En if l = 0 (10.16)
2n
where k(n, 0) is a numerical factor that varies a bit with n from about 12.7 to
13.2. For states with nonzero orbital angular momentum,
" #
α3 1 1
E~n,1,Lamb =− k(n, l) ± En if l 6= 0 and j = l ±
2n 1
π(j + 2 )(l + 21 ) 2

(10.17)
where k(n, l) is less than 0.05 and varies somewhat with n and l.
478 CHAPTER 10. SOME ADDITIONAL TOPICS

It follows that the energy change is really small for states with nonzero orbital
angular momentum, which includes the 2P1/2 state. The change is biggest for
the 2S1/2 state, the other state in the Lamb & Retherford experiment. (True,
the correction would be bigger still for the ground state n = 0, but since there
are no states with nonzero angular momentum in the ground state, there is no
splitting of spectral lines involved there.)
Qualitatively, the reason that the Lamb shift is small for states with nonzero
angular momentum has to do with distance from the nucleus. The nontrivial
effects of the cloud of virtual particles around the electron are most pronounced
in the strong electric field very close to the nucleus. In states of nonzero angular
momentum, the wave function is zero at the nucleus, (A.31). So in those states
the electron is unlikely to be found very close to the nucleus. In states of zero
angular momentum, the square magnitude of the wave function is 1/n3 πa30 at the
nucleus, reflected in both the much larger Lamb shift as well as its approximate
1/n3 dependence on the principal quantum number n.

Hyperfine splitting
Hyperfine splitting of the hydrogen atom energy levels is due to the fact that the
nucleus acts as a little magnet just like the electron. The single-proton nucleus
and electron have magnetic dipole moments due to their spin equal to
gp e ~b ge e ~b
~µp = Sp ~µe = − Se
2mp 2me
in which the g-factor of the proton is about 5.59 and that of the electron 2.
The magnetic moment of the nucleus is much less than the one of the electron,
since the much greater proton mass appears in the denominator. That makes
the energy changes associated with hyperfine splitting really small compared to
other effects such as fine structure.
This discussion will restrict itself to the ground state, which is by far the
most important case. For the ground state, there is no orbital contribution to
the magnetic field of the electron. There is only a “spin-spin coupling” between
the magnetic moments of the electron and proton, The energy involved can be
thought of most simply as the energy −~µe · B ~ p of the electron in the magnetic
~ p of the nucleus. If the nucleus is modelled as an infinitesimally small
field B
electromagnet, its magnetic field is that of an ideal current dipole as given in
table 9.2. The perturbation Hamiltonian then becomes
 
~bp · ~r)(S
gp ge e2  3(S ~be · ~r) − (S
~bp · S
~be )r2 2(S
~bp · S
~be )
H1,spin-spin = + δ 3 (~r)
4me mp ǫ0 c2 4πr5 3
The good states are not immediately self-evident, so the four unperturbed
ground states will just be taken to be the ones which the electron and proton
10.1. PERTURBATION THEORY 479

spins combine into the triplet or singlet states of chapter 4.5.6:

triplet: ψ100 |1 1i ψ100 |1 0i ψ100 |1 1i singlet: ψ100 |0 0i

or ψ100 |snet mnet i for short, where snet and mnet are the quantum numbers of net
spin and its z-component. The next step is to evaluate the four by four matrix
of Hamiltonian perturbation coefficients

hmnet snet |ψ100 |H1,spin-spin ψ100 |snet mnet i

using these states.


Now the first term in the spin-spin Hamiltonian does not produce a contribu-
tion to the perturbation coefficients. The reason is that the inner product of the
perturbation coefficients written in spherical coordinates involves an integration
over the surfaces of constant r. The ground state eigenfunction ψ100 is constant
on these surfaces. So there will be terms like 3Sbp,x Sbe,y xy in the integration, and
those are zero because x is just as much negative as positive on these spherical
surfaces, (as is y). There will also be terms like 3Sbp,x Sbe,x x2 − Sbp,x Sbe,x r2 in the
integration. These will be zero too because by symmetry the averages of x2 , y 2 ,
and z 2 are equal on the spherical surfaces, each equal to one third the average
of r2 .
So only the second term in the Hamiltonian survives, and the Hamiltonian
perturbation coefficients become
gp ge e2 ~bp · S
~be )δ 3 (~r)ψ100 |snet mnet i
2
hmnet snet |ψ100 |(S
6me mp ǫ0 c
The spatial integration in this inner product merely picks out the value
2
ψ100 (0) = 1/πa30 at the origin, as delta functions do. That leaves the sum over
the spin states. The dot product of the spins can be found by expanding

Sbnet
2
= (S ~be ) · (S
~bp + S ~be ) = Sbp2 + 2S
~bp + S ~be + Sbe2
~bp · S

to give
³ ´
S ~be =
~bp · S 1
Sbnet
2
− Sbp2 − Sbe2
2

The spin states |snet mnet i are eigenvectors of this operator,


³ ´
~bp · S
S ~be |snet mnet i = 1 h̄2 snet (snet + 1) − sp (sp + 1) − se (se + 1) |snet mnet i
2

where both proton and electron have spin sp = se = 21 . Since the triplet
and singlet spin states are orthonormal, only the Hamiltonian perturbation
coefficients for which snet = snet and mnet = mnet survive, and these then give
the leading order changes in the energy.
480 CHAPTER 10. SOME ADDITIONAL TOPICS

Plugging it all in and rewriting in terms of the Bohr energy and fine structure
constant, the energy changes are:
me 2 me 2
triplet: E1,spin-spin = 13 gp ge α |E1 | singlet: E1,spin-spin = −gp geα |E1 |
mp mp
(10.18)
The energy of the triplet states is raised and that of the singlet state is lowered.
Therefor, in the true ground state, the electron and proton spins combine into
the singlet state. If they somehow get kicked into a triplet state, they will
eventually transition back to the ground state, say after 10 million years or so,
and release a photon. Since the difference between the two energies is so tiny
on account of the very small values of both α2 and me /mp , this will be a very
low energy photon. Its wave length is as big as 0.21 m, producing the 21 cm
hydrogen line.

10.2 Quantum Field Theory in a Nanoshell


The “classical” quantum theory discussed in this book has major difficulties
describing really relativistic effects such as particle creation and destruction.
Einstein’s E = mc2 allows particles to be destroyed as long as their mass times
the square speed of light shows up as energy elsewhere. They can also be
created when enough energy is available. Indeed, as the Dirac equation, section
9.2, first showed, electrons and positrons can annihilate one another, or they
can be created by a very energetic photon near a heavy nucleus.
And quantum field theory is not just for esoteric conditions. The photons
of light are routinely created under normal conditions. Still more basic to an
engineer, so are their equivalents in solids, the phonons. Then there is the band
theory of solids: electrons are “created” within the conduction band, if they
pick up enough energy, or “annihilated” when they lose it. And similarly for
the real-life equivalent of positrons, holes in the valence band.
Such phenomena are routinely described within the framework of quantum
field theory, and almost unavoidably you will run into it in literature, [7, 11].
Electron-phonon interactions are particularly important for engineering applica-
tions, leading to electrical resistance (along with crystal defects and impurities),
and the combination of electrons into Cooper pairs that act as bosons and so
give rise to superconductivity. The intention of this section is to explain enough
of the ideas so that you can recognize it when you see it. What to do about it
after you recognize it is another matter.
Especially the relativistic applications are very involved. To explain quan-
tum field theory in a nutshell takes 500 pages, [21]. You will also need to pick
up linear algebra, tensor algebra, and group theory. However, if you are just
interested in relativistic quantum mechanics from an intellectual point of view,
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 481

rather than for practical applications, the answer is all good. Feynman gave
a set of lectures on “quantum electrodynamics” for a general audience around
1983, and the text is readily available at low cost. Here, freed from the con-
straint of his lecture notes to cover a standard fare of material, Feynman truly
gets it right. Without doubt, this is the best exposition of the fundamentals
of quantum mechanics that has ever been written, or ever will. The subject is
reduced to its bare abstract axioms, and no more can be said. If the human
race is still around a millennium or so from now, technology may take care of
the needed details of quantum mechanics. But those who need or want to un-
derstand what it means will still reach for Feynman. The 2006 edition, [6], has
a foreword by Zee that gives a few hints how to relate the basic concepts in the
discussion to more conventional mathematics like the complex numbers found
in this book.
It will not be much help applying quantum field theory to engineering prob-
lems, however. In the absence of 1,000 pages and a willing author, the following
discussion will truly be quantum field theory in a nanoshell. I thank Wikipedia
for the basic approach. As far as the rest is concerned, it has been pieced to-
gether from, in order of importance, [[9]], [20], [[3]], [7, 11]. Any mistakes in
doing so are mine.

10.2.1 Occupation numbers


Consider once more systems of weakly interacting particles like the ones that
were studied in section 8.2. The energy eigenfunctions of such a system can
be written in terms of whatever are the single-particle energy eigenfunctions
ψ1p (~r, Sz ), ψ2p (~r, Sz ), . . .. A completely arbitrary example of such a system eigen-
function for a system of I = 95 distinguishable particles is:

ψqI = ψ7p (~r1 , Sz1 )ψ66


p p
(~r2 , Sz2 )ψ86 p
(~r3 , Sz3 )ψ40 (~r4 , Sz4 )ψ7p (~r5 , Sz5 ) . . . ψ59
p
(~r95 , Sz95 )
(10.19)
This system eigenfunction has an energy that is the sum of the 95 single-particle
eigenstate energies involved:
I p p p p p p
E q = E 7 + E 66 + E 86 + E 40 + E 7 + . . . + E 59

Instead of writing out the example eigenfunction mathematically as done


in (10.19), it can be graphically depicted as in figure 10.1. In the figure the
different types of single-particle states are shown as boxes, and the particles
that are in those particular single-particle states are shown inside the boxes. In
the example, particle 1 is inside the ψ7p box, particle 2 is inside the ψ66p
one,
etcetera. It is just the reverse from the mathematical expression (10.19): the
mathematical expression shows for each particle in turn what the single-particle
482 CHAPTER 10. SOME ADDITIONAL TOPICS

p
ψ88
ψ73 i76i
p 66 p
ψ87
p i
ψ59 95 ψ72 p p 3i12
ψ86 i
p
ψ46 ψ58 i93i ψ p 94i
p 70 p
ψ85
71
p 16
ψ34 i26i ψ p 74i ψ p 34i ψ p p
ψ84 88i
45 57 70
p
ψ33 i ψ p 20i ψ p 15i ψ p 42i
77 p
ψ83 61i
44 56 69
p 9i53iψ p i ψp p p i p i
ψ23 32
14
43 ψ55 ψ68 62 ψ82 36
p
ψ22 p i p iii i p i ii
ψ31 25 ψ42 47558486 ψ54 525758 ψ67 p p
ψ81
ψ14 i78i ψ p 81i ψ p
p 38 p
ψ41 p
ψ53 73i ψp 2i p
ψ80
21 30 66
ψ13 i32i ψ20 44i67i ψ29 28i ψ40 4i82i ψ52
p 21 p p p p p
ψ65 ψ79 65i
p

ψ7p i i i
1 5 48 ψ p 41i ψ p 24i85i ψ p 60i ψ p 31i ψ p ψ p p
ψ78
12 19 28 39 51 64
ψ6p 13i ψ 35ip i
40 iψ 10i46i ψ 8i39i ψ
89 p p p
ψ50 19i ψ63
p p i54i
p 17
ψ77
11 18 27 38
ψ5p i
18 i
63 iψ p 50i ψ p 30i
92 i
45 iψ p 7i11i ψ p
49 p
ψ49 83i ψ p 90i p
ψ76
10 17 26 37 62
i i i
ψ2 6991 ψ4p
p 33 i i i
22 51 80 ψ p 37i ψ p i
68 ψ p
ψ p i i
23 72 ψ p
ψ p 56i p
ψ75
9 16 25 36 48 61
ψ1p 43i87i ψ3p 29i ψ 6i59i ψ
8
p p
15 ψ24 27i64i71iψ35 79i ψ47
p p p
ψ60 75i
p p
ψ74

Figure 10.1: Graphical depiction of an arbitrary system energy eigenfunction


for 95 distinguishable particles.

eigenstate of that particle is. The figure shows for each single-particle eigenstate
in turn what particles are in that eigenstate.
However, if the 95 particles are identical bosons, (like photons or phonons),
the example mathematical eigenfunction (10.19) and corresponding depiction
figure 10.1 is unacceptable. Eigenfunctions for bosons must be unchanged if two
particles are swapped. As chapter 4.7 explained, in terms of the mathematical
expression (10.19) it means that all wave functions that can be obtained from
(10.19) by swapping particle numbers must be combined together equally into a
single wave function. There may not actually be 95! of them, but there will be a
lot, and there is no way to actually list such a massive mathematical expression
here. It is much easier in terms of the graphical depiction figure 10.1: graphically
all these countless system eigenfunctions differ only with respect to the numbers
in the particles. And since in the final eigenfunction, all particles are present
in exactly the same way, then so are their numbers within the particles; the
numbers no longer add distinguishing information and can be left out. That
makes the graphical depiction of the correct example eigenfunction for a system
of identical bosons as in figure 10.2.
For a system of identical fermions, (like electrons, protons, or neutrons,) the
eigenfunctions must change sign if two particles are swapped. As chapter 4.7
showed, that means that you cannot create an eigenfunction for a system of 95
fermions from the example eigenfunction (10.19) and the swapped versions of
it. Various single-particle eigenfunctions appear multiple times in (10.19), like
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 483

p
ψ88
p
ψ73 ii p
ψ87
p
ψ59 i ψp p ii
ψ86
72
p
ψ46 p
ψ58 i i ψp i p
ψ85
71
p
ψ34 i i ψp i ψp i ψp p
ψ84 i
45 57 70
p
ψ33 i ψp i ψp i ψp i p
ψ83 i
44 56 69
p
ψ23 i i ψp i ψp p
ψ55 p
ψ68 i p
ψ82 i
32 43
p
ψ22 p
ψ31 i ψp iiiiψ p iiiψ p p
ψ81
42 54 67
p
ψ14 i i ψp i ψp p
ψ41 p
ψ53 i ψp i p
ψ80
21 30 66
p
ψ13 i i ψp i i ψp i ψp ii ψ p p
ψ65 ψ79 i
p
20 29 40 52
ψ7p iiiψ p i ψp i i ψp i ψp i ψp p
ψ64 p
ψ78
12 19 28 39 51
ψ6p i ψp iiiψ p i i ψp i i ψp p
ψ50 i ψp p ii
ψ77
11 18 27 38 63
ψ5p iiiψ p i ψp iiiψ p i i ψp p
ψ49 i ψp i p
ψ76
10 17 26 37 62
ψ2p iiiψ p iiiψ p i ψp i ψp p
ψ36 i i ψp p
ψ61 i p
ψ75
4 9 16 25 48
ψ1p i i ψp i ψp i i ψp p
ψ24 iiiψ p i ψp p
ψ60 i p
ψ74
3 8 15 35 47

Figure 10.2: Graphical depiction of an arbitrary system energy eigenfunction


for 95 identical bosons.

ψ7p , which is occupied by particles 1, 5, and 48. A system eigenfunction for 95


identical fermions requires 95 different single-particle eigenfunctions. Graphi-
cally, the example figure 10.2, which is fine for a system of identical bosons, is
completely unacceptable for a system of identical fermions, because there can-
not be more than one particle in a given type of single-particle eigenstate. A
depiction of an arbitrary energy eigenfunction that is acceptable for a system
of 31 identical fermions is in figure 10.3.
As explained in chapter 4.7, a neat way of writing down the system energy
eigenfunction of the pictured example is to form a Slater determinant from the
“occupied states”
ψ1p , ψ2p , ψ3p , . . . , ψ48
p p
, ψ59 p
, ψ63 .
It is good to meet old friends again, isn’t it?
Now consider what happens in relativistic quantum mechanics. For example,
suppose an electron and positron annihilate each other. What are you going to
do, leave holes in the argument list of your wave function, where the electron and
positron used to be? Or worse, what if a photon with very high energy hits an
heavy nucleus and creates an electron-positron pair in the collision from scratch?
Are you going to scribble in a set of additional arguments for the new particles
into your mathematical wave function? Scribble in another row and column in
the Slater determinant for your electrons? That is voodoo mathematics.
And if positrons are too weird for you, consider photons, the particles of
electromagnetic radiation, like ordinary light. As section 8.14.5 showed, the
484 CHAPTER 10. SOME ADDITIONAL TOPICS

p
ψ88
p p
ψ73 ψ87
p
ψ59 y p
ψ72 p
ψ86
p
ψ46 y ψ p p
ψ71 p
ψ85
58
p
ψ34 y p
ψ45 p
ψ57 p
ψ70 p
ψ84
p
ψ33 y p
ψ44 p
ψ56 p
ψ69 p
ψ83
p
ψ23 y p
ψ32 y p
ψ43 p
ψ55 p
ψ68 p
ψ82
p
ψ22 p
ψ31 p
ψ42 y ψp p
ψ67 p
ψ81
54
p
ψ14 y p
ψ21 y p
ψ30 y p
ψ41 p
ψ53 p
ψ66 p
ψ80
p
ψ13 y p
ψ20 y p
ψ29 p
ψ40 y ψ p p
ψ65 p
ψ79
52
ψ7p y p
ψ12 p
ψ19 y p
ψ28 p
ψ39 p
ψ51 p
ψ64 p
ψ78
ψ6p y p
ψ11 y p
ψ18 y p
ψ27 p
ψ38 p
ψ50 p
ψ63 y ψ p
77
ψ5p y p
ψ10 y p
ψ17 y p
ψ26 p
ψ37 p
ψ49 p
ψ62 p
ψ76
ψ2p y ψp y ψ9p y p
ψ16 p
ψ25 y p
ψ36 p
ψ48 y p
ψ61 p
ψ75
4
ψ1p y ψp y ψ8p y p
ψ15 y p
ψ24 p
ψ35 p
ψ47 p
ψ60 p
ψ74
3

Figure 10.3: Graphical depiction of an arbitrary system energy eigenfunction


for 31 identical fermions.

electrons in hot surfaces create and destroy photons readily when the thermal
equilibrium shifts. Moving at the speed of light, with zero rest mass, photons are
as relativistic as they come. Good luck scribbling in trillions of new arguments
for the photons into your wave function when your black box heats up. Then
there are solids; as section 8.14.6 showed, the phonons of crystal vibrational
waves are the equivalent of the photons of electromagnetic waves.
One of the key insights of quantum field theory is to do away with classical
mathematical forms of the wave function such as (10.19) or the Slater determi-
nants. Instead, the graphical depictions, such as the examples in figures 10.2
and 10.3, are captured in terms of mathematics. How do you do that? By listing
how many particles are in each type of single-particle state, in other words, by
listing the single-state “occupation numbers.”
Consider the example bosonic eigenfunction of figure 10.2. The occupation
numbers for that state would be

~ı = |2, 3, 1, 3, 3, 1, 3, 2, 1, 1, 3, 1, 2, 2, 0, 3, 2, 2, 2, 1, 0, 2, . . .i

indicating that there are two bosons in single-particle state ψ1p , three in ψ2p , one
in ψ3p , etcetera. Knowing those numbers is completely equivalent to knowing
the classical system energy eigenfunction; it could be reconstructed from them.
Similarly, the occupation numbers for the example fermionic eigenfunction of
figure 10.3 would be

~ı = |1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, . . .i
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 485

Such sets of occupation numbers are called “Fock states.” Each describes one
system energy eigenfunction.
The most general wave function for a set of I particles is a linear combination
of all the Fock states whose occupation numbers add up to I. In relativistic
applications like photons in a box, there is no constraint on the number of
particles and all states are possible. The set of all possible wave functions that
can be formed from linear combinations of all the Fock states regardless of
number of particles is called the “Fock space.”
How about the case of distinguishable particles as in figure 10.1? In that
case, the numbers inside the particles also make a difference, so where do they
go?? The answer of quantum field theory is to deny the existence of generic
particles that take numbers. There are no generic particles in quantum field
theory. There is a field of electrons, there is a field of protons, (or quarks,
actually), there is a field of photons, etcetera, and each of these fields is granted
its own set of occupation numbers. There is no way to describe a generic particle
using a number. For example, if there is an electron in a single particle state,
in quantum field theory it means that the “electron field” has a one-particle
excitation at that energy state; no particle numbers are involved.
Some physicist feel that this is a strong point in favor of believing that quan-
tum field theory is the way nature really works. In the classical formulation of
quantum mechanics, the (anti)symmetrization requirements are an additional
ingredient, added to explain the data. In quantum field theory, it comes nat-
urally: particles that are not indistinguishable simply cannot be described by
the formalism. Still, our convenience in describing it is an uncertain motivator
for nature.
The successful analysis of the blackbody spectrum in section 8.14.5 already
testified to the usefulness of the Fock space. If you check the derivations leading
to it, they were all conducted based on occupation numbers. A classical wave
function for a system of photons was never written down; in fact, that cannot
be done.

ψp i i ψp y

Figure 10.4: Example wave functions for a system with just one type of single
particle state. Left: identical bosons; right: identical fermions.

There is a lot more involved in quantum field theory than just the blackbody
spectrum, of course. To explain some of the basic ideas, simple examples can
be helpful. The simplest example that can be studied involves just one type
of single-particle state, say just a single-particle ground state. The graphical
depiction of an arbitrary example wave function is then as in figure 10.4. In
486 CHAPTER 10. SOME ADDITIONAL TOPICS

nonrelativistic quantum mechanics, it would be a completely trivial quantum


system. In the case of identical bosons, all I of them would have to go into the
only type of state there is. In the case of identical fermions, there can only be
one fermion, and it has to go into the only state there is.
But when particles can be created or destroyed, things get more interesting.
When there is no given number of particles, there can be any number of identical
bosons within that single particle state. That allows |0i (no particles,) |1i (1
particle), |2i (2 particles), etcetera. And the general wave function can be a
linear combination of those possibilities. It is the same for identical fermions,
except that there are now only the states |0i (no particles) and |1i (1 particle).
A relativistic system with just one type of single-particle state does seems
very artificial, raising the question how esoteric the example is. But there are
in fact two very well established classical systems that behave just like this:
1. The one-dimensional harmonic oscillator has energy levels that happen to
be exactly equally spaced. It can pick up an energy above the ground
state that is any whole multiple of h̄ω, where ω is its angular frequency.
If you are willing to accept the “particles” to be quanta of energy of size
h̄ω, then it provides a model of a bosonic system with just one single-
particle state. Any whole number particles can go into that state, each
contributing energy h̄ω to the system. (The half particle representing the
ground state energy is in this interpretation considered to be a build-in
part of the single-particle-state box in figure 10.4.) Reformulating the
results of chapter 2.6.2 in quantum field theory terms: the harmonic oscil-
lator ground state h0 is the state |0i with zero particles, the excited state
h1 is the state |1i with one particle h̄ω, the excited state h2 is the state |2i
with two particles h̄ω, etcetera. The general wave function, either way, is
a linear combination of these states, expressing an uncertainty in energy.
Oops, excuse very much, an uncertainty in the number of these energy
particles!
2. A single electron has exactly two spin states. It can pick up exactly one
unit h̄ of z-momentum above the spin-down state. If you accept the “par-
ticles” to be single quanta of z-momentum of size h̄, then it provides an
example of a fermionic system with just one single-particle state. There
can be either zero or one quantum h̄ of angular momentum in the single-
particle state. The general wave function is a linear combination of the
state with one angular momentum “particle” and the state with no angu-
lar momentum “particle”. This example admittedly is quite poor, since
normally when you talk about a particle, you talk about an amount of
energy, like in Einstein’s mass-energy relation. If it bothers you, think of
the electron as being confined inside a magnetic field; then the spin-up
state is associated with a corresponding increase in energy.
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 487

While the above two examples of “relativistic” systems with only one single-
particle state are obviously made up, they do provide a very valuable sanity
check on any relativistic analysis.
Not only that, the two examples are also very useful to understand the
difference between a zero wave function and the so-called “vacuum state”

|~0i ≡ |0, 0, 0, . . .i (10.20)

in which all occupation numbers are zero. The vacuum state is a normalized,
nonzero, wave function just like the other possible sets of occupation numbers;
it describes that there are no particles with certainty. You can see it from the
two examples above: for the harmonic oscillator, the state |0i is the ground
state h0 ; for the electron-spin example, it is the spin-down state. These are
completely normal eigenstates that the system can be in. They are not zero
wave functions, which would be unable to give any probabilities.

10.2.2 Annihilation and creation operators


The key to relativistic quantum mechanics is that particles can be annihilated
or created. So it may not be surprising that it is very helpful to define operators
that “annihilate” and “create” particles .
To keep the notations relatively simple, it will initially be assumed that there
is just one type of single particle state. Graphically that means that there is
just one single particle state box, like in figure 10.4. However, there can be an
arbitrary number of particles in that box.

Definition
The desired actions of the creation and annihilation operators are sketched in
figure 10.5. An annihilation operator ab turns a state |ii with i particles into a
state |i − 1i with i − 1 particles, and a creation operator ab† turns a state |ii
with i particles into a state |i + 1i with i + 1 particles.
Mathematically, the operators are defined by the relations

ab|ii = αi |i − 1i but ab|0i = 0 ab† |ii = αi† |i + 1i but ab† |1i = 0 for fermions
(10.21)

where the αi and αi are numerical constants still to be chosen. To avoid having
to write the special cases separately each time, α0 will be defined to be zero
and, if it is a fermionic system, so will α1† ; then you do not really need to worry
about the fact that |−1i and |2i do not exist.
Note that it is mathematically perfectly OK to define linear operators by
specifying what they do to the basis states of a system. But you must hope
that they will turn out to be operators that are mathematically helpful. To help
488 CHAPTER 10. SOME ADDITIONAL TOPICS
...
abb ? 6
ab†b
ψp i i 0
abb ? 6
ab†b 6abf†
ψp i ψp y

abb ? 6
ab†b abf 6
abf†
?
ψp ψp
abb ? abf
?
0 0

Figure 10.5: Annihilation and creation operators for a system with just one type
of single particle state. Left: identical bosons; right: identical fermions.

achieve that, you want to chose the numerical constants appropriately. Consider
what happens if the operators are applied in sequence:

ab† ab|ii = αi−1 αi |ii

Reading from right to left, the order in which the operators act on the state,
first ab destroys a particle, then ab† restores it again. It gives the same state

back, except for the numerical factor αi−1 αi . That makes every state |ii an
† †
eigenvector of the operator ab ab with eigenvalue αi−1 αi .

If the constants αi−1 and αi are chosen to make the eigenvalue a real number,
then the operator ab† ab will be Hermitian. More specifically, if they are chosen to
make the eigenvalue equal to i, then ab† ab will be the “particle number operator”
whose eigenvalues are the number of particles in the single-particle state. The
most logical choice for the constants to achieve that is clearly
√ †
√ √
αi = i αi−1 = i =⇒ αi† = i + 1 except α1† = 0 for fermions
(10.22)
This choice of constants is particularly convenient since it makes the opera-
tors ab and ab† Hermitian conjugates. That means that if you take them to the
other side in an inner product, they turn into each other:
D ¯ E D ¯ E D ¯ E D ¯ E
|ii¯¯ab|ii = ab† |ii¯¯|ii |ii¯¯ab† |ii = ab|ii¯¯|ii

To see why, first note that states with different occupation numbers are taken
to be orthonormal in Fock space. If the total number of particles is given, that
follows from the classical form of the wave function. And the simple harmonic
oscillator and spin examples of the previous subsection illustrate that it still
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 489

applies when particles can be created or destroyed: these examples were just
rewrites of orthonormal wave functions.
It follows that in the first equality above, the inner products are only nonzero
if i = i + 1: after lowering the particle number with ab, or raising it with ab† , the
particle numbers must √ be the same at both sides of the inner product. When

i = i + 1, αi = αi = i so the equality still applies. The second equality is just
the complex conjugate of the first, with a change in notations.
√ It remains true

for fermions, despite the fact that αf 1 = 0 instead of 2, because there is no
|2i state for which it would make a difference. Also, if it is true for the basis
states, it is true for any combination of them.
You may well wonder why ab† ab is the particle count operator; why not abab† ?
The reason is that abab† would not work for the state |0i unless you took α0† or
α1 zero, and then they could no longer create or annihilate the corresponding
state. Still, it is interesting to see what the effect of abab† is; according to the
chosen definitions, for bosons
√ √
abb ab†b |ii = i + 1 i + 1|ii

So the operator abb ab†b has eigenvalues one greater than the number of particles.
That means that if you subtract abb ab†b and ab†b abb , you get the unit operator that
leaves all states unchanged. And the difference between abb ab†b and ab†b abb is by
definition the commutator of abb and ab†b , indicated by square brackets:

[abb , ab†b ] ≡ abb ab†b − ab†b abb = 1 (10.23)

Isn’t that cute! Of course, [abb , abb ] and [abb , abb ] are zero since everything commutes
with itself. √
It does not work for fermions, because αf† 1 = 0 instead of 2. But for
fermions, the only state for which abf abf† produces something nonzero is |0i and
then it leaves the state unchanged. Similarly, the only state for which abf† abf
produces something nonzero is |1i and then it leaves that state unchanged.
That means that if you add abf abf† and abf† abf together, it reproduces the same state
state whether it is |0i or |1i (or any combination of them). The sum of abf abf† and
abf† abf is by definition called the “anticommutator” of abf and abf† and is indicated
by curly brackets:
{abf , abf† } ≡ abf abf† + abf† abf = 1 (10.24)

Isn’t that neat? Note also that {abb , abb } and {abb , abb } are zero since applying
either operator twice ends up in a non-existing state.
How about the Hamiltonian? Well, for noninteracting particles the energy of
i particles is i times the single particle energy E p . And since the operator that
490 CHAPTER 10. SOME ADDITIONAL TOPICS

gives the number of particles is ab† ab, that is E p ab† ab. So, the total Hamiltonian
for noninteracting particles becomes:
p
H = E ab† ab + Eve (10.25)
where Eve stands for the vacuum, or ground state, energy of the system where
there are no particles. This then allows the Schrödinger equation to be written
in terms of occupation numbers and creation and annihilation operators.

The caHermitians
It is important to note that the creation and annihilation operators are not
Hermitian, and therefor cannot correspond to physically measurable quantities.
But since they are Hermitian conjugates, it is easy to form Hermitian operators
from them:
Pb ≡ 21 (ab† + ab) Qb ≡ 1 i(a
2
b† − ab) (10.26)
Conversely, the creation and annihilation operators can be written as
ab = Pb + iQ
b ab† = Pb − iQ
b (10.27)
In lack of a better name that the author knows of, this book will call Pb and Q
b
the caHermitians.
For bosons, the following commutators follow from the ones for the creation
and annihilation operators:
[Pbb , Q
b ] = 1i
b 2
[ab†b abb , Pbb ] = −iQ
b
b [ab†b abb , Q
b ] = iPb
b b (10.28)

Therefor Pbb and Qb neither commute with each other, nor with the Hamiltonian.
b
It follows that whatever physical variables they may stand for will not be certain
at the same time, and will develop uncertainty in time if initially certain.
The Hamiltonian (10.25) for noninteracting particles may be written in terms
of the caHermitians using (10.27) and (10.28), to give
³ ´
p
H=E Pbb2 + Q
b2 −
b
1
2
+ Eve (10.29)

Often the energy turns out to be simply proportional to Pbb2 + Q b 2 ; then the
b
vacuum energy must be half a particle.
For fermions, the following useful relations follow from the anticommutators
for the creation and annihilation operators
Pbf2 = 1
4
b2 =
Qf
1
4
(10.30)
The Hamiltonian then becomes
³ ´
p
H=E i[Pbf , Q
b ]+
f
1
2
+ Eve (10.31)
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 491

Examples
It is interesting to see how these ideas work out for the two example systems
with just one single-particle state as described at the end of subsection 10.2.1.
Consider first the example of bosons that are energy quanta of a one-dimen-
sional harmonic oscillator. The following discussion will derive the harmonic
oscillator solution from scratch using the creation and annihilation operators.
It provides an alternative to the much more algebraic derivation of chapter 2.6
and its note {A.12}.
The classical Hamiltonian can be written, in the notations of chapter 2.6,
1 2 1
H= pb + 2 mω 2 xb2
2m
or in terms of h̄ω: Ã !
pb2 mω xb2
H = h̄ω +
2h̄mω 2h̄
From comparison with (10.29), it looks like maybe the caHermitians are
r s
mω 1
Pb = x b =
Q pb
2h̄ 2h̄mω
For now just define them that way; also define Eve = 21 h̄ω and

ab = Pb + iQ
b ab† = Pb − iQ
b

It has not yet been shown that ab and ab† are annihilation and creation operators.
Nor that the Hamiltonian can be written in terms of them, instead of using Pb
and Q. b
However, the commutator [Pb , Q]b is according to the canonical commutator
i 12 , which is just as it should be. That in turn implies that [ab, ab† ] = 1. Then the
Hamiltonian can indeed be written as

H = h̄ω(ab† ab + 21 )

To see whether ab† is a creation operator, apply the Hamiltonian on a state


ab† |ii where |ii is some arbitrary energy state whose further properties are still
unknown:
H ab† |ii = h̄ω(ab† abab† + 21 ab† )|ii
But abab† can be written as ab† ab + 1 because of the unit commutator, so

H ab† |ii = h̄ω ab† (ab† ab + 21 )|ii + h̄ω ab† |ii

The first term is just ab† times the Hamiltonian applied on |ii, so this term
multiplies ab† |ii with the energy Ei of the original |ii state. The second term
492 CHAPTER 10. SOME ADDITIONAL TOPICS

multiplies ab† |ii also by a constant, but now h̄ω. It follows that ab† |ii is an energy
eigenstate like |ii, but with an energy that is one quantum h̄ω higher. That does
assume that ab† |ii is nonzero, because eigenfunctions may not be zero. Similarly,
it is seen that ab|ii is an energy eigenstate with one quantum h̄ω less in energy,
if nonzero. So ab† and ab are indeed creation and annihilation operators.
Keep applying ab on the state |ii to lower its energy even more. This must
eventually terminate in zero because the energy cannot become negative. (That
assumes that the eigenfunctions are mathematically reasonably well behaved, as
the solution of chapter 2.6 verifies they are. That can also be seen without using
these solutions, so it is not cheating.) Call the final nonzero state |0i. Solve the
fairly trivial equation ab|0i = 0 to find the lowest energy state |0i = h0 (x) and
note that it is unique. (And that is the same one as derived in chapter 2.6.)
Use ab† to go back up the energy scale and find the other energy states |ii = hi
for i = 1 ,2, 3, . . . Verify using the Hamiltonian that going down in energy with
ab and then up again with ab† brings you back to a multiple of the original state,
not to some other state with the same energy or to zero. Conclude therefor that
all energy states have now been found. And that their energies are spaced apart
by whole multiples of the quantum h̄ω.
While doing this, it is convenient to know not just that ab|ii produces a
multiple of the state |i − 1i, but also what multiple αi that is. Now the αi can
be made real and positive by a suitable choice of the normalization factors of
the various energy eigenstates |ii. Then the αi† are positive too because αi−1 †
αi
produces the number of energy quanta in the Hamiltonian. The magnitude can
be deduced from the square norm of the state produced. In particular, for ab|ii:
D ¯ E D ¯ E
¯ ¯
αi∗ αi ab|ii¯ab|ii = |ii¯ab† ab|ii = i

† †
the first because the Hermitian conjugate of √ ab is ab and the √ because ab ab
latter

must give the number of quanta. So αi = i, and then αi = i + 1 from the
above. The harmonic oscillator has been solved. This derivation using the ideas
of quantum field theory is much neater than the classical one; just compare the
very logical story above to the algebraic mess in note {A.12}.
It should be noted, however, that a completely equivalent derivation can be
given using the classical description of the harmonic oscillator. Many books do
in fact do it that way, e.g. [10]. In the classical treatment, the creation and
annihilation operators are called the “ladder” operators. But without the ideas
of quantum field theory, it is difficult to justify the ladder operators by anything
better than as a weird mathematical trick that just turns out to work.
If you have read the advanced section on angular momentum, the example
system for fermions is also interesting. In that model system, the Hamiltonian is
a multiple of the angular momentum in the z-direction of an electron. The state
|0i is the spin-down state ↓ and the state |1i is the spin-up state ↑. Now the
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 493

annihilation operator must turn ↑ into ↓ and ↓ into zero. In terms of the so-called
Pauli spin matrices of section 9.1.7, the operator that does that is 12 (σx − iσy ).
Similarly, the creation operator is 21 (σx + iσy ). That makes the caHermitians
1
σ and − 12 σy . The commutator i[P, Q] that appears in the Hamiltonian is then
2 x
1
σ , which is a multiple of the angular momentum in the z-direction as it should
2 z
be.

More single particle states

Now consider the case that there is more than one type of single-particle state.
Graphically there is now more than one particle box, as in figures 10.2 and
10.3. Then an annihilation operator abn and a creation operator ab†n must be
defined for each type of single-particle state ψnp . In other words, there is one for
each occupation number in . The mathematical definition of these operators for
bosons is

abb,n |i1 , i2 , . . . , in−1 , in , in+1 , . . .i = in |i1 , i2 , . . . , in−1 , in − 1, in+1 , . . .i

ab†b,n |i1 , i2 , . . . , in−1 , in , in+1 , . . .i = in + 1|i1 , i2 , . . . , in−1 , in + 1, in+1 , . . .i
(10.32)
The commutation relations are
h i h i h i
abb,n , abb,n = 0 ab†b,n , ab†b,n = 0 abb,n , ab†b,n = δnn (10.33)

Here δnn is the Kronecker delta, equal to one if n = n, and zero in all other
cases. These commutator relations apply for n 6= n because then the operators
do unrelated things to different single-particle states, so it does not make a
difference in which order you apply them. For example, abb,n abb,n = abb,n abb,n , so
the commutator abb,n abb,n − abb,n abb,n is zero. For n = n, they are unchanged from
the case of just one single-particle state.
For fermions it is a bit more complex. The graphical representation of the
example fermionic energy eigenfunction figure 10.3 cheats a bit, because it sug-
gests that there is only one classical wave function for a given set of occupation
numbers. Actually, there are two logical ones, based on how the particles are
ordered; the two are the same except that they have the opposite sign. Suppose
that you create a particle in a state n; classically you would want to call that
particle 1, and then create a particle in a state n, classically you would want to
call it particle 2. Do the particle creation in the opposite order, and it is particle
1 that ends up in state n and particle 2 that ends up in state n. That means that
the classical wave function will have changed sign, but the occupation-number
wave function will not unless you do something. What you can do is define the
494 CHAPTER 10. SOME ADDITIONAL TOPICS

annihilation and creation operators for fermions as follows:

abf,n |i1 , i2 , . . . , in−1 , 0, in+1 , . . .i = 0

abf,n |i1 , i2 , . . . , in−1 , 1, in+1 , . . .i = (−1)i1 +i2 +...+in−1 |i1 , i2 , . . . , in−1 , 0, in+1 , . . .i

ab†f,n |i1 , i2 , . . . , in−1 , 0, in+1 , . . .i = (−1)i1 +i2 +...+in−1 |i1 , i2 , . . . , in−1 , 1, in+1 , . . .i

ab†f,n |i1 , i2 , . . . , in−1 , 1, in+1 , . . .i = 0


(10.34)
The only difference from the annihilation and creation operators for just one
type of single-particle state is the potential sign changes due to the (−1)... . It
adds a minus sign whenever you swap the order of annihilating/creating two
particles in different states. For the annihilation and creation operators of the
same state, it may change both their signs, but that does nothing much: it leaves
the important products such as ab†n abn and the anticommutators unchanged.
Of course, you can define the annihilation and creation operators with what-
ever sign you want, but putting in the sign pattern above may produce easier
mathematics. In fact, there is an immediate benefit already for the anticommu-
tator relations; they take the same form as for bosons, except with anticommu-
tators instead of commutators:
n o n o n o
abf,n , abf,n = 0 ab†f,n , ab†f,n = 0 abf,n , ab†f,n = δnn (10.35)

These relationships apply for n 6= n exactly because of the sign change caused
by swapping the order of the operators. For n = n, they are unchanged from
the case of just one single-particle state.
The Hamiltonian for a system of non interacting particles is now found by
summing over all types of single-particle states:
X p
H= E n ab†n abn + Eve,n (10.36)
n

This Hamiltonian preserves the number of particles in each state; particles de-
stroyed by the annihilation operator are immediately restored by the creation
operator. Phrased differently, this Hamiltonian commutes with the operator
ab†n abn giving the number of particles in a state ψnp , so energy eigenstates can be
taken to be states with given numbers of particles in each single-particle state.
Such systems can be described by classical quantum mechanics. But the next
subsection will give an example of particles that do interact, making quantum
field theory essential.
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 495

10.2.3 Quantization of radiation


In the discussion of the emission and absorption of radiation in chapter 5.2.3, the
electromagnetic field was the classical Maxwell one. However, that cannot be
right. According to the Planck-Einstein relation, the electromagnetic field comes
in discrete quanta of energy h̄ω called photons. A classical electromagnetic field
cannot explain that.
The electromagnetic field must be described by operators that act nontriv-
ially on a wave function that includes photons. To identify these operators will
involve two steps. First a more careful look needs to be taken at the classical
electromagnetic field, and in particular at its energy. By comparing that with
the Hamiltonian in terms of creation and annihilation operators, as given in the
previous section, the operators corresponding to the electromagnetic field can
then be inferred.
To make the process more intuitive, it helps to initially assume that the
electromagnetic field is not in the form of a traveling wave, but of radiation
confined to a suitable box of volume V .

Classical energy
The discussion on emission and absorption of radiation in chapter 5.2.3 assumed
the electromagnetic field to be the single traveling wave
~ = k̂E0 cos(ωt − φ − ky)
E ~ = ı̂E0 cos(ωt − φ − ky)
cB

where E~ and B ~ are the electric and magnetic fields, respectively, ω is the fre-
quency, k is the wave number ω/c, and φ is an arbitrary constant phase angle.
This wave is moving in the positive y direction with the speed of light c. To get
a standing wave, add a second wave going the other way:
~ = −k̂E0 cos(ωt − φ + ky)
E ~ = ı̂E0 cos(ωt − φ + ky).
cB

If these are added together, using the trig formula for separating the cosines
into time-only and space-only sines and cosines, the result is
~ = k̂ek (t) sin(ky)
E ~ = ı̂bk (t) cos(ky)
cB (10.37)

ek (t) = 2E0 sin(ωt − φ) bk (t) = 2E0 cos(ωt − φ) (10.38)


Alternatively, just plug the assumption (10.37) directly into Maxwell’s equa-
tions. That produces
dek dbk
= ωbk = −ωek (10.39)
dt dt
The solution to those equations are the coefficients (10.38) above.
496 CHAPTER 10. SOME ADDITIONAL TOPICS

Such a standing wave solution is appropriate for a box with perfectly con-
ducting walls at y = 0 and y = ℓy , where sin(kℓy ) = 0. On perfectly conducting
~ must be zero on behalf of Ohm’s law. For the other
walls the electric field E
surfaces of the box, just assume periodic boundary conditions over some chosen
periods ℓx and ℓz , their length does not make a difference here.
Next, it is shown in basic textbooks on electromagnetics that the energy in
an electromagnetic field is
Z
EV = 1
ǫ ~ 2 + c2 B
E ~ 2 d3~r (10.40)
2 0
V

where ǫ0 = 8.85 10−12 C2 /J m is the permittivity of space. This energy is typ-


ically derived by comparing the energy obtained from discharging a condenser
with the electric field it holds when charged, and from a coil compared to its
magnetic field. Note that this expression implies that the energy of the electric
field of a point charge is infinite.
As an aside, if the energy EV is differentiated with respect to time, substi-
tuting in Maxwell’s equations to get rid of the time derivatives, and cleaning
up, the result is Z
dEV ~ × cB)
~ d3~r
= −ǫ0 c ∇ · (E
dt V
From the divergence theorem it can now be seen that the flow rate of electro-
magnetic energy is given by the “Poynting vector”
~ ×B
ǫ0 c 2 E ~ (10.41)

This was included because other books have Poynting vectors and you would
be very disappointed if yours did not.
The important point here is that in terms of the coefficients ek and bk , the
energy is found to be
ǫ0 V ³ 2 ´
EV = ek + b2k (10.42)
4

Quantization
Following the Planck-Einstein relation, electromagnetic radiation should come
in photons, each with one unit h̄ω of energy. This indicates that the energy in
the electromagnetic field is not a classical value, but corresponds to the discrete
eigenvalues of some as yet unknown Hamiltonian operator. The question then
is, what is that Hamiltonian?
Unfortunately, there is no straightforward way to deduce quantum mechanics
operators from mere knowledge of the classical approximation. Vice-versa is
not a problem: given the operators, it is fairly straightforward to deduce the
corresponding classical equations for a macroscopic system. It is much like at
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 497

the start of this book, where it was postulated that the momentum of a particle
corresponds to the operator h̄∂/i∂x. That was a leap of faith. However, it was
eventually seen, in chapter 5, that it did produce the correct classical momentum
for macroscopic systems, as well as correct quantum results like the energy levels
of the hydrogen atom, in chapter 3.2. A similar leap of faith will be needed to
quantize the electromagnetic field.
Whatever the details of the Hamiltonian, it is clear that the appropriate
mathematical tool is here quantum field theory. After all, photons are not
preserved particles. Atoms readily absorb them or radiate new ones when they
heat up. Now the system considered here involves only one mode of radiation;
therefor the wave function can be indicated by the simple Fock space ket |ii,
where i is the number of photons present in the mode. Also, since the single-
photon energy is h̄ω, the quantum field Hamiltonian operator (10.29) becomes
³ ´
H = h̄ω Pb 2 + Q
b2 − 1
2
+ Eve

Somehow then, the caHermitian operators Pb and Q b must be identified. They


must produce the classical equations of motion in the macroscopic limit, includ-
ing the classical energy (10.42),
ǫ0 V ³ 2 ´
EV = ek + b2k
4
Comparing the two under macroscopic conditions in which 12 h̄ω and the
ground state energy Eve can be ignored, the very simplest assumption is that
the caHermitians are scaled versions of the coefficients ek and bk :

ek → 2εp P = εp (ab† + a) bk → 2εp Q = εp i(ab† − ab) (10.43)

where the scaling factor must be


s
h̄ω
εp = (10.44)
ǫ0 V
This scaling factor can be thought of as the square root of the nominal mean
square electric field per photon.
If this association of mode coefficients with caHermitian operators is indeed
correct even under non macroscopic conditions, one immediate consequence is
that the ground state energy must be equal to that of half a photon. And that is
just for the single mode considered here. Since there are infinitely many modes
of radiation, the total vacuum energy is infinite.
While that is certainly counterintuitive, it may be noted that even classically,
the energy in the electromagnetic field is infinite, assuming that electrons are
indeed point charges. On the other hand, the caHermitians are the Hermitian
498 CHAPTER 10. SOME ADDITIONAL TOPICS

components of the very logically and simply defined creation and annihilation
operators, and you would really expect them to be physically meaningful. They
certainly were for the harmonic oscillator and spin system examples of subsec-
tion 10.2.2.
Therefor, the assumption will be made that the scaled caHermitians appear
in the quantized electromagnetic field where the measurable quantities ek and
bk appear in the classical electromagnetic field:

~b r) = k̂εp sin(ky)(ab† + ab)


E(~ cB(~
b
~ r) = ı̂εp cos(ky)i(ab† − ab) (10.45)

This process of replacing the coefficients of the modes by operators is called


“second quantization.” No, there was no earlier quantization of the electromag-
netic field involved. The word “second” is there for historical reasons: physicists
have historically found it hysterical to confuse students.
Note that just like the time-dependent momentum of a classical particle p(t)
becomes the time-independent operator h̄∂/i∂x, the creation and annihilation
operators are taken to be time-independent. In quantum mechanics, the time
dependence is in the wave function, not the operators:
X
|Ψi = ci eiEi t/h̄ |ii
i

where the energy Ei of the state with i photons is (i + 12 )h̄ω. (The Heisenberg
picture absorbs the time dependence in the operator; that is particularly conve-
nient for relativistic applications. However, true relativity is beyond the scope
of this book and this discussion will stay as close to the normal Schrödinger
picture as possible.)
To see whether this quantization of the electromagnetic field does indeed
make sense, its immediate consequences will now be explored. First consider
the Hamiltonian according to the Newtonian (or is that Maxwellian?) analogy:
Z
H= 1
ǫ ~b 2 + c2 B
E ~b 2 d3~r
2 0
V

Substituting in the quantized electromagnetic field, (10.44) and (10.45), and


integrating,
h̄ω ³ 1 ´
H = 21 ǫ0 V (ab† + a
b )2 − 1 V (a b† + a
b )2
ǫ0 V 2 2

Multiplying out gives ³ ´


H = 14 h̄ω 2ab† ab + 2abab†
and using the commutator abab† − ab† ab = 1 to get rid of abab† gives
³ ´
H = h̄ω ab† ab + 1
2
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 499

which is just as it should be.


Also consider the equation for the expectation value hP i:

dhP i i
= h[H, P ]i = ωhQi
dt h̄
using (10.28) and (10.29). Similarly for the expectation value hQi:

dhQi i
= h[H, Q]i = −ωhP i
dt h̄
Those are the same equations as satisfied by ek and bk . They are also the
equations for the position and linear momentum of a harmonic oscillator, when
suitably scaled. It is often said that the mode amplitudes of the electromagnetic
field are mathematically modelled as quantum harmonic oscillators.
Now suppose there are exactly i photons, what is the expectation value of
the electric field at a given position and time? Well, the wave function will be
1
|Ψi = ci ei(i+ 2 )ωt |ii

and so
1 1
~ r, t) = k̂εp sin(ky)c∗ ei(i+ 2 )ωt ci e−i(i+ 2 )ωt hi|ab† + ab|ii
hEi(~ i

That is zero because ab† and ab applied on |ii create states |i + 1i and |i − 1i that
are orthogonal to the |ii in the left side of the inner product. The same way
the expectation magnetic field is zero!
Oops, not quite as expected. In fact, the previous subsection pointed out
that the caHermitians do not commute with the Hamiltonian. If the number
of photons, hence the energy, is certain, then the electromagnetic field is not.
And the caHermitians also do not commute with each other; if the electric field
is certain, then the magnetic field is not, and vice-versa.
To get something resembling a classical electric field, there must be uncer-
tainty in energy. In particular, if the coefficients of multiple energy states are
nonzero, then the expectation values of the electric and magnetic fields become
X√ ³ ´
~ r, t) = k̂εp sin(ky)
hEi(~ i c∗i ci−1 eiωt + ci c∗i−1 e−iωt
i
X√ ³ ´
~ r, t) = ı̂εp cos(ky)i
hBi(~ i c∗i ci−1 eiωt − ci c∗i−1 e−iωt
i

To check this, observe that for any i, hi|ab† |ii is only nonzero if i = i − 1 and the
same for its complex conjugate hi|ab|ii. Renotating
X√
ici c∗i−1 ≡ iC1 eiφ1
i
500 CHAPTER 10. SOME ADDITIONAL TOPICS

where C1 and φ1 are real constants produces


~ r, t) = k̂2εp C1 sin(ky) sin(ωt − φ1 )
hEi(~
~ r, t) = ı̂2εp C1 sin(ky) cos(ωt − φ1 )
hcBi(~

That is in the form of the classical electromagnetic field (10.37).


Similarly it may be found that
h i
hE 2 i(~r, t) = 2ε2p sin2 (ky) hii + 12 + C2 sin(2ωt − 2φ2 )
h i
hc2 B 2 i(~r, t) = 2ε2p cos2 (ky) hii + 21 − C2 sin(2ωt − 2φ2 )

where the expectation number of photons and the constants are defined by
X X q
|ci |2 i ≡ hii ci c∗i−2 i(i − 1) ≡ iC2 ei2φ2
i i

Note that the mean square expectation electric field is ε2p per photon, with half
a photon left in the ground state.
Consider now a “photon packet” in which the numbers of photons with
nonzero probabilities are restricted to a relatively narrow range. However, as-
sume that the range is still large enough so that the coefficients can vary slowly
from one number of photons to the next, except for possibly a constant phase
difference:
π
ci−1 ≈ ci e−i(φ− 2 )
Then q π
C1 ≈ hii C2 ≈ hii φ1 ≈ φ 2φ2 ≈ 2φ +
2
In that case, the expectation electromagnetic field above becomes the classical
one, with an energy of about hii photons.

Photon spin
If photons act as particles, they should have a value of spin. Physicists have
concluded that the appropriate spin operators are

Sbx = −ih̄(̂k̂· − k̂̂·) Sby = −ih̄(k̂ı̂· − ı̂k̂·) Sbz = −ih̄(ı̂̂· − ̂ı̂·) (10.46)

Note the dots. These operators need to act on vectors and then produce vectors
times dot products. (In terms of linear algebra, the unit vectors above are
multiplied as tensor products.)
If the above operators are correct, they should satisfy the fundamental com-
mutation relations (4.19). They do. For example:

[Sbx , Sby ] = −h̄2 (̂k̂· − k̂̂·)(k̂ı̂· − ı̂k̂·) + h̄2 (k̂ı̂· − ı̂k̂·)(̂k̂· − k̂̂·)
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 501

Since the unit vectors are mutually orthogonal, when multiplying out the first
term only the dot product k̂ · k̂ = 1 is nonzero, and the same for the second
term. So
[Sbx , Sby ] = −h̄2 ̂ı̂· + h̄2 ı̂̂· = ih̄Sbz
The next question is how to apply them, given that the wave function of
photons is a Fock space ket without a precise physical interpretation. However,
following the ideas of quantum mechanics, presumably photon wave functions
can be written as linear combinations of wave functions with definite electric
fields. All the ones for the single mode considered here have an electric field
that is proportional to k̂, and the operators above can be applied on that. But
unfortunately, k̂ is not an eigenvector of any of the spin components above.
However, even given the direction of wave propagation and wave number,
there are still two different modes: the electric field could be fluctuating in the x-
direction instead of the z-direction. (Fluctuating in an oblique direction is just a
linear combination of these two independent modes, and not another possibility.)
In short, any of the considered electromagnetic modes can be rotated 90 degrees
around the y-axis to give a second mode with an electric field proportional to ı̂.
If these modes are combined pairwise in the combination k̂ + iı̂ it produces an
eigenstate of Sby with spin h̄, while k̂ − iı̂ produces an eigenstate with spin −h̄.
That can easily be checked by direct substitution in the eigenvalue problem.
Note that there are only two independent states, so that is it. There is no
third state with spin zero in the y-direction, the direction of wave propagation.
The missing state reflects the classical limitation that the electric field cannot
have a component in the direction of wave propagation. However, it can be seen
using the analysis of chapter 9.1 that suitable combinations of equal amounts
of spin forward and backward in y can produce photons with zero spin in a
direction normal to the direction of propagation.
One thing should still be checked: that the magnetic field does not conflict.
Now, if the electric field is rotated from k̂ to ı̂, the magnetic field rotates from ı̂
to −k̂. So the eigenstates have magnetic fields proportional to ı̂ − ik̂ and ı̂ + ik̂.
That are just −i, respectively i times the vectors of the electric field, so even
including the magnetic field the states remain eigenstates.

Traveling waves
To get the quantized electromagnetic field of traveling waves, the quickest way
to get there is to take the standing wave apart using
eiky − e−iky eiky + e−iky
sin ky = cos ky =
2i 2
Then the parts that propagate forwards in y must be of the form abeiky or ab† e−iky .
To see why, just look at the expectation values of these terms for the generic
502 CHAPTER 10. SOME ADDITIONAL TOPICS

P
wave function i ci e−iiωt |ii.
A single wave of wave number k moving in the positive y-direction and
polarized in the z-direction therefor takes the form

~b r) = k̂ε′p i(ab† e−iky − abeiky )


E(~ cB(~
b
~ r) = ı̂ε′p i(ab† e−iky − abeiky ) (10.47)

where the scaling constant is


s
h̄ω
ε′p = (10.48)
2ǫ0 V ′

For a traveling wave, it is more physical to assume that it is in a box that is


periodic in the y-direction; that is true for a box with twice the length, hence a
volume V ′ = 2V . The constant ε′p can be thought of as the square root of half
the mean square electric field per photon.
However, in general there will be a similar wave polarized in the x-direction.
And then there will be pairs of such waves for different directions of propagation
and different wave numbers k. To describe all these waves, it is convenient to
combine wave number and direction of propagation into a wave number vector
~k that has the magnitude of k and the direction of wave propagation. Then the
complete electromagnetic field operators become
2
X X
~b r) = ε′p
E(~ (−1)µ ı̂µ
~ ~
i(ab~†k,µ e−ik·~r − ab~k,µ eik·~r ) (10.49)
µ=1 ~k

2
X X
b
~ r) = ε′p ~ ~
cB(~ ı̂3−µ i(ab~†k,µ e−ik·~r − ab~k,µ eik·~r ) (10.50)
µ=1 ~k

where the unit vector ı̂1 must be chosen normal to the direction of wave propa-
gation and ı̂2 = ı̂1 × ~k/k.

10.2.4 Spontaneous emission


In this subsection, the spontaneous emission rate of excited atoms will be de-
rived. It may be recalled that this was done in chapter 5.2.10 following an
argument given by Einstein. However, that was cheating: it peeked at the an-
swer for blackbody radiation. This section will verify that a quantum treatment
gives the same answer.
Like in chapter 5.2, consider the interaction of an atom with electromag-
netic radiation, but this time, do it right, using the quantized electromagnetic
field instead of the classical one. The approach will again be to consider the
interaction for a two state system involving a single electromagnetic wave and
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 503

two energy levels of the atom. The total effects should then again follow from
summation over all waves and energy levels.
The most appropriate energy states are now

ψ1 = ψL |i + 1i ψ2 = ψH |ii (10.51)

In state ψ1 the atom is in the lower energy state and there are i + 1 photons
in the wave; in state ψ2 , the atom is excited, but one photon has disappeared.
Note that these wave functions depend on both the atom energy level and on
the number of photons.
The Hamiltonian is now

H = H0 + h̄ω(ab† ab + 21 ) + ezε′p i(ab† − ab) (10.52)

where H0 is the Hamiltonian of the unperturbed atom, the second term is the
Hamiltonian of the electromagnetic field, and the final term is the eEbz z inter-
action energy of the electron with the electric field, but now quantized as in
(10.47). It was again assumed that interaction with the magnetic field can be
ignored and that the atom is small enough compared to the electromagnetic
wave length that it can be taken to be at the origin.
Next the Hamiltonian matrix coefficients are needed. The first one is
¯³ ´
H11 = hi + 1|ψL ¯¯ H0 + h̄ω(ab† ab + 21 ) + ezε′p i(ab† − ab) ψL |i + 1i

The first term of the Hamiltonian acts on the spatial state and produces the
lower atom energy level. The second term acts on the Fock space ket to produce
the electromagnetic energy of i + 1 photons. The final term produces zero,
because of the symmetry of the atom eigenstates. Alternatively, it produces
zero because ab† and ab produce states |i + 2i and |ii that are orthogonal to the
|i + 1i in the left side of the inner product. So

H11 = EL + (i + 1 + 12 )h̄ω

and similarly
H22 = EH + (i + 12 )h̄ω
The remaining Hamiltonian matrix coefficient is
¯³ ´
H12 = hi + 1|ψL ¯¯ H0 + h̄ω(ab† ab + 21 ) + ezε′p i(ab† − ab) ψH |ii

Here the first two terms produce zero, because ψ1 and ψ2 are orthonormal
eigenstates of these operators. The third term produces

H12 = iε′p hψl |ez|ψR i i + 1
504 CHAPTER 10. SOME ADDITIONAL TOPICS

because the creation operator ab† turns |ii into i + 1|i + 1i.
As in chapter 5.2, a Hamiltonian of a simplified two-state system may be
defined as R
−i (H22 −H11 )dt/h̄
H 12 ≡ H12 e
where from the above

H22 − H11 = EH − EL − h̄ω = h̄(ω0 − ω)

where ω0 is the frequency of the photon released in a transition from the higher
to the lower atom energy state. The simplified two state system becomes

ε′p √ ε′ √
ā˙ = hψl |ez|ψR i i + 1ei(ω−ω0 )t b̄ b̄˙ = − p hψl |ez|ψR i∗ i + 1e−i(ω−ω0 )t ā
h̄ h̄
where ā and b̄ give the probabilities of states ψ1 and ψ2 respectively.
Consider a system that starts out with the atom in the excited state, a0 = 0
and b0 = 1. Then, if the perturbation is weak over the time that it acts, b̄ can
be approximated as one in the first equation, and the transition probability to
the lower atom energy state ψ2 is found to be
à !2
¯ ¯2
¯ ¯ 1
4(i + 1)ε′p 2 1
2 2 sin 2 (ω − ω0 )t
¯ā¯ = |hψl |ez|ψR i| t
4
h̄2 1
2
(ω − ω0 )t

For a large number of photons i, this is the same as the classical result (5.33),
because 4ε′p 2 is the peak square electric field per photon.
But now consider the electromagnetic ground state where the number of
photons i is zero. The transition probability above is as if there is still one
photon of electromagnetic energy left. And as noted in chapter 5.2.10, that
is exactly what is needed to explain spontaneous emission using the quantum
equations.
Some additional observations may be interesting. While you may think of
it as excitation by the ground state electromagnetic field, the actual energy of
the ground state was earlier seen to be half a photon, not one photon. And
the zero level of energy should not affect the dynamics anyway. According to
the analysis here, spontaneous emission is a twilight effect: the Hamiltonian
coefficient H12 is the energy if the atom is not excited if the atom is excited.
Think of it in classical common sense terms. There is an excited atom and no
photons around it. (Or if you prefer, the number of photons is as low as it can
ever get. Classical common sense would make that zero.) Why would things
ever change? But in quantum mechanics, the twilight term allows the excited
atom to interact with the electromagnetic radiation of the photon that would
be there if it was not excited. Sic.
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 505

10.2.5 Field operators


As noted at the start of this section, quantum field theory is particularly suited
for relativistic applications because the number of particles can vary. However,
in relativistic applications, it is often best to work in terms of position coordi-
nates instead of single-particle energy eigenfunctions. Relativistic applications
must make sure that matter does not exceed the speed of light, and that coor-
dinate systems moving at different speeds are physically equivalent and related
through the Lorentz transformation. These conditions are posed in terms of
position and time.
To handle such problems, the annihilation and creation operators can be
converted into so-called “field operators” that annihilate or create particles at
a given position in space. Now classically, a particle at a given position ~r0
corresponds to a wave function that is a delta function, Ψ = δ(~r − ~r0 ), chapter
5.3. A delta function can be written in terms of the single-particle eigenfunctions
P
ψn as cn ψn . Here the constants can be found from taking inner products;
cn = hψn |Ψi, and that gives cn = ψn∗ (~r0 ) because of the property of the delta
function to pick out that value of any function that it is in an inner product with.
Since cn is the amount of eigenfunction ψn that must be annihilated/created to
annihilate/create the delta function at ~r0 , the field operators become
X X
ab(~r) = ψn∗ (~r)abn ab† (~r) = ψn∗ (~r)ab†n (10.53)
n n

where the subscript zero was dropped from ~r since it is no longer needed to
distinguish from the independent variable of the delta function. It means that
~r is now the position at which the particle is annihilated/created.
In the case of particles in free space, the energy eigenfunctions are the mo-
mentum eigenfunctions ei~pb·~r/h̄ , and the sums become integrals called the Fourier
transforms; see chapter 5.3 and 5.4.1 for more details. In fact, unless you are
particularly interested in converting the expression (10.36) for the Hamiltonian,
basing the field operators on the momentum eigenfunctions works fine even if
the particles are not in free space.
A big advantage of the way the annihilation and creation operators were de-
fined now shows up: their (anti)commutation relations are effectively unchanged
in taking linear combinations. In particular
h i h i h i
abb (~r)abb (~r′ ) = 0 ab†b (~r)ab†b (~r′ ) = 0 abb (~r)ab†b (~r′ ) = δ 3 (~r − ~r′ )
(10.54)
n o n o n o
abf (~r)abf (~r′ ) = 0 abf† (~r)abf† (~r′ ) = 0 abf (~r)abf† (~r′ ) = δ 3 (~r − ~r′ )
(10.55)
506 CHAPTER 10. SOME ADDITIONAL TOPICS

In other references you might see an additional constant multiplying the three-
dimensional delta function, depending on how the position and momentum
eigenfunctions were normalized.
The field operators help solve a vexing problem in relativistic quantum me-
chanics; how to put space and time on equal footing as relativity needs. The
classical Schrödinger equation ih̄ψt = Hψ treats space and time quite different;
the spatial derivatives, in H, are second order, but the time derivative is first
order. The first-order time derivative allows you to think of the time coordinate
as simply a label on different spatial wave functions, one for each time, and ap-
plication of the spatial Hamiltonian produces the change from one spatial wave
function to the next one, a time dt later. Of course, you cannot think of the
spatial coordinates in the same way; even if there was only one spatial coordi-
nate instead of three: the second order spatial derivatives do not represent a
change of wave function from one position to the next.
As section 9.2 discussed, for spinless particles, the Schrödinger equation can
be converted into the Klein-Gordon equation, which turns the time derivative
to second order by adding the rest mass energy to the Hamiltonian, and for
electrons, the Schrödinger equation can be converted into the Dirac equation by
switching to a vector wave function, which turns the spatial derivatives to first
order. But there are problems; for example, the Klein-Gordon equation does not
naturally preserve probabilities unless the solution is a simple wave; the Dirac
equation has energy levels extending to minus infinity that must be thought of
as being already filled with electrons to prevent an explosion of energy when the
electrons fall down those states. Worse, filling the negative energy states would
not help for bosons, since bosons do not obey the exclusion principle.
The field operators turn out to provide a better option, because they allow
both the spatial coordinates and time to be converted into labels on annihilation
and creation operators. It allows relativistic theories to be constructed that treat
space and time in a uniform way.

10.2.6 An example using field operators


This example exercise from Srednicki [17, p. 11] compares quantum field theory
to the classical formulation of quantum mechanics. The objective is to convert
the classical spatial Schrödinger equation for I particles,
 Ã ! 
∂Ψ X I
h̄2 2 XI X I
ih̄ = ∇i + Vext (~ri ) + 12 V (~ri − ~ri ) Ψ (10.56)
∂t i=1 2m i=1 i=1

into quantum field form. The classical wave function has the positions of the
numbered particles and time as arguments:
classical quantum mechanics: Ψ = Ψ(~r1 ,~r2 ,~r3 , . . . ,~rI ; t) (10.57)
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 507

where ~r1 is the position of particle 1, ~r2 is the position of particle 2, etcetera.
In quantum field theory, the wave function for exactly I particles takes the
form
Z Z
|Ψi = ... Ψ(~r1 ,~r2 , . . . ,~rI ; t) ab† (~r1 )ab† (~r2 ) . . . ab† (~rI ) |~0i d3~r1 . . . d3~rI
all ~r 1 all ~r I
(10.58)
where ket |Ψi is the wave function expressed in Fock space kets, and plain Ψ(. . .)
is to be shown to be equivalent to the classical wave function above. The Fock
space Hamiltonian is
Z " #
h̄2 2
H = †
ab (~r) − ∇ + Vext (~r) ab(~r) d3~r
all ~r 2m ~r
Z Z
+ 21 ab† (~r)ab† (~r)V (~r − ~r)ab(~r)ab(~r) d3~rd3~r (10.59)
all ~r all ~r

It is to be shown that the Fock space Schrödinger equation for |Ψi produces
the classical Schrödinger equation (10.56) for function Ψ(. . .), whether it is a
system of bosons or fermions.
Before trying to tackle this problem, it is probably a good idea to review
representations of functions using delta functions. As the simplest example, a
wave function Ψ(x) of just one spatial coordinate can be written as
Z
Ψ(x) = Ψ(x) δ(x − x)dx
all x | {z } | {z }
coefficients basis states

The way to think about the above integral expression for Ψ(x) is just like you
would think about a vector in three dimensions being written as ~v = v1 ı̂ + v2 ̂ +
P
v3 k̂ or a vector in 30 dimensions as ~v = 30 i=1 vi ı̂i . The Ψ(x) are the coefficients,
corresponding to the vi -components of the vectors. The δ(x − x)dx are the basis
states, just like the unit vectors ı̂i . If you want a graphical illustration, each
δ(x − x)dx would correspond to one spike of unit height at a position x in figure
1.3, and you need to sum (integrate) over them all, with their coefficients, to
get the total vector.
Now HΨ(x) is just another function of x, so it can be written similarly:
Z
HΨ(x) = HΨ(x)δ(x − x) dx
all x

Z " #
h̄2 ∂ 2 Ψ(x)
= − + Vext (x)Ψ(x) δ(x − x) dx
all x 2m ∂x2

Note that the Hamiltonian acts on the coefficients, not on the basis states. You
may be surprised by this since if you straightforwardly apply the Hamiltonian,
508 CHAPTER 10. SOME ADDITIONAL TOPICS

in terms of x, on the integral expression for Ψ(x), you get


Z " #
h̄2 ∂ 2 δ(x − x)
HΨ(x) = Ψ(x) − + Vext (x)δ(x − x) dx
all x 2m ∂x2
in which the Hamiltonian acts on the basis states, not the coefficients. However,
the two expressions are indeed the same. (You can see that using a couple of
integrations by parts in the latter, after recognizing that differentiation of the
delta function with respect to x or x is the same save for a sign change. Much
better, make the change of integration variable u = x − x before applying the
Hamiltonian to the integral.)
The bottom line is that you do not want to use the expression in which the
Hamiltonian is applied to the basis states, because derivatives of delta functions
are highly singular objects that you should not touch with a ten foot pole. Still,
there is an important observation here: you might either know what an operator
does to the coefficients, leaving the basis states untouched, or what it does to
the basis states, leaving the coefficients untouched. Either one will tell you the
final effect of the operator, but the mathematics is different.
Now that the general terms of engagement have been discussed, it is time
to start solving Srednicki’s problem. First consider the expression for the wave
function
Z Z
|Ψi = ... Ψ(~r1 ,~r2 , . . . ,~rI ; t) ab† (~r1 )ab† (~r2 ) . . . ab† (~rI )|~0id3~r1 . . . d3~rI
all ~r 1 all ~rI | {z } | {z }
coefficients Fock space basis state kets

The ket |~0i is the vacuum state, but the preceding creation operators ab† create
particles at positions ~r1 ,~r2 , . . .. So the net state becomes a Fock state where
a particle called 1 is in a delta function at a position ~r1 , a particle called 2
in a delta function at position ~r2 , etcetera. The classical wave function Ψ(. . .)
determines the probability for the particles to actually be at those states, so
it is the coefficient of that Fock state ket. The integration gives the combined
wave function |Ψi as a ket state in Fock space.
Note that Fock states do not know about particle numbers. A Fock basis
state is the same regardless what the classical wave function calls the particles.
It means that the same Fock basis state ket reappears in the integration above
at all swapped positions of the particles. (For fermions read: the same except
possibly a sign change, since swapping the order of application of any two ab†
creation operators flips the sign, compare subsection 10.2.2.) This will become
important at the end of the derivation.
As far as understanding the Fock space Hamiltonian, for now you may just
note a superficial similarity in form with the expectation value of energy. Its
appropriateness will follow from the fact that the correct classical Schrödinger
equation is obtained from it.
10.2. QUANTUM FIELD THEORY IN A NANOSHELL 509

The left hand side of the Fock space Schrödinger equation is evaluated by
pushing the time derivative inside the integral as a partial:
d|Ψi
ih̄ =
dtZ Z
∂Ψ(~r1 ,~r2 , . . . ,~rI ; t) †
... ih̄ ab (~r1 )ab† (~r2 ) . . . ab† (~rI )|~0i d3~r1 . . . d3~rI
all ~r 1 all ~r I ∂t
so the time derivative drops down on the classical wave function in the normal
way.
Applying the Fock-space Hamiltonian (10.59) on the wave function is quite
a different story, however. It is best to start with just a single particle:
Z Z " #
h̄2 2
H|Ψi = †
ab (~r) − ∇ + Vext (~r) ab(~r)Ψ(~r1 ; t)ab† (~r1 )|~0i d3~r1 d3~r
all ~r all ~r 1 2m ~r

The field operator ab(~r) may be pushed past the classical wave function Ψ(. . .);
ab(~r) is defined by what it does to the Fock basis states while leaving their
coefficients, here Ψ(. . .), unchanged:
Z Z " #
h̄2 2
H|Ψi = †
a (~r) −
b ∇~r + Vext (~r) Ψ(~r1 ; t)ab(~r)ab† (~r1 )|~0i d3~r1 d3~r
all ~r all ~r 1 2m

It is now that the (anti)commutator relations become useful. The fact that
for bosons [ab(~r)ab† (~r1 )] or for fermions {ab(~r)ab† (~r1 )} equals δ 3 (~r −~r1 ) means that
you can swap the order of the operators as long as you add a delta function
term:

abb (~r)ab†b (~r1 ) = ab†b (~r1 )abb (~r) + δ 3 (~r −~r1 ) abf (~r)abf† (~r1 ) = −abf† (~r1 )abf (~r) + δ 3 (~r −~r1 )

But when you swap the order of the operators in the expression for H|Ψi,
you get a factor ab(~r)|~0i, and that is zero, because applying an annihilation
operator on the vacuum state produces zero, figure 10.5. So the delta function
is all that remains:
Z Z " #
h̄2 2
H|Ψi = †
ab (~r) − ∇ + Vext (~r) Ψ(~r1 ; t)δ 3 (~r − ~r1 )|~0i d3~r1 d3~r
all ~r all ~r1 2m ~r

Integration over ~r1 now picks out the value Ψ(~r, t) from function Ψ(~r1 , t), as
delta functions do, so
Z " #
h̄2 2
H|Ψi = a (~r) −
b ∇~r + Vext (~r) Ψ(~r; t)|~0i d3~r

all ~r 2m

The creation operator ab† (~r) can be pushed over the coefficient HΨ(~r; t) of the
vacuum state ket for the same reason that ab(~r) could be pushed over Ψ(~r1 ; t);
510 CHAPTER 10. SOME ADDITIONAL TOPICS

these operators do not affect the coefficients of the Fock states, just the states
themselves.
Then, renotating ~r to ~r1 , the grand total Fock state Schrödinger equation
for a system of one particle becomes
Z
∂Ψ(~r1 ; t) †
ih̄ ab (~r1 )|~0i d3~r1 =
all ~r 1 ∂t
Z " #
h̄2 2
− ∇ + Vext (~r1 ) Ψ(~r1 ; t)ab† (~r1 )|~0i d3~r1
all ~r 1 2m ~r1

It is now seen that if the classical wave function Ψ(~r1 ; t) satisfies the classical
Schrödinger equation, the Fock-space Schrödinger equation above is also sat-
isfied. And so is the converse: if the Fock-state equation above is satisfied,
the classical wave function must satisfy the classical Schrödinger equation. The
reason is that Fock states can only be equal if the coefficients of all the basis
states are equal, just like vectors can only be equal if all their components are
equal.
If there is more than one particle, however, the latter conclusion is not
justified. Remember that the same Fock space kets reappear in the integration
at swapped positions of the particles. It now makes a difference. The following
example from basic vectors illustrates the problem: yes, aı̂ = a′ ı̂ implies that
a = a′ , but no, (a + b)ı̂ = (a′ + b′ )ı̂ does not imply that a = a′ and b = b′ ; it
merely implies that a + b = a′ + b′ . However, if additionally it is postulated that
the classical wave function has the symmetry properties appropriate for bosons
or fermions, then the Fock-space Schrödinger equation does imply the classical
one. In terms of the example from vectors, (a + a)ı̂ = (a′ + a′ )ı̂ does imply that
a = a′ .
Operator swapping like in the derivation above also helps to understand why
the Fock-space Hamiltonian has an appearance similar to an energy expectation
value. For example, consider the effect of placing the one-particle Hamiltonian
between h~0|ab(~r1 )Ψ∗ (~r1 ; t) and Ψ(~r1 ; t)ab† (~r1 )|~0i and integrating over all ~r1 and
~r1 .
So the problem has been solved for a system with one particle. Doing it for
I particles will be left as an exercise for your mathematical skills.
Chapter 11

The Interpretation of Quantum


Mechanics

Engineers tend to be fairly matter-of-fact about the physics they use. Many use
entropy on a daily basis as a computational tool without worrying much about
its vague, abstract mathematical definition. Such a practical approach is even
more important for quantum mechanics.
Famous quantum mechanics pioneer Niels Bohr had this to say about it:

For those who are not shocked when they first come across quantum
theory cannot possibly have understood it. [Niels Bohr, quoted in W.
Heisenberg (1971) Physics and Beyond. Harper and Row.]

Feynman, a Caltech quantum physicist who received a Nobel Prize for the
creation of quantum electrodynamics with Schwinger and Tomonaga, and who
pioneered nanotechnology with his famous talk “There’s Plenty of Room at the
Bottom,” wrote:

There was a time when the newspapers said that only twelve men
understood the theory of relativity. I do not believe that there ever
was such a time. There might have been a time when only one man
did, because he was the only guy who caught on, before he wrote
his paper. But after people read the paper, a lot of people under-
stood the theory of relativity in some way or other, certainly more
than twelve. On the other hand, I think I can safely say that no-
body understands quantum mechanics. [Richard P. Feynman (1965)
Character of Physical Law. BBC]

Still, saying that quantum mechanics is ununderstandable raises the obvious


question: “If we cannot understand it, does it at least seem plausible?” That is
the question to be addressed in this chapter. When you read this chapter, you

511
512CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

will see that the answer is simple and clear. Quantum mechanics is the most
implausible theory ever formulated. Nobody would ever formulate a theory like
quantum mechanics in jest, because none would believe it. Physics ended up
with quantum mechanics not because it seemed the most logical explanation,
but because countless observations made it unavoidable.

11.1 Schrödinger’s Cat


Schrödinger, apparently not an animal lover, came up with an example illus-
trating what the conceptual difficulties of quantum mechanics really mean in
everyday terms. This section describes the example.
A cat is placed in a closed box. Also in the box is a Geiger counter and a
tiny amount of radioactive material that will cause the Geiger counter to go off
in a typical time of an hour. The Geiger counter has been rigged so that if it
goes off, it releases a poison that kills the cat.
Now the decay of the radioactive material is a quantum-mechanical process;
the different times for it to trigger the Geiger counter each have their own
probability. According to the orthodox interpretation, “measurement” is needed
to fix a single trigger time. If the box is left closed to prevent measurement, then
at any given time, there is only a probability of the Geiger counter having been
triggered. The cat is then alive, and also dead, each with a nonzero probability.
Of course no reasonable person is going to believe that she is looking at a box
with a cat in it that is both dead and alive. The problem is obviously with what
is to be called a “measurement” or “observation.” The countless trillions of air
molecules are hardly going to miss “observing” that they no longer enter the
cat’s nose. The biological machinery in the cat is not going to miss “observing”
that the blood is no longer circulating. More directly, the Geiger counter is not
going to miss “observing” that a decay has occurred; it is releasing the poison,
isn’t it?
If you postulate that the Geiger counter is in this case doing the “measure-
ment“ that the orthodox interpretation so deviously leaves undefined, it agrees
with our common sense. But of course, this Deus ex Machina only rephrases our
common sense; it provides no explanation why the Geiger counter would cause
quantum mechanics to apparently terminate its normal evolution, no proof or
plausible reason that the Geiger counter is able to fundamentally change the
normal evolution of the wave function, and not even a shred of hard evidence
that it terminates the evolution, if the box is truly closed.
There is a strange conclusion to this story. The entire point Schrödinger
was trying to make was that no sane person is going to believe that a cat
can be both dead and kicking around alive at the same time. But when the
equations of quantum mechanics are examined more closely, it is found that
11.2. INSTANTANEOUS INTERACTIONS 513

they require exactly that. The wave function evolves into describing a series
of different realities. In our own reality, the cat dies at a specific, apparently
random time, just as common sense tells us. Regardless whether the box is open
or not. But, as discussed further in section 11.6, the mathematics of quantum
mechanics extends beyond our reality. Other realities develop, which we humans
are utterly unable to observe, and in each of those other realities, the cat dies
at a different time.

11.2 Instantaneous Interactions


Special relativity has shown that we humans cannot transmit information at
more than the speed of light. However, according to the orthodox interpretation,
nature does not limit itself to the same silly restrictions that it puts on us. This
section discusses why not.
Consider again the H+ 2 -ion, with the single electron equally shared by the
two protons. If you pull the protons apart, maintaining the symmetry, you get a
wave function that looks like figure 11.1. You might send one proton off to your

Figure 11.1: Separating the hydrogen ion.

observer on Mars, the other to your observer on Venus. Where is the electron,
on Mars or on Venus?
According to the orthodox interpretation, the answer is: neither. A position
for the electron does not exist. The electron is not on Mars. It is not on Venus.
Only when either observer makes a measurement to see whether the electron is
there, nature throws its dice, and based on the result, might put the electron
on Venus and zero the wave function on Mars. But regardless of the distance,
it could just as well have put the electron on Mars, if the dice would have come
up differently.
You might think that nature cheats, that when you take the protons apart,
nature already decides where the electron is going to be. That the Venus proton
secretly hides the electron “in its sleeve”, ready to make it appear if an observa-
tion is made. John Bell devised a clever test to force nature to reveal whether
it has something hidden in its sleeve during a similar sort of trick.
514CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

The test case Bell used was a generalization of an experiment proposed by


Bohm. It involves spin measurements on an electron/positron pair, created by
the decay of an π-meson. Their combined spins are in the singlet state because
the meson has no net spin. In particular, if you measure the spins of the electron
and positron in any given direction, there is a 50/50% chance for each that it
turns out to be positive or negative. However, if one is positive, the other must
be negative. So there are only two different possibilities:
1. electron positive and positron negative,

2. electron negative and positron positive.


Now suppose Earth happens to be almost the same distance from Mars and
Venus, and you shoot the positron out to Venus, and the electron to Mars, as
shown at the left in figure 11.2:

Figure 11.2: The Bohm experiment before the Venus measurement (left), and
immediately after it (right).

You have observers on both planets waiting for the particles. According to
quantum mechanics, the traveling electron and positron are both in an indeter-
minate state.
The positron reaches Venus a fraction of a second earlier, and the observer
there measures its spin in the direction up from the ecliptic plane. According to
the orthodox interpretation, nature now makes a random selection between the
two possibilities, and assume it selects the positive spin value for the positron,
corresponding to a spin that is up from the ecliptic plane, as shown in figure
11.2. Immediately, then, the spin state of the electron on Mars must also have
collapsed; the observer on Mars is guaranteed to now measure negative spin, or
spin down, for the electron.
The funny thing is, if you believe the orthodox interpretation, the infor-
mation about the measurement of the positron has to reach the electron in-
stantaneously, much faster than light can travel. This apparent problem in the
orthodox interpretation was discovered by Einstein, Podolski, and Rosen. They
doubted it could be true, and argued that it indicated that something must be
missing in quantum mechanics.
In fact, instead of superluminal effects, it seems much more reasonable to
assume that earlier on earth, when the particles were send on their way, nature
attached a secret little “note” of some kind to the positron, saying the equivalent
11.2. INSTANTANEOUS INTERACTIONS 515

of “If your spin up is measured, give the positive value”, and that it attached a
little note to the electron “If your spin up is measured, give the negative value.”
The results of the measurements are still the same, and the little notes travel
along with the particles, well below the speed of light, so all seems now fine. Of
course, these would not be true notes, but some kind of additional information
beyond the normal quantum mechanics. Such postulated additional information
sources are called “hidden variables.”
Bell saw that there was a fundamental flaw in this idea if you do a large
number of such measurements and you allow the observers to select from more
than one measurement direction at random. He derived a neat little general
formula, but the discussion here will just show the contradiction in a single
case. In particular, the observers on Venus and Mars will be allowed to select
randomly one of three measurement directions ~a, ~b, and ~c separated by 120
degrees:

Figure 11.3: Spin measurement directions.

Let’s see what the little notes attached to the electrons might say. They
might say, for example, “Give the + value if ~a is measured, give the − value if
~b is measured, give the + value if ~c is measured.” The relative fractions of the
various possible notes generated for the electrons will be called f1 , f2 , . . .. There
are 8 different possible notes:

f1 f2 f3 f4 f5 f6 f7 f8
~a + + + + − − − −
~b + + − − + + − −
~c + − + − + − + −

The sum of the fractions f1 through f8 must be one. In fact, because of sym-
metry, each note will probably on average be generated for 18 of the electrons
sent, but this will not be needed.
Of course, each note attached to the positron must always be just the oppo-
site of the one attached to the electron, since the positron must measure + in a
direction when the electron measures − in that direction and vice-versa.
Now consider those measurements in which the Venus observer measures
direction ~a and the Mars observer measures direction ~b. In particular, the
question is in what fraction of such measurements the Venus observer measures
516CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

the opposite sign from the Mars observer; call it fab,opposite . This is not that hard
to figure out. First consider the case that Venus measures − and Mars +. If the
Venus observer measures the − value for the positron, then the note attached to
the electron must say “measure + for ~a”; further, if the Mars observer measures
the + value for ~b, that one should say “measure +” too. So, looking at the
table, the relative fraction where Venus measures − and Mars measures + is
where the electron’s note has a + for both ~a and ~b: f1 + f2 .
Similarly, the fraction of cases where Venus finds + and Mars − is f7 + f8 ,
and you get in total:

fab,opposite = f1 + f2 + f7 + f8 = 0.25

The value 0.25 is what quantum mechanics predicts; the derivation will be
skipped here, but it has been verified in the experiments done after Bell’s work.
Those experiments also made sure that nature did not get the chance to do
subluminal communication. The same way you get

fac,opposite = f1 + f3 + f6 + f8 = 0.25

and
fbc,opposite = f1 + f4 + f5 + f8 = 0.25
Now there is a problem, because the numbers add up to 0.75, but the fractions
add up to at least 1: the sum of f1 through f8 is one.
A seemingly perfectly logical and plausible explanation by great minds is
tripped up by some numbers that just do not want to match up. They only
leave the alternative nobody really wanted to believe.
Attaching notes does not work. Information on what the observer on Venus
decided to measure, the one thing that could not be put in the notes, must have
been communicated instantly to the electron on Mars regardless of the distance.
It can also safely be concluded that we humans will never be able to see inside
the actual machinery of quantum mechanics. For, suppose the observer on Mars
could see the wave function of the electron collapse. Then the observer on Venus
could send her Morse signals faster than the speed of light by either measuring
or not measuring the spin of the positron. Special relativity would then allow
signals to be send into the past, and that leads to logical contradictions such as
the Venus observer preventing her mother from having her.
While the results of the spin measurements can be observed, they do not
allow superluminal communication. While the observer on Venus affects the
results of the measurements of the observer on Mars, they will look completely
random to that observer. Only when the observer on Venus sends over the
results of her measurements, at a speed less than the speed of light, and the two
sets of results are compared, do meaningful patterns how up.
11.2. INSTANTANEOUS INTERACTIONS 517

The Bell experiments are often used to argue that Nature must really make
the collapse decision using a true random number generator, but that is of
course crap. The experiments indicate that Nature instantaneously transmits
the collapse decision on Venus to Mars, but say nothing about how that decision
was reached.
Superluminal effects still cause paradoxes, of course. The left of figure 11.4
shows how a Bohm experiment appears to an observer on earth. The spins

Figure 11.4: Earth’s view of events (left), and that of a moving observer (right).

remain undecided until the measurement by the Venus observer causes both the
positron and the electron spins to collapse.
However, for a moving observer, things would look very different. Assuming
that the observer and the particles are all moving at speeds comparable to the
speed of light, the same situation may look like the right of figure 11.4, {A.4}.
In this case, the observer on Mars causes the wave function to collapse at a time
that the positron has only just started moving towards Venus!
So the orthodox interpretation is not quite accurate. It should really have
said that the measurement on Venus causes a convergence of the wave function,
not an absolute collapse. What the observer of Venus really achieves in the
orthodox interpretation is that after her measurement, all observers agree that
the positron wave function is collapsed. Before that time, some observers are
perfectly correct in saying that the wave function is already collapsed, and that
the Mars observer did it.
It should be noted that when the equations of quantum mechanics are cor-
rectly applied, the collapse and superluminal effects disappear. That is ex-
plained in section 11.6. But, due to the fact that there are limits to our obser-
vational capabilities, as far as our own human experiences are concerned, the
paradoxes remain real.
To be perfectly honest, it should be noted that the example above is not
quite the one of Bell. Bell really used the inequality:

|2(f3 + f4 + f5 + f6 ) − 2(f2 + f4 + f5 + f7 )| ≤ 2(f2 + f3 + f6 + f7 )

So the discussion cheated. And Bell allowed general directions of measurement


not just 120 degree ones. See [10, pp. 423-426]. The above discussion seems a
lot less messy, even though not historically accurate.
518CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

11.3 Global Symmetrization


When computing, say a hydrogen molecule, it is all nice and well to say that
the wave function must be antisymmetric with respect to exchange of the two
electrons 1 and 2, so the spin state of the molecule must be the singlet one.
But what about, say, electron 3 in figure 11.1, which can with 50% chance be
found on Mars and otherwise on Venus? Should not the wave function also be
antisymmetric, for example, with respect to exchange of this electron 3 in one
of two places in space with electron 1 on the hydrogen molecule on Earth? And
would this not locate electron 3 in space also in part on the hydrogen molecule,
and electron 1 also partly in space?
The answer is: absolutely. Nature treats all electrons as one big connected
bunch. The given solution for the hydrogen molecule is not correct; it should
have included every electron in the universe, not just two of them. Every electron
in the universe is just as much present on this single hydrogen molecule as the
assumed two.
From the difficulty in describing the 33 electrons of the arsenic atom, imagine
having to describe all electrons in the universe at the same time! If the universe
is truly flat, this number would not even be finite. Fortunately, it turns out that
the observed quantities can be correctly predicted pretending there are only two
electrons involved. Antisymmetrization with far-away electrons does not change
the properties of the local solution.
If you are thinking that more advanced quantum theories will eventually do
away with the preposterous notion that all electrons are present everywhere, do
not be too confident. As mentioned in chapter 10.2.1, the idea has become a
fundamental tenet in quantum field theory.

11.4 Conservation Laws and Symmetries


The purpose of this note is to explain where conservation laws such as conserva-
tion of linear and angular momentum come from, as well as give the reason why
the corresponding operators take the form of differentiations. It should provide
a better insight into how the mathematics of quantum mechanics relates to the
basic physical properties of nature.
Pretend for now that you have never heard of angular momentum, nor that
it would be preserved, nor what its operator would be. However, there is at
least one operation you do know without being told about: rotating a system
over an angle.
Consider the effect of this operation on a complete system in further empty
space. Since empty space by itself has no preferred directions, it does not make a
difference under what angle you initially position the system. Identical systems
11.4. CONSERVATION LAWS AND SYMMETRIES 519

placed in different initial angular orientations will evolve the same, just seen
from a different angle.
This “invariance” with respect to angular orientation has consequences when
phrased in terms of operators and the Schrödinger equation. In particular, let
a system of particles 1, 2, . . ., be described in spherical coordinates by a wave
function:
Ψ(r1 , θ1 , φ1 , Sz1 , r2 , θ2 , φ2 , Sz2 , . . .)
and let Rϕ be the operator that rotates this entire system over a given angle ϕ
around the z-axis:

Rϕ Ψ(r1 , θ1 , φ1 , Sz1 , r2 , θ2 , φ2 , Sz2 , . . .)


= Ψ(r1 , θ1 , φ1 + ϕ, Sz1 , r2 , θ2 , φ2 + ϕ, Sz2 , . . .)

(For the formula as shown, the rotation of the system ϕ is in the direction of
decreasing φ. Or if you want, it corresponds to an observer or axis system
rotated in the direction of increasing φ; in empty space, who is going to see the
difference?)
Now the key point is that if space has no preferred direction, the operator
Rϕ must commute with the Hamiltonian:

HRϕ = Rϕ H

After all, it should not make any difference at what angle compared to empty
space the Hamiltonian is applied: if you first rotate the system and then apply
the Hamiltonian, or first apply the Hamiltonian and then rotate the system,
the result should be the same. For that reason, an operator such as Rϕ , which
commutes with the Hamiltonian of the considered system, is called a physical
symmetry of the system.
The fact that Rϕ and H commute has a mathematical consequence {A.19}:
it means that Rϕ must have a complete set of eigenfunctions that are also energy
eigenfunctions, and for which the Schrödinger equation gives the evolution. In
particular, if the system is initially an eigenfunction of the operator Rϕ with
a certain eigenvalue, it must stay an eigenfunction with this eigenvalue for all
time, {A.100}. The eigenvalue remains the same during the evolution.
But wait. If this eigenvalue does not change with time, does that not mean
that it is a conserved number? Is it not just like the money in your wallet if you
do not take it out to spend any? Whether or not this eigenvalue will turn out
to be important, it must be truly conserved. It is a physical quantity that does
not change, just like the mass of the system does not change. So it appears you
have here another conservation law, in addition to conservation of mass.
Let’s examine the conserved eigenvalue of Rϕ a bit closer to see what physical
quantity it might correspond to. First of all, the magnitude of any eigenvalue
520CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

of Rϕ must be one: if it was not, the square integral of Ψ could increase by


that factor during the rotation, but of course it must stay the same. Since the
magnitude is one, the eigenvalue can be written in the form eia where a is some
ordinary real number. The eigenvalue has narrowed down a bit already.
But you can go further: the eigenvalue must more specifically be of the form
imϕ
e , where m is some real number independent of the amount of rotation. The
reasons are that there must be no change in Ψ when the angle of rotation is
zero, and a single rotation over ϕ must be the same as two rotations over an
angle 12 ϕ. Those requirements imply that the eigenvalue is of the form eimϕ .
So eimϕ is a preserved quantity if the system starts out as the corresponding
eigenfunction of Rϕ . You can simplify that statement to say that m by itself is
preserved; if m varied in time, eimϕ would too. Also, you might scale m by some
constant, call it h̄, so that you can conform to the dimensional units others, such
as classical physicists, might turn out to be using for this preserved quantity.
You can just give a fancy name to this preserved quantity mh̄. You can
call it “net angular momentum around the z-axis” because that sounds less
nerdy at parties than “scaled logarithm of the preserved eigenvalue of Rϕ .” You
might think of even better names, but whatever the name, it is preserved: if the
system starts out with a certain value of this angular momentum, it will retain
that value for all time. (If it starts out with an a combination of values, leaving
uncertainty, it will keep that combination of values. The Schrödinger equation
is linear, so you can add solutions.)
Next, you would probably like to define a nicer operator for this “angular
momentum” than the rotation operators Rϕ . The problem is that there are
infinitely many of them, one for every angle ϕ, and they are all related, a
rotation over an angle 2ϕ being the same as two rotations over an angle ϕ. If
you define a rotation operator over a very small angle, call it angle ε, then you
can approximate all the other operators Rϕ by just applying Rε sufficiently many
times. To make these approximations exact, you need to make ε infinitesimally
small, but when ε becomes zero, Rε would become just one. You have lost the
operator you want by going to the extreme. The trick to avoid this is to subtract
the limiting operator 1, and in addition, to avoid that the resulting operator
then becomes zero, you must also divide by ε:

Rε − 1
lim
ε→0 ε

is the operator you want.


Now consider what this operator really means for a single particle with no
spin:
Rε − 1 Ψ(r, θ, φ + ε) − Ψ(r, θ, φ)
lim Ψ(r, θ, φ) = lim
ε→0 ε ε→0 ε
11.4. CONSERVATION LAWS AND SYMMETRIES 521

By definition, the final term is the partial derivative of Ψ with respect to φ. So


the operator you just defined is just the operator ∂/∂φ!
You can go one better still, because the eigenvalues of the operator just
defined are
eimε − 1
lim = im
ε→0 ε
If you add a factor h̄/i to the operator, the eigenvalues of the operator are going
to be mh̄, the quantity you defined to be the angular momentum. So you are
led to define the angular momentum operator as:

b ≡ h̄ ∂
L z
i ∂φ
This agrees perfectly with what you got much earlier in chapter 3.1.2 from
guessing that the relationship between angular and linear momentum is the
same in quantum mechanics as in classical mechanics. Now you derived it
from the fundamental rotational symmetry property of nature, instead of from
guessing.
How about the angular momentum of a system of multiple, but still spinless,
particles? It is easy to see that the operator
h̄ Rε − 1
lim
i ε→0 ε
now acts as a total derivative, equivalent to the sum of the partial derivatives
of the individual particles. So the orbital angular momenta of the individual
particles just add, as they do in classical physics.
How about spin? Well, take a hint from nature. If a particle in a given spin
state has an inherent angular momentum in the z-direction mh̄, then apparently
the wave function of that particle changes by eimϕ when you rotate the particle
over an angle ϕ. A surprising consequence is that if the system is rotated over an
angle 2π, half integer spin states do not return to the same value; they change
sign. Since only the magnitude of the wave function is physically observable,
this change of sign does not affect the physical symmetry.
With angular momentum defined, the rotation operator Rϕ can be explicitly
identified if you are curious. It is
³ ´
b /h̄
Rϕ = exp ϕiL z

where the exponential of an operator is found by writing the exponential as a


Taylor series. Rϕ is called the “generator of rotations around the z-axis.” To
check that it does indeed take the form above, expand the exponential in a
Taylor series and multiply by a state with angular momentum Lz = mh̄. The
effect is seen to be to multiply the state by the Taylor series of eimϕ as it should.
522CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

So Rϕ gets all eigenstates and eigenvalues correct, and must therefor be right
since the eigenstates are complete. As an additional check, Rϕ can also be
verified explicitly for purely orbital momentum states; for example, it turns the
wave function Ψ(r, θ, φ) for a single particle into
µ ¶ Ã !
ib ∂
exp ϕ L z Ψ(r, θ, φ) = exp ϕ Ψ(r, θ, φ)
h̄ ∂φ
and expanding the exponential in a Taylor series produces the Taylor series
for Ψ(r, θ, φ + ϕ), the correct expression for the wave function in the rotated
coordinate system.
There are other symmetries of nature, and they give rise to other conserva-
tion laws and their operators. For example, nature is symmetric with respect
to translations: it does not make a difference where in empty space you place
your system. This symmetry gives rise to linear momentum conservation in the
same way that rotational symmetry led to angular momentum conservation.
Symmetry with respect to time delay gives rise to energy conservation.
Initially, it was also believed that nature was symmetric with respect to
mirroring it (looking at the physics in a mirror). That gave rise to a law of
conservation of “parity”. Parity is called “even” when the wave function remains
the same when you replace ~r by −~r, (a way of doing the mirroring which is called
inversion), and odd if it changes sign. The parity of a complete system was
believed to be preserved in time. However, it turned out that the weak nuclear
force does not stay the same under mirroring, so that parity is not conserved
when weak interactions play a role. Nowadays, most physicists believe that in
order to get an equivalent system, in addition to the mirroring, you also need to
replace the particles by their antiparticles, having opposite charge, and reverse
the direction of time.
Invariances of systems such as the ones above are called “group properties.”
There is an entire branch of mathematics devoted to how they relate to the
solutions of the systems, called “group theory.” It is essential to advanced
quantum mechanics, but beyond the scope of this book.

11.5 Failure of the Schrödinger Equation?


Section {11.2} mentioned sending half of the wave function of an electron to
Venus, and half to Mars. A scattering setup as described in chapter 5.7 provides
a practical means for actually doing this, (at least, for taking the wave function
apart in two separate parts.) The obvious question is now: can the Schrödinger
equation also describe the physically observed “collapse of the wave function”,
where the electron changes from being on both Venus and Mars with a 50/50
probability to, say, being on Mars with absolute certainty?
11.5. FAILURE OF THE SCHRÖDINGER EQUATION? 523

The answer obtained in this and the next subsection will be most curious: no,
the Schrödinger equation flatly contradicts that the wave function collapses, but
yes, it requires that measurement leads to the experimentally observed collapse.
The analysis will take us to a mind-boggling but really unavoidable conclusion
about the very nature of our universe.
This subsection will examine the problem the Schrödinger equation has with
describing a collapse. First of all, the solutions of the linear Schrödinger equation
do not allow a mathematically exact collapse like some nonlinear equations do.
But that does not necessarily imply that solutions would not be able to collapse
physically. It would be conceivable that the solution could evolve to a state
where the electron is on Mars with such high probability that it can be taken
to be certainty. In fact, a common notion is that, somehow, interaction with a
macroscopic “measurement” apparatus could lead to such an end result.
Of course, the constituent particles that make up such a macroscopic mea-
surement apparatus still need to satisfy the laws of physics. So let’s make up
a reasonable model for such a complete macroscopic system, and see what can
then be said about the possibility for the wave function to evolve towards the
electron being on Mars.
The model will ignore the existence of anything beyond the Venus, Earth,
Mars system. It will be assumed that the three planets consist of a humon-
gous, but finite, number of conserved classical particles 1, 2, 3, 4, 5, . . ., with a
supercolossal wave function:

Ψ(~r1 , Sz1 ,~r2 , Sz2 ,~r3 , Sz3 ,~r4 , Sz4 ,~r5 , Sz5 , . . .)

Particle 1 will be taken to be the scattered electron. It will be assumed that


the wave function satisfies the Schrödinger equation:

∂Ψ XX 3
h̄2 ∂ 2 Ψ
ih̄ =− 2
+ V (~r1 , Sz1 ,~r2 , Sz2 ,~r3 , Sz3 ,~r4 , Sz4 , . . .)Ψ (11.1)
∂t i j=1 2mi ∂ri,j

Trying to write the solution to this problem would of course be prohibitive,


but the evolution of the probability of the electron to be on Venus can still
be extracted from it with some fairly standard manipulations. First, taking the
combination of the Schrödinger equation times Ψ∗ minus the complex conjugate
of the Schrödinger equation times Ψ produces after some further manipulation
an equation for the time derivative of the probability:
à !
∂Ψ∗ Ψ XX 3
h̄2 ∂ ∂Ψ
∗ ∂Ψ∗
ih̄ =− Ψ −Ψ (11.2)
∂t i j=1 2mi ∂ri,j ∂ri,j ∂ri,j

The question is the probability for the electron to be on Venus, and you can
get that by integrating the probability equation above over all possible positions
524CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

and spins of the particles except for particle 1, for which you have to restrict the
spatial integration to Venus and its immediate surroundings. If you do that,
the left hand side becomes the rate of change of the probability for the electron
to be on Venus, regardless of the position and spin of all the other particles.
Interestingly, assuming times at which the Venus part of the scattered elec-
tron wave is definitely at Venus, the right hand side integrates to zero: the wave
function is supposed to disappear at large distances from this isolated system,
and whenever particle 1 would be at the border of the surroundings of Venus.
It follows that the probability for the electron to be at Venus cannot change
from 50%. A true collapse of the wave function of the electron as postulated in
the orthodox interpretation, where the probability to find the electron at Venus
changes to 100% or 0% cannot occur.
Of course, the model was simple; you might therefore conjecture that a true
collapse could occur if additional physics is included, such as non-conserved
particles like photons, or other relativistic effects. But that would obviously
be a moving target. The analysis made a good-faith effort to examine whether
including macroscopic effects may cause the observed collapse of the wave func-
tion, and the answer was no. Having a scientifically open mind requires you to at
least follow the model to its logical end; nature might be telling you something
here.
Is it really true that the results disagree with the observed physics? You need
to be careful. There is no reasonable doubt that if a measurement is performed
about the presence of the electron on Venus, the wave function will be observed
to collapse. But all you established above is that the wave function does not
collapse; you did not establish whether or not it will be observed to collapse.
To answer the question whether a collapse will be observed, you will need to
include the observers in your reasoning.
The problem is with the innocuous looking phrase regardless of the position
and spin of all the other particles in the arguments above. Even while the total
probability for the electron to be at Venus must stay at 50% in this example
system, it is still perfectly possible for the probability to become 100% for one
state of the particles that make up the observer and her tools, and to be 0% for
another state of the observer and her tools.
It is perfectly possible to have a state of the observer with brain particles,
ink-on-paper particles, tape recorder particles, that all say that the electron is
on Venus, combined with 100% probability that the electron is on Venus, and
a second state of the observer with brain particles, ink-on-paper particles, tape
recorder particles, that all say the electron must be on Mars, combined with 0%
probability for the electron to be on Venus. Such a scenario is called a “relative
state interpretation;” the states of the observer and the measured object become
entangled with each other.
The state of the electron does not change to a single state of presence or
11.6. THE MANY-WORLDS INTERPRETATION 525

absence; instead two states of the macroscopic universe develop, one with the
electron absent, the other with it present. As explained in the next subsection,
the Schrödinger equation does not just allow this to occur, it requires this to
occur. So, far from being in conflict with the observed collapse, the model
above requires it. The model produces the right physics: observed collapse is a
consequence of the Schrödinger equation, not of something else.
But all this ends up with the rather disturbing thought that there are now
two states of the universe, and the two are different in what they think about
the electron. This conclusion was unexpected; it comes as the unavoidable
consequence of the mathematical equations that quantum mechanics abstracted
for the way nature operates.

11.6 The Many-Worlds Interpretation


The Schrödinger equation has been enormously successful, but it describes the
wave function as always smoothly evolving in time, in apparent contradiction to
its postulated collapse in the orthodox interpretation. So, it would seem to be
extremely interesting to examine the solution of the Schrödinger equation for
measurement processes more closely, to see whether and how a collapse might
occur.
Of course, if a true solution for a single arsenic atom already presents an
unsurmountable problem, it may seem insane to try to analyze an entire macro-
scopic system such as a measurement apparatus. But in a brilliant Ph.D. thesis
with Wheeler at Princeton, Hugh Everett, III did exactly that. He showed that
the wave function does not collapse. However it seems to us humans that it does,
so we are correct in applying the rules of the orthodox interpretation anyway.
This subsection explains briefly how this works.
Let’s return to the experiment of section 11.2, where a positron is sent to
Venus and an entangled electron to Mars, as in figure 11.5. The spin states are

Figure 11.5: Bohm’s version of the Einstein, Podolski, Rosen Paradox

uncertain when the two are send from Earth, but when Venus measures the spin
of the positron, it miraculously causes the spin state of the electron on Mars to
collapse too. For example, if the Venus positron collapses to the spin-up state
in the measurement, the Mars electron must collapse to the spin-down state.
The problem, however, is that there is nothing in the Schrödinger equation to
526CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

describe such a collapse, nor the superluminal communication between Venus


and Mars it implies.
The reason that the collapse and superluminal communication are needed
is that the two particles are entangled in the singlet spin state of chapter 4.5.6.
This is a 50% / 50% probability state of (electron up and positron down) /
(electron down and positron up).
It would be easy if the positron would just be spin up and the electron spin
down, as in figure 11.6. You would still not want to write down the supercolossal

Figure 11.6: Non entangled positron and electron spins; up and down.

wave function of everything, the particles along with the observers and their
equipment for this case. But there is no doubt what it describes. It will simply
describe that the observer on Venus measures spin up, and the one on Mars,
spin down. There is no ambiguity.
The same way, there is no question about the opposite case, figure 11.7.
It will produce a wave function of everything describing that the observer on

Figure 11.7: Non entangled positron and electron spins; down and up.

Venus measures spin down, and the one on Mars, spin up.
Everett, III recognized that the solution for the entangled case is blindingly
simple. Since the Schrödinger equation is linear, the wave function for the
entangled case must simply be the sum of the two non entangled ones above, as
shown in figure 11.8. If the wave function in each non entangled case describes

Figure 11.8: The wave functions of two universes combined


11.6. THE MANY-WORLDS INTERPRETATION 527

a universe in which a particular state is solidly established for the spins, then
the conclusion is undeniable: the wave function in the entangled case describes
two universes, each of which solidly establishes states for the spins, but which
end up with opposite results.
This explains the result of the orthodox interpretation that only eigenval-
ues are measurable. The linearity of the Schrödinger equation leaves no other
option:
Assume that any measurement device at all is constructed that for
a spin-up positron results in a universe that has absolutely no doubt
that the spin is up, and for a spin-down positron results in a universe
that has absolutely no doubt that the spin is down. In that case a
combination of spin up and spin down states must unavoidably result
in a combination of two universes, one in which there is absolutely
no doubt that the spin is up, and one in which there is absolutely no
doubt that it is down.
Note that this observation does not depend on the details of the Schrödinger
equation, just on its linearity. For that reason it stays true even including
relativity, {A.101}.
The two universes are completely unaware of each other. It is the very nature
of linearity that if two solutions are combined, they do not affect each other at
all: neither universe would change in the least whether the other universe is
there or not. For each universe, the other universe “exists” only in the sense
that the Schrödinger equation must have created it given the initial entangled
state.
Nonlinearity would be needed to allow the solutions of the two universes to
couple together to produce a single universe with a combination of the two eigen-
values, and there is none. A universe measuring a combination of eigenvalues is
made impossible by linearity.
While the wave function has not collapsed, what has changed is the most
meaningful way to describe it. The wave function still by its very nature assigns
a value to every possible configuration of the universe, in other words, to every
possible universe. That has never been a matter of much controversy. And
after the measurement it is still perfectly correct to say that the Venus observer
has marked down in her notebook that the positron was up and down, and has
transmitted a message to earth that the positron was up and down, and earth
has marked on in its computer disks and in the brains of the assistants that the
positron was found to be up and down, etcetera.
But it is much more precise to say that after the measurement there are
two universes, one in which the Venus observer has observed the positron to be
up, has transmitted to earth that the positron was up, and in which earth has
marked down on its computer disks and in the brains of the assistants that the
528CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

positron was up, etcetera; and a second universe in which the same happened,
but with the positron everywhere down instead of up. This description is much
more precise since it notes that up always goes with up, and down with down.
As noted before, this more precise way of describing what happens is called the
“relative state formulation.”
Note that in each universe, it appears that the wave function has collapsed.
Both universes agree on the fact that the decay of the π-meson creates an elec-
tron/positron pair in a singlet state, but after the measurement, the notebook,
radio waves, computer disks, brains in one universe all say that the positron
is up, and in the other, all down. Only the unobservable full wave function
“knows” that the positron is still both up and down.
And there is no longer a spooky superluminal action: in the first universe,
the electron was already down when send from earth. In the other universe, it
was send out as up. Similarly, for the case of the last subsection, where half
the wave function of an electron was send to Venus, the Schrödinger equation
does not fail. There is still half a chance of the electron to be on Venus; it just
gets decomposed into one universe with one electron, and a second one with
zero electron. In the first universe, earth send the electron to Venus, in the
second to Mars. The contradictions of quantum mechanics disappear when the
complete solution of the Schrödinger equation is examined.
Next, let’s examine why the results would seem to be covered by rules of
chance, even though the Schrödinger equation is fully deterministic. To do so,
assume earth keeps on sending entangled positron and electron pairs. When
the third pair is on its way, the situation looks as shown in the third column
of figure 11.9. The wave function now describes 8 universes. Note that in
most universes the observer starts seeing an apparently random sequence of up
and down spins. When repeated enough times, the sequences appear random
in practically speaking every universe. Unable to see the other universes, the
observer in each universe has no choice but to call her results random. Only the
full wave function knows better.
Everett, III also derived that the statistics of the apparently random se-
quences are proportional to the absolute squares of the eigenfunction expansion
coefficients, as the orthodox interpretation says.
How about the uncertainty relationship? For spins, the relevant uncertainty
relationship states that it is impossible for the spin in the up/down directions
and in the front/back directions to be certain at the same time. Measuring the
spin in the front/back direction will make the up/down spin uncertain. But if
the spin was always up, how can it change?
This is a bit more tricky. Let’s have the Mars observer do a couple of
additional experiments on one of her electrons, first one front/back, and then
another again up/down, to see what happens. To be more precise, let’s also ask
her to write the result of each measurement on a blackboard, so that there is a
11.6. THE MANY-WORLDS INTERPRETATION 529

Figure 11.9: The Bohm experiment repeated.

good record of what was found. Figure 11.10 shows what happens.
When the electron is send from Earth, two universes can be distinguished,
one in which the electron is up, and another in which it is down. In the first one,
the Mars observer measures the spin to be up and marks so on the blackboard.
In the second, she measures and marks the spin to be down.
Next the observer in each of the two universes measures the spin front/back.
Now it can be shown that the spin-up state in the first universe is a linear
combination of equal amounts of spin-front and spin-back. So the second mea-
surement splits the wave function describing the first universe into two, one with
spin-front and one with spin-back.
Similarly, the spin-down state in the second universe is equivalent to equal
amounts of spin-front and spin-back, but in this case with opposite sign. Either
way, the wave function of the second universe still splits into a universe with
spin front and one with spin back.
Now the observer in each universe does her third measurement. The front
electron consists of equal amounts of spin up and spin down electrons, and so
does the back electron, just with different sign. So, as the last column in figure
11.10 shows, in the third measurement, as much as half the eight universes
measure the vertical spin to be the opposite of the one they got in the first
530CHAPTER 11. THE INTERPRETATION OF QUANTUM MECHANICS

Figure 11.10: Repeated experiments on the same electron.

measurement!
The full wave function knows that if the first four of the final eight universes
are summed together, the net spin is still down (the two down spins have equal
and opposite amplitude). But the observers have only their blackboard (and
what is recorded in their brains, etcetera) to guide them. And that information
seems to tell them unambiguously that the front-back measurement “destroyed”
the vertical spin of the electron. (The four observers that measured the spin
to be unchanged can repeat the experiment a few more times and are sure to
eventually find that the vertical spin does change.)
The unavoidable conclusion is that the Schrödinger equation does not fail.
It describes the observations exactly, in full agreement with the orthodox inter-
pretation, without any collapse. The appearance of a collapse is actually just a
limitation of our human observational capabilities.
Of course, in other cases than the spin example above, there are more than
just two symmetric states, and it becomes much less self-evident what the proper
partial solutions are. However, it does not seem hard to make some conjectures.
For Schrödinger’s cat, you might model the radioactive decay that gives rise to
the Geiger counter going off as due to a nucleus with a neutron wave packet
rattling around in it, trying to escape. As chapter 5.7.1 showed, in quantum
mechanics each rattle will fall apart into a transmitted and a reflected wave.
The transmitted wave would describe the formation of a universe where the
neutron escapes at that time to set off the Geiger counter which kills the cat,
and the reflected wave a universe where the neutron is still contained.
For the standard quantum mechanics example of an excited atom emitting
a photon, a model would be that the initial excited atom is perturbed by the
ambient electromagnetic field. The perturbations will turn the atom into a
linear combination of the excited state with a bit of a lower energy state thrown
11.6. THE MANY-WORLDS INTERPRETATION 531

in, surrounded by a perturbed electromagnetic field. Presumably this situation


can be taken apart in a universe with the atom still in the excited state, and
the energy in the electromagnetic field still the same, and another universe with
the atom in the lower energy state with a photon escaping in addition to the
energy in the original electromagnetic field. Of course, the process would repeat
for the first universe, producing an eventual series of universes in almost all of
which the atom has emitted a photon and thus transitioned to a lower energy
state.
So this is where we end up. The equations of quantum mechanics describe
the physics that we observe perfectly well. Yet they have forced us to the un-
comfortable conclusion that, mathematically speaking, we are not at all unique.
Beyond our universe, the mathematics of quantum mechanics requires an infin-
ity of unobservable other universes that are nontrivially different from us.
Note that the existence of an infinity of universes is not the issue. They
are already required by the very formulation of quantum mechanics. The wave
function of say an arsenic atom already assigns a nonzero probability to every
possible configuration of the positions of the electrons. Similarly, a wave func-
tion of the universe will assign a nonzero probability to every possible configu-
ration of the universe, in other words, to every possible universe. The existence
of an infinity of universes is therefore not something that should be ascribed to
Everett, III {A.102}.
However, when quantum mechanics was first formulated, people quite obvi-
ously believed that, practically speaking, there would be just one universe, the
one we observe. No serious physicist would deny that the monitor on which you
may be reading this has uncertainty in its position, yet the uncertainty you are
dealing with here is so astronomically small that it can be ignored. Similarly
it might appear that all the other substantially different universes should have
such small probabilities that they can be ignored. The actual contribution of
Everett, III was to show that this idea is not tenable. Nontrivial universes must
develop that are substantially different.
Formulated in 1957 and then largely ignored, Everett’s work represents with-
out doubt one of the human race’s greatest accomplishments; a stunning dis-
covery of what we are and what is our place in the universe.
Appendix A

Notes

The notes in this section give background information, various derivations of


claims made, and other material that is not essential to understand quantum
mechanics. Use it when curious, or when a ambiguous issue arises.

A.1 Why another book on quantum mechan-


ics?
With the current emphasis on nanotechnology, quantum mechanics is becoming
increasingly essential to engineering students. Yet, the typical quantum mechan-
ics texts for physics students are not written in a style that most engineering
students would likely feel comfortable with. Furthermore, an engineering edu-
cation provides very little real exposure to modern physics, and introductory
quantum mechanics books do little to fill in the gaps. The emphasis tends to
be on the computation of specific examples, rather than on discussion of the
broad picture. Undergraduate physics students may have the luxury of years
of further courses to pick up a wide physics background, engineering graduate
students not really. In addition, the coverage of typical introductory quantum
mechanics books does not emphasize understanding of the larger-scale quantum
system that a density functional computation, say, would be used for.
Hence this book, written by an engineer for engineers. As an engineering
professor with an engineering background, this is the book I wish I would have
had when I started learning real quantum mechanics a few years ago. The
reason I like this book is not because I wrote it; the reason I wrote this book is
because I like it.
This book is not a popular exposition: quantum mechanics can only be de-
scribed properly in the terms of mathematics; suggesting anything else is crazy.
But the assumed background in this book is just basic undergraduate calculus
and physics as taken by all engineering undergraduates. There is no intention to

533
teach students proficiency in the clever manipulation of the mathematical ma-
chinery of quantum mechanics. For those engineering graduate students who
may have forgotten some of their undergraduate calculus by now, there are some
quick and dirty reminders in the notations. For those students who may have
forgotten some of the details of their undergraduate physics, frankly, I am not
sure whether it makes much of a difference. The ideas of quantum mechanics
are that different from conventional physics. But the general ideas of classical
physics are assumed to be known. I see no reason why a bright undergraduate
student, having finished calculus and physics, should not be able to understand
this book. A certain maturity might help, though. There are a lot of ideas to
absorb.
My initial goal was to write something that would “read like a mystery
novel.” Something a reader would not be able to put down until she had finished
it. Obviously, this goal was unrealistic. I am far from a professional writer, and
this is quantum mechanics, after all, not a murder mystery. But I have been
told that this book is very well written, so maybe there is something to be said
for aiming high.
To prevent the reader from getting bogged down in mathematical details, I
mostly avoid nontrivial derivations in the text. Instead I have put the outlines
of these derivations in notes at the end of this document: personally, I enjoy
checking the correctness of the mathematical exposition, and I would not want
to rob my students of the opportunity to do so too. In fact, the chosen approach
allows a lot of detailed derivations to be given that are skipped in other texts to
reduce distractions. Some examples are the harmonic oscillator, orbital angular
momentum, and radial hydrogen wave functions, Hund’s first rule, and rotation
of angular momentum. And then there are extensive derivations of material not
even included in other introductory quantum texts.
While typical physics texts jump back and forward from issue to issue, I
thought that would just be distracting for my audience. Instead, I try to fol-
low a consistent approach, with as central theme the method of separation-of-
variables, a method that most mechanical graduate students have seen before
already. It is explained in detail anyway. To cut down on the issues to be men-
tally absorbed at any given time, I purposely avoid bringing up new issues until
I really need them. Such a just-in-time learning approach also immediately an-
swers the question why the new issue is relevant, and how it fits into the grand
scheme of things.
The desire to keep it straightforward is the main reason that topics such as
Clebsch-Gordan coefficients (except for the unavoidable introduction of singlet
and triplet states) and Pauli spin matrices have been shoved out of the way to
a final chapter. My feeling is, if I can give my students a solid understanding
of the basics of quantum mechanics, they should be in a good position to learn
more about individual issues by themselves when they need them. On the other

534
hand, if they feel completely lost in all the different details, they are not likely
to learn the basics either.
That does not mean the coverage is incomplete. All topics that are conven-
tionally covered in basic quantum mechanics courses are present in some form.
Some are covered in much greater depth. And there is a lot of material that is not
usually covered. I include significant qualitative discussion of atomic and chem-
ical properties, Pauli repulsion, the properties of solids, Bragg reflection, and
electromagnetism, since many engineers do not have much background on them
and not much time to pick it up. The discussion of thermal physics is much more
elaborate than you will find in other books on quantum mechanics. It includes
all the essentials of a basic course on classical thermodynamics, in addition to
the quantum statistics. I feel one cannot be separated from the other, espe-
cially with respect to the second law. While mechanical engineering students
will surely have had a course in basic thermodynamics before, a refresher cannot
hurt. Unlike other books, this book also contains a chapter on numerical pro-
cedures, currently including detailed discussions of the Born-Oppenheimer ap-
proximation, the variational method, and the Hartree-Fock method. Hopefully,
this chapter will eventually be completed with a section on density-functional
theory. (The Lennard-Jones model is covered earlier in the section on molecular
solids.) The motivation for including numerical methods in a basic exposition
is the feeling that after a century of work, much of what can be done analyti-
cally in quantum mechanics has been done. That the greatest scope for future
advances is in the development of improved numerical methods.
Knowledgeable readers may note that I try to stay clear of abstract mathe-
matics when it is not needed. For example, I try to go slow on the more abstract
vector notation permeating quantum mechanics, usually phrasing such issues in
terms of a specific basis. Abstract notation may seem to be completely general
and beautiful to a mathematician, but I do not think it is going to be intu-
itive to a typical engineer. The discussion of systems with multiple particles
is centered around the physical example of the hydrogen molecule, rather than
particles in boxes. The discussion of solids avoids the highly abstract Kronig-
Penney (Heaviside functions) or Dirac combs (delta functions) mathematical
models in favor of a physical discussion of more realistic one-dimensional crys-
tals. The Lennard-Jones potential is derived for two atoms instead of harmonic
oscillators.
The book tries to be as consistent as possible. Electrons are grey tones at
the initial introduction of particles, and so they stay through the rest of the
book. Nuclei are red dots. Occupied quantum states are red, empty ones grey.
That of course required all figures to be custom made. They are not intended
to be fancy but consistent and clear. I also try to stay consistent in notations
throughout the book, as much as is possible without deviating too much from
established usage.

535
When I derive the first quantum eigenfunctions, for a pipe and for the har-
monic oscillator, I make sure to emphasize that they are not supposed to look
like anything that we told them before. It is only natural for students to want
to relate what we told them before about the motion to the completely different
story we are telling them now. So it should be clarified that (1) no, they are
not going crazy, and (2) yes, we will eventually explain how what they learned
before fits into the grand scheme of things.
Another difference of approach in this book is the way it treats classical
physics concepts that the students are likely unaware about, such as canonical
momentum, magnetic dipole moments, Larmor precession, and Maxwell’s equa-
tions. They are largely “derived“ in quantum terms, with no appeal to classical
physics. I see no need to rub in the student’s lack of knowledge of specialized
areas of classical physics if a satisfactory quantum derivation is readily given.
This book is not intended to be an exercise in mathematical skills. Review
questions are targeted towards understanding the ideas, with the mathematics
as simple as possible. I also try to keep the mathematics in successive questions
uniform, to reduce the algebraic effort required. There is an absolute epidemic
out there of quantum texts that claim that “the only way to learn quantum
mechanics is to do the exercises,” and then those exercises turn out to be,
by and large, elaborate exercises in integration and linear algebra that take
excessive time and have nothing to do with quantum mechanics. Or worse,
they are often basic theory. (Lazy authors that claim that basic theory is an
“exercise” avoid having to cover that material themselves and also avoid having
to come up with a real exercise.) Yes, I too did waste a lot of time with these.
And then, when you are done, the answer teaches you nothing because you are
unsure whether there might not be an algebraic error in your endless mass of
algebra, and even if there is no mistake, there is no hint that it means what you
think it means. All that your work has earned you is a 75/25 chance or worse
that you now “know” something that is not true. Not in this book.
Finally, this document faces the very real conceptual problems of quantum
mechanics head-on, including the collapse of the wave function, the indeter-
minacy, the nonlocality, and the symmetrization requirements. The usual ap-
proach, and the way I was taught quantum mechanics, is to shove all these
problems under the table in favor of a good sounding, but upon examination
self-contradictory and superficial story. Such superficiality put me off solidly
when they taught me quantum mechanics, culminating in the unforgettable
moment when the professor told us, seriously, that the wave function had to
be symmetric with respect to exchange of bosons because they are all truly the
same, and then, when I was popping my eyes back in, continued to tell us that
the wave function is not symmetric when fermions are exchanged, which are all
truly the same. I would not do the same to my own students. And I really
do not see this professor as an exception. Other introductions to the ideas of

536
quantum mechanics that I have seen left me similarly unhappy on this point.
One thing that really bugs me, none had a solid discussion of the many worlds
interpretation. This is obviously not because the results would be incorrect,
(they have not been contradicted for half a century,) but simply because the
teachers just do not like these results. I do not like the results myself, but basing
teaching on what the teacher would like to be true rather on what the evidence
indicates is true remains absolutely unacceptable in my book.

A.2 History and wishlist


• Oct 24, 2004. The first version of this manuscript was posted.

• Nov 27, 2004. A revised version was posted, fixing a major blunder related
to a nasty problem in using classical spring potentials for more than a
single particle. The fix required extensive changes. This version also
added descriptions of how the wave function of larger systems is formed.

• May 4, 2005. A revised version was posted. I finally read the paper by
Everett, III on the many worlds interpretation, and realized that I had
to take the crap out of pretty much all my discussions. I also rewrote
everything to try to make it easier to follow. I added the motion of wave
packets to the discussion and expanded the one on Newtonian motion.

• May 11 2005. I got cold feet on immediately jumping into separation of


variables, so I added a section on a particle in a pipe.

• Mid Feb., 2006. A new version was posted. Main differences are correction
of a number of errors and improved descriptions of the free electron and
band spectra. There is also a rewrite of the many worlds interpretation
to be clearer and less preachy.

• Mid April, 2006. Various minor fixes. Also I changed the format from the
“article” to the “book” style.

• Mid Jan., 2007. Added sections on confinement and density of states, a


commutator reference, a section on unsteady perturbed two state systems,
and an advanced chapter on angular momentum, the Dirac equation, the
electromagnetic field, and NMR. Fixed a dubious phrasing about the Dirac
equation and other minor changes.

• Mid Feb.,2007. There are now lists of key points and review questions for
chapter 1. Answers are in the new solution manual.

537
• April 2, 2007. There are now lists of key points and review questions
for chapter 2. That makes it the 3 beta 2 version. So I guess the final
beta version will be 3 beta 6. Various other fixes. I also added, probably
unwisely, a note about zero point energy.

• May 5, 2007. There are now lists of key points and review questions
for chapter 3. That makes it the 3 beta 3 version. Various other fixes,
like spectral line broadening, Helium’s refusal to take on electrons, and
countless other less than ideal phrasings. And full solutions of the har-
monic oscillator, spherical harmonics, and hydrogen wave function ODEs,
Mandelshtam-Tamm energy-time uncertainty, (all in the notes.) A dice is
now a die, though it sounds horrible to me. Zero point energy went out
again as too speculative.

• May 21, 2007. An updated version 3 beta 3.1 to correct a poorly written
subsection on quantum confinement for the particle in a pipe. Thanks to
Swapnil Jain for pointing out the problem. I do not want people to get
lost so early in the game, so I made it a priority correction. In general,
I do think that the sections added later to the document are not of the
same quality as the original with regard to writing style. The reason is
simple. When I wrote the original, I was on a sabbatical and had plenty
of time to think and rethink how it would be clearest. The later sections
are written during the few spare hours I can dig up. I write them and put
them in. I would need a year off to do this as it really should be done.

• July 19, 2007. Version 3 beta 3.2 adds a section on Hartree-Fock. It


took forever. My main regret is that most of them who wasted my time
in this major way are probably no longer around to be properly blasted.
Writing a book on quantum mechanics by an engineer for engineers is a
minefield of having to see through countless poor definitions and dubious
explanations. It takes forever. In view of the fact that many of those
physicist were probably supported by tax payers much of the time, it
should not be such an absolute mess!
There are some additions on Born-Oppenheimer and the variational for-
mulation that were in the Hartree-Fock section, but that I took out, since
they seemed to be too general to be shoved away inside an application.
Also rewrote section 4.7 and subsection 4.9.2 to be consistent, and in par-
ticular in order to have a single consistent notation. Zero point energy
(the vacuum kind) is back. What the heck.

• Sept. 9, 2007. Version 3 beta 3.3 mainly adds sections on solids, that have
been combined with rewritten free and nearly free electron gas sections into
a full chapter on solids. The rest of the old chapter on examples of multiple

538
particle systems has been pushed back into the basic multiple particle
systems chapter. A completely nonsensical discussion in a paragraph of
the free electron gas section was corrected; I cannot believe I have read
over that several times. I probably was reading what I wanted to say
instead of what I said. The alternative name “twilight terms” has been
substituted for “exchange terms.” Many minor changes.

• Dec. 20, 2007. Version 3 beta 3.4 cleans up the format of the “notes.” No
more need for loading an interminable web page of 64 notes all at the same
time over your phone line to read 20 words. It also corrects a few errors,
one important one pointed out by Johann Joss. It also also extends some
further griping about correlation energy to all three web locations. You
may surmise from the lack of progress that I have been installing linux on
my home PC. You are right.

• April 7, 2008. Version 3 beta 4 adds key points and exercises added to
chapter 4, with the usual rewrites to improve clarity. The Dirichlet com-
pleteness proof of the Fourier modes has been moved from the solution
manual to the notes. The actual expressions for the hydrogen molecular
ion integrals are now given in the note. The London force derivation has
been moved to the notes. The subsection of ferromagnetism has been
rewritten to more clearly reflect the uncertainty in the field, and a discus-
sion of Hund’s rules added.

• July 2, 2008. Version 3 beta 4.1 adds a new, “advanced,” chapter on basic
and quantum thermodynamics. An advanced section on the fundamental
ideas underlying quantum field theory has also been added. The discus-
sion of the lambda transition of helium versus Bose-Einstein condensation
has been rewritten to reflect the considerable uncertainty. Uniqueness
has been added to the note on the hydrogen molecular ion ground state
properties. Added a missing 2π in the Rayleigh-Jeans formula.

• July 14, 2008. Version 3 beta 4.2 expands the section on unsteady two-
state systems to include a full discussion of “time-dependent perturbation
theory,” read emission and absorption of radiation. Earlier versions just
had a highly condensed version since I greatly dislike the derivations in
typical textbooks that are full of nontrivial assumptions for which no
justification is, or can be, given at all.

• Jan. 1, 2009. Version 4.0 alpha reorders the book into two parts to achieve
a much better book structure. The changed thinking justifies a new ver-
sion. Parts of the lengthy preface have been moved to the notes. The
background sections have been combined in their own chapter to reduce

539
distraction in part I. There is now a derivation of effective mass in a
note. A few more commutators were added to the reference. There is a
note on Fourier transforms and the Parseval equality. The stupid discus-
sion of group velocity has been replaced by a better (even more stupid?)
one. Two of the gif animations were erroneous (the non-delta function
tunneling and the rapid drop potential) and have been corrected. High
resolution versions of the animations have been added. Time-dependent
perturbation theory is now concisely covered. WKB theory is now cov-
ered. Alpha decay is now included. The adiabatic theorem is now cov-
ered. Three-dimensional scattering is now covered, in a note. Fixed a
mistyped bucket number energy in the thermo chapter. The derivations
of the Dirac equation and the gyromagnetic ratio of electron spin have
been moved to the notes. Note A.87 now gives the full derivation of the
expectation Lorentz force. The direction of the magnetic field in the figure
for Maxwell’s fourth law was corrected. A section on electrostatic solu-
tions has been added. The description on electrons in magnetic fields now
includes the diamagnetic contribution. The section on Stern-Gerlach was
moved to the electromagnetic section where it belongs. Electron split ex-
periments have been removed completely. There is now a full coverage of
time-independent small perturbation theory, including the hydrogen fine
structure. Natural frequency is now angular frequency. Gee. The Planck
formula is now the Planck-Einstein relation. The Euler identity is now the
apparently more common Euler formula. Black body as noun, blackbody
as compound adjective.

• Jan. 19, 2009. Version 4.1 alpha. There is now a discussion of the Heisen-
berg picture. The horribly written, rambling, incoherent, section on nearly
free electrons that has been bothering me for years has been rewritten into
two much better sections. There is now a discussion on the quantization
of the electromagnetic field, including photon spin and spontaneous emis-
sion. The Rayleigh formula is now derived. The perturbation expansion
of eigenfunctions now refers to Rellich’s book to show that it really works
for degenerate eigenfunctions.

• March 22, 2009. Version 4.2 alpha. Spin matrices for systems greater than
spin one half are now discussed. Classical Lagrangian and Hamiltonian
dynamics is now covered in a note. Special relativity is now covered in a
note. There is now a derivation of the hydrogen dipole selection rules and
more extensive discussion of forbidden transitions. Angular momentum
and parity conservation in transitions are now discussed. The Gamow
theory data are now corrected for nuclear versus atomic mass. There
is no perceivable difference, however. The alignment bars next to the

540
electromagnetic tables in the web version should have been eliminated.

Part I is mostly in a fairly good shape. But there are a few recent additions
that probably could do with another look.
In Part II various sections sure could do with a few more rewrites. I am
expanding the quantity much faster than the quality at the time of this writing.
Somewhat notably missing at this time:

1. Electron split experiments. Do engineers need it??

2. Quantum electrodynamics. Do engineers need it??

3. Density-functional theory.

4. Nuclear drop and shell models. Beyond the scope of the book?? Definitely
an engineering topic.

Basic nuclear physics will probably be added soon. Presumably, alpha decay
will be moved to that section.
Density-functional theory will eventually be added. How old are you, and
how is your health?
The index is in a sorry shape. It is being worked on.
I would like to add key points and review questions to all basic sections. I
am inching up to it. Very slowly.
After that, the idea is to run all this text through a style checker to eliminate
the dead wood. Also, ispell seems to be missing misspelled words. Probably
thinks they are TeX.
It would be nice to put frames around all key formulae. Many are already
there.

A.3 Lagrangian mechanics


Lagrangian mechanics is a way to simplify complicated dynamical problems.
This note gives a brief overview. For details and practical examples you will
need to consult a good book on mechanics.

A.3.1 Introduction
As a trivial example of how Lagrangian mechanics works, consider a simple
molecular dynamics simulation. Assume that the forces on the particles are
given by a potential that only depends on the positions of the particles.

541
The difference between the net kinetic energy and the net potential energy
is called the “Lagrangian.” For a system of particles as considered here it takes
the form X
L= 1
m |~v |2 − V (~r1 ,~r2 , . . .)
2 j j
j

where j indicates the particle number and V the potential of the attractions
between the particles and any external forces.
It is important to note that in Lagrangian dynamics, the Lagrangian must
mathematically be treated as a function of the velocities and positions of the
particles. While for a given motion, the positions and velocities are in turn a
function of time, time derivatives must be implemented through the chain rule,
i.e. by means of total derivatives of the Lagrangian.
The “canonical momentum” pcj,i of particle j in the i direction, (with i = 1,
2, or 3 for the x, y, or z component respectively), is defined as

∂L
pcj,i ≡
∂vj,i

For the Lagrangian above, this is simply the normal momentum mvj,i of the
particle in the i-direction.
The Lagrangian equations of motion are
dpcj,i ∂L
=
dt ∂rj,i

This is simply Newton’s second law in disguise: the left hand side is the time
derivative of the linear momentum of particle j in the i-direction, giving mass
times acceleration in that direction; the right hand side is the minus the spatial
derivative of the potential, which gives the force in the i direction on particle j.
Obviously then, use of Lagrangian dynamics does not help here.

A.3.2 Generalized coordinates


One place where Lagrangian dynamics is very helpful is for macroscopic objects.
Consider for example the dynamics of a frisbee. Nobody is going to do a molec-
ular dynamics computation of a frisbee. What you do is approximate the thing
as a “solid body,” (or more accurately, a rigid body). The position of every part
of a solid body can be fully determined using only six parameters, instead of the
countless position coordinates of the individual atoms. For example, knowing
the three position coordinates of the center of gravity of the frisbee and three
angles is enough to fully fix it. Or you could just choose three reference points
on the frisbee: giving three position coordinates for the first point, two for the
second, and one for the third is another possible way to fix its position.

542
Such parameters that fix a system are called “generalized coordinates.” The
word generalized indicates that they do not need to be Cartesian coordinates;
often they are angles or distances, or relative coordinates or angles. The number
of generalized coordinates is called the number of degrees of freedom. It varies
with the system. A bunch of solid bodies moving around freely will have six per
solid body; but if there are linkages between them, like the bars in your car’s
suspension system, it reduces the number of degrees of freedom. A rigid wheel
spinning around a fixed axis has only one degree of freedom, and so does a solid
pendulum swinging around a fixed axis. Attach a second pendulum to its end,
maybe not in the same plane, and the resulting compound pendulum has two
degrees of freedom.
If you try to describe such systems using plain old Newtonian mechanics,
it can get ugly. For each solid body you can apply that the sum of the forces
must equal mass times acceleration of the center of gravity, and that the net
moment around the center of gravity must equal the rate of change of angular
momentum, which you then presumably deduce using the principal axis system.
Instead of messing with all that complex vector algebra, Lagrangian dynam-
ics allows you to deal with just a single scalar, the Lagrangian. If you can merely
figure out the net kinetic and potential energy of your system in terms of your
generalized coordinates and their time derivatives, you are in business.
If there are linkages between the members of the system, the benefits mag-
nify. A brute-force Newtonian solution of the three-dimensional compound pen-
dulum would involve six linear momentum equations and six angular ones. Yet
the thing has only two degrees of freedom; the angular orientations of the in-
dividual pendulums around their axes of rotation. The reason that there are
twelve equations in the Newtonian approach is that the support forces and
moments exerted by the two axes add another 10 unknowns. A Lagrangian
approach allows you to just write two equations for your two degrees of free-
dom; the support forces do not appear in the story. That provides a great
simplification.

A.3.3 Lagrangian equations of motion


This section describes the Lagrangian approach to dynamics in general. Assume
that you have chosen suitable generalized coordinates that fully determine the
state of your system. Call these generalized coordinates q1 , q2 , . . . and their
time derivatives q̇1 , q̇2 , . . . . The number of generalized coordinates K is the
number of degrees of freedom in the system. A generic canonical coordinate
will be indicated as qk .
Now find the kinetic energy T and the potential energy V of your system in
terms of these generalized coordinates and their time derivatives. The difference

543
is the Lagrangian:

L(q1 , q2 , . . . , qK , q̇1 , q̇2 , . . . , q̇K , t)


≡ T (q1 , q2 , . . . , qK , q̇1 , q̇2 , . . . , q̇K , t) − V (q1 , q2 , . . . , qK , t)

Note that the potential energy depends only on the position coordinates of the
system, but the kinetic energy also depends on how fast they change with time.
Dynamics books give lots of helpful formulae for the kinetic energy of the solid
members of your system, and the potential energy of gravity and within springs.
The canonical momenta are defined as
∂L
pck ≡ (A.1)
∂ q̇k
for each individual generalized coordinate qk . The equations of motion are
dpck ∂L
= + Qk (A.2)
dt ∂qk
There is one such equation for each generalized coordinate qk , so there are
exactly as many equations as there are degrees of freedom. The equations are
second order in time, because the canonical momenta involve first order time
derivatives of the qk .
The Qk terms are called generalized forces, and are only needed if there are
forces that cannot be modeled by the potential V . That includes any frictional
forces that are not ignored. To find the generalized force Qk at a given time,
imagine that the system is displaced slightly at that time by changing the cor-
responding generalized coordinate qk by an infinitesimal amount δqk . Since this
displacement is imaginary, it is called a “virtual displacement.” During such a
displacement, each force that is not modelled by V produces a small amount
of “virtual work.” The net virtual work divided by δqk gives the generalized
force Qk . Note that frictionless supports normally do not perform work, because
there is no displacement in the direction of the support force. Also, frictionless
linkages between members do not perform net work, since the forces between
the members are equal and opposite. Similarly, the internal forces that keep a
solid body rigid do not perform work.
The bottom line is that normally the Qk are zero if you ignore friction. How-
ever, any collisions against rigid constraints have to be modeled separately, just
like in normal Newtonian mechanics. For an infinitely rigid constraint to absorb
the kinetic energy of an impact requires infinite force, and Qk would have to
be an infinite spike if described normally. Of course, you could instead consider
describing the constraint as somewhat flexible, with a very high potential energy
penalty for violating it. Then make sure to use an adaptive time step in any
numerical integration.

544
It may be noted that in relativistic mechanics, the Lagrangian is not the
difference between potential and kinetic energy. However, the Lagrangian equa-
tions of motion (A.1) and (A.2) still apply. The unifying concept is that of
“action,” defined as the time integral of the Lagrangian. The action integral is
unchanged by infinitesimal temporary displacements of the system, and that is
all that is needed for the Lagrangian equations of motion to apply.

Derivation
To derive the nonrelativistic Lagrangian, consider the system to be build up from
elementary particles numbered by an index j. You may think of these particles
as the atoms you would use if you would do a molecular dynamics computation
of the system. Because the system is assumed to be fully determined by the
generalized coordinates, the position of each individual particle is fully fixed
by the generalized coordinates and maybe time. (For example, it is implicit
in a solid body approximation that the atoms are held rigidly in their relative
position. Of course, that is approximate; you pay some price for avoiding a full
molecular dynamics simulation.)
Newton’s second law says that the motion of each individual particle j is
governed by
d2~rj ∂V
mj 2 = − + F~j′
dt ∂~rj
where the derivative of the potential V can be taken to be its gradient, if you
(justly) object to differentiating with respect to vectors, and F~j′ indicates any
part of the force not described by the potential.
Now consider an infinitesimal virtual displacement of the system from its
normal evolution in time. It produces an infinitesimal change in position δ~rj (t)
for each particle. After such a displacement, ~rj + δ~rj of course no longer satisfies
the correct equations of motion, but the kinetic and potential energies still exist.
In the equation of motion for the correct position ~rj above, take the mass
times acceleration to the other side, multiply by the virtual displacement, sum
over all particles j, and integrate over an arbitrary time interval:
Z " #
t2 X d2~rj ∂V
0= −mj 2 − + F~j′ · δ~rj dt
t1 j dt ∂~rj

Multiply out and integrate the first term by parts:


Z " #
t2 X d~rj d~rj ∂V
0= mj ·δ − · δ~rj + F~j′ δ~rj dt
t1 j dt dt ∂~rj

The virtual displacements of interest here are only nonzero over a limited range
of times, so the integration by parts did not produce any end point values.

545
Recognize the first two terms within the brackets as the virtual change in
the Lagrangian due to the virtual displacement at that time. Note that this
requires that the potential energy depends only on the position coordinates and
time, and not also on the time derivatives of the position coordinates. You get
Z t2 Z t2 Xh i
0=δ L dt + F~j′ · δ~rj dt (A.3)
t1 t1 j

In case that the additional forces F~j′ are zero, this produces the action princi-
ple: the time integral of the Lagrangian is unchanged under infinitesimal virtual
displacements of the system, assuming that they vanish at the end points of in-
tegration. More generally, for the virtual work by the additional forces to be
zero will require that the virtual displacements respect the rigid constraints, if
any. The infinite work done in violating a rigid constraint is not modeled by
the potential V in any normal implementation.
Unchanging action is an integral equation involving the Lagrangian. To get
ordinary differential equations, take the virtual change in position to be that
due to an infinitesimal change δqk (t) in a single generic generalized coordinate.
Represent the change in the Lagrangian in the expression above by its partial
derivatives, and the same for δ~rj :
Z " #Z t2 X " #
t2 ∂L ∂L ∂~rj
0= δqk + δ q̇k dt + F~j′ · δqk dt
t1 ∂qk ∂ q̇k t1 j ∂qk

The integrand in the final term is by definition the generalized force Qk multi-
plied by δqk . In the first integral, the second term can be integrated by parts,
and then the integrals can be combined to give
Z " Ã ! #
t2 ∂L d ∂L
0= − + Qk δqk dt
t1 ∂qk dt ∂ q̇k

Now suppose that there is any time at which the expression within the square
brackets is nonzero. Then a virtual change δqk that is only nonzero in a very
small time interval around that time, and everywhere positive in that small
interval, would produce a nonzero right hand side in the above equation, but
it must be zero. Therefor, the expression within brackets must be zero at all
times. That gives the Lagrangian equations of motion, because the expression
between parentheses is defined as the canonical momentum.

A.3.4 Hamiltonian dynamics


For a system with K generalized coordinates the Lagrangian approach provides
one equation for each generalized coordinate qk . These K equations involve

546
second order time derivatives of the K unknown generalized coordinates qk .
However, if you consider the time derivatives q̇k as K additional unknowns, you
get K first order equations for these 2K unknowns. An additional K equations
are:
dqk
= q̇k
dt
These are no longer trivial because they now give the time derivatives of the
first K unknowns in terms of the second K of them. This trick is often needed
when using canned software to integrate the equations, because canned software
typically only does systems of first order equations.
However, there is a much neater way to get 2K first order equations in 2K
unknowns, and it is particularly close to concepts in quantum mechanics. Define
the “Hamiltonian” as
K
X
H(q1 , q2 , . . . , qK , pc1 , pc2 , . . . , pcK , t) ≡ q̇k pck − L(q1 , q2 , . . . , qK , q̇1 , q̇2 , . . . , q̇K , t)
k=1
(A.4)
In the right hand side expression, you must rewrite all the time derivatives q̇k
in terms of the canonical momenta
∂L
pck ≡
∂ q̇k
because the Hamiltonian must be a function of the generalized coordinates and
the canonical momenta only. (In case you are not able to readily solve for the q̇k
in terms of the pck , things could become messy. But in principle, the equations
to solve are linear for given values of the qk .)
In terms of the Hamiltonian, the equations of motion are
dqk ∂H dpck ∂H
= c =− + Qk (A.5)
dt ∂pk dt ∂qk
where the Qk , if any, are the generalized forces as before.
If the Hamiltonian does not explicitly depend on time and the generalized
forces are zero, these evolution equations imply that the Hamiltonian does not
change with time at all. For such systems, the Hamiltonian is the preserved total
energy of the system. In particular for a nonrelativistic system, the Hamiltonian
is the sum of the kinetic and potential energies, provided that the position of
the system only depends on the generalized coordinates and not also explicitly
on time.

Derivation
To derive the Hamiltonian equations, consider the general differential of the
Hamiltonian function (regardless of any motion that may go on). According to

547
the given definition of the Hamiltonian function, and using a total differential
for dL,
à ! à !
X X X ∂L X ∂L ∂L
dH = pck dq̇k + q̇k dpck − dqk − dq̇k − dt
k k k ∂qk k ∂ q̇k ∂t

The sums within parentheses cancel each other because of the definition of the
canonical momentum. The remaining differences are of the arguments of the
Hamiltonian function, and so by the very definition of partial derivatives,
∂H ∂L ∂H ∂H ∂L
=− = q̇k =−
∂qk ∂qk ∂pck ∂t ∂t
Now consider an actual motion. For an actual motion, q̇k is the time deriva-
tive of qk , so the second partial derivative gives the first Hamiltonian equation
of motion. The first partial derivative gives the second equation when combined
with the Lagrangian equation of motion (A.2).
It is still to be shown that the Hamiltonian of a classical system is the sum
of kinetic and potential energy if the position of the system does not depend
explicitly on time. The Lagrangian can be written out in terms of the system
particles as
K X
XX K
1 ∂~rj ∂~rj
m
2 j
· q̇k q̇k − V (q1 , q2 , . . . , qK , t)
j k=1 k=1 ∂qk ∂qk

where the sum represents the kinetic energy. The Hamiltonian is defined as
X ∂L
q̇k −L
k ∂ q̇k

and straight substitution shows the first term to be twice the kinetic energy.

A.4 Special relativity


Special relativity tends to keep popping up in quantum mechanics. This note
gives a brief summary of the relevant points.

A.4.1 History
Special relativity is commonly attributed to Albert Einstein’s 1905 papers, even
though Einstein swiped the big ideas of relativity from Henri Poincaré, (who
developed and named the principle of relativity in 1895 and the mass-energy
relation in 1900), without giving him any credit or even mentioning his name.

548
He may also have swiped the underlying mathematics he used from Lorentz,
(who is mentioned, but not in connection with the Lorentz transformation.)
However, in case of Lorentz, it is possible to believe that Einstein was unaware
of his earlier work, if you are so trusting. Before you do, it must be pointed
out that a review of Lorentz work appeared in the same journal as the one
in which Einstein published his papers on relativity. In case of Poincaré, it is
known that Einstein and a friend pored over Poincaré’s 1902 book “Science and
Hypothesis;” in fact the friend noted that it kept them “breathless for weeks on
end.” So Einstein cannot possibly have been unaware of Poincaré’s work.
However, Einstein should not just be blamed for his boldness in swiping most
of his paper from then more famous authors, but also be commended for his
boldness in completely abandoning the basic premises of Newtonian mechanics,
where earlier authors wavered. It should also be noted that general relativity
can clearly be credited to Einstein fair and square. But he was a lot less hungry
then. (And had a lot more false starts.)

A.4.2 Overview of relativity


The most important result of relativity for this book is Einstein’s famous relation
E = mc2 , where E is energy, m mass, and c the speed of light. (To be precise,
this relation traces back to Poincaré, but Einstein generalized the idea.) The
kinetic energy of a particle is not 12 mv 2 , with m the mass and v the velocity, as
Newtonian physics says. Instead it is the difference between the energy mv c2
based on the mass mv of the particle in motion and the energy m0 c2 based on
the mass m0 of the particle at rest. According to special relativity the mass in
motion is
m0
mv = q (A.6)
1 − (v/c)2
so the true kinetic energy is
m0
T =q c2 − m0 c2
1 − (v/c)2
For velocities small compared to the tremendous speed of light, this is equiv-
alent to the classical 21 m0 v 2 ; that can be seen from Taylor series expansion of
the square root. But when the particle speed approaches the speed of light,
the above expression implies that the kinetic energy approaches infinity. Since
there is no infinite supply of energy, the velocity of a material object must al-
ways remain less than the speed of light. The only reason that the photons
of electromagnetic radiation, (including radio waves, microwaves, light, x-rays,
gamma rays, etcetera), can travel at the speed of light is because they have zero
rest mass m0 ; there is no way that they can be brought to a halt, because there
would be nothing left.

549
Quantum mechanics does not use the speed v but the momentum p = mv v;
in those terms the square root can be rewritten to give the kinetic energy as
s
p2
T = m 0 c2 1 + − m0 c2 (A.7)
m20 c2
This expression is readily checked by substituting in for p and then mv .
Note that it suggests that a particle at rest still has a “rest mass energy”
m0 c2 left. And so it turns out to be. For example, an electron and a positron
can completely annihilate each other, releasing their rest mass energies as two
photons that fly apart in opposite directions. Similarly, a photon of electro-
magnetic radiation with enough energy can create an electron-positron pair out
of nothing. (This does require that a heavy nucleus is around to absorb the
photon’s linear momentum without absorbing too much of its energy; otherwise
it would violate momentum conservation.) Perhaps more importantly for engi-
neering applications, the difference between the rest masses of two atomic nuclei
gives the energy released in nuclear reactions that convert one nucleus into the
other.
Another weird relativistic effect is that the speed of light is the same regard-
less of how fast you are travelling. Michelson & Morley tried to determine the
absolute speed of the earth through space by “horse-racing” it against light. If
a passenger jet airplane flies at three quarters of the speed of sound, then sound
waves going in the same direction as the plane only have a speed advantage of
one quarter of the speed of sound over the plane. Seen from inside the plane,
that sound seems to move away from it at only a quarter of the normal speed of
sound. Michelson & Morley figured that the speed of the earth could similarly
be measured by measuring how much it reduces the apparent speed of light
moving in the same direction through a vacuum. But it proved that the motion
of the earth produced no reduction in the apparent speed of light whatsoever.
It is as if you are racing a fast horse, but regardless of how fast you are going,
you do not reduce the velocity difference any more than if you would just stop
your horse and have a drink.
You can think up a hundred lame excuses. (In particular, the sound inside
a plane does not move slower in the direction of motion. But of course, sound
is transmitted by real air molecules that can be trapped inside a plane by well
established mechanisms. It is not transmitted in empty space like light.) Or you
can be honest. In 1895 Poincaré reasoned that such experiments suggested that
it seems to be impossible to detect the absolute motion of matter. In 1900 he
proposed the “Principle of Relative Motion,” that the laws of movement should
be the same in all coordinate systems regardless of their velocity, as long as they
are not accelerating. In 1904 he called it
“The principle of relativity, according to which the laws of physical
phenomena must be the same for a stationary observer as for one

550
carried along in a uniform motion of translation, so that we have
no means, and can have none, of determining whether or not we are
being carried along in such a motion.”

In short, if two observers are moving at a constant speed relative to each other,
it is impossible to say which one, if any, is at rest. (Do note however that if an
observer is accelerating or spinning around, that can be determined through the
generated inertia forces. Not all motion is relative. Just an important subset of
it.)
All this pops up in the thought example of figure 11.4 where an electron
is sent at very nearly the speed of light to Mars and a positron in the other
direction to Venus. As far as an observer on Earth is concerned, the electron
and positron reach their destinations at almost the same time. For a observer
traveling in the direction from Venus to Mars at almost the speed of light, the
electron and positron still seem to be going at close to the speed of light, if their
kinetic energies are high enough. However, for that observer, it appears that
Venus is moving at close to the speed of light away from its positron, while Mars
is moving at the same speed towards its electron. Therefor it appears to that
observer that the electron reaches Mars much earlier than the positron reaches
Venus.
Obviously then, observers in relative motion disagree about the time differ-
ence between events occurring at different locations. Worse, even if two events
happen right in the hands of one of the observers, the observers will disagree
about how long the entire thing takes. In that case, the observer compared
to which the location of the events is in motion will think that it takes longer.
This is called “time-dilation.” The time interval between two events slows down
according to
∆t0
∆tv = q (A.8)
1 − (v/c)2
where ∆tv is shorthand for the time interval between the events as perceived
by an observer compared to whom the location of the events is moving at speed
v, while ∆t0 is the same time interval as perceived by an observer compared to
whom the location is at rest.
For example, cosmic rays can create radioactive particles in the upper at-
mosphere that reach the surface of the earth, even though in a laboratory they
do not last long enough to do so by far. Because of the high speed of these
particles, for an observer standing on earth the decay process seems to take
much longer than normal. Time dilation in action.
Which of course raises the question: should then not an observer moving
along with one such particle observe that the particle does not reach the earth?
The answer is no; relativity maintains a single reality; a particle either reaches

551
the earth or not, regardless of who is doing the observing. It is quantum me-
chanics, not relativity, that does away with a single reality. The observer moving
with the particle observes that the particle reaches the earth, not because the
particle seems to last longer than usual, but because the distance to travel to the
surface of the earth has become much shorter! This is called “Lorentz-Fitzgerald
contraction.” For the observer moving with the particle, it seems that the entire
earth system, including the atmosphere, is in motion with almost the speed of
light. The size of objects in motion seems to contract in the direction of the
motion according q
∆xv = ∆x0 1 − (v/c)2 (A.9)
where the x-axis is taken to be in the direction of motion and ∆x0 is the distance
in x-direction as perceived by an observer compared to which the object is at
rest.
In short, for the observer standing on earth, the particle reaches
q earth be-
cause its motion slows down the decay process by a factor 1/ 1 − (v/c)2 . For
the observer moving along with the particle, the particle reaches earth because
the distance to travel to the surface of the earth has become shorter by exactly
that same factor. The inverse square root is called the “Lorentz factor.”

A.4.3 Lorentz transformation

yA yB yA yB

³u
A ³uB V - ¾ V ³uA³³uB
³ ³³ ³³ xA xB ³³ ³³ xA xB
³
zA zB ³
z zB
A

Figure A.1: Coordinate systems for the Lorentz transformation.

The “Lorentz transformation” describes exactly how measurements of the


position and time of events change from one observer to the next. Consider two
observers A and B that are in motion compared to each other with a relative
speed V . To relate their time and position measurements, it is convenient to
give each observer its own coordinate system x, y, z, t as shown in figure A.1.
Each coordinate system will be taken to have its origin at the location of the
observer, while both x-axes are along the line from A to B, both y-axes are

552
parallel, and both z-axes are parallel. As the left side of figure A.1 shows,
observer A can believe herself to be at rest and see observer B moving away
from her at speed V ; similarly, observer B can believe himself to be at rest and
see observer A moving away from him at speed V , as in the right side of the
figure. The principle of relativity says that both views are equally valid; there
is no physical measurement that can find a fundamental difference between the
two.
The Lorentz transformation says that the relation between positions and
times of events as perceived by the two observers is
ctA − (V /c)xA xA − (V /c)ctA
ctB = q xB = q yB = yA zB = zA
1 − (V /c)2 1 − (V /c)2
(A.10)
To get the transformation of the coordinates of B into those of A, just swap A
and B and replace V by −V . Indeed, if observer B is moving in the positive
x-direction with speed V compared to observer A, then observer A is moving
in the negative x direction with speed V compared to observer B, as in figure
A.1. In the limit that c becomes infinite, the Lorentz transformation becomes
the nonrelativistic “Galilean transformation” in which tB is simply tA and xB =
xA − V t, i.e. xB equals xA except for a shift of magnitude V t.
As a result of the Lorentz transformation, measured velocities are related as
q q
vx,A − V vy,A 1 − (V /c)2 vz,A 1 − (V /c)2
vx,B = vy,B = vz,B =
1 − (V /c2 )vx,A 1 − (V /c2 )vx,A 1 − (V /c2 )vx,A
(A.11)
Note that vx , vy , vz refer here to the perceived velocity components of some
moving object; they are not components of the velocity difference V between
the coordinate systems.

Derivation
The Lorentz transformation can be derived by assuming that the transformation
from the coordinates tA , xA , yA , zA to tB , xB , yB , zB is linear;
tB = atx xA + aty yA + atz zA + att tA yB = ayx xA + ayy yA + ayz zA + ayt tA
xB = axx xA + axy yA + axz zA + axt tA zB = azx xA + azy yA + azz zA + azt tA

where the a.. are constants still to be found. The big reason to assume that the
transformation should be linear is that if space is populated with observers A
and B, rather than just have a single one sitting at the origin of that coordinate
system, then a linear transformation assures that all pairs of observers A and
B see the exact same transformation. In addition, the transformation from
xB , yB , zB , tB back to xA , yA , zA , tA should be similar to the one the other way,

553
since the principle of relativity asserts that the two coordinate systems are
equivalent. A linear transformation has a back transformation that is also linear.
Note that since the choice what to define as time zero and as the origin is quite
arbitrary, it can be arranged that xB , yB , zB , tB are zero when xA , yA , zA , tA are.
A lot of additional constraints can be put in because of physical symmetries
that surely still apply even allowing for relativity. For example, the transforma-
tion to xB , tB should not depend on the arbitrarily chosen positive directions of
the y and z axes, so throw out the y and z terms in those equations. Seen in
a mirror along the xy-plane, the y transformation should look the same, even
if z changes sign, so throw out zA from the equation for yB . Similarly, there
goes yA in the equation for zB . Since the choice of y and z-axes is arbitrary,
the remaining az. coefficients must equal the corresponding ay. ones. Since the
basic premise of relativity is that the coordinate systems A and B are equiva-
lent, the y-difference between tracks parallel to the direction of motion cannot
get longer for B and shorter for A, nor vice-versa, so ayy = 1. Finally, by the
very definition of the relative velocity v of coordinate system B with respect
to system A, xB = yB = zB = 0 should correspond to xA = vtA . And by the
principle of relativity, xA = yA = zA = 0 should correspond to xB = −vtB . You
might be able to think up some more constraints, but this will do. Put it all
together to get

tB = atx xA + axx tA yB = ayx xA + yA + ayt tA


xB = axx (xA − vtA ) zB = ayx xA + zA + ayt tA

Next the trick is to consider the wave front emitted by some light source
that flashes at time zero at the then coinciding origins. Since according to
the principle of relativity the two coordinate systems are fully equivalent, in
both coordinate systems the wave front forms an expanding spherical shell with
radius ct:
x2A + yA2 + zA2 = c2 t2A x2B + yB2 + zB2 = c2 t2B
Plug the linearized expressions for xB , yB , zB , tB in terms of xA , yA , zA , tA into
the second equation and demand that it is consistent with the first equation,
and you obtain the Lorentz transformation. To get the back transformation
giving xA , yA , zA , tA in terms of xB , yB , zB , tB , solve the Lorentz equations for
xA , yA , zA , and tA .
Now assume that two events 1 and 2 happen at the same location xA , yA , zA
in system A, then the Lorentz transformation formula (A.10) giving tB implies
the time dilation for the events seen in the coordinate system B. Next assume
that two stationary locations in system B are apart by a distance xB,2 − xB,1
in the direction of relative motion; then the Lorentz transformation formula
giving xB implies that seen in the A system, these now moving locations are
at any single time tA apart by a distance reduced by the Lorentz-Fitzgerald

554
contraction. Take differentials of the Lorentz transformation formulae to derive
the given transformations between the velocities seen in the two systems.

A.4.4 Proper time and distance


In classical Newtonian mechanics, time is absolute. All observers agree about
the difference in time ∆t between any two events:

nonrelativistic: ∆t is independent of the observer

The time interval is an “invariant;” it is the same for all observers. All observers,
regardless of how their spatial coordinate systems are oriented, also agree over
the distance ∆s between two events that occur at the same time:

nonrelativistic: ∆s is independent of the observer if ∆t = 0

Here the distance between any two points 1 and 2 is found as


q q
∆s ≡ |∆~r| ≡ (∆~r) · (∆~r) = (∆x)2 + (∆y)2 + (∆z)2 ∆r ≡ ~r2 − ~r1

The fact that the distance may be expressed as a square root of the sum of the
square components is known as the “Pythagorean theorem.”
Relativity messes all these things up big time. As time dilation shows,
the time between events now depends on who is doing the observing. And as
Lorentz-Fitzgerald contraction shows, distances now depend on who is doing the
observing. For example, consider a moving ticking clock. Not only do different
observers disagree over the distance ∆s traveled between ticks, (as they would do
nonrelativistically), but they also disagree about the time interval ∆t between
ticks, (which they would not do nonrelativistically).
However, there is one thing that all observers can agree on. They do agree on
how much time between ticks an observer moving along with the clock would
measure. That time interval is called the “proper time” interval ∆t0 . It is
shorter than the time interval that an observer actually perceives due to the
time dilation:
∆t0
∆t = q
1 − (v/c)2
where v is the velocity of the clock as perceived by the observer. To clean this
up, take the square root to the other side, move the time interval inside it,
multiply by c to get rid of the ratio, and then recognize v∆t as the distance
traveled. That gives the proper time interval ∆t0 as
q
c∆t0 = (c∆t)2 − (∆x)2 − (∆y)2 − (∆z)2 (A.12)

555
The scaled proper time interval c∆t0 is called the “space-time interval,”
because it involves both space and time. In particular, it involves the spatial
distance between the events too. Note however that the interval is imaginary
if the quantity under the square root is negative. For example, if an observer
perceives two events as happening simultaneously at two different locations,
then the space-time interval between those two events is imaginary. To avoid
dealing with complex numbers, it is then more convenient to define the “proper
distance” between two events as
q
∆s = (∆x)2 + (∆y)2 + (∆z)2 − (c∆t)2 (A.13)

If the space-time interval, or proper time interval, is real, it is called “time-


like.” If it is imaginary, so that the proper distance is real instead, the space-
time interval is called “space-like.” For vanishingly small differences in time
and location, all differences ∆ become differentials d.

A.4.5 Subluminal and superluminal effects


Suppose you stepped off the curb at the wrong moment and are now in the
hospital. The pain is agonizing, so you contact one of the tele-communications
microchips buzzing in the sky overhead. These chips are capable of sending
out a “superluminal” beam; a beam that propagates with a speed greater than
the speed of light. The factor with which the speed of the beam exceeds the
speed of light is called the “warp factor” w. A beam with a high warp factor
is great for rapid communication with space ships at distant locations in the
solar system and beyond. A beam with a warp factor of 10 allows ten times
quicker communication than those old-fashioned radio waves that propagate at
the speed of light. And these chips have other very helpful uses, like for your
predicament.
You select a microchip that is moving at high speed away from the location
where the accident occurred. The microchip sends out its superluminal beam.
In its coordinate system, the beam reaches the location of the accident at a
time tm , at which time the beam has traveled a distance xm equal to wctm .
According to the Lorentz transformation (A.10), in the coordinate system fixed
to the earth, the beam reaches the location of the accident at a position and
time equal to
1 − (wV /c) wc − V
t= q tm x= q tm
1 − (V /c)2 1 − (V /c)2

Because of the high speed V of the microchip and the additional warp factor,
the time that the beam reaches the location of the accident is negative; the
beam has entered into the past. Not far enough in the past however, so another

556
microchip picks up the message and beams it back, achieving another reduction
in time. After a few more bounces, the message is beamed to your cell phone. It
reaches you just when you are about to step off the curb. The message will warn
you of the approaching car, but it is not really needed. The mere distraction
of your buzzing cell phone causes you to pause for just a second, and the car
rushes past safely. So the accident never happens; you are no longer in agony in
the hospital, but on your Bermuda vacation as planned. And these microchips
are great for investing in the stock market too.

Sounds good, does it not? Unfortunately, there is a hitch. Physicists refuse


to work on the underlying physics to enable this technology. They claim it will
not be workable, since it will force them to think up answers to tough questions
like: “if you did not end up in the hospital after all, then why did you still send
the message?” Until they change their mind, our reality will be that observable
matter or radiation cannot propagate faster than the speed of light.

Therefor, manipulating the past is not possible. An event can only affect
later events. Even more specifically, an event can only affect a later event if
the location of that later event is sufficiently close that it can be reached with
a speed of no more than the speed of light. A look at the definition of the
proper time interval then shows that this means that the proper time interval
between the events must be real, or “time-like.” And while different observers
may disagree about the location and time of the events, they all agree about
the proper time interval. So all observers, regardless of their velocity, agree on
whether an event can affect another event. And they also all agree on which
event is the earlier one, because before the time interval ∆t could change sign
for some observer speeds, it would have to pass through zero, and at that stage
the time interval would have to be imaginary instead of real. It cannot, because
it must be the same for all observers. Relativity maintains a single reality, even
though observers may disagree about precise times and locations.

A more visual interpretation of those concepts can also be given. Imagine a


hypothetical spherical wave front spreading out from the earlier event with the
speed of light. Then a later event can be affected by the earlier event only if that
later event is within or on that spherical wave front. If you restrict attention to
events in the x, y plane, you can use the z coordinate to plot the values of time.
In such a plot, the expanding circular wave front becomes a cone, called the
“light-cone.” Only events within this light cone can be affected. Similarly in
three dimensions and time, an event can only be affected if it is within the light
cone in four-dimensional space-time. But of course, a cone in four dimensions
is hard to visualize.

557
A.4.6 Four-vectors
The Lorentz transformation mixes up the space and time coordinates badly. In
relativity, it is therefor best to think of the spatial coordinates and time to be
coordinates in a four-dimensional “space-time.”
Since you would surely like all components in a vector to have the same
units, you probably want to multiply time by the speed of light, because ct
has units of length. So the four-dimensional “position vector” can logically be
defined to be (ct, x, y, z); ct is the “zeroth” component of the vector where x,
y, and z are components number 1, 2, and 3 as usual. This four-dimensional
position vector will be indicated by
   
ct r0
֒→
 x   r1 
r≡ 

 
≡

 (A.14)
 y   r2 
z r3
The hook on the arrow indicates that time has been hooked into it.
How about the important dot product between vectors? In three dimensional
space this produces such important quantities as the length of vectors and the
angle between vectors. Moreover, the dot product between two vectors is the
same regardless of the orientation of the coordinate system in which it is viewed.
It turns out that the proper way to define the dot product for four-vectors
reverses the sign of the contribution of the time components:
֒→ ֒→
r1 · r 2 ≡ −c2 t1 t2 + x1 x2 + y1 y2 + z1 z2 (A.15)

It can be checked by simple substitution that the Lorentz transformation (A.10)


preserves this dot product. In more expensive words, this “inner product” is
“invariant under the Lorentz transformation.” Different observers may disagree
about the individual components of four-vectors, but not about their dot prod-
ucts.
The difference between the four-vector positions of two events has a “proper
length” equal to the proper distance between the events
q
֒→ ֒→
∆s = (∆ r ) · (∆ r ) (A.16)

So, the fact that all observers agree about proper distance can be seen as a
consequence of the fact that they all agree about dot products.
It should be pointed out that many physicist reverse the sign of the spa-
tial components instead of the time in their inner product. Obviously, this is
completely inconsistent with the nonrelativistic convention, which is still the
limit case for velocities small compared to the speed of light. And this incon-
sistent sign convention seems to be becoming the dominant one too. Count on

558
physicists to argue for more than a century about a sign convention and end up
getting it all wrong in the end.
Some physicists also like to point out that if time is replaced by it, then the
above dot product becomes the normal one. The Lorentz transformation can
then be considered as a mere rotation of the coordinate system in this four-
dimensional space-time. Gee, thanks physicists! This will be very helpful when
examining what happens in universes in which time is imaginary, unlike our
own universe, in which it is real.
Returning to our own universe, the proper length of a four vector can be
imaginary, and a zero proper length does not imply that the four vector is zero as
it does in normal three-dimensional space. In fact, a zero proper length merely
indicates that it requires motion at the speed of light to go from the start point
of the four vector to its end point.

A.4.7 Index notation


The notations used in the previous subsection are non standard. In literature,
you will almost invariably find the four vectors and the Lorentz transform writ-
ten out in index notation. Fortunately, it does not require courses in linear
algebra and tensor algebra to make some basic sense out of it.
First of all, physicists like to indicate the components of four vectors by
x0 , x1 , x2 , x3 because it is inconsistent with the non relativistic convention. Also,
since the letter x is already greatly over-used as it is, it promotes confusion,
something that is always hilarious. A generic component may be denoted as
xµ , and an entire four vector can then be indicated by {xµ } where the brackets
indicate the set of all four components. Needless to say, some physicists forget
about the brackets, because using a component where a vector is required can
have hilarious consequences.
Physicists also like to put the coefficients of the Lorentz transformation
(A.10) into a table called a “matrix” or “second-order tensor,” as follows

    1
Λ11 Λ12 Λ13 Λ14 γ −βγ 0 0 γ≡q
    1 − (V /c)2
 Λ21 Λ22 Λ23 Λ24   −βγ γ 0 0 
Λ≡ ≡  V
 Λ31 Λ32 Λ33 Λ34   0 0 1 0  β≡
Λ41 Λ42 Λ43 Λ44 0 0 0 1 c
γ2 − β 2γ2 = 1
(A.17)
The Lorentz matrix as shown assumes that the axis systems are aligned with
the direction of relative motion of the observers. Otherwise, it becomes a lot
more messy.
In terms of those notations, the Lorentz transformation (A.10) can be written

559
as
3
X
xµ,B = Λµν xν,A for all values µ = 0, 1, 2, 3
ν=0

The “Einstein summation convention” is now to leave out the sum. Summation
is understood to be required whenever the same index appears twice in an
expression. So, you will likely find the Lorentz transformation written more
concisely as
xµ,B = Λµν xν,A
where some of these subscripts may actually appear as superscripts. The basic
reason for raising indices is that a quantity like a position differential transforms
differently than a quantity like a gradient,

∂xµ,B ∂f ∂f ∂xν,A
dxµ,B = dxν,A =
∂xν,A ∂xµ,B ∂xν,A ∂xµ,B

and raising or lowering indices is a means of keeping track of that.


It should be noted that mathematicians call the matrix Λ the transformation
matrix from B to A, even though it produces the coordinates of B from those of
A. However, after you have read some more in this book, insane notation will
no longer surprise you. Just that in this case it comes from mathematicians.

A.4.8 Group property


The derivation of the Lorentz transform as given earlier examined two observers
A and B. But now assume that a third observer C is in motion compared to
observer B. The coordinates of an event as perceived by observer C may then
be computed from those of B using the corresponding Lorentz transformation,
and the coordinates of B may in turn be computed from those of A using that
Lorentz transformation. Schematically,
֒→ ֒→ ֒→
r C= ΛC←B r B = ΛC←B ΛB←A r A

But if everything is OK, that means that the Lorentz transformations from A to
B followed by the Lorentz transformation from B to C must be the same as the
Lorentz transformation from A directly to C. In other words, the combination
of two Lorentz transformations must be another Lorentz transformation.
Mathematicians say that Lorentz transformations must form a “group.” It
is much like rotations of a coordinate system in three spatial dimensions: a
rotation followed by another one is equivalent to a single rotation over some
combined angle. In fact, such spatial rotations are Lorentz transformations;
just between coordinate systems that do not move compared to each other.

560
Derivation
This subsubsection verifies the group property of the Lorentz transformation.
It is not recommended unless you have had a solid course in linear algebra.
The group property is easy to verify if the observers B and C are going
in the same direction compared to A. Just multiply two matrices of the form
(A.17) together and apply the condition that γ 2 − β 2 γ 2 = 1 for each.
It gets much messier if the observers move in different directions. In that
case the only immediate simplification that can be made is to align the coor-
dinate systems so that both relative velocities are in the x, y planes. Then the
transformations only involve z in a trivial way and the combined transformation
takes the generic form
 
Λ11 Λ12 Λ13 0
 Λ21 Λ22 Λ23 0 
ΛC←A = 



 Λ31 Λ32 Λ33 0 
0 0 0 1
It needs to be shown that this is a Lorentz transformation from A directly to
C.
Now the spatial, x, y, coordinate system of observer C can be rotated to
eliminate Λ31 and the spatial coordinate system of observer A can be rotated to
eliminate Λ13 . Next both Lorentz transformations preserve the inner products.
Therefor the dot product between the four-vectors (1, 0, 0, 0) and (0, 0, 1, 0) in
the A system must be the same as the dot product between columns 1 and 3 in
the matrix above. And that means that Λ23 must be zero, because Λ21 will not
not zero except in the trivial case that systems A and C are at rest compared
to each other. Next since the proper length of the vector (0, 0, 1, 0) equals one
in the A system, it does so in the C system, so Λ33 must be one. (Or minus
one, but a 180◦ rotation of the spatial coordinate system around the z-axis can
take care of that.) Next, since the dot product of the vectors (0, 1, 0, 0) and
(0, 0, 1, 0) is zero, so is Λ32 .
That leaves the four values relating the time and x components. From the
fact that the dot product of the vectors (1, 0, 0, 0) and (0, 1, 0, 0) is zero,
Λ12 Λ21
−Λ11 Λ12 + Λ21 Λ22 = 0 =⇒ = ≡β
Λ22 Λ11
where β is some constant. Also, since the proper lengths of these vectors are
minus one, respectively one,
−Λ211 + Λ221 = −1 − Λ212 + Λ22 = 1
or substituting in for Λ12 and Λ21 from the above
−Λ211 + β 2 Λ211 = −1 − β 2 Λ222 + Λ22 = 1

561
It follows that Λ11 and Λ22 must be equal, (or opposite, but since both Lorentz
transformations have unit determinant, so must their combination), so call them
γ. The transformation is then a Lorentz transformation of the usual form (A.17).
(Since the spatial coordinate system cannot just flip over from left handed to
right handed at some point, γ will have to be positive.) Examining the transfor-
mation of the origin xA = yA = zA = 0 identifies β as V /c, with V the relative
velocity of system A compared to B, and then the above two equations identify
γ as the Lorentz factor.
Obviously, if any two Lorentz transformations are equivalent to a single one,
then by repeated application any arbitrary number of them are equivalent to a
single one.

A.4.9 Intro to relativistic mechanics


Nonrelativistic mechanics is often based on the use of a potential energy to
describe the forces. For example, in a typical molecular dynamics computation,
the forces between the molecules are derived from a potential that depends
on the differences in position between the atoms. Unfortunately, this sort of
description fails badly in the truly relativistic case.
The basic problem is not difficult to understand. If a potential depends
only on the spatial configuration of the atoms involved, then the motion of
an atom instantaneously affects all the other ones. Relativity simply cannot
handle instantaneous effects; they must be limited by the speed of light or
major problems appear.
In the quantum solution of the hydrogen atom, an electrostatic potential is
fine when the heavy proton can be assumed to be at rest. However, an observer
compared to who the entire atom is in high speed motion does not see the
electron as moving in an electrostatic field. As far as that observer is concerned,
the moving proton also creates a magnetic field. Indeed, for moving charges,
electrostatics turns into electromagnetics, and the combined electromagnetic
field does obey the limitation of the speed of light, as Maxwell’s equations show.
But in Maxwell’s equations, the speed of light limitation still seems almost
accidental; it comes out of the theory, it is not build into it. It is therefor maybe
not a coincidence that in the more complicated theory of quantum electrody-
namics, electromagnetic interactions end up being described as mediated by a
particle, the photon. Collisions between particles inherently avoid erroneous
action at a distance, especially if those particles have some uncertainty in posi-
tion and time. Taking a hint from that, this introduction will restrict itself to
collisions between particles, [8]. It allows simple dynamics to be done without
the use of a potential between particles that is relativistically suspect.
As a first example, consider two particles of equal mass and opposite speeds
that collide as shown in the center of figure A.2. You might think of the particles

562
u u u
Seen by observer A: Seen by observer C: Seen by observer B:
6 e y
X »»
e
y −vy,2 ?
6vy,2 YH
H
H ¼
©©© −v
XXX
, v X 9
»» »−v
- e e x y e x −vy
,
zu
:X
»
ju
vx , v»y »» e e e
u?
x XX vx , −vy ©*
© H v −v
»
e X X ©
e H H y,2 6 y,2

Figure A.2: Example elastic collision seen by different observers.

as two helium atoms. It will be assumed that while the speed of the atoms may
be quite high, the collision is at a shallow enough angle that it does not excite
the atoms. In other words, it is assumed that the collision is elastic.
As seen by observer C, the collision is perfectly symmetric. Regardless of
the mechanics of the actual collision, observer C sees nothing wrong with it.
The energy of the helium atoms is the same after the collision as before. Also,
the net linear momentum was zero before the collision and still zero afterwards.
And whatever little angular momentum there is, it too is still the same after
the collision.
But now consider an observer A that moves horizontally along with the top
helium atom. For this observer, the top helium atom comes down vertically and
bounces back vertically. Observer B moves along with the bottom helium atom
in the horizontal direction and sees that atom moving vertically. Now consider
the Lorentz transformation (A.11) of the vertical velocity vy,2 of the top atom as
seen by observer A into the vertical velocity vy of that atom as seen by observer
B: q
vy = 1 − (vx /c)2 vy,2
They are different! In particular, vy is smaller than vy,2 . Therefor, if the masses
of the helium atoms that the observers perceive would be their rest mass, linear
momentum would not be preserved. For example, observer A would perceive a
net downwards linear momentum before the collision and a net upwards linear
momentum after it.
Clearly, linear momentum conservation is too fundamental a concept to be
summarily thrown out. Instead, observer A perceives the mass of the rapidly
moving lower atom to be the moving mass mv , which is larger than the rest
mass m by the Lorentz factor:
m
mv = q
1 − (v/c)2

and that exactly compensates for the lower vertical velocity in the expression
for the momentum. (Remember that it was assumed that the collision is under
a shallow angle, so the vertical velocity components are too small to have an
effect on the masses.)

563
It is not difficult to understand why things are like this. The nonrelativistic
definition of momentum allows two plausible generalizations to the relativistic
case:  ֒→



֒→ d r
d~r  p= m
 ?
dt
p~ = m =⇒  ֒→
dt 
 ֒→ d r

 p= m ?
dt0
Indeed, nonrelativistically, all observers agree about time intervals. However,
relativistically the question arises whether the right time differential in momen-
tum is dt as perceived by the observer, or the proper time difference dt0 as
perceived by a hypothetical second observer moving along with the particle.
A little thought shows that the right time differential has to be dt0 . For,
after collisions the sum of the momenta should be the same as before them.
However, the Lorentz velocity transformation (A.11) shows that perceived ve-
locities transform nonlinearly from one observer to the next. For a nonlinear
transformation, there is no reason to assume that if the momenta after a collision
are the same as before for one observer, they are also so for another observer.
On the other hand, since all observers agree about the proper time intervals,
֒→
momentum based on the proper time interval dt0 transforms like d r , like posi-
tion, and that is linear. A linear transformation does assure that if an observer
A perceives that the sum of the momenta of a collection of particles j = 1, 2, . . .
is the same before and after,
X ֒→ X ֒→
p jA,after = p jA,before
j j

then so does any other observer B:


X ֒→ X ֒→ X ֒→ X ֒→
ΛB←A p jA,after = ΛB←A p jA,before ⇒ p jB,after = p jB,before
j j j j

Using the chain rule of differentiation, the components of the momentum


֒→
four-vector p can be written out as
dt dt dx dt dy dt dz
p0 = mc p1 = m p2 = m p3 = m (A.18)
dt0 dt0 dt dt0 dt dt0 dt
The components p1 , p2 , p3 can be written in the same form as in the nonrela-
tivistic case by defining a moving mass
dt m
mv = m =q (A.19)
dt0 1 − (v/c)2

564
Seen by observer A: Seen by observer B:
Before: mv t - ¾ tmv t
mv X XX X
tm
»»» v
M0 u M0 u
z
X 9»
»
After:
?

Figure A.3: A completely inelastic collision.

How about the zeroth component? Since it too is part of the conservation
law, reasonably speaking it can only be the relativistic equivalent of the nonrel-
ativistic kinetic energy. Indeed, it equals mv c2 except for a trivial scaling factor
1/c to give it units of momentum.
Note that so far, this only indicates that differences between mv c2 and mc2
give the kinetic energy. It does not imply that mc2 by itself also corresponds to
a meaningful energy. However, there is a beautifully simple argument to show
that indeed kinetic energy can be converted into rest mass, [8]. Consider two
identical rest masses m0 that are accelerated to high speed and then made to
crash into each other head-on, as in the left part of figure A.3. In this case, think
of the masses as macroscopic objects, so that thermal energy is a meaningful
concept for them. Assume that the collision has so much energy that the masses
melt and merge without any rebound. By symmetry, the combined mass M0
has zero velocity. Momentum is conserved: the net momentum was zero before
the collision because the masses had opposite velocity, and it is still zero after
the collision. All very straightforward.
But now consider the same collision from the point of view of a second
observer who is moving upwards slowly compared to the first observer with
a small speed vB . No relativity involved here at all; going up so slowly, the
second observer sees almost the same thing as the first one, with one difference.
According to the second observer, the entire collision process seems to have a
small downward velocity vB . The two masses have a slight downward velocity
vB before the collision and so has the mass M0 after the collision. But then
vertical momentum conservation inevitably implies
2mv vB = M0 vB
So M0 must be twice the moving mass mv . The combined rest mass M0 is not
the sum of the rest masses m0 , but of the moving masses mv . All the kinetic
energy given to the two masses has ended up as additional rest mass in M0 .

A.4.10 Lagrangian mechanics


Lagrangian mechanics can simplify many complicated dynamics problems. As
an example, in this section it is used to derive the relativistic motion of a particle

565
in an electromagnetic field.
Consider first the nonrelativistic motion of a particle in an electrostatic field.
That is an important case for this book, because it is a good approximation for
the electron in the hydrogen atom. To describe such purely nonrelativistic
motion, physicists like to define a Lagrangian as

L = 12 m|~v |2 − qϕ (A.20)

where m is the mass of the particle, ~v its velocity, and q its charge, while
qϕ is the potential energy due to the electrostatic field, which depends on the
position of the particle. (It is important to remember that the Lagrangian
should mathematically be treated as a function of velocity and position of the
particle. While for a given motion, the position and velocity are in turn a
function of time, time derivatives must be implemented through the chain rule,
i.e. by means of total derivatives of the Lagrangian.)
Physicists next define canonical, or generalized, momentum as the partial
derivative of the Lagrangian with respect to velocity. An arbitrary component
pci of the canonical momentum is found as
∂L
pci = (A.21)
∂vi
This works out to be simply component pi = mvi of the normal momentum.
The equations of motion are taken to be
dpci ∂L
= (A.22)
dt ∂ri
which is found to be
dpi ∂ϕ
= −q
dt ∂ri
That is simply Newton’s second law; the left hand side is just mass times accel-
eration while in the right hand side minus the spatial derivative of the potential
energy gives the force. It can also be seen that the sum of kinetic and potential
energy of the particle remains constant, by multiplying Newton’s equation by
vi and summing over i.
Since the Lagrangian is a just a scalar, it is relatively simple to guess its
form in the relativistic case. To get the momentum right, simply replace the
kinetic energy by an inverse Lorentz factor,
q
−mc2 1 − (|~v |/c)2

For velocities small compared to the speed of light, a two term Taylor series
shows this is equivalent to mc2 plus the kinetic energy. The constant mc2 is

566
of no importance since only derivatives of the Lagrangian are used. For any
velocity, big or small, the canonical momentum as defined above produces the
relativistic momentum based on the moving mass as it should.
The potential energy part of the Lagrangian is a bit trickier. The previous
section showed that momentum is a four-vector including energy. Therefor, go-
ing from one observer to another mixes up energy and momentum nontrivially,
just like it mixes up space and time. That has consequences for energy conser-
vation. In the classical solution, kinetic energy of the particle can temporarily
be stored away as electrostatic potential energy and recovered later intact. But
relativistically, the kinetic energy seen by one observer becomes momentum
seen by another one. If that momentum is to be recovered intact later, there
should be something like potential momentum. Since momentum is a vector,
obviously so should potential momentum be: there must be something like a
vector potential.
Based on those arguments, you might guess that the Lagrangian should be
something like

q ֒→
֒→ d r ֒→ ³1 ´
2
L = −mc 1− (|~v |/c)2 + qΦ · Φ= ϕ, Ax , Ay , Az (A.23)
dt c
And that is in fact right. Component zero of the potential four-vector is the
classical electrostatic potential. The spatial vector (Ax , Ay , Az ) is called the
“magnetic vector potential.”
The canonical momentum is now
∂L
pci = = mv vi + qAi (A.24)
∂vi

and that is no longer just the normal momentum, pi = mv vi , but includes the
magnetic vector potential. The Lagrangian equations of motion become, after
clean up and in vector notation,

d~p ~ + q~v × B
~
= qE (A.25)
dt
where
~
~ = −∇φ − ∂ A
E B~ =∇×A ~
∂t
are called the electric and magnetic fields, respectively. The right hand side in
the equation of motion is called the Lorentz force.
R
The so-called “action” Ldt is the same for different observers, provided that
֒→
Φ transforms according to the Lorentz transformation. That then implies that
different observers agree about the evolution, {A.3}. From the transformation

567
֒→
of Φ , that of the electric and magnetic fields may be found; that will not be a
Lorentz transformation.
It may be noted that the field strengths are unchanged in a “gauge trans-
formation” that modifies ϕ and A ~ as
∂Ω ~′ = A
~ + ∇Ω
ϕ′ = ϕ − A (A.26)
∂t
where Ω is any arbitrary real function of position and time. In quantum me-
chanics, this gauge transform adds a phase factor to the wave function, but still
leaves its magnitude unchanged, hence is usually inconsequential. Gauge trans-
forms do become very important in advanced quantum mechanics, but that is
beyond the scope of this book.
The energy can be found following note {A.3} as
E = ~v · p~ c − L = mv c2 + qϕ
The Hamiltonian is the energy expressed in terms of the canonical momentum
p~ c instead of ~v ; that works out to
v
u
u ~ 2
(~p c − q A)
2t
H = mc 1+ + qϕ
m2 c2
using the formula given in the overview subsection.

Derivation
To derive the given Lorentz force from the given Lagrangian, plug the canonical
momentum and the Lagrangian into the Lagrangian equation of motion. That
gives à !
dpi ∂Ai ∂Ai ∂ϕ ∂Aj
+q + vj = −q +q vj
dt ∂t ∂xj ∂xj ∂xi
using the Einstein convention of suppressing the summation symbols over j.
Reorder to get
à ! à !
dpi ∂ϕ ∂Ai ∂Aj ∂Ai
=q − − +q vj − vj
dt ∂xj ∂t ∂xi ∂xj
The first parenthetical expression is the electric field as claimed. The quantity
in the second parenthetical expression may be rewritten by expanding out the
sums over j to give
∂Ai ∂Ai ∂Aı ∂Ai ∂Aı ∂Ai
vi − vi + vı − vı + vı − v
∂xi ∂xi ∂xi ∂xı ∂xi ∂xı ı
where ı follows i in the cyclic sequence . . . , 1, 2, 3, 1, 2, 3, . . . and ı precedes it.
This can be recognized as component number i of ~v × (∇ × A). ~ Defining B ~ as
~ the Lorentz force law results.
∇ × A,

568
A.5 Completeness of Fourier modes
The purpose of this note is to show completeness of the “Fourier modes”

e−3ix e−2ix e−ix 1 eix e2ix e3ix


..., √ , √ , √ , √ , √ , √ , √ ,...
2π 2π 2π 2π 2π 2π 2π
for describing functions that are periodic of period 2π. It is to be shown that
“all” these functions can be written as combinations of the Fourier modes above.
Assume that f (x) is any reasonable smooth function that repeats itself after a
distance 2π, so that f (x + 2π) = f (x). Then you can always write it in the form

e−2ix e−ix 1 eix e2ix e3ix


f (x) = . . . + c−2 √ + c−1 √ + c0 √ + c1 √ + c2 √ + c3 √ + . . .
2π 2π 2π 2π 2π 2π
or

X ekix
f (x) = ck √
k=−∞ 2π
for short. Such a representation of a periodic function is called a “Fourier
√ series.”
The coefficients ck are called “Fourier coefficients.” The factors 1/ 2π can be
absorbed in the definition of the Fourier coefficients, if you want.
Because of the Euler formula, the set of exponential Fourier modes above is
completely equivalent to the set of real Fourier modes

1 cos(x) sin(x) cos(2x) sin(2x) cos(3x) sin(3x)


√ , √ , √ , √ , √ , √ , √ ,...
2π π π π π π π

so that 2π-periodic functions may just as well be written as


X∞ ∞
1 cos(kx) X sin(kx)
f (x) = a0 √ + ak √ + bk √ .
2π k=1 π k=1 π

The extension to functions that are periodic of some other period than 2π
is a trivial matter of rescaling x. For a period 2ℓ, with ℓ any half period, the
exponential Fourier modes take the more general form

e−k2 ix e−k1 ix 1 ek1 ix ek2 ix 1π 2π 3π


..., √ , √ , √ , √ , √ ,... k1 = , k2 = , k3 = ,...
2ℓ 2ℓ 2ℓ 2ℓ 2ℓ ℓ ℓ ℓ
and similarly the real version of them becomes

1 cos(k x) sin(k x) cos(k x) sin(k x) cos(k x) sin(k x)


√ , √ 1 , √1 , √ 2 , √2 , √ 3 , √3 ,...
2ℓ ℓ ℓ ℓ ℓ ℓ ℓ
See [15, p. 141] for detailed formulae.

569
Often, the functions of interest are not periodic, but are required to be zero
at the ends of the interval on which they are defined. Those functions can be
handled too, by extending them to a periodic function. For example, if the
functions f (x) relevant to a problem are defined only for 0 ≤ x ≤ ℓ and must
satisfy f (0) = f (ℓ) = 0, then extend them to the range −ℓ ≤ x ≤ 0 by setting
f (x) = −f (−x) and take the range −ℓ ≤ x ≤ ℓ to be the period of a 2ℓ-periodic
function. It may be noted that for such a function, the cosines disappear in
the real Fourier series representation, leaving only the sines. Similar extensions
can be used for functions that satisfy symmetry or zero-derivative boundary
conditions at the ends of the interval on which they are defined. See again [15,
p. 141] for more detailed formulae.
If the half period ℓ becomes infinite, the spacing between the discrete k values
becomes zero and the sum over discrete k-values turns into an integral over
continuous k values. This is exactly what happens in quantum mechanics for the
eigenfunctions of linear momentum. The representation is now no longer called
a Fourier series, but a “Fourier integral.” And the Fourier coefficients ck are now
called the “Fourier transform” F (k). The completeness of the eigenfunctions is
now called Fourier’s integral theorem or inversion theorem. See [15, pp. 190-191]
for more.
The basic completeness proof is a rather messy mathematical derivation, so
read the rest of this note at your own risk. The fact that the Fourier modes
are orthogonal and normalized was the subject of various exercises in section
1.6 and will be taken for granted here. See the solution manual for the details.
What this note wants to show is that any arbitrary periodic function f of period
2π that has continuous first and second order derivatives can be written as
k=∞
X ekix
f (x) = ck √ ,
k=−∞ 2π

in other words, as a combination of the set of Fourier modes.


First an expression for the values of the Fourier coefficients
√ ck is needed.
lix
It can be obtained from taking
√ the inner product he / 2π|f (x)i between a
lix
generic eigenfunction e / 2π and the representation for function f (x) above.
Noting that all the inner products with the exponentials representing f (x) will
be zero except the one for which k = l, if the Fourier representation is indeed
correct, the coefficients need to have the values
Z 2π e−lix
cl = √ f (x) dx,
x=0 2π

a requirement that was already noted by Fourier. Note that l and x are just
names for the eigenfunction number and the integration variable that you can

570
change at will. Therefor, to avoid name conflicts later, the expression will be
renotated as Z 2π −kix̄
e
ck = √ f (x̄) dx̄,
x̄=0 2π
Now the question is: suppose you compute the Fourier coefficients ck from
this expression, and use them to sum many terms of the infinite sum for f (x),
say from some very large negative value −K for k to the corresponding large
positive value K; in that case, is the result you get, call it fK (x),
k=K
X ekix
fK (x) ≡ ck √ ,
k=−K 2π

a valid approximation to the true function f (x)? More specifically, if you sum
more and more terms (make K bigger and bigger), does fK (x) reproduce the
true value of f (x) to any arbitrary accuracy that you may want? If it does, then
the eigenfunctions are capable of reproducing f (x). If the eigenfunctions are not
complete, a definite difference between fK (x) and f (x) will persist however large
you make K. In mathematical terms, the question is whether limK→∞ fK (x) =
f (x).
To find out, the trick is to substitute the integral for the coefficients ck into
the sum and then reverse the order of integration and summation to get:
 
1 Z 2π k=K
X
fK (x) = f (x̄)  eki(x−x̄)  dx̄.
2π x̄=0 k=−K

The sum in the square brackets can be evaluated, because it is a geometric


series with starting value e−Ki(x−x̄) and ratio of terms ei(x−x̄) . Using the formula
from [15, item 21.4], multiplying top and bottom with e−i(x−x̄)/2 , and cleaning
up with, what else, the Euler formula, the sum is found to equal
³ ´
sin (K + 21 )(x − x̄)
³
1
´ .
sin 2
(x − x̄)

This expression is called the “Dirichlet kernel”. You now have


³ ´
Z 2π sin (K + 12 )(x − x̄)
fK (x) = f (x̄) ³
1
´ dx̄.
x̄=0 2π sin 2
(x − x̄)

The second trick is to split the function f (x̄) being integrated into the two
parts f (x) and f (x̄) − f (x). The sum of the parts is obviously still f (x̄), but
the first part has the advantage that it is constant during the integration over

571
x̄ and can be taken out, and the second part has the advantage that it becomes
zero at x̄ = x. You get
³ ´
Z 2π sin (K + 12 )(x − x̄)
fK (x) = f (x) ³
1
´ dx̄
x̄=0 2π sin 2
(x − x̄)
³ ´
Z 2π ³ ´ sin (K + 1 )(x − x̄)
2
+ f (x̄) − f (x) ³ ´ dx̄.
1
x̄=0 2π sin 2
(x − x̄)

Now if you backtrack what happens in the trivial case that f (x) is just a
constant, you find that fK (x) is exactly equal to f (x) in that case, while the
second integral above is zero. That makes the first integral above equal to one.
Returning to the case of general f (x), since the first integral above is still one,
it makes the first term in the right hand side equal to the desired f (x), and the
second integral is then the error in fK (x).
To manipulate this error and show that it is indeed small for large K, it is
convenient to rename the K-independent part of the integrand to

f (x̄) − f (x)
g(x̄) = ³
1
´
2π sin 2
(x − x̄)

Using l’Hôpital’s rule twice, it is seen that since by assumption f has a contin-
uous second derivative, g has a continuous first derivative. So you can use one
integration by parts to get
Z ³ ´
1 2π
fK (x) = f (x) + 1 g ′ (x̄) cos (K + 21 )(x − x̄) dx̄.
K+ 2 x̄=0

And since the integrand of the final integral is continuous, it is bounded. That
makes the error inversely proportional to K + 12 , implying that it does indeed
become arbitrarily small for large K. Completeness has been proved.
It may be noted that under the stated conditions, the convergence is uniform;
there is a guaranteed minimum rate of convergence regardless of the value of
x. This can be verified from Taylor series with remainder. Also, the more
continuous derivatives the 2π-periodic function f (x) has, the faster the rate
of convergence, and the smaller the number 2K + 1 of terms that you need
to sum to get good accuracy is likely to be. For example, if f (x) has three
continuous derivatives, you can do another integration by parts to show that
the convergence is proportional to 1/(K + 21 )2 rather than just 1/(K + 21 ). But
watch the end points: if a derivative has different values at the start and end
of the period, then that derivative is not continuous, it has a jump at the ends.
(Such jumps can be incorporated in the analysis, however, and have less effect

572
than it may seem. You get a better practical estimate of the convergence rate
by directly looking at the integral for the Fourier coefficients.)
The condition for f (x) to have a continuous second derivative can be relaxed
with more work. If you are familiar with the Lebesgue form of integration,
it is fairly easy to extend the result above to show that it suffices that the
absolute integral of f 2 exists, something that will be true in quantum mechanics
applications.

A.6 Derivation of the Euler formula


To verify the Euler formula, write all three functions involved in terms of their
Taylor series, [15, p. 136]

A.7 Nature and real eigenvalues


The major difference between real and complex numbers is that real numbers
can be ordered from smaller to larger. So you might speculate that the fact
that the numbers of our world are real may favor a human tendency towards
simplistic rankings where one item is “worse” or “better” than the other. What
if your grade for a quantum mechanics test was 55 + 90i and someone else had a
70 + 65i? It would be logical in a world in which the important operators would
not be Hermitian.

A.8 Are Hermitian operators really like that?


A mathematician might choose to phrase the problem of Hermitian operators
having or not having eigenvalues and eigenfunctions in a suitable space of per-
missible functions and then find, with some justification, that some operators
in quantum mechanics, like the position or momentum operators do not have
any permissible eigenfunctions. Let alone a complete set. The approach of this
text is to simply follow the formalism anyway, and then fix the problems that
arise as they arise.
More generally, what this book tells you about operators is absolutely true
for systems with a finite number of variables, but gets mathematically suspect
for infinite systems. The functional analysis required to do better is well beyond
the scope of this book and the abstract mathematics a typical engineer would
ever want to have a look at.
In any case, when problems are discretized to a finite one for numerical
solution, the problem no longer exists. Or rather, it has been reduced to figuring

573
out how the numerical solution approaches the exact solution in the limit that
the problem size becomes infinite.

A.9 Are linear momentum operators Hermi-


tian?
To check that the linear momentum operators are Hermitian, assume that Ψ1
and Ψ2 are any two proper, reasonably behaved, wave functions. By definition:
Z Z Z
∞ ∞ ∞ h̄ ∂Ψ2
hΨ1 |pbx Ψ2 i = Ψ∗1 dxdydz
x=−∞ y=−∞ z=−∞ i ∂x
Z Z Z Ã !∗
∞ ∞ ∞ h̄ ∂Ψ1
hpbx Ψ1 |Ψ2 i = Ψ2 dxdydz
x=−∞ y=−∞ z=−∞ i ∂x
The two must be equal for pbx to be an Hermitian operator. That they are indeed
equal may be seen from integration by parts in the x-direction, noting that by
definition i∗ = −i and that Ψ1 and Ψ2 must be zero at infinite x: if they were
not, their integral would be infinite, so that they could not be normalized.

A.10 Why boundary conditions are tricky


You might well ask why you cannot have a wave function that has a change in
wave function value at the ends of the pipe. In particular, you might ask what is
wrong with a wave function that is a nonzero constant inside the pipe and zero
outside it. Since the second derivative of a constant is zero, this (incorrectly)
appears to satisfy the Hamiltonian eigenvalue problem with an energy eigenvalue
equal to zero.
The problem is that this wave function has “jump discontinuities” at the
ends of the pipe where the wave function jumps from the constant value to
zero. (Graphically, the function is “broken” into separate pieces at the ends.)
Suppose you approximate such a wave function with a smooth one whose value
merely drops down steeply rather than jumps down to zero. The steep fall-off
produces a first order derivative that is very large in the fall-off regions, and
a second derivative that is much larger still. Therefor, including the fall-off
regions, the average kinetic energy is not close to zero, as the constant part
alone would suggest, but actually almost infinitely large. And in the limit of a
real jump, such eigenfunctions produce infinite energy, so they are not physically
acceptable.
The bottom line is that jump discontinuities in the wave function are not
acceptable. However, the correct solutions will have jump discontinuities in the

574
derivative of the wave function, where it jumps from a nonzero value to zero at
the pipe walls. Such discontinuities in the derivative correspond to “kinks” in
the wave function. These kinks are acceptable; they naturally form when the
walls are made more and more impenetrable. Jumps are wrong, but kinks are
fine. (Don’t break the wave function, but crease it all you like.)
For more complicated cases, it may be less trivial to figure out what singu-
larities are acceptable or not. In general, you want to check the “expectation
value,” as defined later, of the energy of the almost singular case, using integra-
tion by parts to remove difficult-to-estimate higher derivatives, and then check
that this energy remains bounded in the limit to the fully singular case. That
is mathematics far beyond what this book wants to cover, but in general you
want to make singularities as minor as possible.

A.11 Extension to three-dimensional solutions


Maybe you have some doubt whether you really can just multiply one-dimen-
sional eigenfunctions together, and add one-dimensional energy values to get
the three-dimensional ones. Would a book that you find for free on the Internet
lie? OK, let’s look at the details then. First, the three-dimensional Hamiltonian,
(really just the kinetic energy operator), is the sum of the one-dimensional ones:

H = Hx + Hy + Hz

where the one-dimensional Hamiltonians are:


h̄2 ∂ 2 h̄2 ∂ 2 h̄2 ∂ 2
Hx = − Hy = − Hz = −
2m ∂x2 2m ∂y 2 2m ∂z 2
To check that any product ψnx (x)ψny (y)ψnz (z) of one-dimensional eigen-
functions is an eigenfunction of the combined Hamiltonian H, note that the
partial Hamiltonians only act on their own eigenfunction, multiplying it by the
corresponding eigenvalue:

(Hx + Hy + Hz )ψnx (x)ψny (y)ψnz (z)


= Ex ψnx (x)ψny (y)ψnz (z) + Ey ψnx (x)ψny (y)ψnz (z) + Ez ψnx (x)ψny (y)ψnz (z)
or
Hψnx (x)ψny (y)ψnz (z) = (Ex + Ey + Ez )ψnx (x)ψny (y)ψnz (z).
Therefor, by definition ψnx (x)ψny (y)ψnz (z) is an eigenfunction of the three-
dimensional Hamiltonian, with an eigenvalue that is the sum of the three one-
dimensional ones. But there is still the question of completeness. Maybe the
above eigenfunctions are not complete, which would mean a need for additional
eigenfunctions that are not products of one-dimensional ones.

575
Well, the one-dimensional eigenfunctions ψnx (x) are complete, see [15, p. 141]
and earlier exercises in this book. So, you can write any wave function Ψ(x, y, z)
at given values of y and z as a combination of x-eigenfunctions:
X
Ψ(x, y, z) = cnx ψnx (x),
nx

but the coefficients cnx will be different for different values of y and z; in other
words they will be functions of y and z: cnx = cnx (y, z). So, more precisely, you
have X
Ψ(x, y, z) = cnx (y, z)ψnx (x),
nx

But since the y-eigenfunctions are also complete, at any given value of z,
you can write each cnx (y, z) as a sum of y-eigenfunctions:
 
X X
Ψ(x, y, z) =  cnx ny ψny (y) ψnx (x),
nx ny

where the coefficients cnx ny will be different for different values of z, cnx ny =
cnx ny (z). So, more precisely,
 
X X
Ψ(x, y, z) =  cnx ny (z)ψny (y) ψnx (x),
nx ny

But since the z-eigenfunctions are also complete, you can write cnx ny (z) as
a sum of z-eigenfunctions:
 Ã ! 
X X X
Ψ(x, y, z) =  cnx ny nz ψnz (z) ψny (y) ψnx (x).
nx ny nz

Since the order of doing the summation does not make a difference,
XXX
Ψ(x, y, z) = cnx ny nz ψnx (x)ψny (y)ψnz (z).
nx ny nz

So, any wave function Ψ(x, y, z) can be written as a sum of products of


one-dimensional eigenfunctions; these products are complete.

A.12 Derivation of the harmonic oscillator so-


lution
If you really want to know how the harmonic oscillator wave function can be
found, here it is. Read at your own risk.

576
The ODE (ordinary differential equation) to solve is

h̄2 ∂ 2 ψx 1
− 2
+ 2 mω 2 x2 ψx = Ex ψx
2m ∂x
where the spring constant c was rewritten as the equivalent expression mω 2 .
Now the first thing you always want to do with this sort of problems is to
simplify it as much as possible. In particular, get rid of as much dimensional
constants as you can by rescaling the variables: define a new scaled x-coordinate
ξ and a scaled energy ǫ by

x ≡ ℓξ Ex ≡ E0 ǫ.

If you make these replacements into the ODE above, you can make the q coef-
ficients of the two terms in the left hand side equal by choosing ℓ = h̄/mω.
In that case both terms will have the same net coefficient 21 h̄ω. Then if you
cleverly choose E0 = 21 h̄ω, the right hand side will have that coefficient too, and
you can divide it away and end up with no coefficients at all:
∂ 2 ψx
− + ξ 2 ψx = ǫψx
∂ξ 2
Looks a lot cleaner, not?
Now examine this equation for large values of ξ (i.e. large x). You get
approximately
∂ 2 ψx
≈ ξ 2 ψx + . . .
∂ξ 2
If you write the solution as an exponential, you can ballpark that it must take
the form
1 2
ψx = e± 2 ξ +...
where the dots indicate terms that are small compared to 12 ξ 2 for large ξ. The
1 2
form of the solution is important, since e+ 2 ξ becomes infinitely large at large
ξ. That is unacceptable: the probability of finding the particle cannot become
infinitely large at large x: the total probability of finding the particle must be
one, not infinite. The only solutions that are acceptable are those that behave
1 2
as e− 2 ξ +... for large ξ.
Now split off the leading exponential part by defining a new unknown h(ξ)
by
1 2
ψx ≡ e− 2 ξ h(ξ)
Substituting this in the ODE and dividing out the exponential, you get:
∂ 2h ∂h
− + 2ξ + h = ǫh
∂ξ 2 ∂ξ

577
Now try to solve this by writing h as a power series, (say, a Taylor series):
X
h= cp ξ p
p

where the values of p run over whatever the appropriate powers are and the cp
are constants. If you plug this into the ODE, you get
X X
p(p − 1)cp ξ p−2 = (2p + 1 − ǫ)cp ξ p
p p

For the two sides to be equal, they must have the same coefficient for every
power of ξ.
There must be a lowest value of p for which there is a nonzero coefficient cp ,
for if p took on arbitrarily large negative values, h would blow up strongly at
the origin, and the probability to find the particle near the origin would then be
infinite. Denote the lowest value of p by q. This lowest power produces a power
of ξ q−2 in the left hand side of the equation above, but there is no corresponding
power in the right hand side. So, the coefficient q(q − 1)cq of ξ q−2 will need to
be zero, and that means either q = 0 or q = 1. So the power series for h will
need to start as either c0 + . . . or c1 ξ + . . .. The constant c0 or c1 is allowed to
have any nonzero value.
But note that the cq ξ q term normally produces a term (2q + 1 − ǫ)cq ξ q in the
right hand side of the equation above. For the left hand side to have a matching
ξ q term, there will need to be a further cq+2 ξ q+2 term in the power series for h,
h = cq ξ q + cq+2 ξ q+2 + . . .
where (q+2)(q+1)cq+2 will need to equal (2q+1−ǫ)cq , so c³q+2 = (2q+1−ǫ)c
´ q /(q+
2)(q+1). This term in turn will normally produce a term 2(q+2)+1−ǫ cq+2 ξ q+2
in the right hand side which will have to be cancelled in the left hand side by a
cq+4 ξ q+4 term in the power series for h. And so on.
So, if the power series starts with q = 0, the solution will take the general
form
h = c0 + c2 ξ 2 + c4 ξ 4 + c6 ξ 6 + . . .
while if it starts with q = 1 you will get
h = c1 ξ + c3 ξ 3 + c5 ξ 5 + c7 ξ 7 + . . .
In the first case, you have a symmetric solution, one which remains the same
when you flip over the sign of ξ, and in the second case you have an antisym-
metric solution, one which changes sign when you flip over the sign of ξ.
You can find a general formula for the coefficients of the series by making
the change in notations p = 2 + p̄ in the left-hand-side sum:
X X
(p̄ + 2)(p̄ + 1)cp̄+2 ξ p̄ = (2p + 1 − ǫ)cp ξ p
p̄=q p=q

578
Note that you can start summing at p̄ = q rather than q − 2, since the first
term in the sum is zero anyway. Next note that you can again forget about the
difference between p̄ and p, because it is just a symbolic summation variable.
The symbolic sum writes out to the exact same actual sum whether you call the
symbolic summation variable p or p̄.
So for the powers in the two sides to be equal, you must have
2p + 1 − ǫ
cp+2 = cp
(p + 2)(p + 1)
In particular, for large p, by approximation
2
cp+2 ≈ cp
p
2
Now if you check out the Taylor series of eξ , (i.e. the Taylor series of ex with
x replaced by ξ 2 ,) you find it satisfies the exact same equation. So, normally
2 1 2
the solution h blows up something like eξ at large ξ. And since ψx was e− 2 ξ h,
1 2
normally ψx takes on the unacceptable form e+ 2 ξ +... . (If you must have rigor
2
here, estimate h in terms of Ceαξ where α is a number slightly less than one,
plus a polynomial. That is enough to show unacceptability of such solutions.)
What are the options for acceptable solutions? The only possibility is that
the power series terminates. There must be a highest power p, call it p = n,
whose term in the right hand side is zero
0 = (2n + 1 − ǫ)cn ξ n
In that case, there is no need for a further cn+2 ξ n+2 term, the power series will
remain a polynomial of degree n. But note that all this requires the scaled
energy ǫ to equal 2n + 1, and the actual energy Ex is therefor (2n + 1)h̄ω/2.
Different choices for the power at which the series terminates produce different
energies and corresponding eigenfunctions. But they are discrete, since n, as
any power p, must be a nonnegative integer.
With ǫ identified as 2n + 1, you can find the ODE for h listed in table
books, like [15, 29.1], under the name “Hermite’s differential equation.” They
then identify the polynomial solutions as the so-called “Hermite polynomials,”
except for a normalization factor. To find the normalization factor, i.e. c0 or
cR1 , demand that the total probability of finding the particle anywhere is one,
∞ 2
−∞ |ψx | dx = 1. You should be able to find the value for the appropriate
integral in your table book, like [15, 29.15].
Putting it all together, the generic expression for the eigenfunctions can be
found to be:
1 H (ξ) −ξ2 /2
hn = √n e n = 0, 1, 2, 3, 4, 5, . . . (A.27)
(πℓ2 )1/4 2n n!

579
where the details of the “Hermite polynomials” Hn can be found in table books
like [15, pp. 167-168]. They are readily evaluated on a computer using the
“recurrence relation” you can find there, for as far as computer round-off error
allows (up to n about 70.)
Quantum field theory allows a much neater way to find the eigenfunctions.
It is explained in chapter 10.2.2 or equivalently in {A.78}.

A.13 More on the harmonic oscillator and un-


certainty

The given qualitative explanation of the ground state of the harmonic oscillator
in terms of the uncertainty principle is questionable.
In particular, position, linear momentum, potential energy, and kinetic en-
ergy are not defined for the ground state. However, as explained more fully
in chapter 3.3, you can define the “expectation value” of kinetic energy to be
the average predicted result for kinetic energy measurements. Similarly, you
can define the expectation value of potential energy to be the average predicted
result for potential energy measurements. Quantum mechanics does require the
total energy of the ground state to be the sum of the kinetic and potential en-
ergy expectation values. Now if there would be an almost infinite uncertainty
in linear momentum, then typical measurements would find a large momentum,
hence a large kinetic energy. So the kinetic energy expectation value would then
be large; that would be nowhere close to any ground state. Similarly, if there
would be a large uncertainty in position, then typical measurements will find
the particle at large distance from the nominal position, hence at large potential
energy. Not good either.
It so happens that the ground state of the harmonic oscillator manages to
obtain the absolute minimum in combined position and momentum uncertainty
that the uncertainty relationship, given in chapter 3.4.3, allows. (This can be
verified using the fact that the two uncertainties, σx and σpx , as defined in
chapter 3.3, are directly related to the expectation values for potential energy,
respectively kinetic energy in the x-direction. It follows from the virial theorem
of chapter 5.1.4 that the expectation values of kinetic and potential energy for
the harmonic oscillator eigenstates are equal. So each must be 14 h̄ω since the
total energy is 32 h̄ω and each coordinate direction contributes an equal share to
the potential and kinetic energies.)

580
A.14 Derivation of a vector identity
The elementary equality required is not in [15] in any form. In the absence
of tensor algebra, it is best to just grind it out. Define f~ ≡ (~r × ∇)Ψ. Then
(~r × ∇) · f~ equals

∂fx ∂fx ∂fy ∂fy ∂fz ∂fz


y −z +z −x +x −y
∂z ∂y ∂x ∂z ∂y ∂x

On the other hand, ~r · (∇ × f~) is

∂fz ∂fy ∂fx ∂fz ∂fy ∂fx


x −x +y −y +z −z
∂y ∂z ∂z ∂x ∂x ∂y
which is the same.

A.15 Derivation of the spherical harmonics


This analysis will use similar techniques as for the harmonic oscillator solution,
{A.12}. The requirement that the spherical harmonics Ylm are eigenfunctions
of Lz means that they are of the form Θm l (θ)e
imφ
where function Θm l (θ) is still
to be determined. (There is also an arbitrary dependence on the radius r, but
it does not have anything to do with angular momentum, hence is ignored when
people define the spherical harmonics.) Substitution into L b 2 ψ = L2 ψ with L b2
m
as in (3.5) yields an ODE (ordinary differential equation) for Θl (θ):
à !
h̄2 ∂ ∂Θm h̄2 m2 m
− sin θ l + Θ = L2 Θm
sin θ ∂θ ∂θ sin2 θ l l

It is convenient define a scaled square angular momentum by L2 = h̄2 λ2 so that


you can divide away the h̄2 from the ODE.
More importantly, recognize that the solutions will likely be in terms of
cosines and sines of θ, because they should be periodic if θ changes by 2π. If
you want to use power-series solution procedures again, these transcendental
functions are bad news, so switch to a new variable x = cos θ. At the very least,
that will √
reduce things to algebraic functions, since sin θ is in terms of x = cos θ
equal to 1 − x2 . Converting the ODE to the new variable x, you get

d2 Θm
l dΘml m2
−(1 − x2 ) + 2x + Θm = λ2 Θm
dx2 dx 1 − x2 l l

As you may guess from looking at this ODE, the solutions Θm l are likely to
be problematic near x = ±1, (physically, near the z-axis where sin θ is zero.) If

581
you examine the solution near those points by defining a local coordinate ξ as
in x = ±(1 − ξ), and then deduce the leading term in the power series solutions
with respect to ξ, you find that it is either ξ m/2 or ξ −m/2 , (in the special case
that m = 0, that second solution turns out to be ln ξ.) Either way, the second
possibility is not acceptable, since it physically would have infinite derivatives
at the z-axis and a resulting expectation value of square momentum, as defined
in section 3.3.3, that is infinite. You need to have that Θm l behaves as ξ
m/2
at
m/2
each end, so in terms of x it must have a factor (1 − x) near x = 1 and
(1 + x)m/2 near x = −1. The two factors multiply to (1 − x2 )m/2 and so Θm l
can be written as (1 − x2 )m/2 flm where flm must have finite values at x = 1 and
x = −1.
If you substitute Θml = (1−x )
2 m/2 m
fl into the ODE for Θm l , you get an ODE
m
for fl :

d2 flm dflm
−(1 − x2 ) + 2(1 + m)x + (m2 + m)flm = λ2 flm
dx2 dx
P
Plug in a power series, flm = cp xp , to get, after clean up,
X Xh i
p(p − 1)cp xp−2 = (p + m)(p + m + 1) − λ2 cp xp

Using similar arguments as for the harmonic oscillator, you see that the starting
power will be zero or one, leading to basic solutions that are again odd or even.
And just like for the harmonic oscillator, you must again have that the power
series terminates; even in the least case that m = 0, the series for flm at |x| = 1
is like that of ln(1 − x2 ) and will not converge to the finite value stipulated.
(For rigor, use Gauss’s test.)
To get the series to terminate at some final power p = n, you must have
according to the above equation that λ2 = (n + m)(n + m + 1), and if you decide
to call n + m the azimuthal quantum number l, you have λ2 = l(l + 1) where
l ≥ m since l = n + m and n, like any power p, is greater or equal to zero.
The rest is just a matter of table books, because with λ2 = l(l + 1), the
ODE for flm is just the m-th derivative of the differential equation for the Ll
Legendre polynomial, [15, 28.1], so the flm must be just the m-th derivative
of those polynomials. In fact, you can now recognize that the ODE for the
Θml is just Legendre’s associated differential equation [15, 28.49], and that the
solutions that you need are the associated Legendre functions of the first kind
[15, 28.50].
To normalize the eigenfunctions on the surface area of the unit sphere, find
the corresponding integral in a table book, like [15, 28.63]. As mentioned at
the start of this long and still very condensed story, to include negative values
of m, just replace m by |m|. There is one additional issue, though, the sign
pattern. In order to simplify some more advanced analysis, physicists like the

582
sign pattern to vary with m according to the so-called “ladder operators.” That
requires, {A.78}, that starting from m = 0, the spherical harmonics for m > 0
have the alternating sign pattern of the “ladder-up operator,” and those for
m < 0 the unvarying sign of the “ladder-down operator.” Physicists will still
allow you to select your own sign for the m = 0 state, bless them.
The final solution is
v
u
u
m max(m,0) t 2l + 1 (l − |m|)! |m|
Yl (θ, φ) = (−1) Pl (cos θ)eimφ (A.28)
4π (l + |m|)!
|m|
where the properties of the associated Legendre functions of the first kind Pl
can be found in table books like [15, pp. 162-166].
One special property of the spherical harmonics is often of interest: their
“parity.” The parity of a wave function is 1, or even, if the wave function stays
the same if you replace ~r by −~r. The parity is −1, or odd, if the wave function
stays the same save for a sign change when you replace ~r by −~r. It turns out
that the parity of the spherical harmonics is (−1)l ; so it is −1, odd, if the
azimuthal quantum number l is odd, and 1, even, if l is even.
To see why, note that replacing ~r by −~r means in spherical coordinates that
θ changes into π − θ and φ into φ + π. According to trig, the first changes
cos θ into − cos θ. That leaves Pl (cos θ) unchanged for even l, since Pl is then a
symmetric function, but it changes the sign of Pl for odd l. So the sign change is
(−1)l . The value of m has no effect, since while the factor eimφ in the spherical
harmonics produces a factor (−1)|m| under the change in φ, m also puts |m|
derivatives om Pl , and each derivative produces a compensating change of sign
|m|
in Pl (cos θ).
There is a more intuitive way to derive the spherical harmonics: they define
the power series solutions to the Laplace equation. In particular, each rl Ylm is a
different power series solution P of the Laplace equation ∇2 P = 0 in Cartesian
coordinates. Each takes the form
X
cαβγ xα y β z γ
α+β+γ=l

where the coefficients cαβγ are such as to make the Laplacian zero.
Even more specifically, the spherical harmonics are of the form
X
cab ua+m v a z b a, b, m ≥ 0
2a+b=l−m
X
cab ua v a+|m| z b a, b, −m ≥ 0
2a+b=l−|m|

where the coordinates u = x + iy and v = x − iy serve to simplify the Laplacian.


That these are the basic power series solutions of the Laplace equation is readily
checked.

583
To get from those power series solutions back to the equation for the spherical
harmonics, one has to do an inverse separation of variables argument for the
solution of the Laplace equation in a sphere in spherical coordinates (compare
also the derivation of the hydrogen atom.) Also, one would have to accept on
faith that the solution of the Laplace equation is just a power series, as it is in
2D, with no additional non-power terms, to settle completeness. In other words,
you must assume that the solution is analytic.
The simplest way of getting the spherical harmonics is probably the one
given in note {A.78}.

A.16 The effective mass


Two-body systems, like the earth-moon system of celestial mechanics or the
proton-electron hydrogen atom of quantum mechanics, can be analyzed more
simply using effective mass. In this note both a classical and a quantum deriva-
tion will be given. The quantum derivation will need to anticipate some results
on multi-particle systems from chapter 4.1.
In two-body systems the two bodies move around their combined center of
gravity. However, in examples such as the ones mentioned, one body is much
more massive than the other. In that case the center of gravity almost coincides
with the heavy body, (earth or proton). Therefor, in a naive first approximation
it may be assumed that the heavy body is at rest and that the lighter one moves
around it. It turns out that this naive approximation can be made exact by
replacing the mass of the lighter body by an effective mass. That simplifies the
mathematics greatly by reducing the two-body problem to that of a single one,
and it now produces the exact answer regardless of the ratio of masses involved.
The classical derivation is first. Let m1 and ~r1 be the mass and position
of the massive body (earth or proton), and m2 and ~r2 those of the lighter one
(moon or electron). Classically the force F~ between the masses will be a function
of the difference ~r21 = ~r2 −~r1 in their positions. In the naive approach the heavy
mass is assumed to be at rest at the origin. Then ~r21 = ~r2 , and so the naive
equation of motion for the lighter mass is, according to Newton’s second law,

m2~¨r21 = F~ (~r21 )

Now consider the true motion. The center of gravity is defined as a mass-
weighted average of the positions of the two masses:
m1 m2
~rcg = w1~r1 + w2~r2 w1 = w2 =
m1 + m2 m1 + m2
It is shown in basic physics that the net external force on the system equals the
total mass times the acceleration of the center of gravity. Since in this case it

584
will be assumed that there are no external forces, the center of gravity moves at
a constant velocity. Therefor, the center of gravity can be taken as the origin of
an inertial coordinate system. In that coordinate system, the positions of the
two masses are given by

~r1 = −w2~r21 ~r2 = w1~r21

because the position w1~r1 + w2~r2 of the center of gravity must be zero in this
system, and the difference ~r2 − ~r1 must be ~r21 . (Note that the sum of the two
weight factors is one.) Solve these two equations for ~r1 and ~r2 and you get the
result above.
The true equation of motion for the lighter body is m2~¨r2 = F~ (~r21 ), or plug-
ging in the above expression for ~r2 in the center of gravity system,

m2 w1~¨r21 = F~ (~r21 )

That is exactly the naive equation of motion if you replace m2 in it by the


effective mass m2 w1 , i.e. by

m1 m2
meff = (A.29)
m1 + m2

The effective mass is almost the same as the lighter mass if the difference between
the masses is large, like it is in the cited examples, because then m2 can be
ignored compared to m1 in the denominator.
The bottom line is that the motion of the two-body system consists of the
motion of its center of gravity plus motion around its center of gravity. The
motion around the center of gravity can be described in terms of a single effective
mass moving around a fixed center.
The next question is if this effective mass idea is still valid in quantum
mechanics. Quantum mechanics is in terms of a wave function ψ that for a two-
particle system is a function of both ~r1 and ~r2 . Also, quantum mechanics uses
the potential V (~r21 ) instead of the force. The Hamiltonian eigenvalue problem
for the two particles is:

h̄2 2 h̄2 2
Hψ = Eψ H=− ∇1 − ∇ + V (~r21 )
2m1 2m2 2

where the two kinetic energy Laplacians in the Hamiltonian H are with respect
to the position coordinates of the two particles:
3 3
X ∂ 2ψ X ∂ 2ψ
∇21 ψ ≡ 2
∇22 ψ ≡ 2
j=1 ∂r1,j j=1 ∂r2,j

585
Now make a change of variables from ~r1 and ~r2 to ~rcg and ~r21 where

~rcg = w1~r1 + w2~r2 ~r21 = ~r2 − ~r1

The derivatives of ψ can be converted using the chain rule of differentiation:


∂ψ ∂ψ ∂rcg,j ∂ψ ∂r21,j ∂ψ ∂ψ
= + = w1 −
∂r1,j ∂rcg,j ∂r1,j ∂r21,j ∂r1,j ∂rcg,j ∂r21,j

or differentiating once more and summing


3 3 3 3
X ∂2ψ X ∂2ψ X ∂2ψ X ∂ 2ψ
∇21 ψ = 2
= w12 2
− 2w1 + 2
j=1 ∂r1,j j=1 ∂rcg,j j=1 ∂rcg,j ∂r21,j j=1 ∂r21,j

and a similar expression for ∇22 ψ, but with w2 instead of w1 and a plus sign
instead of the minus sign. Combining them together in the Hamiltonian, and
substituting for w1 and w2 , the mixed derivatives drop out against each other
and what is left is
h̄2 h̄2
H=− ∇2cg − ∇221 + V (~r21 )
2(m1 + m2 ) 2meff
The first term is the kinetic energy that the total mass would have if it was
at the center of gravity; the next two terms are kinetic and potential energy
around the center of gravity, in terms of the distance between the masses and
the effective mass.
The Hamiltonian eigenvalue problem Hψ = Eψ has separation of variables
solutions of the form
ψ = ψcg (~rcg )ψ21 (~r21 )
Substituting this and the Hamiltonian above into Hψ = Eψ and dividing by
ψcg ψ21 produces
" #
h̄2 1 2 1 h̄2
− ∇cg ψcg + − ∇221 + V ψ21 = E
2(m1 + m2 ) ψcg ψ21 2meff

Call the first term in the left hand side Ecg and the second E21 . By that
definition, Ecg would normally be a function of ~rcg , because ψcg is, but since it
is equal to E − E21 and those do not depend on ~rcg , Ecg cannot either, and must
be a constant. By similar reasoning, E21 cannot depend on ~r21 and must be a
constant too. Therefor, rewriting the definitions of Ecg and E21 , two separate
eigenvalue problems are obtained:
" #
h̄2 h̄2
− ∇2 ψcg = Ecg ψcg − ∇2 + V ψ21 = E21 ψ21
2(m1 + m2 ) cg 2meff 21

586
The first describes the quantum mechanics of an imaginary total mass m1 +m2
located at the center of gravity. The second describes an imaginary effective
mass meff at a location ~r21 away from a fixed center that experiences a potential
V (~r21 ).
For the hydrogen atom, it means that if the problem with a stationary proton
is solved using an effective electron mass mp me /(mp + me ), it solves the true
problem in which the proton moves a bit too. Like in the classical analysis, the
quantum analysis shows that in addition the atom can move as a unit, with a
motion described in terms of its center of gravity.
It can also be concluded, from a slight generalization of the quantum analy-
sis, that a constant external gravity field, like that of the sun on the earth-moon
system, or of the earth on a hydrogen atom, causes the center of gravity to ac-
celerate correspondingly, but does not affect the motion around the center of
gravity at all. That reflects a basic tenet of general relativity.

A.17 The hydrogen radial wave functions


This will be child’s play for harmonic oscillator, {A.12}, and spherical harmon-
ics, {A.15}, veterans. If you replace the angular terms in (3.15) by l(l + 1)h̄2 ,
and then divide the entire equation by h̄2 , you get
à !
1 d dR me e2 2me 2
− r2 + l(l + 1) − 2 2r = r E
R dr dr 4πǫ0 h̄ h̄2

Since l(l + 1) is nondimensional, all terms in this equation must be. In par-
ticular, the ratio in the third term must be the inverse of a constant with the
dimensions of length; so, define the constant to be the Bohr radius a0 . It is con-
venient to also define a correspondingly nondimensionalized radial coordinate
as ρ = r/a0 . The final term in the equation must be nondimensional too, and
that means that the energy E must take the form (h̄2 /2me a20 )ǫ, where ǫ is a
nondimensional energy. In terms of these scaled coordinates you get
à !
1 d dR
− ρ2 + l(l + 1) − 2ρ = ρ2 ǫ
R dρ dρ
or written out

−ρ2 R′′ − 2ρR′ + [l(l + 1) − 2ρ − ǫρ2 ]R = 0

where the primes denote derivatives with respect to ρ.


Similar to the case of the harmonic oscillator, youR must have solutions that
become zero at large distances ρ from the nucleus: |ψ|2 d3~r gives the proba-
bility of finding the particle integrated over all possible positions, and if ψ does

587
not become zero sufficiently rapidly at large ρ, this integral would become in-
finite, rather than one (certainty) as it should. Now the ODE above becomes
′′
for large
√ ρ approximately R + ǫR = 0, which has solutions of the rough form
cos( ǫρ + φ) for positive ǫ that do not have the required decay to zero. Zero
scaled energy ǫ is still too much, as can be checked by solving in terms of Bessel
functions, so you must have that ǫ is negative. In classical terms, the earth can
only hold onto the moon since the moon’s total energy is less than the potential
energy far from the earth; if it was not, the moon would escape.
Anyway, for bound states, you must have the scaled energy ǫ negative.

In
± −ǫρ
that case, the solution at large ρ takes the approximate form R ≈ e . Only
the negative sign is acceptable. You can make things a lot easier for yourself if
you peek at the final solution and rewrite ǫ as being −1/n2 (that is not really
cheating, since you are not at this time claiming that n is an integer, just a
positive number.) In that case, the acceptable exponential behavior at large
1
distance takes the form e− 2 ξ where ξ = 2ρ/n. Split off this exponential part by
1
writing R = e− 2 ξ R where R(ξ) must remain bounded at large ξ. Substituting
these new variables, the ODE becomes
′′ ′
−ξ 2 R + ξ(ξ − 2)R + [l(l + 1) − (n − 1)ξ]R = 0

where the primes indicate derivatives with respect to ξ.


If you do a power series solution of this ODE, you see that it must start
with either power ξ l or with power ξ −l−1 . The latter is not acceptable, since
it would correspond to an infinite expectation value of energy. You could now
expand the solution further in powers of ξ, but the problem is that tabulated
polynomials usually do not start with a power l but with power zero or one. So
you would not easily recognize the polynomial you get. Therefor it is best to
split off the leading power by defining R = ξ l R, which turns the ODE into
′′ ′
ξR + [2(l + 1) − ξ]R + [n − l − 1]R = 0
P
Substituting in a power series R = cp ξ p , you get
X X
p[p + 2l + 1]cp ξ p−1 = [p + l + 1 − n]cp ξ p

The acceptable lowest power p of ξ is now zero. Again the series must terminate,
otherwise the solution would behave as eξ at large distance, which is unaccept-
able. Termination at a highest power p = q requires that n equals q + l + 1.
Since q and l are integers, so must be n, and since the final power q is at least
zero, n is at least l + 1. The correct scaled energy ǫ = −1/n2 with n > l has
been obtained.
With n identified, you can identify the ODE as Laguerre’s associated differ-
ential equation, e.g. [15, 30.26], the (2l+1)-th derivative of Laguerre’s differential

588
equation, e.g. [15, 30.1], and the polynomial solutions as the associated Laguerre
polynomials L2l+1
n+l , e.g. [15, 30.27], the (2l + 1)-th derivatives of the Laguerre’s
polynomials Ln+l , e.g. [15, 30.2]. To normalize the wave function use an integral
from a table book, e.g. [15, 30.46].
Putting it all together, the generic expression for hydrogen eigenfunctions
are, drums please:
v
u µ ¶ µ ¶
2ut (n − l − 1)! 2ρ l 2l+1 2ρ −ρ/n m
ψnlm =− 2 3
Ln+l e Yl (θ, φ) (A.30)
n [(n + l)!a0 ] n n

The properties of the associated Laguerre polynomials L2l+1


n+l (2ρ/n) are in table
books like [15, pp. 169-172], and the spherical harmonics were given earlier in
section 3.1.3 and in note {A.15}, (A.28).
Do keep in mind that different references have contradictory definitions of
the associated Laguerre polynomials. This book follows the notations of [15,
pp. 169-172], who define

dn ³ n −x ´ dm
Ln (x) = ex x e , Lm
n = Ln (x).
dxn dxm
In other words, Lmn is simply the m-th derivative of Ln , which certainly tends to
simplify things. According to [10, p. 152], the “most nearly standard” notation
defines
m
m d
Lmn = (−1) Ln+m (x).
dxm
Combine the messy definition of the spherical harmonics (A.28) with the
uncertain definition of the Laguerre polynomials in the formulae (A.30) for the
hydrogen energy eigenfunctions ψnlm above, and there is of course always a
possibility of getting an eigenfunction wrong if you are not careful.
Sometimes the value of the wave functions at the origin is needed. Now from
the above solution (A.30), it is seen that

ψnlm ∝ rl for r → 0 (A.31)

so only the eigenfunctions ψn00 are nonzero at the origin. To find the value
requires L1n (0) where L1n is the derivative of the Laguerre polynomial Ln . Skim-
ming through table books, you can find that Ln (0) = n!, [15, 30.19], while the
differential equation for these function implies that L′n (0) = −nLn (0). Therefor:

1
ψn00 (0) = q (A.32)
n3 πa30

589
A.18 Inner product for the expectation value
To see that hΨ|A|Ψi works for getting the expectation value, just write Ψ out
in terms of the eigenfunctions αn of A:

hc1 α1 + c2 α2 + c3 α3 + . . . |A|c1 α1 + c2 α2 + c3 α3 + . . .i

Now by the definition of eigenfunctions Aαn = an αn for every n, so you get

hc1 α1 + c2 α2 + c3 α3 + . . . |c1 a1 α1 + c2 a2 α2 + c3 a3 α3 + . . .i

Since eigenfunctions are orthonormal:

hα1 |α1 i = 1 hα2 |α2 i = 1 hα3 |α3 i = 1 ...

hα1 |α2 i = hα2 |α1 i = hα1 |α3 i = hα3 |α1 i = hα2 |α3 i = hα3 |α2 i = . . . = 0
So, multiplying out produces the desired result:

hΨ|AΨi = |c1 |2 a1 + |c2 |2 a2 + |c3 |2 a3 + . . . ≡ hAi

A.19 Why commuting operators have a com-


mon set of eigenvectors
The fact that two operators that commute have a common set of eigenvectors
can be seen as follows: assume that α ~ is an eigenvector of A with eigenvalue
a. Then since A and B commute, AB~ α = BA~ α = aB~ α, so, comparing start
and end, B~ α must be an eigenvector of A with eigenvalue a too. If there is no
degeneracy of the eigenvalue, that must mean that B~ α equals α~ or is at least
proportional to it, which is the same as saying that α
~ is an eigenvector of B too.
(In the special case that B~ α is zero, α is an eigenvector of B with eigenvalue
zero.)
If there is degeneracy, the eigenvectors of A are not unique and you can
mess with them until they all do become eigenvectors of B too. The following
procedure will construct such a set of common eigenvectors in finite dimensional
space. Consider each eigenvalue of A in turn. There will be more than one
eigenvector corresponding to a degenerate eigenvalue a. Now by completeness,
any eigenvector β can be written as a combination of the eigenvectors of A, and
more particularly as β = βn + βa where βa is a combination of the eigenvectors
of A with eigenvalue a and βn a combination of the eigenvectors of A with other
eigenvalues.
The vectors βn and βa separately are still eigenvectors of B if nonzero, since
as noted above, B converts eigenvectors of A into eigenvectors with the same

590
eigenvalue or zero. (For example, if Bβa was not bβa , Bβn would have to make
up the difference, and Bβn can only produce combinations of eigenvectors of
A that do not have eigenvalue a.) Now replace the eigenvector β by either βa
or βn , whichever is independent of the other eigenvectors of B. Doing this for
all eigenvectors of B you achieve that the replacement eigenvectors of B are
either combinations of the eigenvectors of A with eigenvalue a or of the other
eigenvectors of A. The set of new eigenvectors of B that are combinations of
the eigenvectors of A with eigenvalue a can now be taken as the replacement
eigenvectors of A with eigenvalue a. They are also eigenvectors of B. Repeat
for all eigenvalues of A.
Similar arguments can be used recursively to show that more generally, a
set of operators that all commute have a common set of eigenvectors.
The operators do not really have to be Hermitian, just “diagonalizable”:
they must have a complete set of eigenfunctions.
In the infinite dimensional case the mathematical justification gets much
trickier. However, as the hydrogen atom and harmonic oscillator eigenfunction
examples indicate, it continues to be relevant in nature.

A.20 The generalized uncertainty relationship


For brevity, define A′ = A − hAi and B ′ = B − hBi, then the general expression
for standard deviation says
σA2 σB2 = hA′2 ihB ′2 i = hΨ|A′2 ΨihΨ|B ′2 Ψi
Hermitian operators can be taken to the other side of inner products, so
σA2 σB2 = hA′ Ψ|A′ ΨihB ′ Ψ|B ′ Ψi
Now the Cauchy-Schwartz inequality says that for any f and g,
q q
|hf |gi| ≤ hf |f i hg|gi
(See the notations for more on this theorem.) Using the Cauchy-Schwartz in-
equality in reversed order, you get
σA2 σB2 ≥ |hA′ Ψ|B ′ Ψi|2 = |hA′ B ′ i|2
Now by the definition of the inner product, the complex conjugate of hA′ Ψ|B ′ Ψi
is hB ′ Ψ|A′ Ψi, so the complex conjugate of hA′ B ′ i is hB ′ A′ i, and averaging a
complex number with minus its complex conjugate reduces its size, since the
real part averages away, so
¯ ¯
¯ hA′ B ′ i − hB ′ A′ i ¯2
¯ ¯
σA2 σB2 ≥ ¯¯ ¯
¯
2

591
The quantity in the top is the expectation value of the commutator [A′ , B ′ ].
Writing it out shows that [A′ , B ′ ] = [A, B].

A.21 Derivation of the commutator rules


This note explains where the formulae of section 3.4.4 come from.
The general assertions are readily checked by simply writing out both sides
of the equation and comparing. And some are just rewrites of earlier ones.
Position and potential energy operators commute since they are just ordinary
numerical multiplications, and these commute.
The linear momentum operators commute because the order in which dif-
ferentiation is done is irrelevant. Similarly, commutators between angular mo-
mentum in one direction and position in another direction commute since the
other directions are not affected by the differentiation.
The commutator between x-position and px -linear momentum was worked
out in the previous subsection to figure out Heisenberg’s uncertainty principle.
Of course, three-dimensional space has no preferred direction, so the result
applies the same in any direction, including the y- and z-directions.
The angular momentum commutators are simplest obtained by just grinding
out
b ,L
[L b ] = [ybpb − zbpb , zbpb − x
bpbz ]
x y z y x

using the linear combination and product manipulation rules and the commuta-
tors for linear angular momentum. To generalize the result you get, you cannot
just arbitrarily swap x, y, and z, since, as every mechanic knows, a right-handed
screw is not the same as a left-handed one, and some axes swaps would turn one
into the other. But you can swap axes according to the “xyzxyzx . . .” “cyclic
permutation” scheme, as in:

x → y, y → z, z→x

which produces the other two commutators if you do it twice:


b ,L
[L b ] = ih̄L
b −→ b ,L
[L b ] = ih̄L
b −→ b ,L
[L b ] = ih̄L
b
x y z y z x z x y

For the commutators with square angular momentum, work out


b ,L
[L b2 + L b 2]
b2 + L
x x y z

using the manipulation rules and the commutators between angular momentum
components.
b ] = [x
A commutator like [xb, L b, ybpbz − zbpby ] is zero because everything com-
x
mutes in it. However, in a commutator like [xb, L b ] = [x
b, zbpbx − x
bpbz ], x
b does not
y

592
commute with pbx , so multiplying out and taking the zb out of [xb, zbpbx ] at its own
side, you get zb[xb, pbx ], and the commutator left is the canonical one, which has
value ih̄. Plug these results and similar into [xb2 + yb2 + zb2 , Lx ] and you get zero.
For a commutator like [xb, L b 2 ] = [x b2 + L
b, L b2 + L
b 2 ], the L2 term produces zero
x y z x
because L b commutes with x b , and in the remaining term, taking the various
x
factors out at their own sides of the commutator produces
b [x
=L b b ]Lb +L
b [x b b ]Lb = ih̄L
b zb + ih̄zbL
b − ih̄L
b yb − ih̄ybL
b
y b, Ly ] + [x
b, L y y z b, Lz ] + [x
b, L z z y y z z

the final equality because of the commutators already worked out. Now by the
nature of the commutator, you can swap the order of the terms in L b zb as long
y
b
as you add the commutator [Ly , zb] to make up for it, and that commutator was
b yb can be swapped to give
already found to be ih̄xb, The same way the order of L z

b − zbL
= −2h̄2 xb − 2ih̄(ybL b )
z y

~b
and the parenthetical expression can be recognized as the x-component of ~rb × L,
giving one of the expressions claimed. Instead you can work out the parenthet-
b and L
ical expression further by substituting in the definitions for L b :
z y
µ ¶
= −2h̄2 xb − 2ih̄ yb(xbpby − ybpbx ) − zb(zbpbx − xbpbz ) − xb(xbpbx − xbpbx )

where the third term added within the big parenthesis is self-evidently zero.
This can be reordered to the x-component of the second claimed expression.
And as always, the other components are of course no different.
The commutators between linear and angular momentum go almost identi-
cally, except for additional swaps in the order between position and momentum
operators using the canonical commutator.
To derive the first commutator in (3.55), consider the z-component as the
example:
[xLb − yL b ,L
b 2 ] = [x, L
b 2 ]L
b − [y, L
b 2 ]L
y x y x

~b and using (3.50)


because L2 commutes with L,
b − yL
b ,Lb 2 ] = −2h̄2 xL
b − 2ih̄(y L
b Lb b2 2 b b2 b b
[xL y x y z y − z Ly ) + 2h̄ y Lx + 2ih̄(z Lx − xLz Lx )

Now use the commutator [L b ,L


b ] to get rid of L
b Lb b b
y z z y and [Lz , Lx ] to get rid of
b L
L b
z x and clean up to get
³ ´
b − yL
b ,Lb 2 ] = 2ih̄ −y L
b Lb b2 b2 b b
[xL y x y z + z Ly + z Lx − xLx Lz

b
~ = ~r · (~r × ~pb) = 0 so xL b + yL
b = −z L
b , which gives the claimed
Now ~r · L x y z
expression. To verify the second equation of (3.55), use (3.50), the first of
b 2 ].
(3.55), and the definition of [~r, L

593
A.22 Is the variational approximation best?
Clearly, “best” is a subjective term. If you are looking for the wave function
within a definite set that has the most accurate expectation value of energy,
then minimizing the expectation value of energy will do it. This function will
also approximate the true eigenfunction shape the best, in some technical sense
{A.24}. (There are many ways the best approximation of a function can be
defined; you can demand that the maximum error is as small as possible, or
that the average magnitude of the error is as small as possible, or that a root-
mean-square error is, etcetera. In each case, the “best” answer will be different,
though there may not be much of a practical difference.)
But given a set of approximate wave functions like those used in finite ele-
ment methods, it may well be possible to get much better results using additional
mathematical techniques like Richardson extrapolation. In effect you are then
deducing what happens for wave functions that are beyond the approximate
ones you are using.

A.23 Solution of the hydrogen molecular ion


The key to the variational approximation to the hydrogen molecular ion is to
be able to accurately evaluate the expectation energy
hEi = haψL + bψR |H|aψL + bψR i
This can be multiplied out and simplified by noting that ψL and ψR are eigen-
functions of the partial Hamiltonians. For example,
e2 1
HψL = E1 ψL − ψL
4πǫ0 rR
where E1 is the -13.6 eV hydrogen atom ground state energy. The expression
can be further simplified by noting that by symmetry
hψR |rL−1 ψR i = hψL |rR
−1
ψL i hψL |rL−1 ψR i = hψR |rR
−1
ψL i
and that ψL and ψR are real, so that the left and right sides of the various
inner products can be reversed. Also, a and b are related by the normalization
requirement
a2 + b2 + 2abhψL |ψR i = 1
Cleaning up the expectation energy in this way, the result is
hEi = E1 −
 D E 
−1
 ψL |rL E
e2 D −1
E 1 ψR D
−1
ψL |rR ψL − + 2ab hψL |ψR i  − ψL |rR ψL 
4πǫ0 d hψL |ψR i

594
which includes the proton to proton repulsion energy (the 1/d). The energy E1
is the −13.6 eV amount of energy when the protons are far apart.
Numerical integration is not needed; the inner product integrals in this ex-
pression can be done analytically. To do so, take the origin of a spherical
coordinate system (r, θ, φ) at the left proton, and the axis towards the right
one, so that
q
rL = |~r − ~rLp | = r rR = |~r − ~rRp | = d2 + r2 − 2dr cos(θ).

In those terms,
1 1 √
d2 +r 2 −2dr cos(θ)/a0
ψL = q e−r/a0 ψR = q e− .
πa30 πa30

Then integrate angles first using d3~r = r2 q


sin(θ)dr dθ dφ = −r2 dr d cos(θ) dφ.

Do not forget that x2 = |x|, not x, e.g. (−3)2 = 3, not −3. More details
are in [10, pp. 305-307].
The “overlap integral” turns out to be
 Ã !2 
d 1 d
hψL |ψR i = e−d/a0 1 + + 
a0 3 a0

and provides a measure of how much the regions of the two wave functions
overlap. The “direct integral” is
D E · ¸
−1 1 1 1 −2d/a0
ψL |rR ψL = − + e
d a0 d

and gives the classical potential of an electron density of strength |ψL |2 in the
field of the right proton, except for the factor −e2 /4πǫ0 . The “exchange integral”
is " #
D E 1 d −d/a0
−1
ψL |rL ψR = + e .
a0 a20
and is somewhat of a twilight term, since ψL suggests that the electron is around
the left proton, but ψR suggests it is around the right one.

A.24 Accuracy of the variational method


Any approximate ground state solution ψ may always be written as a sum of
the eigenfunctions ψ1 , ψ2 , . . .:

ψ = c1 ψ1 + ε2 ψ2 + ε3 ψ3 + . . .

595
where, if the approximation is any good at all, the coefficient c1 of the ground
state ψ1 is close to one, while ε2 , ε3 , . . . are small.
The condition that ψ is normalized, hψ|ψi = 1, works out to be
1 = hc1 ψ1 + ε2 ψ2 + . . . |c1 ψ1 + ε2 ψ2 + . . .i = c21 + ε22 + ε23 + . . .
since the eigenfunctions ψ1 , ψ2 , . . . are orthonormal.
Similarly, the expectation energy hEi = hψ|Hψi works out to be
hEi = hc1 ψ1 + ε2 ψ2 + . . . |E1 c1 ψ1 + E2 ε2 ψ2 + . . .i = c21 E1 + ε22 E2 + ε23 E3 + . . .
Eliminating c21 using the normalization condition above gives
hEi = E1 + ε22 (E2 − E1 ) + ε23 (E3 − E1 ) + . . .
One of the things this expression shows is that any approximate wave func-
tion (not just eigenfunctions) has more expectation energy than the ground
state E1 . All other terms in the sum above are positive since E1 is the lowest
energy value.
The expression above also shows that while the deviations of the wave func-
tion from the exact ground state ψ1 are proportional to the coefficients ε2 , ε3 , . . .,
the errors in energy are proportional to the squares of those coefficients. And
the square of any reasonably small quantity is much smaller than the quantity
itself. So the approximate ground state energy is much more accurate than
would be expected from the wave function errors.
Still, if an approximate system is close to the ground state energy, then the
wave function must be close to the ground state wave function. More precisely,
if the error in energy is a small number, call it ε2 , then the amount ε2 of
eigenfunction
√ ψ2 “polluting” approximate ground state ψ must be no more than
ε/ E2 − E1 . And that is in the worst case scenario that all the error in the
expectation value of energy is due to the second eigenfunction.
As a measure of the average combined error in wave function, you can use
the magnitude or norm of the combined pollution:
q
||ε2 ψ2 + ε3 ψ3 + . . . || = ε22 + ε23 + . . .

That error is no more than ε/ E2 − E1 . To verify it, note that
ε22 (E2 − E1 ) + ε23 (E2 − E1 ) + . . . ≤ ε22 (E2 − E1 ) + ε23 (E3 − E1 ) + . . . = ε2 .
(Of course, if the ground state wave function would be degenerate, E2 would
be E1 . But in that case you do not care about the error in ψ2 , since then ψ1
and ψ2 are equally good ground states, and E2 − E1 becomes E3 − E1 .)
The bottom line is that the lower you can get your expectation energy, the
closer you will get to the true ground state energy, and the small error in energy
will reflect in a small error in wave function.

596
A.25 Positive molecular ion wave function
Since the assumed Hamiltonian is real, taking real and imaginary parts of the
eigenvalue problem Hψ = Eψ shows that the real and imaginary parts of ψ
each separately are eigenfunctions with the same eigenvalue, and both are real.
So you can take ψ to be real without losing anything.
The expectation value of the energy for |ψ| is the same as that for ψ, as-
suming that an integration by parts has been done on the kinetic energy part
to convert it into an integral of the square gradients of ψ. Therefor |ψ| must
be the same function as ψ within a constant, assuming that the ground state
of lowest energy is non degenerate. That means that ψ cannot change sign and
can be taken to be positive.
(Regrettably this argument stops working for more than two electrons due
to the antisymmetrization requirement of section 4.6. It does keep working for
bosons, like helium atoms in a box, [7, p. 321])
With a bit more sophistication, the above argument can be be inverted to
show that the ground state must indeed be unique. Assume that there would
be two different (more precisely: not equal within a constant) ground state
eigenfunctions, instead of just one. Then linear combinations of the two would
exist that crossed zero. The absolute value of such a wave function would,
again, have the same expectation energy as the wave function itself, the ground
state energy. But the absolute value of such a wave function has kinks of finite
slope at the zero crossings. (Just think of the graph of |x|.) If these kinks
are locally slightly smoothed out, i.e. rounded off, the kinetic energy would
decrease correspondingly, since kinetic energy is the integral of the square slope
and the slope has been reduced nontrivially in the immediate vicinity of the zero
crossings. However, there would not be a corresponding increase in potential
energy, since the potential energy depends on the square of the wave function
itself, not its slope, and the square of the wave function itself is vanishingly small
in the immediate vicinity of a zero crossing. If the kinetic energy goes down, and
the potential energy does not go up enough to compensate, the energy would
be lowered. But that contradicts the fact that the ground state has the lowest
possible energy. The contradiction implies that the original assumption of two
different ground state eigenfunctions cannot be right; the ground state must be
unique.

A.26 Molecular ion wave function symmetries


Let z be the horizontal coordinate measured from the symmetry plane towards
the right nucleus. Let M be the “mirror operator” that changes the sign of z,

597
in other words,
M Ψ(x, y, z) = Ψ(x, y, −z)
This operator commutes with the Hamiltonian H since the energy evaluates
the same way at positive and negative z. This means that operators H and M
must have a complete set of common eigenfunctions. That set must include the
ground state of lowest energy: so the ground state must be an eigenfunction of
M too. Now the eigenvalues of M , which can be seen to be a Hermitian oper-
ator, are either +1 or −1: if M is applied twice, it gives back the same wave
function, i.e. 1Ψ, so the square of the eigenvalue is 1, so that the eigenvalue
itself can only be 1 and -1. Eigenfunctions with eigenvalue 1 are called “sym-
metric”, eigenfunctions with eigenvalue −1 are called “antisymmetric”. Since
the previous note found that the ground state must be everywhere positive, it
can only be a symmetric eigenfunction of M .
Similarly, let R be the operator that rotates Ψ over a small angle φ around
the axis of symmetry. The magnitude of the eigenvalues of R must be 1, since Ψ
must stay normalized to 1 after the rotation. Complex numbers of magnitude 1
can be written as eia where a is a real number. Number a must be proportional
to φ, since rotating Ψ twice is equivalent to rotating it once over twice the
angle, so the eigenvalues are eimφ , where m is a constant independent of φ.
(In addition, m must be integer since rotating over 360 degrees must give back
the original wave function.) In any case, the only way that Ψ can be real and
positive at all angular positions is if m = 0, and then the eigenvalue of R is 1,
implying that the ground state Ψ does not change when rotated; it must be the
same at all angles. That means that the wave function is axially symmetric.
For future reference, one other symmetry must be mentioned, for the ground
state of the neutral hydrogen molecule that will be covered in the next chap-
ter. The neutral molecule has two electrons, instead of one, with positions ~r1
and ~r2 . The Hamiltonian will commute with the operation of “exchanging the
electrons,” i.e. swapping the values of ~r1 and ~r2 , because all electrons are identi-
cal. So, for the same reasons as for the mirror operator above, the spatial wave
function will be symmetric, unchanged, under particle exchange.
(Regrettably this argument stops working for more than two electrons due
to the antisymmetrization requirement of section 4.6. It does keep working for
bosons, like helium atoms, [7, p. 321])

A.27 Solution of the hydrogen molecule


To find the approximate solution for the hydrogen molecule, the key is to be
able to find the expectation energy of the approximate wave functions aψL ψR +
bψR ψL .

598
First, for given a/b, the individual values of a and b can be computed from
the normalization requirement

a2 + b2 + 2abhψL |ψR i2 = 1 (A.33)

where the value of the overlap integral hψL |ψR i was given in note {A.23}.
The inner product

haψL ψR + bψR ψL |H|aψL ψR + bψR ψL i6

is a six dimensional integral, but when multiplied out, a lot of it can be factored
into products of three-dimensional integrals whose values were given in note
{A.23}. Cleaning up the inner product, and using the normalization condition,
you can get: " #
e2 2
hEi = 2E1 − A1 + 2abhψL |ψR i A2
4πǫ0
using the abbreviations

−1 1 −1
A1 = 2hψL |rR ψL i − − hψL ψR |r12 ψL ψR i
d
2hψL |rL−1 ψR i −1
−1
hψL ψR |r12 ψR ψL i −1
A2 = − 2hψL |rR ψL i − + hψL ψR |r12 ψL ψR i
hψL |ψR i hψL |ψR i2
Values for several of the inner products in these expressions are given in note
{A.23}. Unfortunately, these involving the distance r12 = |~r1 − ~r2 | between
the electrons cannot be done analytically. And one of the two cannot even be
reduced to a three-dimensional integral, and needs to be done in six dimensions.
(It can be reduced to five dimensions, but that introduces a nasty singularity
and sticking to six dimensions seems a better idea.) So, it gets really elaborate,
because you have to ensure numerical accuracy for singular, high-dimensional
integrals. Still, it can be done with some perseverance.
In any case, the basic idea is still to print out expectation energies, easy to
obtain or not, and to examine the print-out to see at what values of a/b and d
the energy is minimal. That will be the ground state.
The results are listed in the main text, but here are some more data that
may be of interest. At the 1.62 a0 nuclear spacing of the ground state, the
antisymmetric state a/b = −1 has a positive energy of 7 eV above separate
atoms and is therefor unstable.
The nucleus to electron attraction energies are 82 eV for the symmetric
state, and 83.2 eV for the antisymmetric state, so the antisymmetric state has
the lower potential energy, like in the hydrogen molecular ion case, and unlike
what you read in some books. The symmetric state has the lower energy because
of lower kinetic energy, not potential energy.

599
Due to electron cloud merging, for the symmetric state the electron to elec-
tron repulsion energy is 3 eV lower than you would get if the electrons were
point charges located at the nuclei. For the antisymmetric state, it is 5.8 eV
lower.
As a consequence, the antisymmetric state also has less potential energy
with respect to these repulsions. Adding it all together, the symmetric state
has quite a lot less kinetic energy than the antisymmetric one.

A.28 Hydrogen molecule ground state and spin


The purpose of this note is to verify that the inclusion of spin does not change
the spatial form of the ground state of the hydrogen molecule. The lowest
expectation energy hEi = hψgs |Hψgs i, characterizing the correct ground state,
only occurs if all spatial components ψ±± of the ground state with spin,
ψgs = ψ++ ↑↑ + ψ+− ↑↓ + ψ−+ ↓↑ + ψ−− ↓↓,
are proportional to the no-spin spatial ground state ψgs,0 .
The reason is that the assumed Hamiltonian (4.3) does not involve spin at
all, only spatial coordinates, so, for example,
(Hψ++ ↑↑) ≡ H (ψ++ (~r1 ,~r2 )↑(Sz1 )↑(Sz2 )) = (Hψ++ ) ↑↑
and the same for the other three terms in Hψgs . So the expectation value of
energy becomes
hEi = hψ++ ↑↑ + ψ+− ↑↓ + ψ−+ ↓↑ + ψ−− ↓↓
| (Hψ++ ) ↑↑ + (Hψ+− ) ↑↓ + (Hψ−+ ) ↓↑ + (Hψ−− ) ↓↓i
Because of the orthonormality of the spin states, this multiplies out into inner
products of matching spin states as
hEi = hψ++ |Hψ++ i + hψ+− |Hψ+− i + hψ−+ |Hψ−+ i + hψ−− |Hψ−− i.
In addition, the wave function must be normalized, hψgs |ψgs i = 1, or
hψ++ |ψ++ i + hψ+− |ψ+− i + hψ−+ |ψ−+ i + hψ−− |ψ−− i = 1.
Now when ψ++ , ψ+− , ψ−+ , and ψ−− are each proportional to the no-spin spatial
ground state ψgs,0 with the lowest energy Egs , their individual contributions to
the energy will be given by hψ±± |Hψ±± i = Egs hψ±± |ψ±± i, the lowest possible.
Then the total energy hEi will be Egs . Anything else will have more energy and
can therefor not be the ground state.
It should be pointed out that to a more accurate approximation, spin causes
the electrons to be somewhat magnetic, and that produces a slight dependence
of the energy on spin; compare chapter 10.1.6. This note ignored that, as do
most other derivations in this book.

600
A.29 Shielding approximation limitations
In the helium atom, if you drop the shielding approximation for the remaining
electron in the ionized state, as common sense would suggest, the ionization
energy would become negative! This illustrates the dangers of mixing models
at random. This problem might also be why the discussion in [10] is based on
the zero shielding approximation, rather than the full shielding approximation
used here.
But zero shielding does make the base energy levels of the critical outer
electrons of heavy atoms very large, proportional to the square of the atomic
number. And that might then suggest the question: if the energy levels explode
like that, why doesn’t the ionization energy or the electronegativity? And it
makes the explanation why helium would not want another electron more dif-
ficult. Full shielding puts you in the obviously more desirable starting position
of the additional electron not being attracted, and the already present electrons
being shielded from the nucleus by the new electron. And how about the size
of the atoms imploding in zero shielding?
Overall, this book prefers the full shielding approach. Zero shielding would
predict the helium ionization energy to be 54.4 eV, which really seems worse
than 13.6 eV when compared to the exact value of 24.6 eV. On the other hand,
zero shielding does give a fair approximation of the actual total energy of the
atom; 109 eV instead of an exact value of 79. Full shielding produces a poor
value of 27 eV for the total energy; the total energy is proportional to the square
of the effective nucleus strength, so a lack of full shielding will increase the total
energy very strongly. But also importantly, full shielding avoids the reader’s
distraction of having to rescale the wave functions to account for the non-unit
nuclear strength.
If eventually X-ray spectra need to be covered in this book, a description of
“hot” relativistic inner electrons would presumably fix any problem well.

A.30 Why the s states have the least energy


The probability of being found near the nucleus, i.e. the origin, is determined by
the magnitude of the relevant hydrogen wave function |ψnlm |2 near the origin.
Now the power series expansion of ψnlm in terms of the distance r from the origin
starts with power rl , (A.30). For small enough r, a p, (i.e. ψn1m ), state involving
a factor r will be much smaller than an s, (ψn0m ), state without such a factor.
Similarly a d, (ψn2m ), state involving a factor r2 will be much less still than a p
state with just single factor r, etcetera. So states of higher angular momentum
quantum number l stay increasingly strongly out of the immediate vicinity of
the nucleus. This reflects in increased energy since the nuclear attraction is

601
much greater close the nucleus than elsewhere in the presence of shielding.

A.31 Why energy eigenstates are stationary


The probability of measuring an eigenvalue ai for any arbitrary physical quan-
tity a is according to the orthodox interpretation the square magnitude of the
coefficient of the corresponding eigenfunction αi . This coefficient can be found
as the inner product hαi |Ψi, which for a stationary state is hαi |c~n (0)e−iE~n t/h̄ ψ~n i
and taking the square magnitude kills off the time-dependent exponential. So
the probability of measuring any value for any physical quantity remains exactly
the same however long you wait.
It is of course assumed that the operator A does not explicitly depend on
time. Otherwise its time variation would be automatic. (The eigenfunctions
would depend on time.)

A.32 Better description of two-state systems


The given description of two state systems is a bit tricky, since the mentioned
states of lowest and highest energy are only approximate energy eigenfunctions. √
But they can be √ made exact energy eigenfunctions by defining (ψ1 + ψ2 )/ 2
and (ψ1 − ψ2 )/ 2 to be the exact symmetric ground state and the exact anti-
symmetric state of second lowest energy. The precise “basic” wave function ψ1
and ψ2 can then be reconstructed from that.
Note that ψ1 and ψ2 themselves are not energy eigenstates, though they
might be so by approximation. The errors in this approximation, even if small,
will produce the wrong result for the time evolution. (It are the small differences
in energy that drive the nontrivial part of the unsteady evolution.)

A.33 The evolution of expectation values


To verify the stated formulae for the evolution of expectation values, just write
the definition of expectation value, hΨ|AΨi, differentiate to get

hΨt |AΨi + hΨ|AΨt i + hΨ|At Ψi

and replace Ψt by HΨ/ih̄ on account of the Schrödinger equation. Note that


in the first inner product, the i appears in the left part, hence comes out as its
complex conjugate −i.

602
A.34 The virial theorem
The virial theorem says that the expectation value of the kinetic energy of
stationary states is given by

hT i = 21 h~r · ∇V i (A.34)

Note that according to the calculus rule for directional derivatives, ~r · ∇V =


r∂V /∂r.
For the V = 21 cx x2 + 21 cy y 2 + 21 cz z 2 potential of a harmonic oscillator,
x∂V /∂x + y∂V /∂y + z∂V /∂z produces 2V . So for energy eigenstates of the
harmonic oscillator, the expectation value of kinetic energy equals the one of
the potential energy. And since their sum is the total energy Enx ny nz , each must
be 21 Enx ny nz .
For the V = constant/r potential of the hydrogen atom, r∂V /∂r produces
−V , So the expectation value of kinetic energy equals minus one half the one
of the potential energy. And since their sum is the total energy En , hT i = −En
and hV i = 2En . Note that En is negative, so that the kinetic energy is positive
as it should be.
To prove the virial theorem, work out the commutator in
dh~r · p~i i
= h[H,~r · p~]i
dt h̄
using the formulae in chapter 3.4.4,
dh~r · p~i
= 2hT i − h~r · ∇V i,
dt
and then note that the left hand side above is zero for stationary states, (in
other words, states with a definite total energy).

A.35 The energy-time uncertainty relationship


As mentioned in chapter 3.4.3, Heisenberg’s formulae
1
∆px ∆x ≥ h̄
2
relating the typical uncertainties in momentum and position is often very con-
venient for qualitative descriptions of quantum mechanics, especially if you mis-
read ≥ as ≈.
So, taking a cue from relativity, people would like to write a similar expres-
sion for the uncertainty in the time coordinate,
1
∆E∆t ≥ h̄
2
603
However, if you want to formally justify such an expression in general, it is not
at all obvious what to make of that uncertainty in time ∆t.
To arrive at one definition, assume that the variable of real interest in a
given problem has a time-invariant operator A. The generalized uncertainty
relationship of chapter 3.4.2 between the uncertainties in energy and A is:
1
σE σA ≥ |h[H, A]i|.
2
But |h[H, A]i| is just h̄|dhAi/dt|.
So the Mandelshtam-Tamm version of the energy-time uncertainty principle
just defines the uncertainty in time to be
,¯ ¯
¯ dhAi ¯
¯ ¯
σt = σA ¯¯ ¯.
dt ¯
That corresponds to the typical time in which the expectation value of A changes
by one standard deviation. In other words, it is the time it takes for A to change
to a value sufficiently different that it will clearly show up in measurements.

A.36 The adiabatic theorem


This note derives the adiabatic theorem and then mentions some of its implica-
tions.

A.36.1 Derivation of the theorem


Consider the Schrödinger equation
∂Ψ
ih̄ = HΨ
∂t
If the Hamiltonian is independent of time, the solution can be written in terms
of its eigenvalues E~n and eigenfunctions ψ~n as
X 1
Ψ= c~n eiθ~n ψ~n θ~n = − E~n t
~
n

where ~n stands for the quantum numbers of the eigenfunctions. But the Hamil-
tonian varies with time in the systems of interest here. Still, at any given time
its eigenfunctions form a complete set. So it is still possible to write the wave
function as a sum of them, say like
X
iθ~n 1Z
Ψ= c~n e ψ~n θ~n = − E~n dt (A.35)
~
n

604
However, the coefficients c~n can no longer be assumed to be constant. They
may be different at different times.
To get an equation for their variation, plug the expression for Ψ in the
Schrödinger equation:
X X i X X
ih̄ c~n′ eiθ~n ψ~n − ih̄ c~n E~n eiθ~n ψ~n + ih̄ c~n eiθ~n ψ~n′ = H c~n eiθ~n ψ~n
~
n ~
n
h̄ ~
n ~
n

where the primes indicate time derivatives. The middle sum in the left hand side
and the right hand side cancel against each other since ψ~n is an eigenfunction
of the Hamiltonian. For the remaining two sums, take an inner product with
an arbitrary eigenfunction hψ~n |:
X
ih̄c~n′ eiθ~n + ih̄ c~n eiθ~n hψ~n |ψ~n′ i = 0
~
n

where in the first sum only the term ~n = ~n survived because of the orthonor-
mality of the eigenfunctions. Divide by ih̄eiθ~n and rearrange to get
X
c~n′ = − c~n ei(θ~n −θ~n ) hψ~n |ψ~n′ i (A.36)
~
n

This is still exact. However, the objective is now to show that in the adiabatic
approximation of a slowly varying system, all terms in the sum above except
the one with ~n = ~n can be ignored.
To do so, for simplicity assume that all eigenfunctions are fully nondegen-
erate, as is typical in one-dimensional problems. For example, think of a one-
dimensional harmonic oscillator whose stiffness slowly changes. For such a sys-
tem the eigenfunctions will change in a slow, regular way with the state of
the system, making the time derivative of the eigenfunction in the sum above
small. Then the complete equation implies that every arbitrary coefficient c~n
also varies slowly with time. And now note that the change in coefficient c~n can
be found from the time integral of the sum. The time integral of the terms with
~n 6= ~n is almost exactly zero, since the exponential is periodic on a time scale
that is not slow, and integrates to zero over each period. So, these terms nullify
over each period of the exponential, and never succeed in making a significant
contribution to the change in c~n .
To see this a bit more mathematically precisely, perform an integration by
parts on such a term:
Z
i h̄c~n hψ~n |ψ~n′ i
− (E~n − E~n )ei(θ~n −θ~n ) dt =
period h̄ i(E~n − E~n )
¯ Z Ã !′
h̄c~n hψ~n |ψ~n′ i ¯¯end n hψ~
i(θ~n −θ~n ) h̄c~

n |ψ~
ni
− e dt
i(E~n − E~n ) ¯start period i(E~n − E~n )

605
If the evolution takes place over some large typical time T , both terms will be
small of order 1/T 2 over each period, due to the smallness and slow variation of
the fraction. So the change in the coefficient c~n that it causes during the order
T evolution time is small of order 1/T .
The remaining equation (A.36) for the coefficient c~n is now readily integrated
as Z
iγ~n
c~n = c~n,0 e γ~n = i hψ~n |ψ~n′ i dt

where c~n,0 is a constant that depends on the initial condition for Ψ, (and on the
choice of integration constant for γ~n , but usually you take γ~n zero at the initial
time). This expression for the coefficients can be plugged in (A.35) to find the
wave function Ψ.
Note that γ~n is real, because differentiating the normalization requirement
produces
hψ~n |ψ~n i = 1 =⇒ hψ~n′ |ψ~n i + hψ~n |ψ~n′ i = 0
so the sum of the inner product plus its complex conjugate are zero. That
makes it purely imaginary. It follows that the magnitude of the coefficients
c~n does not change in time. In particular, if the system starts out in a single
eigenfunction, then it stays in that eigenfunction for as long as its energy remains
non-degenerate.
So what is all this fuss about the adiabatic theorem being so hard to prove?
Well, it gets much messier if it cannot be assumed that all energy eigenfunctions
are non-degenerate. Think of a three-dimensional harmonic oscillator, whose
stiffnesses pass through a point where they are equal in all directions. There
would be massive degeneracy at that point. If eigenfunctions are almost degen-
erate, the smallest perturbation may throw you from one onto another. Sure,
when the adiabatic system is in an almost degenerate eigenstate, ambiguity in
that state is to be expected. But suppose the system is in the non-degenerate
ground state. Should not the adiabatic theorem still apply then, regardless
of the degeneracy of states that the system is not in? Unfortunately the as-
sumption above that the time derivatives of the wave functions are small and
smoothly varying crashes.
Some handle on that problem may be obtained by getting rid of the time
derivative of the eigenfunction, and that can be done by differentiating the
eigenvalue problem for ψ~n with respect to time and then taking an inner product
with hψ~n |:

(H − E~n )ψ~n = 0 =⇒ hψ~n |(H − E~n )ψ~n′ i + hψ~n |(H ′ − E~n′ )ψ~n i = 0

In the first term, H − E~n can be taken to the other side of the inner product
and the term then reduces to E~n − E~n times the desired inner product; also, in

606
the second term, E~n′ can be dropped because of orthonormality. That gives
hψ~n |H ′ ψ~n i
hψ~n |ψ~n′ i =
E~n − E~n
The time derivative of the Hamiltonian does not have rapid variation in time
if the system is changed slowly, even if there is degeneracy. Plugging the inner
product into (A.36), that becomes
X hψ~n |H ′ ψ~n i
c~n′ = −c~n hψ~n |ψ~n′ i + c~n ei(θ~n −θ~n )
~
n6=~
n
E~n − E~n

Note the need for the energy level E~n to be nondegenerate; otherwise you would
be dividing by zero.
Some sources now claim the final sum can be ignored because H ′ is small in
an adiabatic process. Unfortunately, while H ′ is indeed small, that is compen-
sated for by the long evolution time. The correct reason is still the oscillating
exponential factor. Maybe you are willing to believe that the bounded coeffi-
cient c~n and order 1/T inner product hψ~n |H ′ ψ~n i do still essentially vary on the
slow time scale T , so that the cancellation over periods remains valid. If not,
there is a solid discussion in the original derivation by Born & Fock (Zeitschrift
für Physik, Vol. 51, p. 165, 1928). More recent derivations allow the spectrum
to be continuous, in which case the “energy-gap” E~n − E~n can no longer be
assumed to be larger than some nonzero amount. This note will assume it has
already given much more detail than any engineer would care for.

A.36.2 Some implications


The derivation of the previous subsection gives the wave function of an adiabatic
system as
Z
X
iγ~n iθ~n 1Z
Ψ= c~n,0 e e ψ~n γ~n = i hψ~n |ψ~n′ i dt θ~n = − E~n dt (A.37)
~
n

where the c~n,0 are constants. The angle θ~n is called the “dynamic phase” while
the angle γ~n is called the “geometric phase.”
The geometric phase is zero as long as the Hamiltonian is real. The reason
is that real Hamiltonians have real eigenfunctions; then γ~n can only be real, as
it must be, if it is zero.
If the geometric phase is nonzero, you may be able to play games with it.
Suppose first that Hamiltonian changes with time because some parameter λ
that it depends on changes. Then the geometric phase can be written as
Z Z
∂ψ~n
γ~n = i hψ~n | i dλ ≡ f (λ) dλ
∂λ
607
It follows that if you bring the system back to the state it started out at, the
total geometric phase is zero, because the limits of integration will be equal.
But now suppose that not one, but a set of parameters ~λ = (λ1 , λ2 , . . .)
changes during the evolution. Then the geometric phase is
Z Z
γ~n = i hψ~n |∇~λ ψ~n i · d~λ ≡ f1 (λ1 , λ2 , . . .) dλ1 + f2 (λ1 , λ2 , . . .) dλ2 + . . .

and that is not necessarily zero when the system returns to the same state it
started out with. In particular, for two or three parameters, you can immedi-
ately see from the Stokes’ theorem that the integral along a closed path will not
normally be zero unless ∇~λ × f~ = 0. The geometric phase that an adiabatic
system picks up during such a closed path is called “Berry’s phase.”
You might assume that it is irrelevant since the phase of the wave function
is not observable anyway. But if a beam of particles is send along two different
paths, the phase difference between the paths will produce interference effects
when the beams merge again.
Systems that do not return to the same state when they are taken around a
closed loop are not just restricted to quantum mechanics. A classical example
is the Foucault pendulum, whose plane of oscillation picks up a daily angular
deviation when the motion of the earth carries it around a circle. Such systems
are called “nonholonomic” or “anholonomic.”

A.37 The two-state approximation of radiation


An atom is really an infinite state system, not a two state system, and the
wave function Ψ is a combination of all infinitely many eigenfunctions. But if
it is assumed that the perturbation is small, and that only the coefficients a
and b of ψL and ψH have non-negligible initial values, then you can ignore the
effects of the other infinitely many coefficients as quadratically small: the small
perturbation level insures that the other coefficients remain correspondingly
small, and in addition their effect on a and b is much smaller still since the
states hardly affect each other when the perturbation is small. (When the
perturbation level is zero, they are energy eigenstates that evolve completely
independently.)
While the other coefficients do not have a big effect on a and b if they
are small, still if you start from the ground state |a| = 1, then b will remain
small and the other coefficients will probably be comparably small. Also, there
is the likelihood that more than two coefficients have a significant magnitude.
Typically, to find out what really happens to a complete system, you need to
separately evaluate all possible transitions as two state systems, and then sum
all the effects you get together.

608
A.38 Selection rules
This note derives the selection rules for electric dipole transitions between two
hydrogen states ψL = ψnL lL mL l and ψH = ψnH lH mH l. Some selection rules for
forbidden transitions are also derived.
Allowed electric dipole transitions must respond to at least one component
of a constant ambient electric field. That means that they must have a nonzero
value for at least one electrical dipole moment,
hψL |ri |ψH i =
6 0
where ri can be one of r1 = x, r2 = y, or r3 = z for the three different
components of the electric field.
The trick in identifying when these inner products are zero is based on taking
inner products with cleverly chosen commutators. Since the hydrogen states are
eigenfunctions of Lb , the following commutator is useful
z

b ]|ψ i = hψ |r L
hψL |[ri , L b b
z H L i z − Lz ri |ψH i

b term, the operator L


For the ri L b acts on ψ and produces a factor m h̄, while
z z H H
b r term, L
for the L b can be taken to the other side of the inner product and
z i z
then acts on ψL , producing a factor mL h̄. So:
b ]|ψ i = (m − m )h̄hψ |r |ψ i
hψL |[ri , L (A.38)
z H H L L i H

The final inner product is the dipole moment of interest. Therefor, if a suitable
expression for the commutator in the left hand side can be found, it will fix the
dipole moment.
b ] is zero. That means according
In particular, according to chapter 3.4.4 [z, L z
to equation (A.38) above that the dipole moment hψL |z|ψH i in the right hand
side will have to be zero too, unless mH = mL . So the first conclusion is that
the z component of the electric field does not do anything unless the magnetic
quantum numbers are equal. One down, two to go.
For the x and y components, from chapter 3.4.4
b ] = −ih̄y
[x, L b ] = ih̄x
[y, L
z z

Plugging that into (A.38) produces


−ih̄hψL |y|ψH i = (mH −mL )h̄hψL |x|ψH i ih̄hψL |x|ψH i = (mH −mL )h̄hψL |y|ψH i
From these equations it is seen that the y dipole moment is zero if the x one is,
and vice-versa. Further, plugging the y dipole moment from the first equation
into the second produces
(mH − mL )2 h̄2
ih̄hψL |x|ψH i = hψL |x|ψH i
−ih̄

609
and if the x dipole moment is nonzero, that requires that (mH − mL )2 is one,
so mH = mL ± 1. It follows that dipole transitions can only occur if mH = mL ,
through the z component of the electric field, or if mH = mL ± 1, through the
x and y components.
To derive selection rules involving the azimuthal quantum numbers lH and lL ,
the obvious approach would be to use the commutator [ri , Lb 2 ] since the quantum
number l is produced by L b 2 . However, according to chapter 3.4.4, (3.50), this

commutator will bring in the ~rb × L ~b operator, which cannot be handled. The
commutator that works is the second of (3.55):
b 2 ], L
[[ri , L b 2 ] = 2h̄2 (r Lb2 + L
b 2r )
i i

where by the definition of the commutator


b 2 ], L
[[ri , L b 2 ] = (r Lb2 − L
b 2 r )L
b2 − L
b 2 (r Lb2 − L
b 2r ) = r Lb 2L
b 2 − 2L
b 2r Lb2 + L
b 2L
b 2r
i i i i i i i

b 2 ], L
Evaluating hψL |[[ri , L b 2 ]|ψ i according to each expression and equating the
H
two gives

2h̄2 [lH (lH + 1) + lL (lL + 1)]hψL |ri |ψH i = h̄2 [lH (lH + 1) − lL (lL + 1)]2 hψL |ri |ψH i

For hψL |ri |ψH i to be nonzero, the numerical factors in the left and right hand
sides must be equal,

2[lH (lH + 1) + lL (lL + 1)] = [lH (lH + 1) − lL (lL + 1)]2

The right hand side is obviously zero for lH = lL , so lH − lL can be factored out
of it as
[lH (lH + 1) − lL (lL + 1)]2 = (lH − lL )2 (lH + lL + 1)2
and the left hand side can be written in terms of these same factors as

2[lH (lH + 1) + lL (lL + 1)] = (lH − lL )2 + (lH + lL + 1)2 − 1

Combining the two results and simplifying gives

[(lH − lL )2 − 1][(lH + lL + 1)2 − 1] = 0

The second factor is only zero if lH = lL = 0, but then hψL |ri |ψH i is still zero
because both states are spherically symmetric. It follows that the first factor
will have to be zero for dipole transitions to be possible, and that means that
lH = lL ± 1.
The spin is not affected by the perturbation Hamiltonian, so the dipole
moment inner products are still zero unless the spin magnetic quantum numbers
ms are the same, both spin-up or both spin-down. Indeed, if the electron spin

610
is not affected by the electric field to the approximations made, then obviously
it cannot change.
Now consider the effect of the magnetic field on transitions. Like the electric
field, the magnetic field can be approximated as spatially constant and quasi-
steady. The perturbation Hamiltonian of a constant magnetic field is according
to chapter 9.6 µ ¶
e ~ b
~ ~b
H1 = B · L + 2S
2me
Note that now electron spin must be included in the discussion.
According to this perturbation Hamiltonian, the perturbation coefficient
HHL for the z-component of the magnetic field is proportional to
b + 2S
hψL |L b |ψ i
z z H

and that is zero because ψH l is an eigenfunction of both operators and or-


thogonal to ψL l. So the z component of the magnetic field does not produce
transitions to different states.
However, the x-component (and similarly the y-component) produces a per-
turbation coefficient proportional to
b |ψ i + 2hψ |S
hψL |L b
x H L x |ψH i

b on a state with magnetic quantum


According to chapter 9.1.8, the effect of L x
number mH is to turn it into a linear combination of two similar states with
magnetic quantum numbers mH + 1 and mH − 1. Therefor, for the first inner
product above to be nonzero, mL will have to be either mH + 1 or mH − 1. Also
the orbital azimuthal momentum numbers l will need to be the same, and so will
the spin magnetic quantum numbers ms . For the second inner product, it are
the spin magnetic quantum numbers that have to be different by one unit, while
the orbital magnetic quantum numbers must now be equal. So, all together

lH = lL mH = mL or mL ± 1 ms,H = ms,L or ms,L ± 1 (A.39)

and either the orbital or the spin magnetic quantum numbers must be unequal.
The logical way to proceed to electric quadrupole transitions would be to
expand the electric field in a Taylor series in terms of y:
³ ´
~ = k̂E0 cos ω(t − y/c) − φ ≈ k̂E0 cos(ωt − φ) + k̂ ω E0 sin(ωt − φ)y
E
c
The first term is the constant electric field of the electric dipole approximation,
and the second would then give the electric quadrupole approximation. How-
ever, an electric field in which Ez is a multiple of y is not conservative, so the
electrostatic potential does no longer exist.

611
It is necessary to retreat to the so-called vector potential A. ~ It is then
simplest to chose this potential to get rid of the electrostatic potential altogether.
In that case the typical electromagnetic wave is described by the vector potential
³ ´ ~
~ = −k̂ 1 E0 sin ω(t − y/c) − φ
A ~ = − ∂A
E ~ =∇×A
B ~
ω ∂t
In terms of the vector potential, the perturbation Hamiltonian is, chapter
9.3 and assuming a weak field,
e ~ b b~ e ~b ~
H1 = (A · ~p + ~pA) + S·B
2me me
~ this expression produces an Hamiltonian
Ignoring the spatial variation of A,
perturbation coefficient
e
HHL = − E0 sin(ωt − φ)hψL |pz |ψH i
me ω
That should be same as for the electric dipole approximation, since the field
is now completely described by A, ~ but it is not quite. The earlier derivation
assumed that the electric field is quasi-steady. However, pbz is equal to the com-
mutator ime [H0 , z]/h̄ where H0 is the unperturbed hydrogen atom Hamiltonian.
If that is plugged in and expanded, it is found that the expressions are equiva-
lent, provided that the perturbation frequency is close to the frequency of the
photon released in the transition, and that that frequency is sufficiently rapid
that the phase shift from sine to cosine can be ignored. Those are in fact the
normal conditions.
Now consider the second term in the Taylor series of A ~ with respect to y. It
produces a perturbation Hamiltonian
e 1
E0 cos(ωt − φ)y pbz
me c
The factor y pbz can be trivially rewritten to give
e 1 e 1
E0 cos(ωt − φ)(y pbz − z pby ) + E0 cos(ωt − φ)(y pbz + z pby )
2me c 2me c
The first term has already been accounted for in the magnetic dipole transitions
discussed above, because the factor within parentheses is Lb . The second term
x
is the electric quadrupole Hamiltonian for the considered wave. As second terms
in the Taylor series, both Hamiltonians will be much smaller than the electric
dipole one as long as the atom is small compared to the wave length c/ω of
the wave. Therefor they will not make a difference unless the electric dipole
transition is forbidden.

612
The selection rules for the electric quadrupole Hamiltonian can be narrowed
down with a bit of simple reasoning. First, since the hydrogen eigenfunctions
are complete, applying any operator on an eigenfunction will always produce
a linear combination of eigenfunctions. Now reconsider the derivation of the
electric dipole selection rules above from that point of view. It is then seen
that z only produces eigenfunctions with the same values of m and the values
of l exactly one unit different. The operators x and y change both m and l by
exactly one unit. And the components of linear momentum do the same as the
corresponding components of position, since pbi = ime [H0 , ri ]/h̄ and H0 does not
change the eigenfunctions, just their coefficients. Therefor y pbz + z pby produces
only eigenfunctions with azimuthal quantum number l either equal to lH or to
lH ± 2, depending on whether the two unit changes reinforce or cancel each
other. Furthermore, it produces only eigenfunctions with m equal to mH ± 1.
However, xpby + y pbx , corresponding to a wave along another axis, will produce
values of m equal to mH or to mH ± 2. Therefor the selection rules become:

lH = lL or lL ± 2 mH = mL or mL ± 1 or mL ± 2 ms,H = ms,L (A.40)

These arguments apply equally well to the magnetic dipole transition, but there
the possibilities are narrowed down much further because the angular momen-
tum operators only produce a couple of eigenfunctions. It may be noted that in
addition, electric quadrupole transitions from lH = 0 to lL = 0 are not possible
because of spherical symmetry.

A.39 About spectral broadening


The fact that there is a frequency range that can be absorbed may seem to vi-
olate the postulate of quantum mechanics that only the eigenvalues are observ-
able. But actually an atom perturbed by an electromagnetic field is a slightly
different system than an unperturbed atom, and will have slightly different en-
ergy eigenvalues. Indeed, the frequency range ω1 is proportional to the strength
of the perturbation, and in the limit of the perturbation strength becoming zero,
only the exact unperturbed frequency will be absorbed.
For some reason, this spectral line broadening due to the strength of the
transmitted light is not mentioned in the references the author has seen. Pre-
sumably it is included in what is called Stark broadening.
The “natural broadening” due to the always present ground state electro-
magnetic field perturbation is mentioned, but usually ascribed to the energy-
time uncertainty ∆E∆t ≥ 21 h̄ where ∆E is the uncertainty in energy and ∆t
some sort of uncertainty in time that in this case is claimed to be the typical
life time of the excited state. And of course, a ≥ sign is readily changed into an
≈ sign; they are both mathematical symbols, not?

613
Anyway, considered as a dimensional argument rather than a law of physics,
it does seem to work; if there was no ground state electromagnetic field per-
turbing the atom, then Schrödinger’s equation would have the excited state
surviving forever; ∆t would then be infinite, and the energy values would be
the exact unperturbed ones. And transitions like the 21 cm line of astronomy
that has a life time of 10 million years do indeed have a very small natural
width.
Of course, broadening affects both the absorption spectra (frequencies re-
moved from light that passes through the gas on its way towards us) and the
emission spectra (spontaneously emitted radiation, like the “scattered” radia-
tion re-emitted from absorbed light that passes through the gas not originally
headed in our direction.)
An important other effect that causes spectral line deviations is atom motion,
either thermal motion or global gas motion; it produces a Doppler shift in the
radiation. This is not necessarily bad news; line broadening can provide an hint
about the temperature of the gas you are looking at, while line displacement
can provide a hint of its motion away from you. Line deviations can also be
caused by surrounding atoms and other perturbations.

A.40 Derivation of the Einstein B coefficients


The purpose of this note is to derive the Einstein B coefficients that determine
the transition probability between the energy states of atoms. It is assumed the
atoms are subject to incoherent radiation and frequent elastic collisions with
other atoms. It is also again assumed that there are just two atom energy
eigenfunctions involved, a lower energy one ψL and an higher energy one ψH .
It is assumed that the elastic collisions do not change the average energy
picture; that they do not affect the average probabilities |a|2 and |b|2 of the
eigenfunctions ψL and ψH . However, they are assumed to leave the wave func-
tion of an atom immediately after a collision in some state a0 ψL +b0 ψH in which
a0 and b0 are quite random, especially with respect to their phase. What is now
to be determined in this note is how, until the next collision, the wave function
of the atom will develop under the influence of the electromagnetic field and
how that changes the average probabilities |a|2 and |b|2 .
As noted in subsection 5.2.1, the Schrödinger equation simplifies if you switch
to new variables ā and b̄. These new variables have the same square magnitudes
and initial conditions as a and b. Further, because the Schrödinger equation
(5.17) is linear, the solution for the coefficients ā and b̄ can be written as a sum
of two contributions, one proportional to the initial value a0 and the other to
b0 :
ā = a0 āL + b0 āH b̄ = a0 b̄L + b0 b̄H

614
Here (āL , b̄L ) is the solution that starts out from the lower energy state (āL , b̄L ) =
(1, 0) while (āH , b̄H ) is the solution that starts out from the higher energy state
(āH , b̄H ) = (0, 1).
Now consider what happens to the probability of an atom to be in the excited
state in the time interval between collisions:

|b̄|2 − |b0 |2 = (b0 + a0 ∆b̄L + b0 ∆b̄H )∗ (b0 + a0 ∆b̄L + b0 ∆b̄H ) − b∗0 b0

Here ∆b̄L indicates the change in b̄L in the time interval between collisions; in
particular ∆b̄L = b̄L since this solution starts from the ground state with bL = 0.
Similarly, the change ∆b̄H equals b̄H − 1 since this solution starts out from the
excited state with bH = 1. Like in section 5.2.8, it will again be assumed that
the changes ∆b̄L and ∆b̄H are small; in view of the Schrödinger equation (5.17),
that is true as long as the typical value of the Hamiltonian coefficient H LH times
the time interval t between the collisions is small. Note also that ∆b̄H will be
quadratically small, since the corresponding solution starts out from aH = 0, so
aH is an additional small factor in the Schrödinger equation (5.17) for bH .
Therefor, if the change in probability |b̄|2 above is multiplied out, ignoring
terms that are cubically small or less, the result is, (remember that for a complex
number c, c + c∗ is twice its real part):
³ ´ ³ ´
|b̄|2 − |b0 |2 = 2ℜ b∗0 a0 ∆b̄L + |a0 |2 |∆b̄L |2 + |b0 |2 2ℜ ∆b̄H

Now if this is averaged over all atoms and time intervals between collisions, the
first term in the right hand side will average away. The reason is that it has a
random phase angle, for one since those of a0 and b0 are assumed to be random
after a collision. For a number with a random phase angle, the real part is
just as likely to be positive as negative, so it averages away. Also, for the final
term, 2ℜ(∆b̄H ) is the approximate change in |b̄H |2 in the time interval, and that
equals −|∆āH |2 because of the normalization condition |āH |2 + |b̄H |2 = 1. So
the relevant expression for the change in probability becomes

|b̄|2 − |b0 |2 = |a0 |2 |∆b̄L |2 − |b0 |2 |∆āH |2

Summing the changes in the probabilities therefor means summing the changes
in the square magnitudes of ∆b̄L and ∆āH . According to the above, the Einstein
coefficient BL→H is the average change |∆b̄L |2 per unit time.
Now for a single electromagnetic wave, subsection 5.2.8 found that the
change ∆b̄L was given by

e−i(ω−ω0 )t − 1 E0 hψL |ez|ψH i


∆b̄L = b̄L = ω1 eiφ ω1 = (A.41)
2(ω − ω0 ) h̄

615
and ∆āH is given by a virtually identical expression. However, since it is as-
sumed that the atoms are subject to incoherent radiation of all wave numbers ~k
and polarizations p, here ∆b̄L will consist of the sum of all their contributions:
X
∆b̄L = ∆b̄~Lk,p
~k,p

(This really assumes that the particles are in a very large periodic box so that
the electromagnetic field is given by a Fourier series; in free space you would
need to integrate over the wave numbers instead of sum over them.) The square
magnitude is then
XX X
|∆b̄L |2 = ∆b̄~L∗
k,p
∆b̄~Lk,p = |∆b̄~Lk,p |2
~k,p ~k,p ~k,p

where the final equality comes from the assumption that the radiation is incoher-
ent, so that the phases of different waves are uncorrelated and the corresponding
products average to zero.
The bottom line is that square magnitudes must be summed together to find
the total contribution of all waves. And the square magnitude of the contribu-
tion of a single wave is, according to (A.41) above,
 ³ ´ 2
1
sin 2
(ω − ω0 )t hψL |ez|ψH i
|∆b̄~Lk,p |2 = 41 |ω1 |2 t2  1
 ω1 ≡ E0
2
(ω − ω0 )t h̄

Now broadband radiation is described in terms of an electromagnetic energy


density ρ(ω); in particular ρ(ω) dω gives the energy per unit volume due to the
electromagnetic waves in an infinitesimal frequency range dω around a frequency
ω. For a single wave, this energy equals 21 ǫ0 E02 , [20, p. 129]. And the square
amplitudes of different waves simply add up to the total energy; that is the
so-called Parseval equality of Fourier analysis. So to sum the expression above
over all the frequencies ω of the broadband radiation, make the substitution
E02 = 2ρ(ω) dω/ǫ0 and integrate:
 ³ ´ 2
2 Z sin 1
(ω − ω0 )t
|hψL |ez|ψH i| 2 ∞
2
|∆b̄L |2 = t ρ(ω)   dω
2h̄2 ǫ0 ω=0
1
2
(ω − ω0 )t

1
If a change of integration variable is made to u = 2
(ω − ω0 )t, the integral
becomes
µ ¶2
L 2|hψL |ez|ψH i|2 Z ∞ sin u
|∆b̄ | = 2 t ρ(ω0 + 2(u/t)) du
h̄ ǫ0 1
u=− 2 ω0 t u

616
Recall that a starting assumption underlying these derivations was that ω0 t was
large. So the lower limit of integration can be approximated as −∞. Also,
in the argument of the energy density ρ, the term 2u/t represents a negligible
change in ω0 and can be ignored. Then ρ(ω0 ) is a constant in the integration
and can be taken out. The remaining integral is in table books, [15, 18.36], and
the result is
π|hψL |ez|ψH i|2
|∆b̄L |2 = ρ(ω0 )t
h̄2 ǫ0
This must still be averaged over all directions of wave propagation and po-
larization. That gives:

π|hψL |e~r|ψH i|2


|∆b̄L |2 = ρ(ω0 )t
3h̄2 ǫ0

where

|hψL |e~r|ψH i|2 = |hψL |ex|ψH i|2 + |hψL |ey|ψH i|2 + |hψL |ez|ψH i|2 .

To see why, consider the electromagnetic waves propagating along any axis,
not just the y axis, and polarized in either of the other two axial directions.
These waves will include ex and ey as well as ez in the transition probability,
making the average as shown above. And of course, waves propagating in an
oblique rather than axial direction are simply axial waves when seen in a rotated
coordinate system and produce the same average.
The Einstein coefficient BL→H is the average change per unit time, so the
claimed (5.36) results from dividing by the time t between collisions. There is no
need to do BH→L separately from ∆āL ; it follows immediately from subsection
5.2.2 that it is the same.

A.41 Parseval and the Fourier inversion theo-


rem
This note discusses the Parseval’s relation and its relation to the Fourier inver-
sion theorem. The analysis is in one dimension but extension to three dimensions
is straightforward.
The Fourier inversion theorem is most simply derived from the formulae
for Fourier series as discussed in note {A.5}. Rescale the Fourier series to an
arbitrary period, and then take the limit of the period going to infinity to get
the Fourier integral expressions. The Fourier series itself becomes an integral in
the limit.

617
The same way, you can verify that the inner product hΨ1 |Ψ2 i of any two
wave functions is the same as the inner product hΦ1 |Φ2 i of the correspond-
ing momentum space wave functions. This very important result is known as
“Parseval’s relation.”
In particular the norm of any wave function is the same as that of the
corresponding momentum space wave function. That is especially relevant for
quantum mechanics were both norms must be one: the particle must be found
somewhere, and it must be found with some linear momentum.
To be sure, the mentioned derivations of these properties based on converting
Fourier series into integrals only work for well behaved functions. But to show
that it also works for nasty wave functions, you can set up a limiting process
in which you approximate the nasty functions increasingly accurately using
well behaved ones. And since the differences between functions are the same
for the corresponding momentum space wave functions because of Parseval,
the momentum space wave functions converge to the momentum space wave
function of the nasty wave function.
To show that rigorously is a messy exercise, to be sure, requiring the abstract
Lebesgue variant of the theory of integration. The resulting “official version”
of the inversion theorem is called “Plancherel’s theorem.” It applies as long as
the nasty functions are square integrable (in the Lebesgue sense).
That is a very suitable version of the inversion theorem for quantum mechan-
ics where the functions are normally square integrable because of the normal-
ization requirement. But it is hardly the last word, as you may have been told
elsewhere. A lot of functions that are not square integrable have meaningful,
invertible Fourier transforms. For example, functions whose square magnitude
integrals are infinite, but absolute value integrals are finite can still be mean-
ingfully transformed. That is more or less the classical version of the inversion
theorem, in fact. (See D.C. Champeney, A Handbook of Fourier Theorems, for
more.)

A.42 Derivation of group velocity


The objective of this note is to derive the wave function for a wave packet if
time is large.
To shorten the writing, the Fourier integral (5.54) for Ψ will be abbreviated
as:
Z k2
x x
Ψ= f (k)eiϕt dk ϕ = k − ω ϕ′ = − vg ϕ′′ = −vg′
k1 t t
where it will be assumed that ϕ is a well behaved functions of k and f at least
twice continuously differentiable. Note that the wave number k0 at which the

618
group velocity equals x/t is a stationary point for ϕ. That is the key to the
mathematical analysis.
The so-called “method of stationary phase” says that the integral is neg-
ligibly small as long as there are no stationary points ϕ′ = 0 in the range of
integration. Physically that means that the wave function is zero at large time
positions that cannot be reached with any group velocity within the range of
the packet. It therefor implies that the wave packet propagates with the group
velocity, within the variation that it has.
To see why the integral is negligible if there are no stationary points, just
integrate by parts:
¯ Ã !′
f (k) iϕt ¯¯k2 Z k2 f (k)
Ψ= ′ e ¯ − eiϕt dk
iϕ t k1 k1 iϕ′ t

This is small of order 1/t for large times. And if Φ0 (p) is chosen to smoothly
become zero at the edges of the wave packet, rather than abruptly, you can
keep integrating by parts to show that the wave function is much smaller still.
That is important if you have to plot a wave packet for some book on quantum
mechanics and want to have its surroundings free of visible perturbations.
For large time positions with x/t values within the range of packet group
velocities, there will be a stationary point to ϕ. The wave number at the station-
ary point will be indicated by k0 , and the value of ϕ and its second derivative by
ϕ0 and ϕ′′0 . (Note that the second derivative is minus the first derivative of the
group velocity, and will be assumed to be nonzero in the analysis. If it would
be zero, nontrivial modifications would be needed.)
Now split the exponential in the integral into two,
Z k2
iϕ0 t
Ψ=e f (k)ei(ϕ−ϕ0 )t dk
k1

It is convenient to write the difference in ϕ in terms of a new variable k:


2
ϕ − ϕ0 = 12 ϕ′′0 k k ∼ k − k0 for k → k0

By Taylor series expansion it can be seen that k is a well behaved monotonous


function of k. The integral becomes in terms k:
Z
iϕ0 t
k2 1 ′′ 2 dk
Ψ=e g(k)ei 2 ϕ0 k t dk g(k) = f (k)
k1 dk
Now split function g apart as in

g(k) = g(0) + [g(k) − g(0)]

619
The part within brackets produces an integral
Z k2 g(k) − g(0) 1 ′′ 2
eiϕ0 t iϕ′′0 ktei 2 ϕ0 k t dk
k1 iϕ′′0 kt

and integration by parts shows that to be small of order 1/t.


That leaves the first part, g(0) = f (k0 ), which produces
Z k2 1 ′′ 2
Ψ = eiϕ0 t f (k0 ) ei 2 ϕ0 k t dk
k1

Change to a new integration variable


s
|ϕ′′0 |t
u≡ k
2
Note that since time is large, the limits of integration will be approximately
u1 = −∞ and u2 = ∞ unless the stationary point is right at an edge of the
wave packet. The integral becomes
s
iϕ0 t 2 Z u2 ±iu2
Ψ=e f (k0 ) e du
|ϕ′′0 |t u1

where ± is the sign of ϕ′′0 . The remaining integral is a “Fresnel integral” that
can be looked up in a table book. Away from the edges of the wave packet, the
integration range can be taken as all u, and then
s
iϕ0 t ±iπ/4 2π
Ψ=e e f (k0 )
|ϕ′′0 |t

Convert back to the original variables and there you have the claimed expression
for the large time wave function.
Right at the edges of the wave packet, modified integration limits for u must
be used, and the result above is not valid. In√particular it can be seen that the
wave packet spreads out a distance √ of order t beyond the stated wave packet
range; however, for large times t is small compared to the size of the wave
packet, which is proportional to t.
For the mathematically picky: the treatment above assumes that the wave
packet momentum range is not small in an asymptotic sense, (i.e. it does not
go to zero when t becomes infinite.) It is just small in the sense that the group
velocity must be monotonous. However, Kaplun’s extension theorem implies
that the packet size can be allowed to become zero at least slowly. And the
analysis is readily adjusted for faster convergence towards zero in any case.

620
A.43 Details of the animations
This note explains how the wave packet animations of sections 5.5 and 5.7
were obtained. If you want a better understanding of unsteady solutions of the
Schrödinger equation and their boundary conditions, this is a good place to
start. In fact, deriving such solutions is a popular item in quantum mechanics
books for physicists.
First consider the wave packet of the particle in free space, as shown in
subsection 5.5.1. An energy eigenfunction with energy E in free space takes the
general form √
ψE = Cf eipx/h̄ + Cb e−ipx/h̄ p = 2mE
where p is the momentum that a classical particle with that energy would have
and Cf and Cb are constants.
To study a single wave packet coming in from the far left, the coefficient Cb
has to be set to zero. The reason was worked out in section 5.4: combinations of
exponentials of the form Cb e−ipx/h̄ produce wave packets that propagate back-
wards in x, from right to left. Therefor, a nonzero value for Cb would add an
unwanted second wave packet coming in from the far right.

eipx/h̄

Figure A.4: Example energy eigenfunction for the particle in free space.

With only the coefficient Cf of the forward moving part left, you may as well
scale the eigenfunction so that Cf = 1, simplifying it to
ψE = eipx/h̄
A typical example is shown in figure A.4. Plus and minus the magnitude of the
eigenfunction are shown in black, and the real part is shown in red. This wave
function is an eigenfunction of linear momentum, with p the linear momentum.
To produce a coherent wave packet, eigenfunctions with somewhat different
energies
√ E have to be combined together. Since the momentum is given by
p = 2mE, different energy means different momentum p; therefor the wave
packet can be written as
Z
Ψ(x, t) = c(p)e−iEt/h̄ ψE (x) dp (A.42)
all p

where c(p) is some function that is only nonzero in a relatively narrow range of
momenta p around the nominal momentum. Except for that basic requirement,

621
the choice of the function c(p) is quite arbitrary. Choose some suitable function
c(p), then use a computer to numerically integrate the above integral at a large
number of plot points and times. Dump the results into your favorite animation
software and bingo, out comes the movie.

Cfl eipl x/h̄ + Cbl e−ipl x/h̄ A C r [Bi(x) + iAi(x)]

Figure A.5: Example energy eigenfunction for a particle entering a constant


accelerating force field.

Next consider the animation of subsection 5.5.2, where the particle acceler-
ates along a downward potential energy ramp starting from point A. A typical
energy eigenfunction is shown in figure A.5. Since to the left of point A, the
potential energy is still zero, in that region the energy eigenfunction is still of
the form

ψE = Cfl eipl x/h̄ + Cbl e−ipl x/h̄ for x < xA pl = 2mE

where pl is the momentum that a classical particle would have in the left region.
In this case, it can no longer be argued that the coefficient Cbl must be zero
to avoid a packet entering from the far right. After all, the Cbl e−ipl x/h̄ term does
not extend to the far right anymore. To the right of point A, the potential
changes linearly with position, and the exponentials are no longer valid.
In fact, it is known that the solution of the Hamiltonian eigenvalue problem
in a region with a linearly varying potential is a combination of two weird
functions Ai and Bi that are called the “Airy” functions. The bad news is
that if you are interested in learning more about their properties, you will need
an advanced mathematical handbook like [1] or at least look at note {A.45}.
The good news is that free software to evaluate these functions and their first
derivatives is readily available on the web. The general solution for a linearly
varying potential is of the form
s
3 2mV ′ V − E dV
ψE = CB Bi(x) + CA Ai(x) x= V′ ≡
h̄2 V′ dx

Note that (V − E)/V ′ is the x-position measured from the point where V = E.
It may be deduced from the approximate analysis of section 5.6 that to
prevent a second wave packet coming in from the far right, Ai and Bi must
appear together in the combination Bi + iAi as shown in figure A.5. The fact

622
that no second packet comes in from the far right in the animation can be taken
as an experimental confirmation of that result, so there seems little justification
to go over the messy argument.
To complete the determination of the eigenfunction for a given value of E,
the constants Cfl , Cbl and C r must still be determined. That goes as follows. For
now, assume that C r has the provisional value cr = 1. Then provisional values
cfl and clb for the other two constants may be found from the requirements that
the left and right regions give the same values for ψE and dψE /dx at the point
A in figure A.5 where they meet:

cfl eipl xA /h̄ + clb e−ipl xA /h̄ = cr [Bi(xA ) + iAi(xA )]


ipl ipl xA /h̄ ipl dx
cfl e − clb e−ipl xA /h̄ = cr [Bi′ (xA ) + iAi′ (xA )]
h̄ h̄ dx
That is equivalent to two equations for the two constants cfl and clb , since every-
thing else can be evaluated, using the mentioned software. So cfl and clb can be
found from solving these two equations.
As the final step, it is desirable to normalize the eigenfunction ψE so that
Cfl = 1. To do so, the entire provisional eigenfunction can be divided by cfl ,
giving Cbl = clb /cfl and C r = cr /cfl . The energy eigenfunction has now been
found. And since Cfl = 1, the eipl x/h̄ term is exactly the same as the free space
energy eigenfunction of the first example. That means that if the eigenfunctions
ψE are combined into a wave packet in the same way as in the free space case,
(A.42) with p replaced by pl , the eipl x/h̄ terms produce the exact same wave
packet coming in from the far left as in the free space case.
For larger times, the Cbl e−ipl x/h̄ terms produce a “reflected” wave packet that
returns toward the far left. Note that e−ipl x/h̄ is the complex conjugate of eipl x/h̄ ,
and it can be seen from the unsteady Schrödinger equation that if the complex
conjugate of a wave function is taken, it produces a reversal of time. Wave
packets coming in from the far left at large negative times become wave packets
leaving toward the far left at large positive times. However, the constant Cbl
turns out to be very small in this case, so there is little reflection.

Cfl eipl x/h̄ + Cbl e−ipl x/h̄ A C r Ai(x)

Figure A.6: Example energy eigenfunction for a particle entering a constant


decelerating force field.

623
Next consider the animation of subsection 5.5.3, where the particle is turned
back by an upward potential energy ramp. A typical energy eigenfunction for
this case is shown in figure A.6. Unlike in the previous example, where the
argument x of the Airy functions was negative at the far right, here it is positive.
Table books that cover the Airy functions will tell you that the Airy function
Bi blows up very strongly with increasing positive argument x. Therefor, if
the solution in the right hand region would involve any amount of Bi, it would
locate the particle at infinite x for all times. For a particle not at infinity, the
solution in the right hand region can only involve the Airy function Ai. That
function decays rapidly with positive argument x, as seen in figure A.6.
The further determination of the energy eigenfunctions proceeds along the
same lines as in the previous example: give C r a provisional value cr = 1, then
compute cfl and clb from the requirements that the left and right regions produce
the same values for ψ and dψ/dx at the point A where they meet. Finally divide
the eigenfunction by cfl . The big difference is that now Cbl is no longer small;
Cbl turns out to be of unit magnitude just like Cfl . It means that the incoming
wave packet is reflected back completely.

h50 (x)

Figure A.7: Example energy eigenfunction for the harmonic oscillator.

For the harmonic oscillator of subsection 5.5.4, the analysis is somewhat


different. In particular, chapter 2.6.2 showed that the energy levels of the one-
dimensional harmonic oscillator are discrete,
2n + 1
En = h̄ω for n = 0, 1, 2, . . .
2
so that unlike the motions just discussed, the solution of the Schrödinger equa-
tion is a sum, rather than the integral (A.42),

X
Ψ(x, t) = cn e−iEn t/h̄ hn (x)
n=0

However, for large n the difference between summation and integration is small.
Also, while the energy eigenfunctions hn (x) are not exponentials as for the
free particle, for large n they can be pairwise combined to approximate such

624
exponentials. For example, eigenfunction h50 , shown in figure A.7, behaves
near the center point much like a cosine if you scale it properly. Similarly, h51
behaves much like a sine. A cosine plus i times a sine gives an exponential,
according to the Euler formula (1.5). Create similar exponential combinations
of eigenfunctions with even and odd values of n for a range of n values, and
there are the approximate exponentials that allow you to create a wave packet
that is at the center point at time t = 0. In the animation, the range of n
values was centered around n = 50, making the nominal energy hundred times
the ground state energy. The exponentials degenerate over time, since their
component eigenfunctions have slightly different energy, hence time evolution.
That explains why after some time, the wave packet can return to the center
point going the other way.

A B
Cfl eipl x/h̄ + Cbl e−ipl x/h̄ CBm Bi(x)+ CAm Ai(x) C r eipr x/h̄

Figure A.8: Example energy eigenfunction for a particle encountering a brief


accelerating force.

For the particle of section 5.7.1 that encounters a brief accelerating force,
an example eigenfunction looks like figure A.8. In this case, the solution in
the far right region is similar to the one in the far left region. However, there
cannot be a term of the form e−ipr x/h̄ in the far right region, because when
the eigenfunctions are combined, it would produce an unwanted wave packet
coming in from the far right. In the middle region of linearly varying potential,
the wave function is again a combination of the two Airy functions. The way
to find the constants now has an additional step. First give the constant C r of
the far right exponential the provisional value cr = 1 and from that, compute
provisional values cm m
A and cB by demanding that the Airy functions give the
same values for ψ and dψ/dx as the far-right exponential at point B, where
they meet. Next compute provisional values cfl and clb by demanding that the
far-left exponentials give the same values for ψ and dψ/dx as the Airy functions
at point A, where they meet. Finally, divide all the constants by cfl to make
Cfl = 1.
For the tunneling particle of section 5.7.2, an example eigenfunction is as
shown in figure A.9. In this case, the solution in the middle part is not a
combination of Airy functions, but of real exponentials. It is essentially the same
solution as in the left and right parts, but in the middle
q region the potential
energy is greater than the total energy, making pm = 2m(E − V ) an imaginary
number. Therefor the arguments of the exponentials become real when written

625
A B
Cfl eipl x/h̄ + Cbl e−ipl x/h̄ Cpm e|pm |x/h̄ + Cnm e−|pm |x/h̄ C r eipl x/h̄

Figure A.9: Example energy eigenfunction for a particle tunneling through a


barrier.

q
in terms of the absolute value of the momentum |pm | = 2m(V − E). The rest
of the analysis is similar to that of the previous example.

Cfl eipl x/h̄ + Cbl e−ipl x/h̄ A C r eipl x/h̄

Figure A.10: Example energy eigenfunction for tunneling through a delta func-
tion barrier.

For the particle tunneling through the delta function potential in section
5.7.2, an example energy eigenfunction is shown in figure A.10. The potential
energy in this case is V = νδ(x − xA ), where δ(x − xA ) is a spike at A that
integrates to one
q and the strength ν is a chosen constant. In the example, ν was
chosen to be 2h̄2 Enom /m with Enom the nominal energy. For that strength,
half the wave packet will pass through.
For a delta function potential, a modification must be made in the analysis
as used so far. Examination of figure A.10 shows that there are kinks in the
energy eigenfunction at the location A of the delta function. Hence, the left
and right expressions for the eigenfunction do not predict the same value for its
derivative dψ/dx at point A. To find the difference, integrate the Hamiltonian
eigenvalue problem from a point a very short distance ε before point A to a
point the same very short distance behind it:

h̄2 Z xA +ε d2 ψ Z xA +ε Z xA +ε
− dx + ν δ(x − x A )ψ dx = Eψ dx
2m x=xA −ε dx2 x=xA −ε x=xA −ε

The integral in the right hand side is zero because of the vanishingly small inter-
val of integration. But the delta function spike in the left hand side integrates
to one regardless of the small integration range, so
¯
h̄2 dψ ¯¯xA +ε
− + νψ(xA ) = 0
2m dx ¯xA −ε

626
For vanishingly small ε, dψ/dx at xA + ε becomes what the right hand part of
the eigenfunction gives for dψ/dx at xA , while dψ/dx at xA − ε stands for what
the left hand part gives for it. As seen from the above equation, the difference
is not zero, but 2mνψ(xA )/h̄2 .
So the correct equations for the provisional constants are in this case

cfl eipl xA /h̄ + clb e−ipl xA /h̄ = cr eipl xA /h̄


ipl l ipl xA /h̄ ipl l −ipl xA /h̄ ipl r ipl xA /h̄ 2mν r ipl xA /h̄
ce − cb e = ce − 2 ce
h̄ f h̄ h̄ h̄
Compared to the analysis as used previously, the difference is the final term in
the second equation that is added by the delta function.
The remainder of this note gives some technical details for if you are actually
planning to do your own animations. It is a good idea to assume that the units
of mass, length, and time are chosen such that h̄ and the nominal energy are one,
while the mass of the particle is one-half. That avoids having to guesstimate
suitable values for all sorts of very small numbers. The Hamiltonian eigenvalue
problem then simplifies to
d2 ψ
− + V ψ = Eψ
dx2
where the values of E of interest cluster around 1. The nominal momentum will
be one too. In those units, the length of the plotted range was one hundred in
all but the harmonic oscillator case.
It should be noted that to select a good function c(p) in (A.42) is somewhat
of an art. The simplest idea would be to choose c(p) equal to one in some limited
range around the nominal momentum, and zero elsewhere, as in

c(p) = 1 if (1 − r)pnom < p < (1 + r)pnom c(p) = 0 otherwise

where r is the relative deviation from the nominal momentum below which
c(p) is nonzero. However, it is know from Fourier analysis that the locations
where c(p) jumps from one to zero lead to lengthy wave packets when viewed
in physical space. {A.42}. Functions c(p) that do lead to nice compact wave
packets are known to be of the form
à !
(p − pnom )2
c(p) = exp −
r2 p2nom

And that is essentially the function c(p) used in this study. The typical width
of the momentum range was chosen to be r = 0.15, or 15%, by trial and error.
However, it is nice if c(p) becomes not just very small, but exactly zero beyond
some point, for one because it cuts down on the number of energy eigenfunctions

627
that have to be evaluated numerically. Also, it is nice not to have to worry about
the possibility of p being negative in writing energy eigenfunctions. Therefor,
the final function used was

à !
(p − pnom )2
c(p) = exp − 2 2 for 0 < p < 2pnom c(p) = 0 otherwise
r [pnom − (p − pnom )2 ]

The actual difference in numerical values is small, but it does make c(p) exactly
zero for negative momenta and those greater than twice the nominal value.
Strictly speaking, c(p) should still be multiplied by a constant to make the total
probability of finding the particle equal to one. But if you do not tell people
what numbers for Ψ are on the vertical axes, you do not need to bother.
In doing the numerical integrations to find Ψ(x, t), note that the mid point
and trapezium rules of numerical integration are exponentially accurate under
the given conditions, so there is probably not much motivation to try more
advanced methods. The mid point rule was used.
The animations in this book used the numerical implementations daie.f,
dbie.f, daide.f, and dbide.f from netlib.org for the Airy functions and
their first derivatives. These offer some basic protection against underflow and
overflow by splitting off an exponential for positive x. It may be a good idea to
check for underflow and overflow in general and use 64 bit precision.
For the harmonic oscillator, the larger the nominal energy is compared to the
ground state energy, the more the wave packet can resemble a single point com-
pared to the limits of motion. However, the computer program used to create
the animation computed the eigenfunctions by evaluating the analytical expres-
sion given in note {A.12}, and explicitly evaluating the Hermite polynomials is
very round-off sensitive. That limited it to a maximum of about hundred times
the ground state energy when allowing for enough uncertainty to localize the
wave packet. Round-off is a general problem for power series, not just for the
Hermite polynomials. If you want to go to higher energies to get a smaller wave
packet, you will want to use a finite difference or finite element method to find
the eigenfunctions.
The plotting software used to produce the animations was a mixture of
different programs. There are no doubt much simpler and better ways of doing
it. In the animations presented here, first plots were created of Ψ versus x for
a large number of closely spaced times covering the duration of the animation.
These plots were converted to gifs using a mixture of personal software, netpbm,
and ghostview. The gifs were then combined into a single movie using gifsicle.

628
A.44 Derivation of the WKB approximation
The purpose in this note is to derive an approximate solution to the Hamiltonian
eigenvalue problem
d2 ψ p2c
= − ψ
dx2 h̄2
q
where the classical momentum pc = 2m(E − V ) is a given function. The
approximation is to be valid when the values of pc /h̄ are large. In quantum
terms, you can think of that as due to an energy that is macroscopically large.
But to do the mathematics, it is easier to take a macroscopic point of view; in
macroscopic terms, pc /h̄ is large because Planck’s constant h̄ is so small.
Since either way pc /h̄ is a large quantity, for the left hand side of the Hamilto-
nian eigenvalue problem above to balance the right hand side, the wave function
must vary rapidly with position. Something that varies rapidly and nontrivially
with position tends to be hard to analyze, so it turns out to be a good idea to
write the wave function as an exponential,

ψ = eiθ̃

and then approximate the argument θ̃ of that exponential.


To do so, first the equation for θ̃ will be needed. Taking derivatives of ψ
using the chain rule gives in terms of θ̃
à !2
dψ dθ̃ d2 ψ dθ̃ d2 θ̃
= eiθ̃ i 2
= −eiθ̃ iθ̃
+e i 2
dx dx dx dx dx

Then plugging ψ and its second derivative above into the Hamiltonian eigenvalue
problem and cleaning up gives:
à !2
dθ̃ p2c d2 θ̃
= + i (A.43)
dx h̄2 dx2

For a given energy, θ̃ will depend on both what x is and what h̄ is. Now,
since h̄ is small, mathematically it simplifies things if you expand θ̃ in a power
series with respect to h̄:
1³ ´
θ̃ = f0 + h̄f1 + 12 h̄2 f2 + . . .

You can think of this as writing h̄θ as a Taylor series in h̄. The coefficients
f0 , f1 , f2 , . . . will depend on x. Since h̄ is small, the contribution of f2 and
further terms to ψ is small and can be ignored; only f0 and f1 will need to be
figured out.

629
Plugging the power series into the equation for θ̃ produces
1 ′2 1 ′ ′ 1 2 1 ′′
2 f0 + 2f0 f1 + . . . = 2 pc + if0 + . . .
h̄ h̄ h̄ h̄
where primes denote x-derivatives and the dots stand for powers of h̄ greater
than h̄−1 that will not be needed. Now for two power series to be equal, the
coefficients of each individual power must be equal. In particular, the coefficients
of 1/h̄2 must be equal, f0′2 = p2c , so there are two possible solutions
f0′ = ±pc
For the coefficients of 1/h̄ to be equal, 2f0′ f1′ = if0′′ , or plugging in the solution
for f0′ ,
p′
f1′ = i c
2pc
It follows that the x-derivative of θ̃ is given by
à !
1 p′

θ̃ = ±pc + h̄i c + . . .
h̄ 2pc

and integrating gives θ̃ as


1Z
θ̃ = ± pc dx + i 21 ln pc + C̃ . . .

where C̃ is an integration constant. Finally, eiθ̃ now gives the two terms in the
WKB solution, one for each possible sign, with eiC̃ equal to the constant Cf or
Cb .

A.45 WKB solution near the turning points


Both the classical and tunneling WKB approximations of section 5.6 fail near
so-called “turning points” where the classical kinetic energy E − V becomes
zero. This note explains how the problem can be fixed.
The trick is to use a different approximation near turning points. In a small
vicinity of a turning point, it can normally be assumed that the x-derivative
V ′ of the potential is about constant, so that the potential varies linearly with
position. Under that condition, the exact solution of the Hamiltonian eigenvalue
problem is known to be a combination of two special functions Ai and Bi that
are called the “Airy” functions. These functions are shown in figure A.11. The
general solution near a turning point is:
s
3 2mV ′ V − E dV
ψ = CA Ai(x) + CB Bi(x) x= V′ ≡
h̄2 V′ dx

630
1 1
Ai
Bi

1 x 1 x

Figure A.11: The Airy Ai and Bi functions that solve the Hamiltonian eigenvalue
problem for a linearly varying potential energy. Bi very quickly becomes too
large to plot for positive values of its argument.

Note that (V − E)/V ′ is the x-position measured from the point where V = E,
so that x is a local, stretched x-coordinate.
The second step is to relate this solution to the normal WKB approximations
away from the turning point. Now from a macroscopic point of view, the WKB
approximation follows from the assumption that Planck’s constant h̄ is very
small. That implies that the validity of the Airy functions normally extends
to region where |x| is relatively large. For example, if you focus attention on a
point where V − E is a finite multiple of h̄1/3 , V − E is small, so the value of V ′
will deviate little from its value at the turning point: the assumption of linearly
varying potential remains valid. Still, if V −E is a finite multiple of h̄1/3 , |x| will
be proportional to 1/h̄1/3 , and that is large. Such regions of large, but not too
large, |x| are called “matching regions,” because in them both the Airy function
solution and the WKB solution are valid. It is where the two meet and must
agree.
It is graphically depicted in figures A.12 and A.13. Away from the turning
points, the classical or tunneling WKB approximations apply, depending on
whether the total energy is more than the potential energy or less. In the vicinity
of the turning points, the solution is a combination of the Airy functions. If
you look up in a mathematical handbook like [1] how the Airy functions can be
approximated for large positive respectively negative x, you find the expressions
listed in the bottom lines of the figures. (After you rewrite what you find in
table books in terms of useful quantities, that is!)
The expressions in the bottom lines must agree with what the classical,
respectively tunneling WKB approximation say about the matching regions.
At one side of the turning point, that relates the coefficients Cp and Cn of the
tunneling approximation to the coefficients of CA and CB of the Airy functions.

631
t


1 h iθ i

√ Cf e + Cb e−iθ 

 h i
pc 1
1 
CB cBi(x) + CA cAi(x) q Cp eγ + Cn e−γ

 |pc |
√ [Cc cos θ + Cs sin θ] 
pc
⇑ equate
equate ⇑


· ¸ 1 h i
1 π π q CB eγ−γt + 21 CA eγt −γ
√ CB cos(θ−θt − ) − CA sin(θ−θt − )
pc 4 4 |pc |

Figure A.12: Connection formulae for a turning point from classical to tunnel-
ing.


 1 h iθ i
h i


 √ Cf e + Cb e−iθ
1 pc
q Cp eγ + Cn e−γ CB cBi(x) + CA cAi(x)  1
|pc | 

 √ [Cc cos θ + Cs sin θ]
pc
equate ⇑
⇓ ⇑ equate

1 h i · ¸
q CB eγt −γ + 12 CA eγ−γt 1 π π
√ CB cos(θ−θt + ) + CA sin(θ−θt + )
|pc | pc 4 4

Figure A.13: Connection formulae for a turning point from tunneling to classi-
cal.

632
At the other side, it relates the coefficients Cf and Cb (or Cc and Cs ) of the
classical WKB approximation to CA and CB . The net effect of it all is to relate,
“connect,” the coefficients of the classical WKB approximation to those of the
tunneling one. That is why the formulae in figures A.12 and A.13 are called the
“connection formulae.”
You may have noted the appearance of an additional constant c in figures
A.12 and A.13. This nasty constant is defined as

π
c= (A.44)
(2m|V ′ |h̄)1/6

and shows up uninvited when you approximate the Airy function solution for
large |x|. By cleverly absorbing it in a redefinition of the constants CA and CB ,
figures A.12 and A.13 achieve that you do not have to worry about it unless
you specifically need the actual solution at the turning points.
As an example of how the connection formulae are used, consider a right
turning point for the harmonic oscillator or similar. Near such a turning point,
the connection formulae of figure A.12 apply. In the tunneling region towards
the right, the term Cp eγ better be zero, because it blows up at large x, and that
would put the particle at infinity for sure. So the constant Cp will have to be
zero. Now the matching at the right side equates Cp to CB e−γt so CB will have
to be zero. That means that the solution in the vicinity of the turning point
will have to be a pure Ai function. Then the matching towards the left shows
that the solution in the classical WKB region must take the form of a sine that,
when extrapolated to the turning point θ = θt , stops short of reaching zero by
an angular amount π/4. Hence the assertion in section 5.6 that the angular
range of the classical WKB solution should be shortened by π/4 for each end
at which the particle is trapped by a gradually increasing potential instead of
an impenetrable wall.

V
E 1 2

- ¾ - ¾
√ √ √
[Cfl eiθ + Cbl e−iθ ]/ pc [Cpm eγ + Cnm e−γ ]/ |pc | C r eiθ / pc

Figure A.14: WKB approximation of tunneling.

As another example, consider tunneling as discussed in sections 5.7 and 5.8.


Figure A.14 shows a sketch. The WKB approximation may be used if the barrier
through which the particle tunnels is high and wide. In the far right region, the
energy eigenfunction only involves a term C r eiθ with a forward wave speed. To

633
simplify the analysis, the constant C r can be taken to be one, because it does not
make a difference how the wave function is normalized. Also, the integration
constant in θ can be chosen such that θ = π/4 at turning point 2; then the
connection formulae of figure A.13 along with the Euler formula (1.5) show that
the coefficients of the Airy functions at turning point 2 are CB = 1 and CA = i.
Next, the integration constant in γ can be taken such that γ = 0 at turning
point 2; then the connection formulae of figure A.13 imply that Cpm = 21 i and
Cnm = 1.
Next consider the connection formulae for turning point 1 in figure A.12.
Note that e−γ1 can be written as eγ12 , where γ12 = γ2 −γ1 , because the integration
constant in γ was chosen such that γ2 = 0. The advantage of using eγ12 instead of
e−γ1 is that it is independent of the choice of integration constant. Furthermore,
under the typical conditions that the WKB approximation applies, for a high
and wide barrier, eγ12 will be a very large number. It is then seen from figure
A.12 that near turning point 1, CA = 2eγ12 which is large while CB is small
and will be ignored. And that then implies, using the Euler formula to convert
Ai’s sine into exponentials, that |Cfl | = eγ12 . As discussed in section 5.8, the
transmission coefficient is given by

pr |C r / pr |2
T = √
pl |Cfl / pl |2

and plugging in C r = 1 and |Cfl | = eγ12 , the transmission coefficient is found to


be e−2γ12 .

A.46 Three-dimensional scattering


This note introduces some of the general concepts of three dimensional scat-
tering, in case you run into them. For more details and actual examples, a
quantum mechanics text for physicists will need to be consulted; it is a big
thing for them.
The basic idea is as sketched in figure A.15. A beam of particles is send in
from the far left towards a three-dimensional target. Part of the beam hits the
target and is scattered, to be picked up by surrounding detection equipment.
It will be assumed that the collision with the target is elastic, and that
the particles in the beam are sufficiently light that they scatter off the target
without transferring kinetic energy to it. In that case, the target can be modeled
as a steady potential energy field. And if the target and/or incoming particles
are electrically neutral, it can also be assumed that the potential decays fairly
quickly to zero away from the target.
It is convenient to use a spherical coordinate system (r, θ, φ) with its origin at
the scattering object and with its axis aligned with the direction of the incoming

634
eip∞ r/h̄
C f (θ, φ)
r

Cfl eip∞ z/h̄

Figure A.15: Scattering of a beam off a target.

beam. Since the axis of a spherical coordinate system is usually called the z-
axis, the horizontal coordinate will now be indicated as z rather than x as in
the one-dimensional examples.
In the relevant (nominal) energy eigenfunction, the incoming beam can
still be represented as a one-dimensional wave. However, unlike for the one-
dimensional scattering of figure 5.16, now the wave is not just scattered to the
left and right, but in all directions, in other words to all angles θ and φ. The
far field behavior of the nominal energy eigenfunction is

eip∞ r/h̄
ψE ∼ Cfl eip∞ z/h̄ + Cf (θ, φ) for r → ∞ (A.45)
r
where Cf is called the scattering amplitude.” The first term in the far field
behavior allows the incoming wave packets to be described and the same packets
going out again unperturbed. If some joker removes the target, that is all there
is. The second term describes outgoing scattered waves. This term must be
proportional to eip∞ r/h̄ , without a e−ip∞ r/h̄ term, since no wave packets should
come in from infinity except those in the incoming beam. The magnitude of the
second term decreases with r because the probability of finding a particle in a
given detection area should decrease with distance. Indeed, the total detection
area increases with its radius as 4πr2 and the total number of particles to detect
per unit time is the same wherever the detectors are, so the probability of finding
a particle per unit area should decrease as 1/r2 . Since the probability of finding
a particle is proportional to the square of the wave function, the wave function

635
itself must be proportional to 1/r. The second term above makes it so.
It follows that the number of particles in a small detection area dA is pro-
portional to it angular extent, or “solid angle” dΩ = dA/r2 . In spherical coor-
dinates
dΩ = sin θ dθdφ
For a continuous beam of particles being directed at the target, the number of
scattered particles per unit solid angle and per unit time will be proportional
to the square magnitude of the scattered wave function:

dI˙
∝ |Cf (θ, φ)|2
dΩ
The number of particles in the incoming beam per unit beam cross-sectional
area and per unit time is called the “luminosity” of the beam; it is proportional
to the square magnitude of the incoming one-dimensional wave function,

dI˙
∝ |Cfl |2
dAb
Physicists like to take the ratio of the two in order that the rate at which the
particles are send in is scaled away, so they define

dAb |Cf (θ, φ)|2


D(θ, φ) ≡ = (A.46)
dΩ |Cfl |2

It is the infinitesimal area dAb of the incoming beam that is scattered into the
infinitesimal solid angle dΩ. So it is a scattered particle density expressed in
suitable terms.
However, “scattered particle density” would be understandable, so physicists
call it the “differential cross-section.” This is a particularly well chosen name,
because it is not a differential, but a differential quotient. It will greatly confuse
mathematically literate people. And “cross-section” is sufficiently vague that
it can mean anything; it does not at all give the secret away that the thing is
really a measure for the number of scattered particles.
The total area of the incoming beam that is scattered can be found by
integrating over all deflection angles:
Z Z π Z 2π
dAb
σ ≡ Ab,total = dΩ = D(θ, φ) sin θ dθdφ (A.47)
all dΩ θ=0 φ=0

Note that the integration should exclude the particles that pass through unde-
flected in figure A.15; in other words it should exclude θ = 0. Physicists call
σ the “total cross-section.” That is quite below their normal standards, since
it really is a total cross-section. Fortunately, physicist are clever enough not to

636
say what cross section it is, and cross-section can mean many things. Also, by
using the symbol σ instead of something logical like Ab for the cross-section,
and D instead of something like dAb /dΩ or A′b , or even σ ′ , for the differential
cross-section, they do their best to reduce the damage as well as possible.
If you remain disappointed in physicists, take some comfort in the following
term for scattering that can be described using classical mechanics: the “impact
parameter.” If you guess it describes the local physics of the particle impact
process, it is really hilarious to physicists. Instead, think “centerline offset;” it
describes the location relative to the centerline of the incoming beam that the
particles come in; it has no direct relation whatsoever to what sort of impact
(if any) these particles end up experiencing.

A.46.1 Partial wave analysis


Jim Napolitano from RPI and Cornell writes “The term ‘Partial Wave Analysis’
is poorly defined and over used.” Gee, what a surprise! For one, they are
component waves, not partial waves. But you already componently assumed
that they might be.
This discussion will restrict itself to spherically symmetric scattering poten-
tials. In that case, the analysis of the energy eigenfunctions can be done much
like the analysis of the hydrogen atom of chapter 3.2. However, the boundary
conditions at infinity will be quite different; the objective is not to describe
bound particles, but particles that come in from infinity with nonzero kinetic
energy and are scattered back to infinity. Also, the potential will of course not
normally be a Coulomb one.
But just like for the hydrogen atom, the energy eigenfunctions can be taken
to be radial functions times spherical harmonics Ylm :
ψElm (r, θ, φ) = REl (r)Ylm (θ, φ) (A.48)
The reason is that any spherically symmetric potential commutes with both L b2
b
and Lz , so the energy eigenfunctions can be taken to be also eigenfunctions of
b 2 and L
L b . However, the functions R will not be the hydrogen ones R .
z El nl
In terms of the classical momentum
q
pc ≡ 2m(E − V )
the Hamiltonian eigenvalue problem is
p2c
−∇2 ψElm = ψElm
h̄2
Following chapter 3.2.2, for a spherically symmetric potential, this may be re-
duced to an equation for the REl :
à ! " #
d dREl p2
r2 + c2 r2 − l(l + 1) REl = 0
dr dr h̄

637
Note that the classical momentum pc is constant wherever the potential
energy is constant. That includes the detection region far from the scattering
object, where the potential is zero and pc = p∞ . In such regions, the solution
for REl can be found in advanced mathematical handbooks, like for example
[1]. Depending on whatever is easiest, the solution can be written in two ways.
The first is as
REl = cs jl (pc r/h̄) + cc nl (pc r/h̄)
where the functions jl and nl are called the “spherical Bessel functions” of the
first and second kinds. The nl are also called the “Neumann functions” and
might be indicated by yl or ηl . The other way to write the solution is as
(1) (2)
REl = cf hl (pc r/h̄) + cb hl (pc r/h̄)
(1) (2)
where hl and hl are called the “spherical Hankel functions.”
The spherical Hankel functions can be found in advanced table books as
à !l
(1) l 1 d eix
hl (x) = −i(−x) = jl (x) + inl (x) (A.49)
x dx x

à !l
(2) 1 d e−ix
hl (x) = i(−x)l = jl (x) − inl (x) (A.50)
x dx x
These are convenient for large r since the Hankel functions of the first kind
represent the outgoing waves while those of the second kind are the incoming
waves. Indeed for large x,

(1) eix (2) e−ix


hl (x) ∼ (−i)l+1 hl (x) ∼ il+1 (A.51)
x x

The spherical Bessel functions are


à !l (1) (2)
1 d sin x h + hl
jl (x) = (−x) l
= l (A.52)
x dx x 2

à !l (1) (2)
1 d cos x h − hl
nl (x) = −(−x) l
= l (A.53)
x dx x 2i
These are often convenient in a region of constant potential that includes the the
origin, because the Bessel function of the first kind jl gives the solution that is
finite at the origin. (Note that the Taylor series of sin x divided by x is a power
series in x2 , and that xdx = 12 dx2 .) Also, they are real for real x. However, in a

638
region where the scattering potential is larger than the energy of the particles,
the argument x of the Bessel or Hankel functions will be imaginary.
(In case you demand a derivation of the spherical Bessel and Hankel func-
tions, well, OK. It is almost comically trivial compared to similar problems in
quantum mechanics. Start with a change of dependent variable from fl to xl gl
in the normalized ordinary differential equation to solve:
à !
d dfl h i fl ≡ xl gl d2 gl dgl
x2 + x2 − l(l + 1) fl = 0 =⇒ x 2
+ 2(l + 1) + xgl = 0
dx dx dx dx

Check, by simply plugging it in, that eix /x is a solution for l = 0. Now make a
change in independent variable from x to ξ = 12 x2 to give

d 2 gl dgl
2ξ 2
+ 2(l + 1) + gl = 0
dξ dξ
Note that the equation for l = 1 is obtained by differentiating the one for l = 0.
That implies that the ξ-derivative of the solution for l = 0 above is a solution for
l = 1. Keep differentiating to get solutions for all values of l. That produces the
spherical Hankel functions of the first kind; the remaining constant is just an
arbitrarily chosen normalization factor. Since the original differential equation
is real, the real and imaginary parts of these Hankel functions, as well as their
complex conjugates, must be solutions too. That gives the spherical Bessel
functions and Hankel functions of the second kind, respectively. Note that all
of them are just finite sums of elementary functions. And that physicists do not
even disagree over their definition, just their names.)
In case you are so inclined, it might be fun to create three-dimensional
scattering wave packet animations. As long as you assume that the potential
V (r) is piecewise constant, for each piece the solution is given by spherical Bessel
functions as above. You should then be able to tie these solutions together
wherever different pieces meet much like in note {A.43}. But there is now an
additional complication.
The problem is that the wave function that describes the incoming particle
beam is a purely Cartesian expression, and the current analysis is in spheri-
cal coordinates. The Cartesian one-dimensional wave eip∞ z/h̄ will need to be
converted to spherical coordinates to do the analysis.
Note that the one-dimensional wave is a solution to the Hamiltonian eigen-
value problem with zero potential. As a solution of that problem, it must be
possible to write it as spherical harmonics times spherical Bessel functions, as
discussed above. More specifically, the wave can be written as spherical har-
monics Yl0 times the Bessel functions of the first kind jl : there are no spherical
harmonics for m 6= 0 since z = r cos θ does not depend on φ, and there are no
Bessel functions of the second kind because the one-dimensional wave is finite

639
at the origin. Now the interaction of a one-dimensional wave with a three-
dimensional object is not at all unique to quantum mechanics, of course, and
Rayleigh worked out the correct multiples of these functions to use a very long
time ago:


X q
ip∞ z/h̄ 0 l
e = cw,l jl (p∞ r/h̄)Yl (θ) cw,l = i 4π(2l + 1) (A.54)
l=0

This is were the term “partial waves” comes in. In spherical coordinates, the
single Cartesian wave eip∞ z/h̄ falls apart in an infinite series of partial waves.
The scattering problem for each needs to be solved separately and the results
summed back together to get the total solution. It is the price to pay for training
a Cartesian particle beam on a potential that needs to be solved in spherical
coordinates.
(To derive the Rayleigh formula, set x = p∞ r/h̄ to make the one-dimensional
wave eix cos θ . Expand in a Taylor series around the origin. The generic term
(ix cos θ)l /l! must match cw,l times the term with the lowest power of x in jl (x)
times the term with the highest power of cos θ in Yl0 (θ). Terms l 6= l are not
involved since they do not have a low enough power of x or a high enough power
of cos θ. Deduce or look up the coefficients of the lowest power of x in jl (x) and
highest power of cos θ in Yl0 (θ) to get cw,l as claimed.)
The solution procedure is now as follows: given an energy E, for every l,
(up to some sufficiently large value), find the energy eigenfunction of the form
ψEl = REl Yl0 that for large r behaves like
h
(1)
i √
ψEl ∼ cw,l jl (p∞ r/h̄) + cf,l hl (p∞ r/h̄) Yl0 (θ) for r → ∞ 2mE p∞ =
(A.55)
Note that at large r, the deviations from the one-dimensional wave must be
outgoing reflected waves, so they must be Hankel functions of the first kind.
Also, for simplicity the wave function has been scaled to make Cfl one. As
long as V is piecewise constant, it should be possible to solve for ψEl using
Bessel or Hankel functions as above and so find the cf,l . Sum all these “partial
waves” together to find the energy eigenfunction ψE for an incoming wave of
that energy. (Or better, just sum the Hankel function terms together and add
eip∞ z/h̄ .) Sum such eigenfunctions together in a narrow range of energies to get
a one-dimensional incoming wave packet being scattered.
In terms of the asymptotic behavior above, using (A.51), the differential
cross section is
h̄2 X
∞ X ∞
D(θ) = 2 il−l cf,l

cf,l Yl0 (θ)Yl0 (θ) (A.56)
p∞ l=0 l=0
The Bessel functions form the incoming wave and do not contribute. For the

640
total cross-section, note that the spherical harmonics are orthonormal, so

h̄2 X

σ= 2 |cf,l |2
p∞ l=0

The magnitudes of the coefficients cf,l is not arbitrary, but constrained by


conservation of probability. That allows these complex numbers to be written
in terms of real phase shifts. But if you need that sort of detail, you will need
to consult a book for physics students.

A.46.2 The Born approximation


The Born approximation assumes that the scattering potential is weak to derive
approximate expressions for the scattering.
Consider first the case that the scattering potential is exactly zero. In that
case, the Hamiltonian eigenvalue problem takes the form
" #
p2
∇ + ∞
2
ψE = 0 (A.57)
h̄2
This equation is called the “Helmholtz equation.” The appropriate solution is
here the unperturbed one-dimensional wave

ψE = eip∞ z/h̄

Now consider the case that the potential is not zero, but small. In that case
the Hamiltonian eigenvalue problem becomes
" #
p2∞
2 2mV
∇ + 2 ψE = f f= ψE (A.58)
h̄ h̄2
The trick is now that as long as the potential is very small, ψE will be almost
the same as the one-dimensional wave. Substituting that into the expression for
f , the approximate right hand side in the Helmholtz equation becomes a known
function. The inhomogeneous Helmholtz equation may now be solved for ψE
much like the Poisson equation was solved in chapter 9.5.4.
In particular, using symbol k for the constant p∞ /h̄, the general solution to
the Helmholtz equation can be written as

³ ´ Z
eik|~r|
∇2 + k 2 ψ = f =⇒ ψ = ψh + G(~r − ~r)f (~r) d3~r G(~r) = −
all ~r 4π|~r|
(A.59)
where ψh describes the effects of any waves that come in from infinity, and can
be any solution of the homogeneous Helmholtz equation. To see why this is the

641
solution of the Helmholtz equation, first consider the solution G for the special
case that f is a delta function at the origin:
³ ´
∇2 + k 2 G = δ 3 (~r)
The solution G to this problem is called the “Green’s function of the Helmholtz
equation. Since the delta function in the right hands side is zero everywhere
except at the origin, everywhere except at the origin G is a solution of the ho-
mogeneous Helmholtz equation. According to the previous section, that means
it must be Bessel or Hankel functions times spherical harmonics. The solution
of interest is the one where no waves are propagating in from infinity, because
ψh takes care of that, so Hankel functions of the first kind only. Further, since
there is no angular preference in this problem at all, the solution must be spheri-
cally symmetric; that means the spherical harmonic must be Y00 and the Hankel
(1)
function is then h0 . Also, the appropriate normalization factor multiplying
this solution is essentially the same as the one for the Green’s function of the
Laplace equation: it is the Laplacian ∇2 of G that produces the delta function;
the integral of k 2 G is vanishingly small over the immediate vicinity of the origin.
That gives the Green’s function as stated above. Next, to solve the Helmholtz
equation for an arbitrary right hand side f , just think of that right hand side
as made up of infinitely many “spikes” f (~r) d~r. Each of these spikes produces
a solution given by the Green’s function shifted to location ~r and scaled. That
gives the general solution as stated.
If the solution of the Helmholtz equation is applied to the Schrödinger
equation in the form (A.58), it produces the so-called “integral form of the
Schrödinger equation”,
m Z eip∞ |~r−~r|/h̄ √
ψE (~r) = ψE,0 (~r)− 2 V (~r)ψE (~r) d3~r p∞ = 2mE (A.60)
2πh̄ all ~r |~r − ~r|
where ψE,0 is any free-space wave function of energy E. Whereas the normal
Schrödinger equation is called a partial differential equation, this is called an
integral equation, because the unknown wave function ψE appears in an integral
rather than in partial derivatives.
In the Born approximation, ψE,0 is the incoming one-dimensional wave and
so is the approximation for ψE inside the integral, so
m Z eip∞ |~r−~r|/h̄
ψE (~r) ≈ e ip∞ z/h̄
− V (~r)eip∞ z/h̄ d3~r
2πh̄2 all ~r |~r − ~r|
This can be cleaned up a bit by noting that the interest is really only in ψE (~r)
at large distances from the scattering region. In that case ~r can be ignored
compared to ~r in the denominator, while in the argument of the exponential
~r · ~r
|~r − ~r| ∼ r −
r
642
Also, the vector momenta of the incoming and scattered waves may be defined
as, respectively,

′ ~r √
p~∞ = p∞ k̂ p~∞ = p∞ p∞ = 2mE (A.61)
r

Then

eip∞ r/h̄ m Z
ei(~p∞ −~p∞ )·~r/h̄ V (~r) d3~r

ψE (~r) ∼ eip∞ z/h̄ − (A.62)
r 2πh̄2 all ~r

The differential cross section is therefor


¯ Z ¯2
¯ m i(~
p∞ −~
p∞′ )·~
r /h̄ 3 ¯
D(θ, φ) ≈ ¯¯ 2 e V (~r) d ~r ¯¯ (A.63)
2πh̄ all ~r


Note that p~∞ is a function of φ and θ because of its definition above. In the
further approximation that the energy of the incoming beam is small, the ex-
ponential can be approximated by one.

A.46.3 The Born series


Following Griffiths [10], this section takes a more philosophical view of the Born
approximation.
The basic idea of the Born approximation can very schematically be repre-
sented as Z
ψE ≈ ψE,0 + gV ψE,0
where g represents the Green’s function, absorbing the various constants, and
d3~r was left away for brevity. The error in this expression is because ψE in the
integral has been replaced by the unperturbed incoming wave ψE,0 . To reduce
the error, you could plug the above two-term approximation of the wave function
into the Green’s function integral instead, to give
Z Z Z
ψE ≈ ψE,0 + gV ψE,0 + gV gV ψE,0

Then you could go one better still by plugging the so obtained three-term ap-
proximation into the integral, and so on:
Z Z Z Z Z Z
ψE ≈ ψE,0 + gV ψE,0 + gV gV ψE,0 + gV gV gV ψE,0 + . . .

Graphically, these contributions to ψE can be represented as in figure A.16.


The first term in the series simply takes ψ at some location ~r from the un-
modified incoming wave. The second contribution takes the incoming wave at

643
Vq g
H
HqV Vq
¢g ¢@
g
sψ sψ ¢sψ Vq ¢g @ sψ

q¶ g g@q¢
V V

Figure A.16: Graphical interpretation of the Born series.

an intermediate location ~r, multiplies it by a “vertex factor” V , (the poten-


tial), and adds it to the wave function at ~r multiplied by some factor g, the
“propagator.” This must then be integrated over all possible locations of the
intermediate point. The third term takes the wave function at some location,
multiplies it by the vertex factor, propagates it to another location, multiplies
it again by a vertex factor, and then propagates that to the wave function at
~r. This is to be integrated over all possible locations of the two intermediate
points. And so on.
The Born series inspired Feynman to formulate relativistic quantum me-
chanics in terms of vertices connected together into “Feynman diagrams.” Since
there is a nontechnical, very readable discussion available from the master him-
self, [6], there does not seem much need to go into the details here.

A.47 The evolution of probability


This note looks at conservation of probability, and the resulting definitions of
the reflection and transmission coefficients in scattering. It also explains the
concept of the “probability current” that you may occasionally run into.
For the unsteady Schrödinger equation to provide a physically correct de-
scription of nonrelativistic quantum mechanics, particles should not be able to
disappear into thin air. In particular, during the evolution of the wave func-
tion of a single particle, the total probability of finding the particle if you look
everywhere should stay one at all times:
Z ∞
|Ψ|2 dx = 1 at all times
x=−∞

Fortunately, the Schrödinger equation

∂Ψ h̄2 ∂ 2 Ψ
ih̄ =− +VΨ
∂t 2m ∂x2
does indeed conserve this total probability, so all is well.

644
To verify this, note first that |Ψ|2 = Ψ∗ Ψ, where the star indicates the
complex conjugate, so
∂|Ψ|2 ∂Ψ ∂Ψ∗
= Ψ∗ +Ψ
∂t ∂t ∂t
To get an expression for that, take the Schrödinger equation above times Ψ∗ /ih̄
and add the complex conjugate of the Schrödinger equation,

∂Ψ∗ h̄2 ∂ 2 Ψ∗
−ih̄ =− + V Ψ∗ ,
∂t 2m ∂x2
times −Ψ/ih̄. The potential energy terms drop out, and what is left is
à !
∂|Ψ|2 ih̄ ∂2Ψ ∂ 2 Ψ∗
= Ψ∗ 2 − Ψ .
∂t 2m ∂x ∂x2

Now it can be verified by differentiating out that the right hand side can be
rewritten as a derivative:
à !
∂|Ψ|2 ∂J ih̄ ∂Ψ∗ ∂Ψ
=− where J = Ψ − Ψ∗ (A.64)
∂t ∂x 2m ∂x ∂x

For reasons that will become evident below, J is called the “probability current.”
Note that J, like Ψ, will be zero at infinite x for proper, normalized wave
functions.
If (A.64) is integrated over all x, the desired result is obtained:
d Z∞ ¯∞
|Ψ|2 dx = −J ¯¯ = 0.
dt x=−∞ x=−∞

Therefor, the total probability of finding the particle does not change with time.
If a proper initial condition is provided to the Schrödinger equation in which
the total probability of finding the particle is one, then it stays one for all time.
It gets a little more interesting to see what happens to the probability of
finding the particle in some given finite region a ≤ x ≤ b. That probability is
given by Z b
|Ψ|2 dx
x=a
and it can change with time. A wave packet might enter or leave the region. In
particular, integration of (A.64) gives
d Zb
|Ψ|2 dx = Ja − Jb
dt x=a
This can be understood as follows: Ja is the probability flowing out of the region
x < a into the interval [a, b] through the end a. That increases the probability

645
within [a, b]. Similarly, Jb is the probability flowing out of [a, b] at b into the
region x > b; it decreases the probability within [a, b]. Now you see why J
is called probability current; it is equivalent to a stream of probability in the
positive x-direction.
The probability current can be generalized to more dimensions using vector
calculus:
ih̄
J~ = (Ψ∇Ψ∗ − Ψ∗ ∇Ψ) (A.65)
2m
and the net probability flowing out of a region is given by
Z
J~ · ~n dA (A.66)

where A is the outside surface area of the region, and ~n is a unit vector normal
to the surface. A surface integral like this can often be simplified using the
divergence (Gauss or whatever) theorem of calculus.
Returning to the one-dimensional case, it is often desirable to relate conser-
vation of probability to the energy eigenfunctions of the Hamiltonian,

h̄2 d2 ψ
− + V ψ = Eψ
2m dx2
because the energy eigenfunctions are generic, not specific to one particular
example wave function Ψ.
To do so, first an important quantity called the “Wronskian” must be intro-
duced. Consider any two eigenfunctions ψ1 and ψ2 of the Hamiltonian:

h̄2 d2 ψ1
− + V ψ1 = Eψ1
2m dx2
h̄2 d2 ψ2
− + V ψ2 = Eψ2
2m dx2
If you multiply the first equation above by ψ2 , the second by ψ1 and then
subtract the two, you get
à !
h̄2 d2 ψ2 d2 ψ1
ψ1 2 − ψ2 2 =0
2m dx dx

The constant h̄2 /2m can be divided out, and by differentiation it can be verified
that the remainder can be written as
dW dψ2 dψ1
=0 where W = ψ1 − ψ2
dx dx dx
The quantity W is called the Wronskian. It is the same at all values of x.

646
As an application, consider the example potential of figure A.6 in note
{A.43} that bounces a particle coming in from the far left back to where it
came from. In the left region, the potential V has a constant value Vl . In this
region, an energy eigenfunction is of the form
q
ψE = Cfl eipl x/h̄ + Cbl e−ipl x/h̄ for x < xA where pl = 2m(E − Vl )

At the far right, the potential grows without bound and the eigenfunction be-
comes zero rapidly. To make use of the Wronskian, take the first solution ψ1
to be ψE itself, and ψ2 to be its complex conjugate ψE∗ . Since at the far right
the eigenfunction becomes zero rapidly, the Wronskian is zero there. And since
the Wronskian is constant, that means it must be zero everywhere. Next, if
you plug the above expression for the eigenfunction in the left region into the
definition of the Wronskian and clean up, you get
2ipl ³ l 2 ´
W = |Cb | − |Cfl |2 .

If that is zero, the magnitude of Cbl must be the same as that of Cfl .
This can be understood as follows: if a wave packet is created from eigen-
functions with approximately the same energy, then the terms Cfl eipl x/h̄ combine
for large negative times into a wave packet coming in from the far left. The
probability of finding the particle in that wave packet is proportional to the
integrated square magnitude of the wave function, hence proportional to the
square magnitude of Cfl . For large positive times, the Cbl e−ipl x/h̄ terms combine
in a similar wave packet, but one that returns towards the far left. The prob-
ability of finding the particle in that departing wave packet must still be the
same as that for the incoming packet, so the square magnitude of Cbl must be
the same as that of Cfl .
Next consider a generic scattering potential like the one in figure 5.16. To
the far left, the eigenfunction is again of the form
q
ψE = Cfl eipl x/h̄ + Cbl e−ipl x/h̄ for x << 0 where pl = 2m(E − Vl )

while at the far right it is now of the form


q
ψE = C r eipr x/h̄ for x >> 0 where pr = 2m(E − Vr )

The Wronskian can be found the same way as before:


2ipl ³ l 2 ´ 2ipr r 2
W = |Cb | − |Cfl |2 = − |C |
h̄ h̄
The fraction of the incoming wave packet that ends up being reflected back
towards the far left is called the “reflection coefficient” R. Following the same

647
reasoning as above, it can be computed from the coefficients in the far left region
of constant potential as:
|C l |2
R = bl 2
|Cf |
The reflection coefficient gives the probability that the particle can be found to
the left of the scattering region at large times.
Similarly, the fraction of the incoming wave packet that passes through the
potential barrier towards the far right is called the “transmission coefficient”
T . It gives the probability that the particle can be found to the right of the
scattering region at large times. Because of conservation of probability, T =
1 − R.
Alternatively, because of the Wronskian expression above, the transmission
coefficient can be explicitly computed from the coefficient of the eigenfunction
in the far right region as
pr |C r |2 q q
T = pl = 2m(E − Vl ) pr = 2m(E − Vr )
pl |Cfl |2
If the potential energy is the same at the far right and far left, the two classical
momenta are the same, pr = pl . Otherwise, the reason that the ratio of clas-
sical momenta appears in the transmission coefficient is because the classical
momenta in a wave packet have a different spacing with respect to energy if
the potential energy is different. (The above expression for the transmission
coefficient can also be derived explicitly using the Parseval equality of Fourier
analysis, instead of inferred from conservation of probability and the constant
Wronskian.)

A.48 A basic description of Lagrangian multi-


pliers
This note will derive the Lagrangian multipliers for an example problem. Only
calculus will be used. The example problem will be to find a stationary point
of a function f of four variables if there are two constraints. Different numbers
of variables and constraints would work out in similar ways as this example.
The four variables that example function f depends on will be denoted by
x1 , x2 , x3 , and x4 . The two constraints will be taken to be equations of the
form g(x1 , x2 , x3 , x4 ) = 0 and h(x1 , x2 , x3 , x4 ) = 0, for suitable functions g and
h. Constraints can always be brought in such a form by taking everything in
the constraint’s equation to the left-hand side of the equals sign.
So the example problem is:
stationarize: f (x1 , x2 , x3 , x4 )

648
subject to: g(x1 , x2 , x3 , x4 ) = 0, h(x1 , x2 , x3 , x4 ) = 0

Stationarize means to find locations where the function has a minimum or a


maximum, or any other point where it does not change under small changes of
the variables x1 , x2 , x3 , x4 as long as these satisfy the constraints.
The first thing to note is that rather than considering f to be a function of
x1 , x2 , x3 , x4 , you can consider it instead to be to be a function of g and h and
only two additional variables from x1 , x2 , x3 , x4 , say x3 and x4 :

f (x1 , x2 , x3 , x4 ) = f˜(g, h, x3 , x4 )

The reason you can do that is that you should in principle be able to reconstruct
the two missing variables x1 and x2 given g, h, x3 , and x4 .
As a result, any small change in the function f , regardless of constraints,
can be written using the expression for a total differential as:

∂ f˜ ∂ f˜ ∂ f˜ ∂ f˜
df = dg + dh + dx3 + dx4 .
∂g ∂h ∂x3 ∂x4
At the desired stationary point, acceptable changes in variables are those
that keep g and h constant at zero; they have dg = 0 and dh = 0. So for f
to be stationary under all acceptable changes of variables, you must have that
the final two terms are zero for any changes in variables. This means that the
partial derivatives in the final two terms must be zero since the changes dx3 and
dx4 can be arbitrary.
For changes in variables that do go out of bounds, the change in f will not
be zero; that change will be given by the first two terms in the right-hand side.
So, the erroneous changes in f due to going out of bounds are these first two
terms, and if we subtract them, we get zero net change for any arbitrary change
in variables:
∂ f˜ ∂ f˜
df − dg − dh = 0 always.
∂g ∂h
In other words, if we “penalize” the change in f for going out of bounds by
amounts dg and dh at the rate above, any change in variables will produce a
penalized change of zero, whether it stays within bounds or not.
The two derivatives at the stationary point in the expression above are the
Lagrangian multipliers or penalty factors, call them ǫ1 = ∂ f˜/∂g and ǫ2 = ∂ f˜/∂h.
In those terms
df − ǫ1 dg − ǫ2 dh = 0
for whatever is the change in the variables g, h, x3 , x4 , and that means for what-
ever is the change in the original variables x1 , x2 , x3 , x4 . Therefor, the change
in the penalized function
f − ǫ1 g − ǫ2 h

649
is zero whatever is the change in the variables x1 , x2 , x3 , x4 .
In practical application, explicitly computing the Lagrangian multipliers ǫ1
and ǫ2 as the derivatives of function f˜ is not needed. You get four equations by
putting the derivatives of the penalized f with respect to x1 through x4 equal
to zero, and the two constraints provide two more equations. Six equations is
enough to find the six unknowns x1 through x4 , ǫ1 and ǫ2 .

A.49 The generalized variational principle


The purpose of this note is to verify directly that the variation of the expectation
energy is zero at any energy eigenstate, not just the ground state.
Suppose that you are trying to find some energy eigenstate ψn with eigen-
value En , and that you are close to it, but no cigar. Then the wave function
can be written as

ψ = ε1 ψ1 + ε2 ψ2 + . . . + εn−1 ψn−1 + (1 + εn )ψn + εn+1 ψn+1 + . . .

where ψn is the one you want and the remaining terms together are the small
error in wave function, written in terms of the eigenfunctions. Their coefficients
ε1 , ε2 , . . . are small.
The normalization condition hψ|ψi = 1 is, using orthonormality:

1 = ε21 + ε22 + . . . + ε2n−1 + (1 + εn )2 + ε2n+1 + . . .

The expectation energy is

hEi = ε21 E1 + ε22 E2 + . . . + ε2n−1 En−1 + (1 + εn )2 En + ε2n+1 En+1 + . . .

or plugging in the normalization condition to eliminate (1 + εn )2

hEi = ε21 (E1 −En )+ε22 (E2 −En )+. . .+ε2n−1 (En−1 −En )+En +ε2n+1 (En+1 −En )+. . .

Assuming that the energy eigenvalues are arranged in increasing order, the terms
before En in this sum are negative and the ones behind En positive. So En is
neither a maximum nor a minimum; depending on conditions hEi can be greater
or smaller than En .
Now, if you make small changes in the wave function, the values of ε1 , ε2 , . . .
will slightly change, by small amounts that will be indicated by δε1 , δε2 , . . ., and
you get

δhEi = 2ε1 (E1 − En )δε1 + 2ε2 (E2 − En )δε2 + . . .


+ 2εn−1 (En−1 − En )δεn−1 + 2εn+1 (En+1 − En )δεn+1 + . . .

650
This is zero when ε1 = ε2 = . . . = 0, so when ψ is the exact eigenfunction
ψn . And it is nonzero as soon as any of ε1 , ε2 , . . . is nonzero; a change in
that coefficient will produce a nonzero change in expectation energy. So the
variational condition δhEi = 0 is satisfied at the exact eigenfunction ψn , but
not at any nearby different wave functions.
The bottom line is that if you locate the nearest wave function for which
δhEi = 0 for all acceptable small changes in that wave function, well, if you are in
the vicinity of an energy eigenfunction, you are going to find that eigenfunction.
One final note. If you look at the expression above, it seems like none of the
other eigenfunctions are eigenfunctions. For example, the ground state would
be the case that ε1 is one, and all the other coefficients zero. So a small change
in ε1 would seem to produce a change δhEi in expectation energy, and the
expectation energy is supposed to be constant at eigenstates.
The problem is the normalization condition, whose differential form says
that

0 = 2ε1 δε1 + 2ε2 δε2 + . . . + 2εn−1 δεn−1 + 2(1 + εn )δεn + 2εn+1 δεn+1 + . . .

At ε1 = 1 and ε2 = . . . = εn−1 = 1 + εn = εn+1 = . . . = 0, this implies that


the change δε1 must be zero. And that means that the change in expectation
energy is in fact zero. You see that you really need to eliminate ε1 from the
list of coefficients near ψ1 , rather than εn as the analysis for ψn did, for the
mathematics not to blow up. A coefficient that is not allowed to change at a
point in the vicinity of interest is a confusing coefficient to work with.

A.50 Spin degeneracy


To see that generally speaking the basic form of the Hamiltonian produces
energy degeneracy with respect to spin, but that it is not important for using
the Born-Oppenheimer approximation, consider the example of three electrons.
Any three-electron energy eigenfunction ψ E with Hψ E = E E ψ E can be split
into separate spatial functions for the distinct combinations of electron spin
values as

ψ E = ψ+++
E E
↑↑↑ + ψ+−− E
↑↓↓ + ψ−+− E
↓↑↓ + ψ−−+ ↓↓↑ +
E E E E
ψ−−− ↓↓↓ + ψ−++ ↓↑↑ + ψ+−+ ↑↓↑ + ψ++− ↑↑↓.

Since the assumed Hamiltonian H does not involve spin, each of the eight
spatial functions ψ±±± above will separately have to be an eigenfunction of the
Hamiltonian with eigenvalue E E if nonzero. In addition, since the first four
functions have an odd number of spin up states and the second four an even
number, the antisymmetry requirements apply only within the two sets, not

651
between them. The exchanges only affect the order of the spin states, not their
number. So the two sets satisfy the antisymmetry requirements individually.
It is now seen that given a solution for the first four wave functions, there is
an equally good solution for the second four wave functions that is obtained by
inverting all the spins. Since the spins are not in the Hamiltonian, inverting the
spins does not change the energy. They have the same energy, but are different
because they have different spins.
However, they are orthogonal because their spins are, and the spatial op-
erations in the derivation of the Born-Oppenheimer approximation in the next
note do not change that fact. So they turn out to lead to nuclear wave functions
that do not affect each other. More precisely, the inner products appearing in
the coefficients ann are zero because the spins are orthogonal.

A.51 Derivation of the approximation


This note gives a derivation of the Born-Oppenheimer Hamiltonian eigenvalue
problems (6.13) for the wave functions of the nuclei.
First consider an exact eigenfunction ψ of the complete system, including
both the electrons and the nuclei fully. Can it be related somehow to the simpler
electron eigenfunctions ψ1E , ψ2E , . . . that ignored nuclear kinetic energy? Yes it
can. For any given set of nuclear coordinates, the electron eigenfunctions are
complete; they are the eigenfunctions of an Hermitian electron Hamiltonian.
And that means that you can for any given set of nuclear coordinates write the
exact wave function as X
ψ= cn ψnE
n

You can do this for any set of nuclear coordinates that you like, but the coeffi-
cients cn will be different for different sets of nuclear coordinates. That is just
another way of saying that the cn are functions of the nuclear coordinates.
So, to be really precise, the wave function of I electrons and J nuclei can be
written as:

ψ(~r1 , Sz1 , . . . ,~rI , SzI ,~r1n , Sz1


n
, . . . ,~rJn , SzJ
n
)=
X
cn (~r1n , Sz1
n
, . . . ,~rJn , SzJ
n
)ψnE (~r1 , Sz1 , . . . ,~rI , SzI ; ~r1n , Sz1
n
, . . . ,~rJn , SzJ
n
)
n

where superscripts n indicate nuclear coordinates. (The nuclear spins are really
irrelevant, but it cannot hurt to keep them in.)
Consider what this means physically. By construction, the square electron
eigenfunctions |ψnE |2 give the probability of finding the electrons assuming that
they are in eigenstate n and that the nuclei are at the positions listed in the

652
final arguments of the electron eigenfunction. But then the probability that
the nuclei are actually at those positions, and that the electrons are actually in
eigenstate ψnE , will have to be |cn |2 . After all, the full wave function ψ must
describe the probability for the entire system to actually be in a specific state.
That means that cn must be the nuclear wave function ψnN for when the electrons
are in energy eigenstate ψnE . So from now on, just call it ψnN instead of cn . The
full wave function is then X
ψ= ψnN ψnE (A.67)

In the unsteady case, the cn , hence the ψnN , will also be functions of time.
The ψnE will remain time independent as long as no explicitly time-dependent
terms are added. The derivation then goes exactly the same way as the time-
independent Schrödinger equation (Hamiltonian eigenvalue problem) derived
below, with ih̄∂/∂t replacing E.
So far, no approximations have been made; the only thing that has been done
is to define the nuclear wave functions ψnN . But the objective is still to derive
P
the claimed equation (6.13) for them. To do so plug the expression ψ = ψnN ψnE
into the exact Hamiltonian eigenvalue problem:
h iX X
Tb N + Tb E + V NE + V EE + V NN ψnN ψnE = E ψnN ψnE
n n

Note first that the eigenfunctions can be taken to be real since the Hamilto-
nian is real. If the eigenfunctions were complex, then their real and imaginary
parts separately would be eigenfunctions, and both of these are real. This ar-
gument applies to both the electron eigenfunctions separately as well as to the
full eigenfunction. The trick is now to take an inner product of the equation
above with a chosen electron eigenfunction ψnE . More precisely, multiply the en-
tire equation by ψnE , and integrate/sum over the electron coordinates and spins
only, keeping the nuclear positions and spins at fixed values.
What do you get? Consider the terms in reverse order, from right to left. In
the right hand side, the electron-coordinate inner product hψnE |ψnE ie is zero unless
n = n, and then it is one, since the electron wave functions are orthonormal
for given nuclear coordinates. So all we have left in the right-hand side is
EψnN , Check, EψnN is the correct right hand side in the nuclear-wave-function
Hamiltonian eigenvalue problem (6.13).
Turning to the latter four terms in the left-hand side, remember that by
definition the electron eigenfunctions satisfy
h i
Tb E + V NE + V EE + V NN ψnE = (EnE + V NN )ψnE
P
and if you then take an inner product of ψnN (EnE + V NN )ψnE with ψnE , it is just
like the earlier term, and you get (EnE + V NN )ψnN . Check, that are two of the
terms in the left-hand side of (6.13) that you need.

653
That leaves only the nuclear kinetic term, and that one is a bit tricky. Re-
calling the definition (6.4) of the kinetic energy operator Tb N in terms of the
nuclear coordinate Laplacians, you have
J X
X 3 X
h̄2 ∂2 N E
− ψ ψ
n 2 n n
j=1 α=1 n 2mjn ∂rαj

Remember that not just the nuclear wave functions, but also the electron
wave functions depend on the nuclear coordinates. So, if you differentiate the
product, you get

X 3 X
J X
h̄2 ∂ 2 ψnN X 3 X
J X
h̄2 ∂ψnN ∂ψnE J X
X 3 X
h̄2 ∂ 2 ψnE
− ψE −
n 2 n
− ψN
n n n 2
j=1 α=1 n 2mjn ∂rαj j=1 α=1 n mjn ∂rαj
n n
∂rαj j=1 α=1 n 2mj ∂rαj

Now if you take the inner product with electron eigenfunction ψnE , the first
term gives you what you need, the expression for the kinetic energy of the nuclei.
But you do not want the other two terms; these terms have the nuclear kinetic
energy differentiations at least in part on the electron wave function instead of
on the nuclear wave function.
Well, whether you like it or not, the exact equation is, collecting all terms
and rearranging,
h i X
Tb N + V NN + EnE ψnN = EψnN + ann ψnN (A.68)
n

where
J X
X 3
h̄2 ∂2
Tb N = − n n 2
(A.69)
j=1 α=1 2mj ∂rαj

à !
J X
X 3
h̄2 D ¯ ∂ψ E E ∂ D ¯ ∂ 2ψE E
ψnE ¯¯ ψnE ¯¯
n n
ann = 2 + n 2
(A.70)
j=1 α=1 2mjn n
∂rαj n
∂rαj ∂rαj

The first thing to note is the final sum in (A.68). Unless you can talk away
this sum as negligible, (6.13) is not valid. The “off-diagonal” coefficients, the ann
for n 6= n, are particularly bad news, because they produce interactions between
the different potential energy surfaces, shifting energy from one value of n to
another. These off-diagonal terms are called “vibronic coupling terms.” (The
word is a contraction of “vibration” and “electronic,” if you are wondering.)
Let’s have a closer look at (A.69) and (A.70) to see how big the various terms
really are. At first appearance it might seem that both the nuclear kinetic
energy Tb N and the coefficients ann can be ignored, since both are inversely
proportional to the nuclear masses, hence apparently thousands of times smaller

654
than the electronic kinetic energy included in EnE . But do not go too quick here.
n
First ballpark the typical derivative, ∂/∂rαj when applied to the nuclear wave
function. You can estimate such a derivative as 1/ℓN , where ℓN is the typical
length over which there are significant changes in a nuclear wave function ψnN .
Well, there are significant changes in nuclear wave functions if you go from the
middle of a nucleus to its outside, and that is a very small distance compared to
the typical size of the electron blob ℓE . It means that the distance ℓN is small.
So the relative importance of the nuclear kinetic energy increases by a factor
(ℓE /ℓN )2 relative to the electron kinetic energy, compensating quite a lot for the
much higher nuclear mass. So keeping the nuclear kinetic energy is definitely a
good idea.
How about the coefficients ann ? Well, normally the electron eigenfunctions
only change appreciable when you vary the nuclear positions over a length com-
parable to the electron blob scale ℓE . Think back of the example of the hydrogen
molecule. The ground state separation between the nuclei was found as 0.87Å.
But you would not see a dramatic change in electron wave functions if you made
it a few percent more or less. To see a dramatic change, you would have to make
n
the nuclear distance 1.5Å, for example. So the derivatives ∂/∂rαj applied to the
electron wave functions are normally not by far as large as those applied to the
nuclear wave functions, hence the ann terms are relatively small compared to
the nuclear kinetic energy, and ignoring them is usually justified. So the final
conclusion is that equation (6.13) is usually justified.
But there are exceptions. If different energy levels get close together, the
electron wave functions become very sensitive to small effects, including small
changes in the nuclear positions. When the wave functions have become sensitive
enough that they vary significantly under nuclear position changes comparable
in size to the nuclear wave function blobs, you can no longer ignore the ann
terms and (6.13) becomes invalid.
You can be a bit more precise about that claim with a few tricks. Consider
the factors
D ¯ ∂ψ E E
ψnE ¯¯ n
n
∂rαj

appearing in the ann , (A.70). First of all, these factors are zero when n = n.
The reason is that because of orthonormality, hψnE |ψnE i = 1, and taking the
n
∂/∂rαj derivative of that, noting that the eigenfunctions are real, you see that
the factor is zero.
For n 6= n, the following trick works:

D ¯ ∂ ∂ ¯¯ E E D ¯ ∂ψ E E
ψnE ¯¯ = (En − En ) ψnE ¯¯ n
E E E E n
n
H −H n
¯ψn
∂rαj ∂rαj ∂rαj

655
I D
Zj e2 X ¯ r n − rαi ¯ E
= ψnE ¯¯ αj 3 ¯ E
¯ψn
4πǫ0 i=1 rij
The first equality is just a matter of the definition of the electron eigenfunc-
tions and taking the second H E to the other side, which you can do since it
is Hermitian. The second equality is a matter of looking up the Hamiltonian
in subsection 6.2.1 and then working out the commutator in the leftmost inner
product. (V NN does not commute with the derivative, but you can use orthog-
onality on the cleaned up expression.) The bottom line is that the final inner
product is finite, with no reason for it to become zero when energy levels ap-
proach. So, looking at the second equality, the first term in ann , (A.70), blows
up like 1/(EnE − EnE ) when those energy levels become equal.
As far as the final term in ann is concerned, like the second term, you would
expect it to become important when the scale of non-trivial changes in electron
wave functions with nuclear positions becomes comparable to the size of the
nuclear wave functions. You can be a little bit more precise by taking one more
derivative of the inner product expression derived above,
D ∂ψ E ¯ ∂ψ E E D ¯ ∂ 2 ψ E E I D
Zj e2 X ¯ n ¯ E
n¯ ∂ 1 E ¯ rαj − rαi ¯ E
+ ψnE ¯¯ n 2 =
n n
n
¯ n n
ψn ¯ ¯ψn
∂rαj ∂rαj ∂rαj ∂rαj EnE − EnE 4πǫ0 i=1 rij
The first term should not be large: while the left hand side of the inner product
has a large component along ψnE , the other side has zero component and vice-
versa. The final term should be of order 1/(EnE − EnE )2 , as you can see if you
first change the origin of the integration variable in the inner product to be at
the nuclear position, to avoid having to differentiate the potential derivative. So
you conclude that the second term of coefficient ann is of order 1/(EnE − EnE )2 .
In view of the fact that this term has one less derivative on the nuclear wave
function, that is just enough to allow it to become significant at about the same
time that the first term does.
The diagonal part of matrix ann , i.e. the ann terms, is somewhat interesting
since it produces a change in effective energy without involving interactions
with the other potential energy surfaces, i.e. without interaction with the ψnN
for n 6= n. The diagonal part is called the “Born-Oppenheimer diagonal correcti-
on.” Since as noted above, the first term in the expression (A.70) for the ann
does not have a diagonal part, the diagonal correction is given by the second
term.
Note that in a transient case that starts out as a single nuclear wave function
N
ψn , the diagonal term ann multiplies the predominant nuclear wave function
ψnN , while the off-diagonal terms only multiply the small other nuclear wave
functions. So despite not involving any derivative of the nuclear wave function,
the diagonal term will initially be the main correction to the Born-Oppenheimer
approximation. It will remain important at later times.

656
A.52 Why a single Slater determinant is not
exact
The simplest example that illustrates the problem with representing a general
wave function by a single Slater determinant is to try to write a general two-
variable function F (x, y) as a Slater determinant of two functions f1 and f2 .
You would write
a ³ ´
F (x, y) = √ f1 (x)f2 (y) − f2 (x)f1 (y)
2
A general function F (x, y) cannot be written as a combination of the same two
functions f1 (x) and f2 (x) at every value of y. However well chosen the two
functions are.
In fact, for a general antisymmetric function F , a single Slater determinant
can get F right at only two nontrivial values y = y1 and y = y2 . (Nontrivial
here means that functions F (x, y1 ) and F (x, y2 ) should not just be multiples of
each other.) Just take f1 (x) = F (x, y1 ) and f2 (x) = F (x, y2 ). You might object
that in general, you have
F (x, y1 ) = c11 f1 (x) + c12 f2 (x) F (x, y2 ) = c21 f1 (x) + c22 f2 (x)
where c11 , c12 , c21 , and c22 are some constants. (They are f1 or f2 values at
y1 or y2 , to be precise). But if you plug these two expressions into the Slater
determinant formed with F (x, y1 ) and F (x, y2 ) and multiply out, you get the
Slater determinant formed with f1 and f2 within a constant, so it makes no
difference.
If you add a second Slater determinant, you can get F right at two more
y values y3 and y4 . Just take the second Slater determinant’s functions to be
(2) (2)
f1 = ∆F (x, y3 ) and f2 = ∆F (x, y4 ), where ∆F is the deviation between the
true function and what the first Slater determinant gives. Keep adding Slater
determinants to get more and more y-values right. Since there are infinitely
many y-values to get right, you will in general need infinitely many determinants.
You might object that maybe the deviation ∆F from the single Slater de-
terminant must be zero for some reason. But you can use the same ideas to
explicitly construct functions F that show that this is untrue. Just select two
arbitrary but different functions f1 and f2 and form a Slater determinant. Now
choose two locations y1 and y2 so that f1 (y1 ), f2 (y1 ) and f1 (y2 ), f2 (y2 ) are not in
the same ratio to each other. Then add additional Slater determinants whose
(2) (2) (3) (3)
functions f1 , f2 , f1 , f2 , . . . you choose so that they are zero at y1 and y2 .
The so constructed function F is different from just the first Slater determinant.
However, if you try to describe this F by a single determinant, then it could
only be the first determinant since that is the only single determinant that gets
y1 and y2 right. So a single determinant cannot get F right.

657
A.53 Simplification of the Hartree-Fock energy
This note derives the expectation energy for a wave function given by a single
Slater determinant.
First note that if you multiply out a Slater determinant
1 ¯ E
Ψ = √ ¯¯det(ψ1e l1 , ψ2e l2 , ψ3e l3 , . . .)
I!
you are going to get terms, or Hartree products if you want, of the form
±
√ ψne 1 (~r1 )ln1 (Sz1 ) ψne 2 (~r2 )ln2 (Sz2 ) ψne 3 (~r3 )ln3 (Sz3 ) . . .
I!
where the numbers n1 , n2 , n3 , . . . of the single-electron states can have values
from 1 to I, but they must be all different. So there are I! such terms: there
are I possibilities among 1, 2, 3, . . . , I for the number n1 of the single-electron
state for electron 1, which leaves I − 1 remaining possibilities for the number
n2 of the single-electron state for electron 2, I − 2 remaining possibilities for
n3 , etcetera. That means a total of I(I − 1)(I − 2) . . . 2 1 = I! terms. As far as
the sign of the term is concerned, just don’t worry about it. The only thing to
remember is that whenever you exchange two n values, it changes the sign of
the term. It has to be, because exchanging n values is equivalent to exchanging
electrons, and the complete wave function must change sign under that.
To make the above more concrete, consider the example of a√Slater determi-
nant of three single-electron functions. It writes out to, taking I! to the other
side for convenience,
√ ¯¯ E
3! ¯det(ψ1e l1 , ψ2e l2 , ψ3e l3 ) =

+ψ1e (~r1 )l1 (Sz1 )ψ2e (~r2 )l2 (Sz2 )ψ3e (~r3 )l3 (Sz3 )
−ψ1e (~r1 )l1 (Sz1 )ψ3e (~r2 )l3 (Sz2 )ψ2e (~r3 )l2 (Sz3 )
−ψ2e (~r1 )l2 (Sz1 )ψ1e (~r2 )l1 (Sz2 )ψ3e (~r3 )l3 (Sz3 )
+ψ2e (~r1 )l2 (Sz1 )ψ3e (~r2 )l3 (Sz2 )ψ1e (~r3 )l1 (Sz3 )
+ψ3e (~r1 )l3 (Sz1 )ψ1e (~r2 )l1 (Sz2 )ψ2e (~r3 )l2 (Sz3 )
−ψ3e (~r1 )l3 (Sz1 )ψ2e (~r2 )l2 (Sz2 )ψ1e (~r3 )l1 (Sz3 )

The first row in the expansion covers the possibility that n1 = 1, with the
first term the possibility that n2 = 2 and the second term the possibility that
n2 = 3; note that there are then no choices left for n3 . The second row covers
the possibilities in which n1 = 2, and the third n1 = 3. You see that there are
3! = 6 Hartree product terms total.

658
Next, recall that the Hamiltonian consists of single-electron Hamiltonians
heiand electron-pair repulsion potentials viiee . The expectation value of a sin-
gle electron Hamiltonian hei will be done first. In forming the inner product
hΨ|hei |Ψi, and taking Ψ apart into its Hartree product terms as above, you are
going to end up with a large number of individual terms that all look like
D ± ¯
√ ψne 1 (~r1 )ln1 (Sz1 )ψne 2 (~r2 )ln2 (Sz2 ) . . . ψne i (~ri )lni (Szi ) . . . ψne I (~rI )lnI (SzI )¯¯
I!¯ E
¯ ±
hei ¯ √ ψne 1 (~r1 )ln1 (Sz1 )ψne 2 (~r2 )ln2 (Sz2 ) . . . ψne i (~ri )lni (Szi ) . . . ψne I (~rI )lnI (SzI )
I!
Note that overlines will be used to distinguish the wave function in the right
hand side of the inner product from the one in the left hand side. Also note
that to take this inner product, you have to integrate over 3I scalar position
coordinates, and sum over I spin values.
But multiple integrals, and sums, can be factored into single integrals, and
sums, as long as the integrands and limits only involve single variables. So you
can factor out the inner product as
± ± D ¯ E
√ √ ψne 1 (~r1 )ln1 (Sz1 )¯¯ψne 1 (~r1 )ln1 (Sz1 )
I! I! D ¯ E
× ψne 2 (~r2 )ln2 (Sz2 )¯¯ψne 2 (~r2 )ln2 (Sz2 )
×D
... ¯ ¯ E
¯ ¯
× ψne i (~ri )lni (Szi )¯hei ¯ψne i (~ri )lni (Szi )
×D
... ¯ E
× ψne I (~rI )lnI (SzI )¯¯ψne I (~rI )lnI (SzI )

Now you can start the weeding-out process, because the single-electron func-
tions are orthonormal. So factors in this product are zero unless all of the
following requirements are met:

n1 = n1 , n2 = n2 , . . . , ni−1 = ni−1 , ni+1 = ni+1 , . . . , nI = nI

Note that hψne i (~ri )lni (Szi )|hei |ψne i (~ri )lni (Szi )i does not require ni = ni for a
nonzero value, since the single-electron functions are most definitely not eigen-
functions of the single-electron Hamiltonians, (you would wish things were that
easy!) But now remember that the numbers n1 , n2 , n3 , . . . in an individual term
are all different. So the numbers n1 , n2 , . . . , ni−1 , ni+1 , . . . include all the num-
bers that are not equal to ni . Then so do n1 , n2 , . . . , ni−1 , ni+1 , . . ., because they
are the same. And since ni must be different from all of those, it can only be
equal to ni anyway.
So what is left? Well, with all the n values equal to the corresponding n-
values, all the plain inner products are one on account of orthonormality, and

659
the only thing left is:
± ± D ¯ ¯ E
√ √ ψne i (~ri )lni (Szi )¯¯hei ¯¯ψne i (~ri )lni (Szi )
I! I!
Also, the two signs are equal, because with all the n values equal to the
corresponding n values, the wave function term in the right side of the inner
product is the exact same one as in the left side. So the signs multiply to 1, and
you can further factor out the spin inner product, which is one since the spin
states are normalized:
1D e ¯ ¯ ED ¯ E 1D e ¯ ¯ E 1
ψni (~ri )¯¯hei ¯¯ψne i (~ri ) lni (Szi )¯¯lni (Szi ) = ψni (~ri )¯¯hei ¯¯ψne i (~ri ) ≡ Ene
I! I! I!
where for brevity the remaining inner product was called Ene . Normally you
would call it Ene i i , but an inner product integral does not care what the inte-
gration variable is called, so the thing has the same value regardless what the
electron i is. Only the value of the single-electron function number ni = n
makes a difference.
Next, how many such terms are there for a given electron i and single-
electron function number n? Well, for a given n value for electron i, there are
I − 1 possible values left among 1, 2, 3, . . . for the n-value of the first of the other
electrons, then I − 2 left for the second of the other electrons, etcetera. So there
are a total of (I − 1)(I − 2) . . . 1 = (I − 1)! such terms. Since (I − 1)!/I! = 1/I,
if you sum them all together you get a total contribution from terms in which
electron i is in state n equal to Ene /I. Summing over the I electrons kills off
the factor 1/I and so you finally get the total energy due to the single-electron
Hamiltonians as
I
X D ¯ ¯ E
Ene Ene = ψne (~r)¯¯he ¯¯ψne (~r)
n=1

You might have guessed that answer from the start. Since the inner product
integral is the same for all electrons, the subscripts i have been omitted.
The good news is that the reasoning to get the Coulomb and exchange
contributions is pretty much the same. A single electron to electron repulsion
term viiee between an electron numbered i and another numbered i makes a
contribution to the expectation energy equal to hΨ|viiee |Ψi, and if you multiply
out Ψ, you get terms of the general form:
1D e ¯
¯
ψn1 (~r1 )ln1 (Sz1 )ψne 2 (~r2 )ln2 (Sz2 ) . . . ψne i (~ri )lni (Szi ) . . . ψne i (~ri )lni (Szi ) . . . ¯
I! ¯ E
viiee ¯¯ψne 1 (~r1 )ln1 (Sz1 )ψne 2 (~r2 )ln2 (Sz2 ) . . . ψne i (~ri )lni (Szi ) . . . ψne i (~ri )lni (Szi ) . . .

You can again split into a product of individual inner products, except that
you cannot split between electrons i and i since viiee involves both electrons in

660
a nontrivial way. Still, you get again that all the other n-values must be the
same as the corresponding n-values, eliminating those inner products from the
expression:
1D e ¯ ¯ E
¯ ¯
ψni (~ri )lni (Szi )ψne i (~ri )lni (Szi )¯viiee ¯ψne i (~ri )lni (Szi )ψne i (~ri )lni (Szi )
I!
For given values of ni and ni , there are (I − 2)! equivalent terms, since that is
the number of possibilities left for the n = n-values of the other I − 2 electrons.
Next, ni and ni must together be the same pair of numbers as ni and ni ,
since they must be the two numbers left by the set of numbers not equal to ni
and ni . But that still leaves two possibilities, they can be in the same order or
in reversed order:
ni = ni , n i = n i or ni = ni , ni = ni .
The first possibility gives rise to the Coulomb terms, the second to the exchange
ones. Note that the former case represents an inner product involving a Hartree
product with itself, and the latter case an inner product of a Hartree product
with the Hartree product that is the same save for the fact that it has ni and
ni reversed, or equivalently, electrons i and i exchanged.
Consider the Coulomb terms first. For those the two Hartree products in the
inner product are the same, so their signs multiply to one. Also, their spin states
will be the same, so that inner product will be one too. And as noted there are
(I − 2)! equivalent terms for given ni and ni , so for each pair of electrons i and
i 6= i, and each pair of states n = ni and n = ni , you get one term
1
Jnn
I(I − 1)
with D ¯ ¯ E
Jnn ≡ ψne (~r)ψne (~r)¯¯v ee ¯¯ψne (~r)ψne (~r) .
Again, the Jnn are the same regardless of what i and i are; they depend only
on what n = ni and n = ni are. So the subscripts i and i were left out, after
setting ~r = ~ri and ~r = ~ri .
You now need to sum over all pairs of electrons with i 6= i and pairs of
single-electron function numbers n 6= n. Since there are a total of I(I − 1)
electron pairs, it takes out the factor 1/I(I − 1), and you get a contribution to
the energy
I X
X I
1
2
Jnn
n=1 n=1
n6=n

The factor 21 was added since for every electron pair, you are summing both viiee
and viiee , and that counts the same energy twice.

661
The exchange integrals go exactly the same way; the only differences are
that the Hartree product in the right hand side of the inner product has the
values of ni and ni reversed, producing a change of sign, and that the inner
product of the spins is not trivial. Define
D ¯ ¯ E
Knn ≡ ψne (~r)ψne (~r)¯¯v ee ¯¯ψne (~r)ψne (~r) .

and then the total contribution is


I X
X I
− 12 Knn hln |ln i2
n=1 n=1
n6=n

Finally, you can leave the constraint n 6= n on the sums away since Knn =
Jnn , so they cancel each other.

A.54 Integral constraints


This note verifies the mentioned constraints on the Coulomb and exchange in-
tegrals.
To verify that Jnn = Knn , just check their definitions.
The fact that

Jnn = hψne (~ri )ψne (~ri )|viiee |ψne (~ri )ψne (~ri )i
Z Z
e2 1 3
= |ψne (~ri )ψne (~ri )|2 d ~ri d3~ri .
all ~r i all ~ri 4πǫ0 rii
is real and positive is self-evident, since it is an integral of a real and positive
function.
The fact that

Knn = hψne (~ri )ψne (~ri )|viiee |ψne (~ri )ψne (~ri )i
Z Z
e2 1 e
= ψne (~ri )∗ ψne (~ri )∗ ψn (~ri )ψne (~ri ) d3~ri d3~ri
all ~r i all ~ri 4πǫ0 rii
is real can be seen by taking complex conjugate, and then noting that the names
of the integration variables do not make a difference, so you can swap them.
The same name swap shows that Jnn and Knn are symmetric matrices; Jnn =
Jnn and Knn = Knn .
That Knn is positive is a bit trickier; write it as
Z ÃZ !
−ef (~ri ) 1 3
−ef (~ri ) ∗
d ~ri d3~ri
all ~ri all ~r i 4πǫ0 rii

662

with f = ψne ψne . The part within parentheses is just the potential V (~ri ) of
a distribution of charges with density −ef . Sure, f may be complex but that
merely means that the potential is too. The electric field is minus the gradient of
the potential, E ~ = −∇V , and according to Maxwell’s equation, the divergence
~ = −∇2 V = −ef /ǫ0 .
of the electric field is the charge density divided by ǫ0 : divE
∗ 2 ∗
So −ef = −ǫ0 ∇ V and the integral is
Z
−ǫ0 V ∇2 V ∗ d3~ri
all ~ri

and integration by parts shows it is positive. Or zero, if ψne is zero wherever ψne
is not, and vice versa.
To show that Jnn ≥ Knn , note that

hψne (~ri )ψne (~ri ) − ψne (~ri )ψne (~ri )|v ee |ψne (~ri )ψne (~ri ) − ψne (~ri )ψne (~ri )i

is nonnegative, for the same reasons as Jnn but with ψne ψne − ψne ψne replacing
ψne ψne . If you multiply out the inner product, you get that 2Jnn − 2Knn is
nonnegative, so Jnn ≥ Knn .

A.55 Generalized orbitals


This note has a brief look at generalized orbitals of the form

ψne (~r, Sz ) = ψn+


e e
(~r)↑(Sz ) + ψn− (~r)↓(Sz ).

For such orbitals, the expectation energy can be worked out in exactly the
same way as in {A.53}, except without simplifying the spin terms. The energy
is
I D
X ¯ ¯ E I D
I X
X ¯ ¯ E I D
I X
X ¯ ¯ E
hEi = ψne ¯¯he ¯¯ψne + 12 ψne ψne ¯¯v ee ¯¯ψne ψne − 21 ψne ψne ¯¯v ee ¯¯ψne ψne
n=1 n=1 n=1 n=1 n=1
n6=n n6=n

To multiply out to the individual spin terms, it is convenient to normalize


the spatial functions, and write

ψne = cn+ ψn+,0


e e
↑ + cn− ψn−,0 ↓,
e e e e
hψn+,0 |ψn+,0 i = hψn−,0 |ψn−,0 i = 1, |cn+ |2 + |cn− |2 = 1
In that case, the expectation energy multiplies out to
I D
X ¯ ¯ E I D
X ¯ ¯ E
e ¯ e¯ e ¯ e¯ e
hEi = ψn+,0 ¯h ¯ψn+,0 |cn+ |2 + e
ψn−,0 ¯h ¯ψn−,0 |cn− |2
n=1 n=1

663
I µD
I X
X ¯ ¯ E
e e ¯ ee ¯ e e
+ 21 ψn+,0 ψn+,0 ¯v ¯ψn+,0 ψn+,0
n=1 n=1
n6=n

D ¯ ¯ E¶
e e ¯ ee ¯ e e
− ψn+,0 ψn+,0 ¯v ¯ψn+,0 ψn+,0 |cn+ |2 |cn+ |2

I X
X I D ¯ ¯ E
e e ¯ ee ¯ e e
+ 12 2 ψn+,0 ψn−,0 ¯v ¯ψn+,0 ψn−,0 |cn+ |2 |cn− |2
n=1 n=1
n6=n

I µD
I X
X ¯ ¯ E
e e ¯ ee ¯ e e
+ 21 ψn−,0 ψn−,0 ¯v ¯ψn−,0 ψn−,0
n=1 n=1
n6=n

D ¯ ¯ E¶
e e ¯ ee ¯ e e
− ψn−,0 ψn−,0 ¯v ¯ψn−,0 ψn−,0 |cn− |2 |cn− |2

I X
X I µD ¯ ¯ E ¶
e e ¯ ee ¯ e e
− 12 2ℜ ψn+,0 ψn−,0 ¯v ¯ψn+,0 ψn−,0 c∗n+ cn− c∗n− cn+
n=1 n=1
n6=n

where ℜ stands for the real part of its argument.


Now assume you have a normal unrestricted Hartree-Fock solution, and you
try to lower its ground-state energy by selecting, for example, a spin-up orbital
e e
ψm ↑ ≡ ψm+,0 ↑ and adding some amount of spin down to it. First note then that
the final sum above is zero, since at least one of cn+ , cn− , cn− , and cn+ must
be zero: all states except m are still either spin-up or spin-down, and m cannot
be both n and n 6= n. With the final sum gone, the energy is a linear function
of |cm− |2 , with |cm+ |2 = 1 − |cm− |2 . The maximum energy must therefor occur
for either |cm− |2 = 0, the original purely spin up orbital, or for |cm− |2 = 1.
(The latter case means that the unrestricted solution with the opposite spin for
orbital m must have less energy, so that the spin of orbital m was incorrectly
selected.) It follows from this argument that for correctly selected spin states,
the energy cannot be lowered by replacing a single orbital with a generalized
one.
Also note that for small changes, |cm− |2 is quadratically small and can be
ignored. So the variational condition of zero change in energy is satisfied for
all small changes in orbitals, even those that change their spin states. In other
words, the unrestricted solutions are solutions to the full variational problem
δhEi = 0 for generalized orbitals as well.
Since these simple arguments do not cover finite changes in the spin state
of more than one orbital, they do not seem to exclude the possibility that there
might be additional solutions in which two or more orbitals are of mixed spin.
But since either way the error in Hartree-Fock would be finite, there may not

664
be much justification for dealing with the messy problem of generalized orbitals
with dubious hopes of improvement. Procedures already exist that guarantee
improvements on standard Hartree-Fock results.

A.56 Derivation of the Hartree-Fock equations


This note derives the canonical Hartree-Fock equations. The derivation be-
low will be performed under the normally stated rules of engagement that the
orbitals are of the form ψne ↑ or ψne ↓, so that only the spatial orbitals ψne are
continuously variable. The derivations allow for the fact that some spatial spin
states may be constrained to be equal.
First, you can make things a lot less messy by a priori specifying the ordering
of the orbitals. The ordering makes no difference physically, and it simplifies
the mathematics. In particular, in restricted Hartree-Fock some spatial orbitals
appear in pairs, but you can only count each spatial orbital as one unknown
function. The easiest way to handle that is to push the spin-down versions of
the duplicated spatial orbitals to the end of the Slater determinant. Then the
start of the determinant comprises a list of unique spatial orbitals.
So, it will be assumed that the orbitals are ordered as follows:

1. the paired spatial states in their spin-up version; assume there are Np ≥ 0
of them;

2. unpaired spin-up states; assume there are Nu of them;

3. unpaired spin-down states; assume there are Nd of them;

4. and finally, the paired spatial states in their spin-down version.

That means that the Slater-determinant wave function looks like:


1 ¯ E
√ ¯¯det(ψ1e ↑, . . . , ψN
e
p
↑, ψ e
Np +1 ↑, . . . ψ e
N1 ↑, ψ e
N1 +1 ↓, . . . , ψ e
N ↓, ψ e
N +1 ↓, . . . , ψ e
I ↓)
I!
N1 = Np + Nu N = Np + Nu + Nd
e e e e
ψN +1 = ψ1 , ψN +2 = ψ2 , ..., ψIe = ψN
e
p

The total number of unknown spatial orbitals is N , and you need a correspond-
ing N equations for them.
The variational method discussed in section 6.1 says that the expectation
energy must be unchanged under small changes in the orbitals, provided that
penalty terms are added for changes that violate the orthonormality require-
ments on the orbitals.

665
The expectation value of energy was in subsection 6.3.3 found to be:
I
X
hEi = hψne |he |ψne i
n=1

X I
I X I X
X I
+ 21 hψne ψne |v ee |ψne ψne i − 1
2
hψne ψne |v ee |ψne ψne ihln |ln i2
n=1 n=1 n=1 n=1

(From here on, the argument of the first orbital of a pair in either side of an
inner product is taken to be the first inner product integration variable ~r and
the argument of the second orbital is the second integration variable ~r)
The penalty terms require penalty factors called Lagrangian variables. The
penalty factor for violations of the normalization requirement

hψne |ψne i = 1

will be called ǫnn for reasons evident in a second. Orthogonality between any
two spatial orbitals ψne and ψne requires
³ ´ ³ ´
1
2
hψne |ψne i + hψne |ψne i = 0, 1
2
i hψne |ψne i − hψne |ψne i = 0.

where the first constraint says that the real part of hψne |ψne i must be zero and
the second that its imaginary part must be zero too. (Remember that if you
switch sides in an inner product, you turn it into its complex conjugate.) To
avoid including the same orthogonality condition twice, the constraint will be
written only for n > n. The penalty factor for the first constraint will be called
2ǫnn,r , and the one for the second constraint 2ǫnn,i .
In those terms, the penalized change in expectation energy becomes, in the
restricted case that all unique spatial orbitals are mutually orthogonal,
N X
X N
δhEi − ǫnn δhψne |ψne i = 0
n=1 n=1

where ǫnn is an Hermitian matrix, with ǫnn = ǫnn,r + iǫnn,i . The notations for
the Lagrangian variables were chosen above to achieve this final result.
But for unrestricted Hartree-Fock, spatial orbitals are not required to be
orthogonal if they have opposite spin, because the spins will take care of orthog-
onality. You can remove the erroneously added constraints by simply specifying
that the corresponding Lagrangian variables are zero:

ǫnn = 0 if unrestricted Hartree-Fock and hln |ln i = 0

or equivalently, if n ≤ Nu , n > Nu or n > Nu , n ≤ Nu .

666
Now work out the penalized change in expectation energy due to a change
e
in the values of a selected spatial orbital ψm with m ≤ N . It is
X ³ ´
e
hδψm |he |ψm
e e
i + hψm |he |δψm
e
i
e =ψ e
ψn m

X I ³
X ´
e e ee e e e e ee e e
+ 12 hδψm ψn |v |ψm ψn i + hψm ψn |v |δψm ψn i
e =ψ e n=1
ψn m

I
X X ³ ´
+ 21 hψne δψm
e
|v ee |ψne ψm
e
i + hψne ψm
e
|v ee |ψne δψm
e
i
n=1 ψn
e =ψ e
m

X I ³
X ´
e e ee e e e e ee e e
− 21 hδψm ψn |v |ψn ψm i + hψm ψn |v |ψn δψm i hln |ln i2
e =ψ e n=1
ψn m

I
X X ³ ´
− 12 hψne δψm
e
|v ee |ψm
e e
ψn i + hψne ψm
e
|v ee |δψm ψn i hln |ln i2
e e

n=1 ψn
e =ψ e
m

N
X N
X
e
− ǫmn hδψm |ψne i − ǫnm hψne |δψm
e
i=0
n=1 n=1

OK, OK it is a mess. Sums like ψne = ψm e


are for the restricted Hartree-Fock
e
case, in which spatial orbital ψm may appear twice in the Slater determinant.
From now on, just write them as [2], meaning, put in a factor 2 if orbital
e
ψm appears twice. The exception is for the exchange integrals which produce
exactly one nonzero spin product; write that as [hln |lm i2 ], meaning take out
that product if the orbital appears twice.
Next, note that the second term in each row is just the complex conjugate of
e
the first. Considering iδψm as a second possible change in orbital, as was done
in the example in section 6.1, it is seen that the first terms by themselves must
be zero, so you can just ignore the second term in each row. And the integrals
with the factors 21 are pairwise the same; the difference is just a name swap of
the first and second summation and integration variables. So all that you really
have left is
e
[2]hδψm |he |ψm
e
i+
I n
X o
e e ee e e e e ee e e
[2]hδψm ψn |v |ψm ψn i − hδψm ψn |v |ψn ψm i[hln |lm i2 ] −
n=1

N
X
e
ǫmn hδψm |ψne i
n=1

667
Now note that if you write out the inner product over the first position
coordinate, you will get an integral of the general form
Z µ
e ∗
δψm [2]he ψm
e
all ~r

I n
X o N
X ¶
+ [2]hψne |v ee |ψne iψm
e
− hψne |v ee |ψm
e
iψne [hln |lm i2 ] − ǫmn ψne d3~r
n=1 n=1
e
If this integral is to be zero for whatever you take then the terms within δψm ,
e
the parentheses must be zero. (Just take δψm proportional to the parenthetical
expression; you would get the integral of an absolute square, only zero if the
square is.) Unavoidably, you must have that
I
X I
X N
X
[2]he ψm
e
+ [2]hψne |v ee |ψne iψm
e
− [hln |lm i2 ]hψne |v ee |ψm
e
iψne = ǫmn ψne
n=1 n=1 n=1

You can divide by [2]:


I I
( ) N
( )
e e
X X hln |lm i2 X 1
h ψm + hψne |v ee |ψne iψm
e
− 1 hψne |v ee |ψm
e
iψne = 1 ǫmn ψne
n=1 n=1 2 n=1 2
(A.71)
where you use the lower of the choices between the braces in the case that
e
spatial orbital ψm appears twice in the Slater determinant, or equivalently, if
m ≤ Np . There you have the Hartree-Fock equations, one for each m ≤ N .
Recall that they apply assuming the ordering of the Slater determinant given in
the beginning, and that for unrestricted Hartree-Fock, ǫmn is zero if hlm |ln i = 0
is.
e
How about those ǫmn , you say? Shouldn’t the right hand side just be ǫm ψm ?
Ah, you want the canonical Hartree-Fock equations, not just the plain vanilla
version.
OK, let’s do the restricted closed-shell Hartree-Fock case first, then, since it
is the easiest one. Every state is paired, so the lower choice in the curly brackets
always applies, and the number of unique unknown spatial states is N = I/2.
Also, you can reduce the summation upper limits n = I to n = N = I/2 if you
add a factor 2, since the second half of the spatial orbitals are the same as the
first half. So you get
N
X N
X N
X
he ψm
e
+2 hψne |v ee |ψne iψm
e
− hψne |v ee |ψm
e
iψne = 1
ǫ ψe
2 mn n
n=1 n=1 n=1

Now, suppose that you define a new set of orbitals, each a linear combination
of the current ones:
N
X
e
ψν ≡ uνn ψne for ν = 1, 2, . . . , N
n=1

668
where the uνn are the multiples of the original orbitals. Will the new ones still
be an orthonormal set? Well, they will be if
e e
hψ µ |ψ ν i = δµν

where δµν is the Kronecker delta, one if µ = ν, zero otherwise. Substituting in


the definition of the new orbitals, making sure not to use the same name for
two different indices,
N X
X N
u∗µm uνn hψm
e
|ψne i = δµν .
m=1 n=1

Now note that the ψne are orthonormal, so to get a nonzero value, m must
be n, and you get
N
X
u∗µn uνn = δµν .
n=1
Consider n to be the component index of a vector. Then this really says that
vectors ~uµ and ~uν must be orthonormal. So the “matrix of coefficients” must
consist of orthonormal vectors. Mathematicians call such matrices “unitary,”
rather than orthonormal, since it is easily confused with “unit,” and that keeps
mathematicians in business explaining all the confusion.
Call the complete matrix U . Then, according to the rules of matrix multipli-
cation and Hermitian adjoint, the orthonormality condition above is equivalent
to U U H = I where I is the unit matrix. That means U H is the inverse matrix
to U , U H = U −1 and then you also have U H U = I:
N
X
u∗νm uνn = δmn .
ν=1

Now premultiply the definition of the new orbitals above by U H ; you get
N
X N X
X N
e
u∗νm ψ ν = u∗νm uνn ψne
ν=1 ν=1 n=1

but the sum over ν in the right hand side is just δmn and you get
N
X N
X
e
u∗νm ψ ν = δmn ψne = ψm
e
.
ν=1 n=1

That gives you an expression for the original orbitals in terms of the new
ones. For aesthetic reasons, you might just as well renotate ν to µ, the Greek
equivalent of m, to get
N
X
e e
ψm = u∗µm ψ µ .
µ=1

669
Now plug that into the noncanonical restricted closed-shell Hartree-Fock
equations, with equivalent expressions for ψne using whatever summation vari-
able is still available,
N
X N
X N
X
e e e
ψne = u∗µn ψ µ ψne = u∗κn ψ κ ψne = u∗λn ψ λ
µ=1 κ=1 λ=1

and use the reduction formula U U H = I,


N
X N
X
uνm u∗µm = δµν uκn u∗λn = δκλ
m=1 n=1
PN
premultiplying all by U , i.e. put m=1 uνm before each term. You get
N
X N
X N X
X N X
N
e e e e e e e e
he ψ ν + 2 hψ λ |v ee |ψ λ iψ ν − hψ λ |v ee |ψ ν iψ λ = 1
u ǫ u∗ ψ
2 νm mn µn µ
λ=1 λ=1 m=1 n=1 µ=1

Note that the only thing that has changed more than just by symbol names
is the matrix in the right hand side. Now for each value of µ, take u∗µn as the
µ-th orthonormal eigenvector of Hermitian matrix ǫmn , calling the eigenvalue
2ǫµ . Then the right hand side becomes
I X
X N N
X
e e e
uνm ǫµ u∗µm ψ µ = δµν ǫµ ψ µ = ǫν ψ ν
m=1 µ=1 µ=1

So, in terms of the new orbitals defined by the requirement that u∗µn gives the
eigenvectors of ǫmn , the right hand side simplifies to the canonical one.
Since you no longer care about the old orbitals, you can drop the overlines on
the new ones, and revert to sensible roman indices n and n instead of the Greek
ones ν and λ. You then have the canonical restricted closed-shell Hartree-Fock
equations

I/2 I/2
X X
he ψne + 2 hψne |v ee |ψne iψne − hψne |v ee |ψne iψne = ǫn ψne (A.72)
n=1 n=1

that, as noted, assume that the Slater determinant is ordered so that the I/2
spin-up orbitals are at the start of it. Note that the left hand side directly
provides a Hermitian Fock operator if you identify it as Fψne ; there is no need
to involve spin here.
In the unrestricted case, the noncanonical equations are
I
X I
X I
X
he ψm
e
+ hψne |v ee |ψne iψm
e
− hln |lm i2 hψne |v ee |ψm
e
iψne = ǫmn ψne
n=1 n=1 n=1

670
In this case the spin-up and spin-down spatial states are not mutually orthonor-
mal, and you want to redefine the group of spin up states and the group of spin
down states separately.
The term in linear algebra is that you want to partition your U matrix.
What that means is simply that you separate the orbital numbers into two sets.
The set of numbers n ≤ Nu of spin-up orbitals will be indicated as U, and the
set of values n > Nu of spin-down ones by D. So you can partition (separate)
the noncanonical equations above into equations for m ∈ U (meaning m is one
of the values in set U):
X X X X
he ψm
e
+ hψne |v ee |ψne iψm
e
+ hψne |v ee |ψne iψm
e
− hψne |v ee |ψm
e
iψne = ǫmn ψne ,
n∈U n∈D n∈U n∈U

and equations for m ∈ D


X X X X
he ψm
e
+ hψne |v ee |ψne iψm
e
+ hψne |v ee |ψne iψm
e
− hψne |v ee |ψm
e
iψne = ǫmn ψne .
n∈U n∈D n∈D n∈D

In these two equations, the fact that the up and down spin states are orthogonal
was used to get rid of one pair of sums, and another pair was eliminated by the
fact that there are no Lagrangian variables ǫmn linking the sets, since the spatial
orbitals in the sets are allowed to be mutually nonorthogonal.
Now separately replace the orbitals of the up and down states by a modified
set just like for the restricted closed-shell case above, for each using the unitary
matrix of eigenvectors of the ǫmn coefficients appearing in the right hand side
of the equations for that set. It leaves the equations intact except for changes
in names, but gets rid of the equivalent of the ǫmn for m 6= n, leaving only ǫmm -
equivalent values. Then combine the spin-up and spin-down equations again
into a single expression. You get, in terms of revised names,
I
X I
X
he ψne + hψne |v ee |ψne iψne − hln |ln i2 hψne |v ee |ψne iψne = ǫn ψne (A.73)
n=1 n=1

In the restricted open-shell Hartree-Fock method, the partitioning also needs


to include the set P of orbitals n ≤ Np whose spatial orbitals appear both with
spin-up and spin-down in the Slater determinant. In that case, the procedure
above to eliminate the ǫmn values for m 6= n no longer works, since there are
coefficients relating the sets. This (even more) elaborate case will be left to the
references that you can find in [18].
Woof.

A.57 Why the Fock operator is Hermitian


To verify that the Fock operator is Hermitian, first note that he is Hermitian
since it is an Hamiltonian. Next if you form the inner product hψ e l|v HF ψ e li,

671
the first term in v HF , the Coulomb term, can be taken to the other side since it
is just a real function. The second term, the exchange one, produces the inner
product,
I ¿ ¯ À
X ¯ e
¯ ee e e
− ψ e (~r)l(S z )¯hψn (~
r)ln (Sz1 )|v |ψ (~r)l(Sz1 )iψn (~r)ln (Sz )
n=1

and if you take the operator to the other side, you get
I ¿ ¯ À
X ¯
− hψne (~r)ln (Sz )|v ee |ψ e (~r)l(Sz )iψne (~r)ln (Sz1 )¯¯ψ e (~r)l(Sz1 )
n=1

and writing out these inner products as six-dimensional spatial integrals and
sums over spin, you see that they are the same.

A.58 “Correlation energy”


The error in Hartree-Fock is due to the single-determinant approximation only.
A term like “Hartree-Fock error“ or “single-determinantal error” is therefor both
precise, and immediately understandable by a general audience.
However, it is called “correlation energy,” and to justify that term, it would
have to be both clearer and equally correct mathematically. It fails both re-
quirements miserably. The term correlation energy is clearly confusing and
distracting for nonspecialist. But in addition, there does not seem to be any
theorem that proves that an independently defined correlation energy is iden-
tical to the Hartree-Fock single determinant error. That would not just make
the term correlation energy disingenuous, it would make it wrong.
Instead of finding a rigorous theorem, you are lucky if standard textbooks,
e.g,. [18, 12, 13] and typical web references, offer a vague qualitative story
why Hartree-Fock underestimates the repulsions if a pair of electrons gets very
close. That is a symptom of the disease of having an incomplete function rep-
resentation, it is not the disease itself. Low-parameter function representations
have general difficulty with representing localized effects, whatever their phys-
ical source. If you make up a system where the Coulomb force vanishes both
at short and at long distance, such correlations do not exist, and Hartree-Fock
would still have a finite error.
The kinetic energy is not correct either; what is the correlation in that?
Some sources, like [12] and web sources, seem to suggest that these are “indi-
rect” results of having the wrong correlation energy, whatever correlation energy
may be. The idea is apparently, if you would have the electron-electron repul-
sions exact, you would compute the correct kinetic energy too. That is just like
saying, if you computed the correct kinetic energy term, you would compute the

672
correct potential too, so let’s rename the Hartree-Fock error “kinetic energy in-
teraction.” Even if you computed the potential energy correctly, you would still
have to convert the wave function to single-determinantal form before evaluat-
ing the kinetic energy, otherwise it is not Hartree-Fock, and that would produce
a finite error. Phrased differently, there is absolutely no way to get a general
wave function correct with a finite number of single-electron functions, whatever
corrections you make to the potential energy.
Szabo and Ostlund [18, p. 51ff,61] state that it is called correlation energy
since “the motion of electrons with opposite spins is not correlated within the
Hartree-Fock approximation.” That is incomprehensible, for one thing since it
seems to suggest that Hartree-Fock is exact for excited states with all electrons
in the same spin state, which would be ludicrous. In addition, the electrons do
not have motion; a stationary wave function is computed, and they do not have
spin; all electrons occupy all the states, spin up and down. It is the orbitals that
have spin, and the spin-up and spin-down orbitals are most definitely correlated.
However, the authors do offer a “clarification;” they take a Slater determi-
nant of two opposite spin orbitals, compute the probability of finding the two
electrons at given positions and find that it is correlated. They then explain:
that’s OK; the exchange requirements do not allow uncorrelated positions. This
really helps an engineer trying to figure out why the “motion” of the two elec-
trons is uncorrelated!
The unrestricted Hartree-Fock solution of the dissociated hydrogen molecule
is of this type. Since if one electron is around the left proton, the other is
around the right one, and vice versa, many people would call the positions of
the electrons strongly correlated. But now we engineers understand that this
“does not count,” because an uncorrelated state in which electron 1 is around
the left proton for sure and electron 2 is around the right one for sure is not
allowed.
Having done so well, the authors offer us no further guidance how we are
supposed to figure out whether or not electrons 1 and 2 are of opposite spin if
there are more than two electrons. It is true that if the wave function
Ψ(~r1 , 21 h̄,~r2 , − 21 h̄,~r3 , Sz3 , . . .)
is represented by a single small determinant, (like for helium or lithium, say),
it leads to uncorrelated spatial probability distributions for electrons 1 and 2.
However, that stops being true as soon as there are at least two spin-up states
and two spin-down states. And of course it is again just a symptom of the
single-determinant disease, not the disease itself. Not a sliver of evidence is
given that the supposed lack of correlation is an important source of the error
in Hartree-Fock, let alone the only error.
Koch and Holthausen, [12, pp.22-23], address the same two electron example
as Szabo and Ostlund, but do not have the same problem of finding the electron

673
probabilities correlated. For example, if the spin-independent probability of
finding the electrons at positions ~r1 and ~r2 in the dissociated hydrogen molecule
is
1
2
|ψL (~r1 )|2 |ψR (~r2 )|2 + 12 |ψR (~r1 )|2 |ψL (~r2 )|2
then, Koch and Holthausen explain to us, the second term must be the same
as the first. After all, if the two terms were different, the electrons would
be distinguishable: electron 1 would be the one that selected ψL in the first
term that Koch and Holthausen wrote down in their book. So, the authors
conclude, the second term above is the same as the first, making the probability
of finding the electrons equal to twice the first term, |ψL (~r1 )|2 |ψR (~r2 )|2 . That
is an uncorrelated product probability.
However, the assumption that electrons are indistinguishable with respect
to mathematical formulae in books is highly controversial. Many respected
references, and this book too, only see an empirical requirement that the wave
function, not books, be antisymmetric with respect to exchange of any two
electrons. And the wave function is antisymmetric even if the two terms above
are not the same.
Wikipedia, [[9]], Hartree-Fock entry June 2007, lists electron correlation,
(defined here vaguely as “effects” arising from from the mean-field approxima-
tion, i.e. using the same v HF operator for all electrons) as an approximation
made in addition to using a single Slater determinant. Sorry, but Hartree-Fock
gives the best single-determinantal approximation; there is no additional ap-
proximation made. The mean “field” approximation is a consequence of the
single determinant, not an additional approximation. Then this reference pro-
ceeds to declare this correlation energy the most important of the set, in other
words, more important that the single-determinant approximation! And again,
even if the potential energy was computed exactly, instead of using the v HF
operator, and only the kinetic energy was computed using a Slater determinant,
there would still be a finite error. It would therefor appear then that the name
correlation energy is sufficiently impenetrable and poorly defined that even the
experts cannot necessarily figure it out.
Consider for a second the ground state of two electrons around a massive nu-
cleus. Because of the strength of the nucleus, the Coulomb interaction between
the two electrons can to first approximation be ignored. A reader of the various
vague qualitative stories listed above may then be forgiven for assuming that
Hartree-Fock should not have any error. But only the unrestricted Hartree-Fock
solution with those nasty, “uncorrelated” (true in this case), opposite-spin “elec-
trons” (orbitals) is the one that gets the energy right. A unrestricted solution
in terms of those perfect, correlated, aligned-spin “electrons” gets the energy all
wrong, since one orbital will have to be an excited one. In short the “correlation
energy” (error in energy) that, we are told, is due to the “motion” of electrons

674
of opposite spins not being “correlated” is in this case 100% due to the motion
of aligned-spin orbitals being correlated. Note that both solutions get the spin
wrong, but we are talking about energy.
And what happened to the word “error” in “correlation energy error?” If
you did a finite difference or finite element computation of the ground state,
you would not call the error in energy “truncation energy;” it would be called
“truncation error” or “energy truncation error.” Why does one suspect that the
appropriate and informative word “error” did not sound “hot” enough to the
physicists involved?
Many sources refer to a reference, (Löwdin, P.-E., 1959, Adv. Chem. Phys.,
2, 207) instead of providing a solid justification of this widely-used key term
themselves. If one takes the trouble to look up the reference, does one find a
rigorously defined correlation energy and a proof it is identical in magnitude to
the Hartree-Fock error?
Not exactly. One finds a vague qualitative story about some perceived
“holes” whose mathematically rigorous definition remains restricted to the cen-
ter point of one of them. However, the lack of a defined hole size is not supposed
to deter the reader from agreeing wholeheartedly with all sorts of claims about
the size of their effects. Terms like “main error,”, “small error,” “large correla-
tion error” (qualified by “certainly”), “vanish or be very small,” (your choice),
are bandied around, even though there is no small parameter that would allow
any rigorous mathematical definition of small or big.
Then the author, who has already noted earlier that the references cannot
agree on what the heck correlation energy is supposed to mean in the first place,
states “In order to get at least a formal definition of the problem, . . . ” and
proceeds to redefine the Hartree-Fock error to be the “correlation energy.” In
other words, since correlation energy at this time seems to be a pseudo-scientific
concept, let’s just cross out the correct name Hartree-Fock error, and write in
“correlation energy!”
To this author’s credit, he does keep the word error in “correlation error in
the wave function” instead of using “correlation wave function.” But somehow,
that particular term does not seem to be cited much in literature.

A.59 Explanation of the London forces


To fully understand the details of the London forces, it helps to first understand
the popular explanation of them, and why it is all wrong. To keep things
simple, the example will be the London attraction between two neutral hydrogen
atoms that are well apart. (This will also correct a small error that the earlier
discussion of the hydrogen molecule made; that discussion implied incorrectly
that there is no attraction between two neutral hydrogen atoms that are far

675
apart. The truth is that there really is some Van der Waals attraction. It was
ignored because it is small compared to the chemical bond that forms when the
atoms are closer together and would distract from the real story.)

a) b)

c) d)

Figure A.17: Possible polarizations of a pair of hydrogen atoms.

The popular explanation for the London force goes something like this:
“Sure, there would not be any attraction between two distant hydrogen atoms
if they were perfectly spherically symmetric. But according to quantum me-
chanics, nature is uncertain. So sometimes the electron clouds of the two atoms
are somewhat to the left of the nuclei, like in figure A.17 (b). This polarization
[dipole creation] of the atoms turns out to produce some electrostatic attraction
between the atoms. At other times, the electron clouds are somewhat to the
right of the nuclei like in figure A.17 (c); it is really the same thing seen in the
mirror. In cases like figure A.17 (a), where the electron clouds move towards
each other, and (b), where they move away from each other, there is some repul-
sion between the atoms; however, the wave functions become correlated so that
(b) and (c) are more likely than (a) and (d). Hence a net attraction results.”
Before examining what is wrong with this explanation, first consider what
is right. It is perfectly right that figure A.17 (b) and (c) produce some net
attraction between the atoms, and that (a) and (d) produce some repulsion.
This follows from the net Coulomb potential energy between the atoms for
given positions of the electrons:
µ ¶
e2 1 1 1 1
VLR = − − +
4πǫ0 d rL rR rLR

where e = 1.6 10−19 C is the magnitude of the charges of the protons and
electrons, ǫ0 = 8.85 10−12 C2 /J m is the permittivity of space, d is the distance
between the nuclei, rL is the distance between the left electron and the right
nucleus, rR the one between the right electron and the left nucleus, and rLR is

676
the distance between the two electrons. If the electrons charges are distributed
over space according to densities nL (~rL ) and nR (~rR ), the classical potential
energy is
µ ¶
e2 Z Z
1 1 1 1
VLR = − − + nL (~rL )nR (~rR ) d3~rL d3~rR
4πǫ0 all ~rL all ~rR d rL rR rLR
(Since the first, 1/d, term represents the repulsion between the nuclei, it may
seem strange to integrate it against the electron charge distributions, but the
charge distributions integrate to one, so they disappear. Similarly in the second
and third term, the charge distribution of the uninvolved electron integrates
away.)
Since it is assumed that the atoms are well apart, the integrand above can
be simplified using Taylor series expansions to give:

e2 Z Z
xL xR + yL yR − 2zL zR
VLR = nL (~rL )nR (~rR ) d3~rL d3~rR
4πǫ0 all ~rL all ~rR d3
where the positions of the electrons are measured from their respective nuclei.
Also, the two z-axes are both taken horizontal and positive towards the left. For
charge distributions as shown in figure A.17, the xL xR and yL yR terms integrate
to zero because of odd symmetry. However, for a distribution like in figure A.17
(c), nL and nR are larger at positive zL , respectively zR , than at negative one, so
the integral will integrate to a negative number. That means that the potential
is lowered, there is attraction between the atoms. In a similar way, distribution
(b) produces attraction, while (a) and (d) produce repulsion.
So there is nothing wrong with the claim that (b) and (c) produce attraction,
while (a) and (d) produce repulsion. It is also perfectly right that the combined
quantum wave function gives a higher probability to (b) and (c) than to (a) and
(d).
So what is wrong? There are two major problems with the story.
1. Energy eigenstates are stationary. If the wave function oscillated in time
like the story suggests, it would require uncertainty in energy, which would
act to kill off the lowering of energy. True, states with the electrons at
the same side of their nuclei are more likely to show up when you measure
them, but to reap the benefits of this increased probability, you must not
do such a measurement and just let the electron wave function sit there
unchanging in time.

2. The numbers are all wrong. Suppose the wave functions in figures (b)
and (c) shift (polarize) by a typical small amount ε. Then the attractive
potential is of order ε2 /d3 . Since the distance d between the atoms is
assumed large, the energy gained is a small amount times ε2 . But to shift

677
atom energy eigenfunctions by an amount ε away from their ground state
takes an amount of energy Cε2 where C is some constant that is not small.
So it would take more energy to shift the electron clouds than the dipole
attraction could recover. In the ground state, the electron clouds should
therefor stick to their original centered positions.

On to the correct quantum explanation. First the wave function is needed. If


there were no Coulomb potentials linking the atoms, the combined ground-state
electron wave function would simply take the form

ψ(~rL ,~rR ) = ψ100 (~rL )ψ100 (~rR )

where ψ100 is the ground state wave function of a single hydrogen atom. To get
a suitable correlated polarization of the atoms, throw in a bit of the ψ210 “2pz ”
states, as follows:

ψ(~rL ,~rR ) = 1 − ε2 ψ100 (~rL )ψ100 (~rR ) + εψ210 (~rL )ψ210 (~rR ).

For ε > 0, it produces the desired correlation between the wave functions: ψ100
is always positive, and ψ210 is positive if the electron is at the positive-z side
of its nucleus and negative otherwise. So if both electrons are at the same
side of their nucleus, the product ψ210 (~rL )ψ210 (~rR ) is positive, and the wave
function is increased, giving increased probability of such states. Conversely, if
the electrons are at opposite sides of their nucleus, ψ210 (~rL )ψ210 (~rR ) is negative,
and the wave function is reduced.
Now write the expectation value of the energy:
√ √
hEi = h 1 − ε2 ψ100 ψ100 +εψ210 ψ210 |HL +HR +VLR | 1 − ε2 ψ100 ψ100 +εψ210 ψ210 i

where HL and HR are the Hamiltonians of the individual electrons and

e2 xL xR + yL yR − 2zL zR
VLR =
4πǫ0 d3
is again the potential between atoms. Working out the inner product, noting
that the ψ100 and ψ210 are orthonormal eigenfunctions of the atom Hamiltonians
HL and HR with eigenvalues E1 and E2 , and that most VLR integrals are zero
on account of odd symmetry, you get
√ e2 1
hEi = 2E1 + 2ε2 (E2 − E1 ) − 4ε 1 − ε2 hψ100 ψ100 |zL zR |ψ210 ψ210 i.
4πǫ0 d3
The final term is the savior for deriving the London force. For small values
of ε, for which the square root can be approximated as one, this energy-lowering

678
term dominates the energy 2ε2 (E2 − E1 ) needed to distort the atom wave func-
tions. The best approximation to the true ground state is then obtained when
the quadratic in ε is minimal. That happens when the energy has been lowered
by an amount
à !2
2 e2 2 1
hψ100 |z|ψ210 i .
E2 − E1 4πǫ0 d6
Since the assumed eigenfunction is not exact, this variational approximation
will underestimate the actual London force. For example, it can be seen that
the energy can also be lowered similar amounts by adding some of the 2px and
2py states; these cause the atom wave functions to move in opposite directions
normal to the line between the nuclei.
So what is the physical meaning of the savior term? Consider the inner
product that it represents:

hψ100 ψ100 |VLR |ψ210 ψ210 i.

That is the energy if both electrons are in the spherically symmetric ψ100 ground
state if both electrons are in the antisymmetric 2pz state. The savior term is a
twilight term, like the ones discussed earlier in chapter 4.3 for chemical bonds.
It reflects nature’s habit of doing business in terms of an unobservable wave
function instead of observable probabilities.

A.60 Ambiguities in the definition of electron


affinity
The International Union of Pure and Applied Chemistry (IUPAC) Gold Book
defines electron affinity as “Energy required to detach an electron from the singly
charged negative ion [. . . ] The equivalent more common definition is the energy
released (Einitial − Efinal ) when an additional electron is attached to a neutral
atom or molecule.” This is also the definition given by Wikipedia. Chemguide
says “The first electron affinity is the energy released when 1 mole of gaseous
atoms each acquire an electron to form 1 mole of gaseous 1- ions.” HyperPhysics
says “The electron affinity is a measure of the energy change when an electron
is added to a neutral atom to form a negative ion.” Encyclopedia Brittanica
says “in chemistry, the amount of energy liberated when an electron is added
to a neutral atom to form a negatively charged ion.” Chemed.chem.purdue.edu
says “The electron affinity of an element is the energy given off when a neutral
atom in the gas phase gains an extra electron to form a negatively charged ion.”
Another definition that can be found: “Electron affinity is the energy re-
leased when an electron is added to the valence shell of a gas-phase atom.”

679
Note the additional requirement here that the electron be added to the valence
shell of the atom. It may make a difference.
First note that it is not self-evident that a stable negative ion exists. Atoms,
even inert noble gasses, can be weakly bound together by Van der Waals/London
forces. You might think that similarly, a distant electron could be weakly bound
to an atom or molecule through the dipole strength it induces in the atom or
molecule. The atom’s or molecule’s electron cloud would move a bit away from
the distant electron, allowing the nucleus to exert a larger attractive force on
the distant electron than the repulsive force by the electron cloud. Remember
that according to the variational principle, the energy of the atom or molecule
does not change due to small changes in wave function, while the dipole strength
does. So the electron would be weakly bound.
It sounds logical, but there is a catch. A theoretical electron at rest at infinity
would have an infinitely large wave function blob. If it moves slightly towards
the attractive side of the dipole, it would become somewhat localized. The
associated kinetic energy that the uncertainty principle requires, while small at
large distances, still dwarfs the attractive force by the induced dipole which is
still smaller at large distances. So the electron would not be bound. Note that if
the atom or molecule itself already has an inherent dipole strength, then if you
ballpark the kinetic energy, you find that for small dipole strength, the kinetic
energy dominates and the electron will not be bound, while for larger dipole
strength, the electron will move in towards the electron cloud with increasing
binding energy, presumably until it hits the electron cloud.
In the case that there is no stable negative ion, the question is, what to make
of the definitions of electron affinity above. If there is a requirement that the
additional electron be placed in the valence shell, there would be energy needed
to do so for an unstable ion. Then the electron affinity would be negative.
If there is however no requirement to place the electron in the valence shell,
you could make the negative value of the electron affinity arbitrarily small by
placing the electron in a sufficiently highly-excited state. Then there would be
no meaningful value of the electron affinity, except maybe zero.
Various reputed sources differ greatly about what to make of the electron
affinities if there is no stable negative ion. The CRC Handbook of Chemistry
and Physics lists noble gasses, metals with filled s shells, and nitrogen all as “not
stable” rather than giving a negative electron affinity for them. That seems to
agree with the IUPAC definition above, which does not require a valence shell
position. However, the Handbook does give a small negative value for ytterbium.
A 2001 professional review paper on electron affinity mentioned that it would
not discuss atoms with negative electron affinities, seemingly implying that they
do exist.
Quite a lot of web sources list specific negative electron affinity values for
atoms and molecules. For example, both Wikipedia and HyperPhysics give

680
specific negative electron affinity values for benzene. Though one web source
based on Wikipedia (!) claims the opposite.
Also note that references, like Wikipedia and HyperPhysics, differ over how
the sign of electron affinity should be defined, making things even more con-
fusing. Wikipedia however agrees with the IUPAC Gold Book on this point: if
a stable ion exist, there is a positive affinity. Which makes sense; if you want
to specify a negative value for a stable ion, you should not give it the name
“affinity.”
Wikipedia (July 2007) also claims: “All elements have a positive electron
affinity, but older texts mistakenly report that some elements such as inert
gases have negative [electron affinity], meaning they would repel electrons. This
is not recognized by modern chemists.” However, this statement is very hard to
believe in view of all the authoritative sources, like the CRC Handbook above,
that explicitly claim that various elements do not form stable ions, and often
give explicit negative values for the electron affinity of various elements. If the
2007 Handbook would after all these years still misstate the affinity of many
elements, would not by now a lot of people have demanded their money back? It
may be noted that Wikipedia lists Ytterbium as blank, and the various elements
listed as not stable by the CRC handbook as stars, in other words, Wikipedia
itself does not even list the positive values it claims.

A.61 Why Floquet theory should be called so


At about the same time as Floquet, Hill appears to have formulated similar
ideas. However, he did not publish them, and the credit of publishing a publicly
scrutinizable exposure fairly belongs to Floquet.
Note that there is much more to Floquet theory than what is discussed here.
If you have done a course on differential equations, you can see why, since the
simplest case of periodic coefficients is constant coefficients. Constant coefficient
equations may have exponential solutions that do not have purely imaginary
arguments, and they may include algebraic factors if the set of exponentials is
not complete. The same happens to the variable coefficient case, with additional
periodic factors thrown in. But these additional solutions are not relevant to the
discussed periodic crystals. They can be relevant to describing simple crystal
boundaries, though.

A.62 Superfluidity versus BEC


Many texts and most web sources suggest quite strongly, without explicitly
saying so, that the so-called “lambda” phase transition at 2.17 K from normal

681
helium I to superfluid helium II indicates Bose-Einstein condensation.
One reason given that is that the temperature at which it occurs is com-
parable in magnitude to the temperature for Bose-Einstein condensation in a
corresponding system of noninteracting particles. However, that argument is
very weak; the similarity in temperatures merely suggests that the main en-
ergy scales involved are the classical energy kB T and the quantum energy scale
formed from h̄2 /2m and the number of particles per unit volume. There are
likely to be other processes that scale with those quantities besides macroscopic
amounts of atoms getting dumped into the ground state.
Still, there is not much doubt that the transition is due to the fact that
3
helium atoms are bosons. The isotope He that is missing a neutron in its
nucleus does not show a transition to a superfluid until 2.5 mK. The three
orders of magnitude difference can hardly be due to the minor difference in mass;
the isotope does condense into a normal liquid at a comparable temperature
as plain helium, 3.2 K versus 4.2 K. Surely, the vast difference in transition
temperature to a superfluid is due to the fact that normal helium atoms are
3
bosons, while the missing spin 12 neutron in He atoms makes them fermions.
3
(The eventual superfluid transition of He at 2.5 mK occurs because at extremely
low temperatures very small effects allow the atoms to combine into pairs that
act as bosons with net spin one.)
While the fact that the helium atoms are bosons is apparently essential to
the lambda transition, the conclusion that the transition should therefor be
Bose-Einstein condensation is simply not justified. For example, Feynman [7,
p. 324] shows that the boson character has a dramatic effect on the excited
states. (Distinguishable particles and spinless bosons have the same ground
state; however, Feynman shows that the existence of low energy excited states
that are not phonons is prohibited by the symmetrization requirement.) And
this effect on the excited states is a key part of superfluidity: it requires a finite
amount of energy to excite these states and thus mess up the motion of helium.
Another argument that is usually given is that the specific heat varies with
temperature near the Lambda point just like the one for Bose-Einstein con-
densation in a system of noninteracting bosons. This is certainly a good point
if you pretend not to see the dramatic, glaring, differences. In particular, the
Bose-Einstein specific heat is finite at the Bose-Einstein temperature, while the
one at the lambda point is infinite.. How much more different can you get?
In addition, the specific heat curve of helium below the lambda point has a
logarithmic singularity at the lambda point. The specific heat curve of Bose-
Einstein condensation for a system with a unique ground state stays analytical
until the condensation terminates, since at that point, out of the blue, nature
starts enforcing the requirement that the number of particles in the ground state
cannot be negative, {A.68}.

682
Tilley and Tilley [19, p. 37] claim that the qualitative correspondence be-
tween the curves for the number of atoms in the ground state in Bose-Einstein
condensation and the fraction of superfluid in a two-fluid description of liquid
helium “are sufficient to suggest that Tλ marks the onset of Bose-Einstein con-
4
densation in liquid He.” Sure, if you think that a curve reaching a maximum of
one exponentially has a similarity to one that reaches a maximum of one with
infinite curvature. And note that it is quite generally believed that the conden-
sate fraction in liquid helium, unlike that in true Bose-Einstein condensation,
does not reach one at zero temperature in the first place, but only about 10%
or so, [19, pp. 62-66].
Since the specific heat curves are completely different, Occam’s razor would
suggest that helium has some sort of different phase transition at the lambda
point. In that case, if the concept of Bose-Einstein condensation is still mean-
ingful for liquid helium, it would presumably imply that normal helium I would
condensate at a temperature below the lambda point, and helium II at one
above the lambda point.
However, Tilley and Tilley [19, pp. 62-66] present data, their figure 2.17, that
suggests that the number of atoms in the ground state does indeed increase from
zero at the lambda point, if various models are to be believed and one does not
demand great accuracy. So, the best available knowledge seems to be that Bose-
Einstein condensation, whatever that means for liquid helium, does occur at the
lambda point. But the fact that many sources see “evidence” of condensation
where none exists is worrisome: obviously, the desire to believe despite the
evidence is strong and widespread, and might affect the objectivity of the data.
The question whether Bose-Einstein condensation occurs at the lambda
point seems to be academic anyway. The following points can be distilled from
Schmets and Montfrooij [14]:
1. Bose-Einstein condensation is a property of the ground state, while super-
fluidity is a property of the excited states.
2. Ideal Bose-Einstein condensates are not superfluid.
3. Below 1 K, essentially 100% of the helium atoms flow without viscosity,
even though only about 7% is in the ground state.
4. In fact, there is no reason why a system could not become a superfluid
even if only a very small fraction of the atoms were to form a condensate.
An undisputed Bose-Einstein condensation was achieved in 1995 by Cornell,
Wieman, et al by cooling a dilute gas of rubidium atoms to below about 170
nK (nano Kelvin). Based on the extremely low temperature and fragility of the
condensate, practical applications are very likely to be well into the future, and
even determination of the condensate’s basic properties will be hard.

683
A.63 Explanation of Hund’s first rule
Hund’s first rule of spin-alignment applies because electrons in atoms prefer to
go into spatial states that are antisymmetric with respect to electron exchange.
Spin alignment is then an unavoidable consequence of the weird antisymmetriza-
tion requirement.
To understand why electrons want to go into antisymmetric spatial states,
the interactions between the electrons need to be considered. Sweeping them
below the carpet as the discussion of atoms in chapter 4.9 did is not going to
cut it.
To keep it as simple as possible, the case of the carbon atom will be consid-
ered. As the crude model of chapter 4.9 did correctly deduce, the carbon atom
has two 1s electrons locked into a zero-spin singlet state, and similarly two 2s
electrons also in a singlet state. Hund’s rule is about the final two electrons
that are in 2p states. As far as the simple model of chapter 4.9 was concerned,
these electrons can do whatever they want within the 2p subshell.
To go one better than that, the correct interactions between the two 2p
electrons will need to be considered. To keep the arguments manageable, it will
still be assumed that the effects of the 1s and 2s electrons are independent of
where the 2p electrons are.
Call the 2p electrons α and β. Under the stated conditions, their Hamilto-
nian takes the form
Hα + Hβ + Vαβ
where Hα and Hβ are the single-electron Hamiltonians for the electrons α and
β, consisting of their kinetic energy, their attraction to the nucleus, and the
repulsion by the 1s and 2s electrons. Note that in the current analysis, it is not
required that the 1s and 2s electrons are treated as located in the nucleus. Lack
of shielding can be allowed now, but it must still be assumed that the 1s and
2s electrons are unaffected by where the 2p electrons are. In particular, Hα is
assumed to be be independent of the position of electron β, and Hβ independent
of the position of electron α. The mutual repulsion of the two 2p electrons is
given by Vαβ = e2 /4πǫ0 |~rα − ~rβ |.
Now assume that electrons α and β appropriate two single-electron spatial
2p states for themselves, call them ψ1 and ψ2 . For carbon, ψ1 can be thought
of as the 2pz state and ψ2 as the 2px state, The general spatial wave function
describing the two electrons takes the generic form

aψ1 (~r1 )ψ2 (~r2 ) + bψ2 (~r1 )ψ1 (~r2 ).

The two states ψ1 and ψ2 will be taken to be orthonormal, like pz and px are,
and then the normalization requirement is that |a|2 + |b|2 = 1.

684
The expectation value of energy is

haψ1 ψ2 + bψ2 ψ1 |Hα + Hβ + Vαβ |aψ1 ψ2 + bψ2 ψ1 i.

That can be multiplied out and then simplified by noting that in the various
inner product integrals involving the single-electron Hamiltonians, the integral
over the coordinate unaffected by the Hamiltonian is either zero or one because
of orthonormality. Also, the inner product integrals involving Vαβ are pairwise
the same, the difference being just a change of names of integration variables.
The simplified expectation energy is then:

Eψ1 + Eψ2 + hψ1 ψ2 |Vαβ |ψ1 ψ2 i + (a∗ b + b∗ a)hψ1 ψ2 |Vαβ |ψ2 ψ1 i.

The first two terms are the single-electron energies of states ψ1 and ψ2 . The third
term is the classical repulsion between between two electron charge distributions
of strengths |ψ1 |2 and |ψ2 |2 . The electrons minimize this third term by going
into spatially separated states like the 2px and 2pz ones, rather than into the
same spatial state or into greatly overlapping ones.
The final one of the four terms is the interesting one for Hund’s rule; it de-
termines how the two electrons occupy the two states ψ1 and ψ2 , symmetrically
or antisymmetrically. Consider the detailed expression for the inner product
integral appearing in the term:
Z Z
hψ1 ψ2 |Vαβ |ψ2 ψ1 i = Vαβ f (~r1 ,~r2 )f ∗ (~r2 ,~r1 ) d3~r1 d3~r2
all ~r 1 all ~r 2

where f (~r1 ,~r2 ) = ψ2 (~r1 )ψ1 (~r2 ).


The sign of this inner product can be guesstimated. If Vαβ would be the
same for all electron separation distances, the integral would be zero because of
orthonormality of ψ1 and ψ2 . However, Vαβ favors positions where ~r1 and ~r2 are
close to each other; in fact Vαβ is infinitely large if ~r1 = ~r2 . At such a location
f (~r1 ,~r2 )f ∗ (~r2 ,~r1 ) is a positive real number, so it tends to have a positive real
part in regions it really counts. That means the inner product integral should
have the same sign as Vαβ ; it should be repulsive.
And since this integral is multiplied by a∗ b+b∗ a, the energy is smallest when
that is most negative, which is for the antisymmetric spatial state a = −b. Since
this state takes care of the sign change in the antisymmetrization requirement,
the spin state must be unchanged under particle exchange; the spins must be
aligned. More precisely, the spin state must be some linear combination of
the three triplet states with net spin one. There you have Hund’s rule, as an
accidental byproduct of the Coulomb repulsion.
This leaves the philosophical question why for the two electrons of the hydro-
gen molecule in chapter 4.2 the symmetric state is energetically most favorable,

685
while the antisymmetric state is the one for the 2p electrons. The real differ-
ence is in the kinetic energy. In both cases, the antisymmetric combination
reduces the Coulomb repulsion energy between the electrons, and in the hydro-
gen molecule model, it also increases the nuclear attraction energy. But in the
hydrogen molecule model, the symmetric state achieves a reduction in kinetic
energy that is more than enough to make up for it all. For the 2p electrons, the
reduction in kinetic energy is nil. When the positive component wave functions
of the hydrogen molecule model are combined into the symmetric state, they
allow greater access to fringe areas farther away from the nuclei. Because of the
uncertainty principle, less confined electrons tend to have less indeterminacy in
momentum, hence less kinetic energy. On the other hand, the 2p states are
half positive and half negative, and even their symmetric combination reduces
spatial access for the electrons in half the locations.

A.64 The mechanism of ferromagnetism


It should be noted that in solids, not just spatial antisymmetry, but also sym-
metry can give rise to spin alignment. In particular, in many ferrites, there is
an opposite spin coupling between the iron atoms and the oxygen ones. If two
iron atoms are opposite in spin to the same oxygen atom, it implies that they
must have aligned spins even if their electrons do not interact directly.
It comes as somewhat a surprise to discover that in this time of high-
temperature superconductors, the mechanism of plain old ferromagnetism is
still not understood that well if the magnetic material is a conductor, such as a
piece of iron.
For a conductor, the description of the exclusion effect should really be at
least partly in terms of band theory, rather than electrons localized at atoms.
More specifically, Aharoni [2, p. 48] notes “There is thus no doubt in anybody’s
mind that neither the itinerant electron theory nor the localized electron one
can be considered to be a complete picture of the physical reality, and that they
both should be combined into one theory.”
Sproull notes that in solid iron, most of the 4s electrons move to the 4d
bands. That reduces the magnetization by reducing the number of unpaired
electrons.
While Sproull [16, p. 282] in 1956 describes ferromagnetism as an interac-
tion between electrons localized at neighboring atoms, Feynman [9, p. 37-2] in
1965 notes that calculations using such a model produce the wrong sign for the
interaction. According to Feynman, the interaction is thought to occur with
[4s] conduction band electrons acting as intermediaries. More recently, Aharoni
[2, p. 44] notes: “It used to be stated [. . . ] that nobody has been able to com-
pute a positive exchange integral for Fe, and a negative one for Cu [. . . ]. More

686
modern computations [. . . ] already have the right sign, but the magnitude of
the computed exchange still differs considerably from the experimental value.
Improving the techniques [. . . ] keeps improving the results, but not sufficiently
yet.”
Batista, Bonča, and Gubernatis note that “After seven decades of intense
effort we still do not know what is the minimal model of itinerant ferromag-
netism and, more importantly, the basic mechanism of ordering.” (Phys Rev
Let 88, 2002, 187203-1) and “Even though the transition metals are the most
well studied itinerant ferromagnets, the ultimate reason for the stabilization of
the FM phase is still unknown.” (Phys Rev B 68, 2003, 214430-11)

A.65 Number of system eigenfunctions


This note derives the number of energy eigenfunctions QI~ for a given set I~ =
(I1 , I2 , I3 , . . .) of bucket occupation numbers, Ib being the number of particles
in bucket number b. The number of single-particle eigenfunctions in bucket
number b is indicated by Nb .
Consider first the case of distinguishable particles, referring to figure 8.1 for
an example. The question is how many different eigenfunctions can be created
with the given bucket numbers. What are the ways to create different ones?
Well, the first choice that can be made is what are the I1 particles that go into
bucket 1. If you pick out I1 particles from the I total particles, you have I
choices for particle 1, next there are I − 1 choices left for particle 2, then I − 2
for particle 3. The total number of possible ways of choosing the I1 particles is
then
I × (I − 1) × (I − 2) × . . . × (I − I1 + 1)
However, this overestimates the number of variations in eigenfunctions that you
can create by selecting the I1 particles: the only thing that makes a difference
for the eigenfunctions is what particles you pick to go into bucket 1; the order
in which you chose to pick them out of the total set of I makes no difference.
If you chose a set of I1 particles in an arbitrary order, you get no difference
in eigenfunction compared to the case that you pick out the same particles
sorted by number. To correct for this, the number of eigenfunction variations
above must be divided by the number of different orderings in which a set of
I1 particles can come out of the total collection. That will give the number of
different sets of particles, sorted by number, that can be selected. The number
of ways that a set of I1 particles can be ordered is
I1 ! = I1 × (I1 − 1) × (I1 − 2) × . . . × 3 × 2 × 1;
there are I1 possibilities for the particle from the sorted set that comes first in
the unsorted one, then I1 − 1 possibilities left for the particle that comes second,

687
etcetera. Dividing the earlier expression by I1 !, the number of different sets of
I1 particles that can be selected for bucket 1 becomes

I × (I − 1) × (I − 2) × . . . × (I − I1 + 1)
.
I1 × (I1 − 1) × (I1 − 2) × . . . × 3 × 2 × 1

But further variations in eigenfunctions are still possible in the way these I1
particles are distributed over the N1 single-particle states inside bucket 1. There
are N1 possible single-particle states for the first particle of the sorted set, times
N1 possible single-particle states for the second particle, etcetera, making a total
of N1I1 variations. That number of variations exists for each of the individual
sorted sets of particles, so the total number of variations in eigenfunctions is
the product:

I × (I − 1) × (I − 2) × . . . × (I − I1 + 1)
N1I1 .
I1 × (I1 − 1) × (I1 − 2) × . . . × 3 × 2 × 1

This can be written more concisely by noting that the bottom of the fraction is
per definition I1 ! while the top equals I!/(I − I1 )!: note that the terms missing
from I! in the top are exactly (I − I1 )!. (In the special case that I = I1 , all
particles in bucket 1, this still works since mathematics defines 0! = 1.) So, the
number of variations in eigenfunctions so far is:
I!
N1I1 .
I1 !(I − I1 )!

The fraction is known in mathematics as “I choose I1 .”


Further variations in eigenfunctions are possible in the way that the I2 par-
ticles in bucket 2 are chosen and distributed over the single-particle states in
that bucket. The analysis is just like the one for bucket 1, except that bucket
1 has left only I − I1 particles for bucket 2 to chose from. So the number of
additional variations related to bucket 2 becomes
(I − I1 )!
N2I2 .
I2 !(I − I1 − I2 )!

The same way the number of eigenfunction variations for buckets 3, 4, . . . can
be found, and the grand total of different eigenfunctions is

I! (I − I1 )! (I − I1 − I2 )!
N1I1 × N2I2 × N3I3 × ...
I1 !(I − I1 )! I2 !(I − I1 − I2 )! I3 !(I − I1 − I2 − I3 )!

This terminates at the bucket number B beyond which there are no more parti-
cles left, when I − I1 − I2 − I3 − . . . − IB = 0. All further buckets will be empty.
Empty buckets might just as well not exist, they do not change the eigenfunction

688
count. Fortunately, there is no need to exclude empty buckets from the math-
ematical expression above, it can be used either way. For example, if bucket 2
would be empty, e.g. I2 = 0, then N2I2 = 1 and I2 ! = 1, and the factors (I − I1 )!
and (I −I1 −I2 )! cancel each other. So the factor due to empty bucket 2 becomes
multiplying by one, it does not change the eigenfunction count.
Note that various factors cancel in the eigenfunction count above, it simplifies
to the final expression
N1I1 N I2 N I3
QdI~ = I! × 2 × 3 × ...
I1 ! I2 ! I3 !
Mathematicians like to symbolically write a product of indexed factors like this
using the product symbol Π:
Y NbIb
QdI~ = I! .
all b Ib !
It means exactly the same as the written-out product.
Next the eigenfunction count for fermions. Refer now to figure 8.3. For
any bucket b, it is given that there are Ib particles in that bucket, and the
only variations in eigenfunctions that can be achieved are in the way that these
particles are distributed over the Nb single-particle eigenfunctions in that bucket.
The fermions are identical, but to simplify the reasoning, for now assume that
you stamp numbers on them from 1 to Ib . Then fermion 1 can go into Nb single-
particle states, leaving Nb − 1 states that fermion 2 can go into, then Nb − 2
states that fermion 3 can go into, etcetera. That produces a total of
Nb !
Nb × (Nb − 1) × (Nb − 2) × . . . × (Nb − Ib + 1) =
(Nb − Ib )!
variations. But most of these differ only in the order of the numbers stamped
on the fermions; differences in the numbers stamped on the electrons do not
constitute a difference in eigenfunction. The only difference is in whether a
state is occupied by a fermion or not, not what number is stamped on it. Since,
as explained under distinguishable particles, the number of ways Ib particles
can be ordered is Ib !, it follows that the formula above over-counts the number
of variations in eigenfunctions by that factor. To correct, divide by Ib !, giving
the number of variations as Nb !/(Nb − Ib )!Ib !, or “Nb choose Ib .” The combined
number of variations in eigenfunctions for all buckets then becomes
N1 ! N2 ! N3 ! Y Nb !
QfI~ = × × × ... = .
(N1 − I1 )!I1 ! (N2 − I2 )!I2 ! (N3 − I3 )!I3 ! all b (Nb − Ib )!Ib !

If a bucket is empty, it makes again no difference; the corresponding factor is


again one. But another restriction applies for fermions: there should not be any

689
eigenfunctions if any bucket number Ib is greater than the number of states Nb
in that bucket. There can be at most one particle in each state. Fortunately,
mathematics defines factorials of negative integer numbers to be infinite, and
the infinite factor (Nb − Ib )! in the bottom will turn the eigenfunction count into
zero as it should. The formula can be used whatever the bucket numbers are.

i i i i i i i i i i i

Figure A.18: Schematic of an example boson distribution in a bucket.

Last but not least, the eigenfunction count for bosons. Refer now to figure
8.2. This one is tricky, but a trick solves it. To illustrate the idea, take bucket 2
in figure 8.2 as an example. It is reproduced in condensed form in figure A.18;
things have been drawn horizontally instead of vertically up to save paper. Also,
the figure merely shows the particles and the lines separating the single-particle
states. Like for the fermions, the question is, how many ways can the Ib bosons
be arranged inside the Nb single-particle states? In other words, how many
variations are there on a schematic like the one shown in figure A.18? To figure
it out, stamp identifying numbers on all the elements, particles and single-state
separating lines alike, ranging from 1 to Ib +Nb −1. Following the same reasoning
as before, there are (Ib +Nb −1)! different ways to order these numbered objects.
As before, now back off. All the different orderings of the numbers stamped on
the bosons, Ib ! of them, produce no difference in eigenfunction, so divide by
Ib ! to fix it up. Similarly, all the different orderings of the single-particle state
boundaries produce no difference in eigenfunction, so divide by (Nb − 1)!. The
number of variations in eigenfunctions possible by rearranging the particles in
a single bucket b is then (Ib + Nb − 1)!/Ib !(Nb − 1)!. The total for all buckets is

(I1 + N1 − 1)! (I2 + N2 − 1)! (I3 + N3 − 1)! Y (Ib + Nb − 1)!


QbI~ = × × ×. . . = .
I1 !(N1 − 1)! I2 !(N2 − 1)! I3 !(N3 − 1)! all b Ib !(Nb − 1)!

A.66 The fundamental assumption of quantum


statistics
The assumption that all energy eigenstates with the same energy are equally
likely is simply stated as an axiom in typical books, [3, p. 92], [7, p. 1], [10,
p. 230], [20, p. 177]. Some of these sources quite explicitly suggest that the fact
should be self-evident to the reader.
However, why could not an energy eigenstate, call it A, in which all parti-
cles have about the same energy, have a wildly different probability from some

690
eigenstate B in which one particle has almost all the energy and the rest has
very little? The two wave functions are wildly different. (Note that if the prob-
abilities are only somewhat different, it would not affect various conclusions
much because of the vast numerical superiority of the most probable energy
distribution.)
The fact that it does not take any energy to go from one state to the other
[7, p. 1] does not imply that the system must spend equal time in each state,
or that each state must be equally likely. It is not difficult at all to construct
nonlinear systems of evolution equations that preserve energy and in which the
system runs exponentially away towards specific states.
However, the coefficients of the energy eigenfunctions do not satisfy some
arbitrary nonlinear system of evolution equations. They evolve according to
the Schrödinger equation, and the interactions between the energy eigenstates
are determined by a Hamiltonian matrix of coefficients. The Hamiltonian is a
Hermitian matrix; it has to be to preserve energy. That means that the coupling
constant that allows state A to increase or reduce the probability of state B is
just as big as the coupling constant that allows B to increase or reduce the
probability of state A. More specifically, the rate of increase of the probability
of state A due to state B and vice-versa is seen to be
à ! à !
d|cA |2 1 d|cB |2 1
= ℑ (c∗A HAB cB ) = − ℑ (c∗A HAB cB )
dt duetoB
h̄ dt duetoA

where HAB is the perturbation Hamiltonian coefficient between A and B. (In


the absence of perturbations, the energy eigenfunctions do not interact and
HAB = 0.) Assuming that the phase of the Hamiltonian coefficient is random
compared to the phase difference between A and B, the transferred probability
can go at random one way or the other regardless of which one state is initially
more likely. Even if A is currently very improbable, it is just as likely to pick
up probability from B as B is from A. Also note that eigenfunctions of the same
energy are unusually effective in exchanging probability, since their coefficients
evolve approximately in phase.
This note would argue that under such circumstances, it is simply no longer
reasonable to think that the difference in probabilities between eigenstates of
the same energy is enough to make a difference. How could energy eigenstates
that readily and randomly exchange probability, in either direction, end up
in a situation where some eigenstates have absolutely nothing, to incredible
precision?
Feynman [7, p. 8] gives an argument based on time-dependent perturbation
theory, subsection 8.10. However, time-dependent perturbations theory relies
heavily on approximation, and worse, the measurement wild card. Until scien-
tists, while maybe not agreeing exactly on what measurement is, start laying

691
down rigorous, unambiguous, mathematical ground rules on what measurements
can do and cannot do, measurement is like astrology: anything goes.

A.67 A problem if the energy is given


Examining all bucket number combinations with the given energy and then
picking out the combination that has the most energy eigenfunctions seems
straightforward enough, but it runs into a problem. The problem arises when
it is required that the set of bucket numbers agrees with the given energy to
mathematical precision. To see the problem, recall the simple model system of
subsection 8.3 that had only three
√ energy buckets. Now assume that the energy
of the second bucket√is not 9 = 3 as assumed there, (still arbitrary units),
but slightly less at 8. The difference is small, and all figures of subsection
8.3 are essentially unchanged. However, if the average energy per particle is
still assumed equal to 2.5, so that the total system energy equals the number
of particles I times that amount, then I2 must be √ zero: it is impossible to take
a nonzero multiple of an irrational number like 8 and end up with a rational
number like 2.5I − I1 − 4I3 . What this means graphically is that the oblique
energy line in the equivalent of figure 8.5 does not hit any of the centers of the
squares mathematically exactly, except for the one at I2 = 0. So the conclusion
would be that the system must have zero particles in the middle bucket.
Of course, physically this is absolute nonsense; the energy of a large number
of perturbed particles is not going to be certain to be 2.5I to mathematical
precision. There will be some uncertainty in energy, and the correct bucket
numbers are still those of the darkest square, even if its energy is 2.4999. . .I
instead of 2.5I exactly. Here typical textbooks will pontificate about the accu-
racy of your system-energy measurement device. However, this book shudders
to contemplate what happens physically in your glass of icewater if you have
three system-energy measurement devices, but your best one is in the shop, and
you are uncertain whether to believe the unit you got for cheap at Wal-Mart or
your backup unit with the sticking needle.
To avoid these conundrums, in this book it will simply be assumed that the
right combination of bucket occupation numbers is still the one at the maximum
in figure 8.6, i.e. the maximum when the number of energy eigenfunctions is
mathematically interpolated by a continuous function. Sure, that may mean
that the occupation numbers are no longer exact integers. But who is going to
count 1020 particles to check that it is exactly right? (And note that those other
books end up doing the same thing anyway in the end, since the mathematics
of an integer-valued function defined on a strip is so much more impossible than
that of a continuous function defined on a line.)
If fractional particles bothers you, even among 1020 of them, just fix things

692
after the fact. After finding the fractional bucket numbers that have the biggest
energy, select the whole bucket numbers nearest to it and then change the
“given” energy to be 2.4999999. . . or whatever it turns out to be at those whole
bucket numbers. Then you should have perfectly correct bucket numbers with
the highest number of eigenfunctions for the new given energy.

A.68 Derivation of the particle energy distri-


butions
This note derives the Maxwell-Boltzmann, Fermi-Dirac, and Bose-Einstein en-
ergy distributions of weakly interacting particles for a system for which the net
energy is precisely known.
The objective is to find the bucket numbers I~ = (I1 , I2 , I3 , . . .) for which
the number of eigenfunctions QI~ is maximal. Actually, it is mathematically
easier to find the maximum of ln(QI~), and that is the same thing: if QI~ is as
big as it can be, then so is ln(QI~). The advantage of working with ln(QI~) is
that it simplifies all the products in the expressions for the QI~ derived in note
{A.65} into sums: mathematics says that ln(ab) equals ln(a) plus ln(b) for any
(positive) a and b.
It will be assumed, following note {A.67}, that if the maximum value is
found among all bucket occupation numbers, whole numbers or not, it suffices.
More daringly, errors less than a particle are not going to be taken seriously.
In finding the maximum of ln(QI~), the bucket numbers cannot be completely
arbitrary; they are constrained by the conditions that the sum of the buckets
numbers must equal the total number of particles I, and that the particle ener-
gies must sum together to the given total energy E:
X X p
Ib = I Ib E b = E.
b b

Mathematicians call this a constrained maximization problem.


According to calculus, without the constraints, you can just put the deriva-
tives of ln(QI~) with respect to all the bucket numbers Ib to zero to find the
maximum. With the constraints, you have to add “penalty terms” that correct
for any going out of bounds, {A.48}, and the correct function whose derivatives
must be zero is
à ! à !
X X p
F = ln(QI~) − ǫ1 Ib − I − ǫ2 Ib E b −E
b b

where the constants ǫ1 and ǫ2 are unknown penalty factors called the Lagrangian
multipliers.

693
At the bucket numbers for which the number of eigenfunctions is largest,
the derivatives ∂F/∂Ib must be zero. However, that condition is difficult to
apply exactly, because the expressions for QI~ as given in the text involve the
factorial function, or rather, the gamma function. The gamma function does
not have a simple derivative. Here typical textbooks will flip out the Stirling
approximation of the factorial, but this approximation is simply incorrect in
parts of the range of interest, and where it applies, the error is unknown.
It is a much better idea to approximate the differential quotient by a differ-
ence quotient, as in

∂F ∆F
0= ≈ ≡
∂Ib ∆Ib
F (I1 , I2 , . . . , Ib−1 , Ib + 1, Ib+1 , . . .) − F (I1 , I2 , . . . , Ib−1 , Ib , Ib+1 , . . .)
.
Ib + 1 − Ib

This approximation is very minor, since according to the so-called mean value
theorem of mathematics, the location where ∆F/∆Ib is zero is at most one
particle away from the desired location where ∂F/∂Ib is zero. Better still, Ib +
1
2
≡ Ib,best will be no more that half a particle off, and the analysis already had
to commit itself to ignoring fractional parts of particles anyway. The difference
quotient leads to simple formulae because the gamma function satisfies the
condition (n + 1)! = (n + 1) n! for any value of n, compare the notations section
under “!”.
Now consider first distinguishable particles. The function F to differentiate
is defined above, and plugging in the expression for QdI~ as found in note {A.65}
produces
à ! à !
X X X p
F = ln(I!) + [Ib ln(Nb ) − ln(Ib !)] − ǫ1 Ib − I − ǫ2 Ib E b −E
b b b

For any value of the bucket number b, in the limit Ib ↓ −1, F tends to negative
infinity because Ib ! tends to positive infinity in that limit and its logarithm
appears with a minus sign. In the limit Ib ↑ +∞, F tends once more to negative
infinity, since ln(Ib !) for large values of Ib is according to the so-called Stirling
formula approximately equal to Ib ln(Ib ) − Ib , so the − ln(Ib !) term in F goes to
minus infinity more strongly than the terms proportional to Ib might go to plus
infinity. If F tends to minus infinity at both ends of the range −1 < Ib < ∞,
there must be a maximum value of F somewhere within that range where the
derivative with respect to Ib is zero. More specifically, working out the difference
quotient:
∆F p
= ln(Nb ) − ln(Ib + 1) − ǫ1 − ǫ2 E b = 0
∆Ib

694
and − ln(Ib + 1) is infinity at Ib = −1 and minus infinity at Ib = ∞. Somewhere
in between, ∆F/∆Ib will cross zero. In particular, combining the logarithms
and then taking an exponential, the best estimate for the bucket occupation
number is
Nb
Ib,best = Ib + 12 = ǫ E p +ǫ − 21
e 2 b 1

The correctness of the final half particle is clearly doubtful within the made
approximations. In fact, it is best ignored since it only makes a difference at
high energies where the number of particles per bucket becomes small, and
surely, the correct probability of finding a particle must go to zero at infinite
energies, not to minus half a particle! Therefore, the best estimate ιd ≡Ib,best /Nb
for the number of particles per single-particle energy state becomes the Maxwell-
Boltzmann distribution. Note that the derivation might be off by a particle for
the lower energy buckets. But there are a lot of particles in a macroscopic
system, so it is no big deal.
The case of identical fermions is next. The function to differentiate is now
à ! à !
X X X p
F = [ln(Nb !) − ln(Ib !) − ln((Nb − Ib )!)] − ǫ1 Ib − I − ǫ2 Ib E b −E
b b b

This time F is minus infinity when a bucket number reaches Ib = −1 or Ib =


Nb + 1. So there must be a maximum to F when Ib varies between those limits.
The difference quotient approximation produces
∆F p
= − ln(Ib + 1) + ln(Nb − Ib ) − ǫ1 − ǫ2 E b = 0
∆Ib
which can be solved to give
p
1 Nb 1 − eǫ2 E b +ǫ1
1
Ib,best = Ib + 2
= p + 2 p .
eǫ2 E b +ǫ1 +1 1 + eǫ2 E b +ǫ1
The final term, less than half a particle, is again best left away, to ensure that
0 ≤ Ib,best ≤ Nb as it should. That gives the Fermi-Dirac distribution.
Finally, the case of identical bosons, is, once more, the tricky one. The
function to differentiate is now
X
F = [ln((Ib + Nb − 1)!) − ln(Ib !) − ln((Nb − 1)!)]
b
µX ¶ µX ¶
p
−ǫ1 Ib − I − ǫ2 Ib E b − E
b b

For now, assume that Nb > 1 for all buckets. Then F is again minus infinity
for Ib = −1. For Ib ↑ ∞, however, F will behave like −(ǫ1 + ǫ2 E pb )Ib . This

695
tends to minus infinity if ǫ1 + ǫ2 E pb is positive, so for now assume it is. Then
the difference quotient approximation produces
∆F p
= ln(Ib + Nb ) − ln(Ib + 1) − ǫ1 − ǫ2 E b = 0
∆Ib
which can be solved to give

1 Nb − 1
Ib,best = Ib + 2
= ǫ2 E pb +ǫ1
− 21 .
e −1
The final half particle is again best ignored to get the number of particles to
become zero at large energies. Then, if it is assumed that the number Nb of
single-particle states in the buckets is large, the Bose-Einstein distribution is
obtained. If Nb is not large, the number of particles could be less than the
predicted one by up to a factor 2, and if Nb is one, the entire story comes part.
And so it does if ǫ1 + ǫ2 E pb is not positive.
Before addressing these nasty problems, first the physical meaning of the La-
grangian multiplier ǫ2 needs to be established. It can be inferred from examining
the case that two different systems, call them A and B, are in thermal contact.
Since the interactions are assumed weak, the eigenfunctions of the combined
system are the products of those of the separate systems. That means that the
number of eigenfunctions of the combined system QI~A I~B is the product of those
of the individual systems. Therefor the function to differentiate becomes

F = ln(QI~A QI~B )
   
X X
−ǫ1,A  IbA − IA  − ǫ1,B  IbB − IB 
bA bB
 
X p X p
−ǫ2  I bA E bA + I bB E bB − E
bA bB

Note the constraints: the number of particles in system A must be the correct
number IA of particles in that system, and similar for system B. However, since
the systems are in thermal contact, they can exchange energy through the weak
interactions and there is no longer a constraint on the energy of the individual
systems. Only the combined energy must equal the given total. That means the
two systems share the same Lagrangian variable ǫ2 . For the rest, the equations
for the two systems are just like if they were not in thermal contact, because
the logarithm in F separates, and then the differentiations with respect to the
bucket numbers IbA and IbB give the same results as before.
It follows that two systems that have the same value of ǫ2 can be brought
into thermal contact and nothing happens, macroscopically. However, if two

696
systems with different values of ǫ2 are brought into contact, the systems will
adjust, and energy will transfer between them, until the two ǫ2 values have
become equal. That means that ǫ2 is a temperature variable. From here on,
the temperature will be defined as T = 1/ǫ2 kB , so that ǫ2 = 1/kB T , with kB
the Boltzmann constant. The same way, for now the chemical potential µ will
simply be defined to be the constant −ǫ1 /ǫ2 . Subsection 8.14.4 will eventually
establish that the temperature defined here is the ideal gas temperature, while
note {A.73} will establish that µ is the Gibbs free energy per atom that is
normally defined as the chemical potential.
Returning now to the nasty problems of the distribution for bosons, first
assume that every bucket has at least two states, and that (E pb − µ)/kB T is
positive even for the ground state. In that case there is no problem with the
derived solution. However, Bose-Einstein condensation will occur when either
the number density is increased by putting more particles in the system, or
the temperature is decreased. Increasing particle density is associated with
increasing chemical potential µ because
Nb − 1
Ib = (E p −µ)/k T
e b B −1
implies that every bucket particle number increases when µ increases. Decreas-
ing temperature by itself decreases the number of particles, and to compensate
and keep the number of particles the same, µ must then once again increase.
When µ gets very close to the ground state energy, the exponential in the ex-
pression for the number of particles in the ground state bucket b = 1 becomes
very close to one, making the total denominator very close to zero, so the num-
ber of particles I1 in the ground state blows up. When it becomes a finite
fraction of the total number of particles I even when I is macroscopically large,
Bose-Einstein condensation is said to have occurred.
Note that under reasonable assumptions, it will only be the ground state
bucket that ever acquires a finite fraction of the particles. For, assume the
contrary, that bucket 2 also contains a finite fraction of the particles. Using
Taylor series expansion of the exponential for small values of its argument, the
bucket occupation numbers are
(N1 − 1)kB T
I1 =
E p1 − µ
(N2 − 1)kB T
I2 =
E p1 − µ + (E p2 − E p1 )
(N3 − 1)kB T
I3 =
E p1 − µ + (E p2 − E p1 ) + (E p3 − E p2 )
..
.

697
For I2 to also be a finite fraction of the total number of particles, E p2 − E p1 must
be similarly small as E p1 − µ. But then, reasonably assuming that the energy
levels are at least roughly equally spaced, and that the number of states will
not decrease with energy, so must I3 be a finite fraction of the total, and so
on. You cannot have a large number of buckets each having a finite fraction of
the particles, because there are not so many particles. More precisely, a sum
P
roughly like ∞ b=2 const/b∆E, (or worse), sums to an amount that is much larger
than the term for b = 2 alone. So if I2 would be a finite fraction of I, then the
sum would be much larger than I.
What happens during condensation is that µ becomes much closer to E p1
than E p1 is to the next energy level E p2 , and only the ground state bucket ends
up with a finite fraction of the particles. The remainder is spread out so much
that the bucket numbers immediately above the ground state only contain a
negligible fraction of the particles. It also follows that for all buckets except
the ground state one, µ may be approximated as being E p1 . (Specific data for
particles in a box is given in section 8.14.1. The entire story may of course need
to be modified in the presence of confinement, compare chapter 7.6.5.)
The other problem with the analysis of the occupation numbers for bosons
is that the number of single-particle states in the buckets had to be at least two.
There is no reason why a system of weakly-interacting spinless bosons could not
have a unique single-particle ground state. And combining the ground state with
the next one in a single bucket is surely not an acceptable approximation in the
presence of potential Bose-Einstein condensation. Fortunately, the mathematics
still partly works:
∆F p
= ln(I1 + 1) − ln(I1 + 1) − ǫ1 − ǫ2 E 1 = 0
∆I1
implies that ǫ1 − ǫ2 E p1 = 0. In other words, µ is equal to the ground state energy
E p1 exactly, rather than just extremely closely as above.
That then is the condensed state. Without a chemical potential that can be
adjusted, for any given temperature the states above the ground state contain
a number of particles that is completely unrelated to the actual number of
particles that is present. Whatever is left can be dumped into the ground state,
since there is no constraint on I1 .
Condensation stops when the number of particles in the states above the
ground state wants to become larger than the actual number of particles present.
Now the mathematics changes, because nature says “Wait a minute, there is no
such thing as a negative number of particles in the ground state!” Nature now
adds the constraint that I1 = 0 rather than negative. That adds another penalty
term, ǫ3 I1 to F and ǫ3 takes care of satisfying the equation for the ground state
bucket number. It is a sad story, really: below the condensation temperature,
the ground state was awash in particles, above it, it has zero. None.

698
A system of weakly interacting helium atoms, spinless bosons, would have
a unique single-particle ground state like this. Since below the condensation
temperature, the elevated energy states have no clue about an impending lack
of particles actually present, physical properties such as the specific heat stay
analytical until condensation ends.
It may be noted that above the condensation temperature it is only the most
probable set of the occupation numbers that have exactly zero particles in the
unique ground state. The expectation value of the number in the ground state
will include neighboring sets of occupation numbers to the most probable one,
and the number has nowhere to go but up, compare {A.73}.

A.69 The canonical probability distribution


This note deduces the canonical probability distribution. Since the derivations
in typical textbooks seem crazily convoluted and the made assumptions not at
all as self-evident as the authors suggest, a more mathematical approach will
be followed here.
Consider a big system consisting of many smaller subsystems A, B, . . . with
a given total energy E. Call the combined system the collective. Following the
same reasoning as in note {A.68} for two systems, the thermodynamically stable
equilibrium state has bucket occupation numbers of the subsystems satisfying
∂ ln QI~A p
− ǫ1,A − ǫ2 E bA = 0
∂IbA
∂ ln QI~B p
− ǫ1,B − ǫ2 E bB = 0
∂IbB
...
where ǫ2 is a shorthand for 1/kB T .
An individual system, take A as the example, no longer has an individual
energy that is certain. Only the collective has that. That means that when
A is taken out of the collective, its bucket occupation numbers will have to be
described in terms of probabilities. There will still be an expectation value for
the energy of the system, but system energy eigenfunctions ψqIA with somewhat
different energy E IqA can no longer be excluded with certainty. However, still
assume, following the fundamental assumption of quantum statistics, {A.66},
that the physical differences between the system energy eigenfunctions do not
make (enough of) a difference to affect which ones are likely or not. So, the
probability PqA of a system eigenfunction ψqIA will be assumed to depend only
on its energy E IqA :
I
PqA = P (E qA ).

699
where P is some as yet unknown function.
For the isolated example system A, the question is now no longer “What
bucket numbers have the most eigenfunctions?” but “What bucket numbers
have the highest probability?” Note that all system eigenfunctions ψqIA for a
P
given set of bucket numbers I~A have the same system energy E II~A = bA IbA E pbA .
Therefor, the probability of a given set of bucket numbers PI~A will be the num-
ber of eigenfunctions with those bucket numbers times the probability of each
individual eigenfunction:
I
PI~A = QI~A P (E I~A ).

Mathematically, the function whose partial derivatives must be zero to find


the most probable bucket numbers is
 
³ ´ X
F = ln PI~A − ǫ1,A  I b A − IA  .
bA

The maximum is now to be found for the bucket number probabilities, not their
eigenfunction counts, and there is no longer a constraint on energy.
Substituting PI~A = QI~A P (E II~A ), taking apart the logarithm, and differenti-
ating, produces
∂ ln QI~A d ln(P ) p
+ E bA − ǫ1,A = 0
∂IbA dE II~A

That is exactly like the equation for the bucket numbers of system A when it
was part of the collective, except that the derivative of the as yet unknown
function ln(PA ) takes the place of −ǫ2 , i.e. −1/kB T . It follows that the two
must be the same, because the bucket numbers cannot change when the system
A is taken out of the collective it is in thermal equilibrium with. For one, the
net energy would change if that happened, and energy is conserved.
It follows that d ln P/dE II~A = −1/kB T at least in the vicinity of the most
probable energy E II~A . Hence in the vicinity of that energy

I 1 −E IA /kB T
P (E A ) = e
ZA

which is the canonical probability. Note that the given derivation only ensures
it to be true in the vicinity of the most probable energy. Nothing says it gives
the correct probability for, say, the ground state energy. But then the question
becomes “What difference does it make?” Suppose the ground state has a
probability of 0. followed by only 100 zeros instead of the predicted 200 zeros?
What would change in the price of eggs?

700
Note that the canonical probability is self-consistent: if two systems at the
same temperature are combined, the probabilities of the combined eigenfunc-
tions multiply, as in
1 I I
PAB = e−(E A +E B )/kB T .
ZA ZB
That is still the correct expression for the combined system, since its energy is
the sum of those of the two separate systems. Also for the partition functions
XX I I
ZA ZB = e−(E qA +E qB )/kB T = ZAB .
qA qB

A.70 Analysis of the ideal gas Carnot cycle


Refer to figure A.19 for the physical device to be analyzed. The refrigerant
circulating through the device is an ideal gas with constant specific heats, like
a thin gas of helium atoms. Section 8.14 will examine ideal gases in detail,
but for now some reminders from introductory classical physics classes about
ideal gasses must do. The internal energy of the gas is E = mICV T where mI
is its mass and Cv is a constant for a gas like helium whose atoms only have
translational kinetic energy. Also, the ideal gas law says that P V = mIRT ,
where P is the pressure, V the volume, and the constant R is the gas constant,
equal to the universal gas constant divided by the molecular mass.
The differential version of the first law, energy conservation, (8.11), says that

dE = δQ − P dV

or getting rid of internal energy and pressure using the given expressions,

dV
mICv dT = δQ − mIRT .
V
Now for the transitions through the heat exchangers, from 1 to 2 or from 3
to 4 in figure A.19, the temperature is approximated to be constant. The first
law above can then be integrated to give the heat added to the substance as:

QL = mIRTL (ln V2 − ln V1 ) QH = −mIRTH (ln V4 − ln V3 ) .

Remember that unlike QL , QH is taken positive if it comes out of the substance.


On the other hand, for the transitions through the adiabatic turbine and
compressor, the heat δQ added is zero. Then the first law can be divided
through by T and integrated to give

mICv (ln TH − ln TL ) = −mIR (ln V3 − ln V2 )

701
High temperature side TH . (Kitchen.)

4m 3m
6 6 6 6 QH
6
heat exchanger ¾
'? $ ' $

⇐= turbine compressor ⇐=
& % & %
WT WC
6

1m 2m
- heat exchanger
6 6 6 6 6
QL
Low temperature side TL . (Fridge.)

Figure A.19: Schematic of the Carnot refrigeration cycle.

mICv (ln TL − ln TH ) = −mIR (ln V1 − ln V4 )


Adding these two expressions shows that

ln V3 − ln V2 + ln V1 − ln V4 = 0 =⇒ ln V3 − ln V4 = ln V2 − ln V1

and plugging that into the expressions for the exchanged heats shows that
QH /TH = QL /TL .

A.71 The recipe of life


Religious nuts, “creationists,” “intelligent designers,” or whatever they are call-
ing themselves at the time you are reading this, call them CIDOWs for short,
would like to believe that the universe was created literally like it says in the
bible. The bible contains two creation stories, the Genesis story and the Adam
and Eve story, and they conflict. At some time in the past they were put in
together for simplicity, without ironing out their contradictions. CIDOWs feel
that with two conflicting creation stories, surely at least one should be right?
This is the bible, you know?
Now if you want to believe desperately enough, you are willing to accept
anything that seems to reasonably support your point, without looking too hard
at any opposing facts. (Critically examining facts is what a scientist would do,
but you can reasonably pass yourself off as a scientist in the court system and
popular press without worrying about it. You do have to pass yourself off as a
scientist in the United States, since it is unconstitutional to force your religious
beliefs upon the public education system unless you claim they are scientific

702
instead of religious.) Now CIDOWs had a look at life, and it seemed to be quite
non-messy to them. So they felt its entropy was obviously low. (Actually, a
human being may be a highly evolved form of life, but being largely water well
above absolute zero temperature, its entropy is not particularly low.) Anyway,
since the earth has been around for quite some time, they reasoned that the
entropy of its surface must have been increasing for a long time, and non-messy
human beings could not possibly be true. Hence the conventional scientific
explanation of the evolution of life violated the second law and could not be
true. It followed that the universe just had to be created by God. The Christian
God of course, don’t assume now that Allah or Buddha need apply.
Hello CIDOWs! The surface of the earth is hardly an adiabatic system. See
that big fireball in the sky? What do you think all that plant life is doing with
all those green leaves? Baierlein [3, pp. 128-130] works out some of the rough
details. Since the surface of the sun is very hot, the photons of light that reach
us from the sun are high energy ones. Despite the influx of solar energy, the
surface of the earth does not turn into an oven because the earth emits about the
same energy back into space as it receives from the sun. But since the surface
of the earth is not by far as hot as that of the sun, the photons emitted by the
earth are low energy ones. Baierlein estimates that the earth emits about 20
of these low energy photons for every high energy one it receives from the sun.
Each photon carries one unit of entropy on average, (8.59). So the earth loses
20 units of messiness for every one it receives. So, evolution towards less messy
systems is exactly what you would expect for the earth surface, based on the
overall entropy picture. Talk about an argument blowing up in your face!

A.72 Checks on the expression for entropy


According to the microscopic definition, the differential of the entropy S should
be " #
X
dS = −kB d Pq ln Pq
q

where the sum is over all system energy eigenfunctions ψqI and Pq is their prob-
ability. The differential can be simplified to
X X
dS = −kB [ln Pq + 1] dPq = −kB ln Pq dPq ,
q q

P
the latter equality since the sum of the probabilities is always one, so q dPq = 0.
This is to be compared with the macroscopic differential for the entropy.
Since the macroscopic expression requires thermal equilibrium, Pq in the mi-
I
croscopic expression above can be equated to the canonical value e−E q /kB T /Z

703
where E Iq is the energy of system eigenfunction ψqI . It simplifies the microscopic
differential of the entropy to
   
X E Iq X E Iq 1X I
dS = −kB − 
− ln Z dPq = −kB −  dPq = E dPq ,
q kB T q kB T T q q
(A.74)
P
the second inequality since Z is a constant in the summation and q dPq = 0.
The macroscopic expression for the differential of entropy is given by (8.18),
δQ
dS = .
T
Substituting in the differential first law (8.11),
1 1
dS = dE + P dV
T T
and plugging into that the definitions of E and P ,
" #  
I
1 X I 1 X dE q 
dS = d Pq E q − Pq dV
T q T q dV

and differentiating out the product in the first term, one part drops out versus
the second term and what is left is the differential for S according to the micro-
scopic definition (A.74). So, the macroscopic and microscopic definitions agree
to within a constant on the entropy. That means that they agree completely,
because the macroscopic definition has no clue about the constant.
Now consider the case of a system with zero indeterminacy in energy. Ac-
cording to the fundamental assumption, all the eigenfunctions with the correct
energy should have the same probability in thermal equilibrium. From the en-
tropy’s point of view, thermal equilibrium should be the stable most messy state,
having the maximum entropy. For the two views to agree, the maximum of the
microscopic expression for the entropy should occur when all eigenfunctions of
the given energy have the same probability. Restricting attention to only the
energy eigenfunctions ψqI with the correct energy, the maximum entropy occurs
when the derivatives of
à !
X X
F = −kB Pq ln Pq − ǫ Pq − 1
q q

with respect to the Pq are zero. Note that the constraint that the sum of the
probabilities must be one has been added as a penalty term with a Lagrangian
multiplier, {A.48}. Taking derivatives produces

−kB ln(Pq ) − kB − ǫ = 0

704
showing that, yes, all the Pq have the same value at the maximum entropy.
(Note that the minima in entropy, all Pq zero except one, do not show up in the
derivation; Pq ln Pq is zero when Pq = 0, but its derivative does not exist there.
In fact, the infinite derivative can be used to verify that no maxima exist with
any of the Pq equal to zero if you are worried about that.)
If the energy is uncertain, and only the expectation energy is known, the
penalized function becomes
X ³X ´ ³X ´
I
F = −kB Pq ln Pq − ǫ1 Pq − 1 − ǫ2 E q Pq − E
q

and the derivatives become


I
−kB ln(Pq ) − kB − ǫ1 − ǫ2 E q = 0

which can be solved to show that


I
Pq = C1 e−E q /C2

with C1 and C2 constants. The requirement to conform with the given definition
of temperature identifies C2 as kB T and the fact that the probabilities must sum
to one identifies C1 as 1/Z.
For two systems A and B in thermal contact, the probabilities of the com-
bined system energy eigenfunctions are found as the products of the probabili-
ties of those of the individual systems. The maximum of the combined entropy,
constrained by the given total energy E, is then found by differentiating
XX
F = −kB PqA PqB ln(PqA PqB )
qA qB
X X
−ǫ1,A ( PqA − 1) − ǫ1,B ( PqB − 1)
qA qB
X I X I
−ǫ2 ( PqA E qA + PqB E qB − E).
qA qB

F can be simplified by taking apart the logarithm and noting that the proba-
bilities PqA and PqB sum to one to give
X X
F = −kB PqA ln(PqA ) − kB PqB ln(PqB )
qA qB
X X
−ǫ1,A ( PqA − 1) − ǫ1,B ( PqB − 1)
qA qB
X I X I
−ǫ2 ( PqA E qA + PqB E qB − E)
qA qB

705
Differentiation now produces
I
−kB ln(PqA ) − kB − ǫ1,A − ǫ2 E qA = 0
I
−kB ln(PqB ) − kB − ǫ1,B − ǫ2 E qB = 0
I I
which produces PqA = C1,A e−E qA /C2 and PqB = C1,B e−E qB /C2 and the common
constant C2 then implies that the two systems have the same temperature.

A.73 Chemical potential and distribution func-


tions
The following convoluted derivation of the distribution functions comes fairly
straightly from Baierlein [3, pp. 170-]. Let it not deter you from reading the
rest of this otherwise very clearly written and engaging little book. Even a non
engineering author should be allowed one mistake.
The derivations of the Maxwell-Boltzmann, Fermi-Dirac, and Bose-Einstein
distributions given previously, {A.68} and {A.69}, were based on finding the
most numerous or most probable distribution. That implicitly assumes that
significant deviations from the most numerous/probable distributions will be so
rare that they can be ignored. This note will bypass the need for such an as-
sumption since it will directly derive the actual expectation values of the single-
particle state occupation numbers ι. In particular for fermions, the derivation
will be solid as a rock.
The mission is to derive the expectation number ιn of particles in an arbitrary
single-particle state ψnp . This expectation value, as any expectation value, is
given by the possible values times their probability:
X
ιn = in Pq
q

where in is the number of particles that system energy eigenfunction ψqI has
in single-particle state ψnp , and Pq the probability of the eigenfunction. Since
I
thermal equilibrium is assumed, the canonical probability value e−E q /kB T /Z can
be substituted for Pq . Then, if the energy E Iq is written as the sum of the ones
of the single particle states times the number of particles in that state, it gives:
1 X −(i1 E p1 +i2 E p2 +...+in−1 E pn−1 +in E pn +in+1 E pn+1 +...)/kB T
ιn = in e .
Z q

Note that in is the occupation number of single-particle state ψnp , just like
Ib was the occupation number of bucket b. Dealing with single-particle state

706
occupation numbers has an advantage over dealing with bucket numbers: you
do not have to figure out how many system eigenfunctions there are. For a given
set of single-particle state occupation numbers ~ı = |i1 , i2 , . . .i, there is exactly
one system energy eigenfunction. Compare figures 8.2 and 8.3: if you know
how many particles there are in each single-particle state, you know everything
there is to know about the eigenfunction depicted. (This does not apply to dis-
tinguishable particles, figure 8.1, because for them the numbers on the particles
can still vary for given occupation numbers, but as noted in subsection 8.11,
there is no such thing as identical distinguishable particles anyway.)
It has the big consequence that the sum over the eigenfunctions can be
replaced by sums over all sets of occupation numbers:
1 XX XXX p p p p p
ιn = ... . . . in e−(i1 E 1 +i2 E 2 +...+in−1 E n−1 +in E n +in+1 E n+1 +...)/kB T
Z i1 i2 in−1 in in+1
| {z }
i1 +i2 +...+in−1 +in +in+1 +...=I

Each set of single-particle state occupation numbers corresponds to exactly one


eigenfunction, so each eigenfunction is still counted exactly once. Of course, the
occupation numbers do have to add up to the correct number of particles in the
system.
Now consider first the case of I identical bosons. For them the occupation
numbers may have values up to a maximum of I:
I X I I I I
1 X X X X
ιn = ... ...
Z i1 =0 i2 =0 in−1 =0 in =0 in+1 =0
| {z }
i1 +i2 +...+in−1 +in +in+1 +...=I
p p p p p
in e−(i1 E 1 +i2 E 2 +...+in−1 E n−1 +in E n +in+1 E n+1 +...)/kB T

One simplification that is immediately evident is that all the terms that have
in = 0 are zero and can be ignored. Now apply a trick that only a mathematician
would think of: define a new summation index i′n by setting in = 1+i′n . Then the
summation over i′n can start at 0 and will run up to I − 1. Plugging in = 1 + i′n
into the sum above gives
I I I−1 I
1 X X X X
ιn = ... ...
Z i1 =0 in−1 =0 i′ =0 in+1 =0
n
| {z }
i1 +...+in−1 +i′n +in+1 +...=I−1
p p p ′ p p
(1 + i′n )e−(i1 E 1 +...+in−1 E n−1 +E n +in E n +in+1 E n+1 +...)/kB T

This can be simplified by taking the constant part of the exponential out of
the summation. Also, the constraint in the bottom shows that the occupation

707
numbers can no longer be any larger than I − 1 (since the original in is at least
one), so the upper limits can be reduced to I − 1. Finally, the prime on i′n may
as well be dropped, since it is just a summation index and it does not make a
difference what name you give it. So, altogether,

1 −E pn /kB T I−1
X X I−1
I−1 X I−1 X
ιn = e ... ...
Z i1 =0 in−1 =0 in =0 in+1 =0
| {z }
i1 +...+in−1 +in +in+1 +...=I−1
p p p p
(1 + in )e−(i1 E 1 +...+in−1 E n−1 +in E n +in+1 E n+1 +...)/kB T

The right hand side falls apart into two sums: one for the 1 in 1 + in and
one for the in in 1 + in . The first sum is essentially the partition function Z −
for a system with I − 1 particles instead of I. The second sum is essentially Z −
times the expectation value ι−n for such a system. To be precise

1 −E pn /kB T − h i
ιn = e Z 1 + ι−
n
Z
This equation is exact, no approximations have been made yet.
The system with I − 1 particles is the same in all respects to the one for I
particles, except that it has one less particle. In particular, the single-particle
energy eigenfunctions are the same, which means the volume of the box is the
same, and the expression for the canonical probability is the same, meaning
that the temperature is the same.
But when the system is macroscopic, the occupation counts for I −1 particles
must be virtually identical to those for I particles. Clearly the physics should not
change noticeably depending on whether 1020 or 1020 + 1 particles are present.
If ι−
n = ιn , then the above equation can be solved to give:
Á· ¸
Z E pn /kB T
ιn = 1 e −1
Z−
The final formula is the Bose-Einstein distribution with
Z
e−µ/kB T =
Z−
Solve for µ:
µ ¶
Z −kB T ln(Z) + kB T ln(Z − )
µ = −kB T ln =
Z− I − (I − 1)

The final fraction is a difference quotient approximation for the derivative of


the Helmholtz free energy with respect to the number of particles. Now a single

708
particle change is an extremely small change in the number of particles, so the
difference quotient will be to very great accuracy be equal to the derivative of
the Helmholtz free energy with respect to the number of particles. And as noted
earlier, in the obtained expressions, volume and temperature are held constant.
So, µ = (∂F/∂I)T,V , and (8.39) identified that as the chemical potential. Do
note that µ is on a single-particle basis, while µ̄ was taken to be on a molar basis.
The Avogadro number IA = 6.0221 1026 particles per kmol converts between
the two.
Now consider the case of I identical fermions. Then, according to the ex-
clusion principle, there are only two allowed possibilities for the occupation
numbers: they can be zero or one:
1 1 1 1
1 X X X X p p p p
ιn = ... . . . in e−(i1 E 1 +...+in−1 E n−1 +in E n +in+1 E n+1 +...)/kB T
Z i1 =0 in−1 =0 in =0 in+1 =0
| {z }
i1 +...+in−1 +in +in+1 +...=I

Again, all terms with in = 0 are zero, so you can set in = 1 + i′n and get
1 1 0 1
1 X X X X
ιn = ... ...
Z i1 =0 in−1 =0 i′ =0 in+1 =0
n
| {z }
i1 +...+in−1 +i′n +in+1 +...=I−1
p p p ′ p p
(1 + i′n )e−(i1 E 1 +...+in−1 E n−1 +E n +in E n +in+1 E n+1 +...)/kB T

But now there is a difference: even for a system with I − 1 particles i′n can still
have the value 1 but the upper limit is zero. Fortunately, since the above sum
only sums over the single value i′n = 0, the factor (1 + i′n ) can be replaced by
(1 − i′n ) without changing the answer. And then the summation can include
i′n = 1 again, because (1 − i′n ) is zero when i′n = 1. This sign change produces
the sign change in the Fermi-Dirac distribution compared to the Bose-Einstein
one; the rest of the analysis is the same.
Here are some additional remarks about the only approximation made, that
the systems with I and I − 1 particles have the same expectation occupation
numbers. For fermions, this approximation is justified to the gills, because it
can be easily be seen that the obtained value for the occupation number is in
between those of the systems with I − 1 and I particles. Since nobody is going
to count whether a macroscopic system has 1020 particles or 1020 + 1, this is
truly as good as any theoretical prediction can possibly get.
But for bosons, it is a bit trickier because of the possibility of condensation.
Assume, reasonably, that when a particle is added, the occupation numbers
will not go down. Then the derived expression overestimates both expectation
occupation numbers ιn and ι− n . However, it could at most be wrong, (i.e. have

709
a finite relative error) for a finite number of states, and the number of single-
particle states will be large. (In the earlier derivation using bucket numbers,
the actual ιn was found to be lower than the Bose-Einstein value by a factor
(Nb − 1)/Nb with Nb the number of states in the bucket.)
p
If the factor ZeE 1 /kB T /Z − is one exactly, which definitely means Bose-
Einstein condensation, then i1 = 1+i− 1 . In that case, the additional particle that
the system with I particles has goes with certainty into the ground state. So
the ground state better be unique then; the particle cannot go into two ground
states.

A.74 The Fermi-Dirac integral for small tem-


perature
If you must know, to do the Fermi-Dirac particle integral for small but nonzero
temperature, change integration variable to v = (u/u0 )−1, then take the integral
apart as
Z 0 √ Z 0 √ √
1 + veu0 v dv Z ∞ 1 + v dv
1 + v dv − +
−1 −1 eu0 v + 1 0 eu0 v + 1
and clean it up, by dividing top and bottom of the center integral by the expo-
nential and then changing its sign of v, to
Z 0 √ Z 1 √ √ √
( 1 + v − 1 − v) dv Z ∞ 1 + v dv
1 + v dv + +
−1 0 eu0 v + 1 1 eu0 v + 1
In the second integral, the range that is not killed off by the exponential
in the bottom
√ √ is very small for large u0 and you can therefor approximate
1 + v − 1 − v as v, or using a Taylor series if still higher precision is required.
The range of integration can be extended to infinity, since the exponential is
extremely large beyond v = 1. For R
the same reason, the third integral can
be ignored completely. Note that 0∞ x dx/(ex + 1) = π 2 /12, see [15, 18.81-82,
p. 132] for this and additional integrals.

A.75 Physics of the fundamental commutation


relations
The fundamental commutation relations look much like a mathematical axiom.
Surely, there should be some other reasons for physicists to believe that they
apply to nature, beyond that they seem to produce the right answers.
The background chapter 11.4 explains that the angular momentum opera-
tors correspond to small rotations of the axis system through space. So, the

710
commutator [L b ,L
b ] really corresponds to the difference between a small rota-
x y
tion around the y-axis followed by a small rotation around the x axis, versus
a small rotation around the x-axis followed by a small rotation around the y
axis. And it works out that this difference is equivalent to a small rotation
about the z-axis. (If you know a bit of linear algebra, you can verify this by
writing down the matrices that describe the effects that coordinate-system ro-
tations around each of the axes have on an arbitrary radius vector ~r.) So, the
fundamental commutator relations do have physical meaning; they say that this
basic relationship between rotations around different axes continues to apply in
the presence of spin.

A.76 Multiple angular momentum components


b is also an eigenstate of L
Suppose that an eigenstate, call it |mi, of L b . Then
z x
b b
[Lz , Lx ]|mi must be zero, and the commutator relations say that this is equiv-
alent to L b |mi = 0, which makes |mi also an eigenvector of L b , and with the
y y
eigenvalue zero to boot. So the angular momentum in the y direction must be
zero. Repeating the same argument using the [L b ,Lb ] and [L
b ,L b ] commutator
x y y z
pairs shows that the angular momentum in the other two directions is zero too.
So there is no angular momentum at all, |mi is an |0 0i state.

A.77 Components of vectors are less than the


total vector

You might wonder whether the fact that the square components of angular
momentum must be less than total square angular momentum still applies in
the quantum case. After all, those components do not exist at the same time.
But it does not make a difference: just evaluate them using expectation values.
Since states |l mi are eigenstates, the expectation value of total square angular
momentum is the actual value, and so is the square angular momentum in the
z-direction. And while the |l mi states are not eigenstates of L b and L b , the
x y
b 2 b 2
expectation values of square Hermitian operators such as Lx and Ly is always
positive anyway (as can be seen from writing it out in terms of the eigenstates
of them.)

711
A.78 The spherical harmonics with ladder op-
erators
One application of ladder operators is to find the spherical harmonics, which as
noted in chapter 3.1.3 is not an easy problem. To do it with ladder operators,
show that
à ! à !
b = h̄ − sin φ ∂ − cos θ cos φ ∂
L b = h̄ cos φ ∂ − cos θ sin φ ∂
L
x y
i ∂θ sin θ ∂φ i ∂θ sin θ ∂φ
(A.75)
then that
à ! à !
+ iφ ∂ cos θ ∂ − −iφ ∂ cos θ ∂
L = h̄e +i L = h̄e − +i (A.76)
∂θ sin θ ∂φ ∂θ sin θ ∂φ

Note that the spherical harmonics are of the form Ylm = eimφ Θm
l (θ), so
m
d(Θml / sin θ)
L+ Ylm = h̄ei(m+1)φ sinm θ

1 d(Θl sinm θ)
m
L− Ylm = −h̄ei(m−1)φ m
sin θ dθ
l b + l
Find the Yl harmonic from L Yl = 0, then apply L b − to find the rest of the
ladder.
Interestingly enough, the solution of the one-dimensional harmonic oscillator
problem can also be found using ladder operators. It turns out that, in the
notation of that problem,
H + = −ipb + mω xb H − = ipb + mω xb
are commutator eigenoperators of the harmonic oscillator Hamiltonian, with
eigenvalues ±h̄ω. So, you can play the same games of constructing ladders. Eas-
ier, really, since there is no equivalent to square angular momentum to worry
about in that problem: there is only one ladder. See [10, pp. 42-47] for de-
tails. An equivalent derivation is given in chapter 10.2.2 based on quantum
field theory.

A.79 Why angular momenta components can


be added
The fact that net angular momentum components can be obtained by summing
the single-particle angular momentum operators is clearly following the Newto-
nian analogy: in classical physics each particle has its own independent angular
momentum, and you just add them up,

712
See also chapter 11.4.

A.80 Why the Clebsch-Gordan tables are bidi-


rectional
The fact that you can read the tables either by rows or by columns is due to the
orthonormality of the states involved. In terms of the real vectors of physics,
it is simply an expression of the fact that the component of one unit vector in
the direction of another unit vector is the same as the component of the second
unit vector in the direction of the first.

A.81 How to make Clebsch-Gordan tables


The procedure of finding the Clebsch-Gordan coefficients for the combination
of any two spin ladders is exactly the same as for electron ones, so it is simple
enough to program.
To further simplify things, it turns out that the coefficients are all square
roots of rational numbers (i.e. ratios of integers such as 102/38.) The step-
up and step-down operators by themselves produce square roots of rational
numbers, so at first glance it would appear that the individual Clebsch-Gordan
coefficients would be sums of square roots. But the square roots of a given
coefficient are all compatible and can be summed into one. To see why, consider
the coefficients that result from applying the combined step down ladder L b− a
ab
few times on the top of the ladder |l lia |l lib . Every contribution to the coefficient
of a state |l mia |l mib comes from applying L b − for l − m times and L b − for
a a a b
lb − mb times, so all contributions have compatible square roots. Lab merelyb −

adds an mab dependent normalization factor.


You might think this pattern would be broken when you start defining the
tops of lower ladders, since that process uses the step up operators. But because
b +L
L b − and L b −L
b + are rational numbers (not square roots), applying the up
operators is within a rational number the same as applying the down ones, and
the pattern turns out to remain.

A.82 Machine language version of the Clebsch-


Gordan tables
The usual “machine language” form of the tables leaves out the a, b, and ab
identifiers, the la = and lb = clarifications from the header, and all square root
signs, the l values of particles a and b from the kets, and all ket terminator bars

713
and brackets, but combines the two m-values with missing l values together in
a frame to resemble an lm ket as well as possible, and then puts it all in a font
that is very easy to read with a magnifying glass or microscope.

A.83 The triangle inequality


The normal triangle inequality continues to apply for expectation values in
quantum mechanics.
The way to show that is, like other triangle inequality proofs, rather curious:
examine the combination of L ~ba , not with L
~bb , but with an arbitrary multiple λ
~bb :
of L
¿³ ´2 À D E D E D E
~ a + λL
L ~b = (Lx,a + λLx,b )2 + (Ly,a + λLy,b )2 + (Lz,a + λLz,b )2
³ ´2
For λ = 1 this produces the expectation value of L ~a + L
~ b , for λ = −1, the
³ ´2
one for L~a − L
~ b . In addition, it is positive for all values of λ, since it consists
of expectation values of square Hermitian operators. (Just examine each term
in terms of its own eigenstates.)
If you multiply out, you get
¿³ ´2 À
~ a + λL
L ~b = L2a + 2M λ + L2b λ2
rD E rD E
where La ≡ + L2xa
+ L2ya
, Lb ≡ L2xb + L2yb + lzb
L2za 2
, and M represents
mixed terms that do not need to be written out. In order for this quadratic
form in λ to always be positive, the discriminant must be negative:

M 2 − L2a L2b ≤ 0

which means, taking square roots,

−La Lb ≤ M ≤ La Lb

and so ¿³ ´2 À
L2a − 2La Lb + L2b ≤ ~a + L
L ~b ≤ L2a + 2La Lb + L2b
or D³ ´E2
|La − Lb |2 ≤ ~a + L
L ~b ≤ |La + Lb |2
and taking square roots gives the triangle inequality.
Note that this derivation does not use any properties specific to angular
momentum and does not require the simultaneous existence of the components.

714
With a bit of messing around, the azimuthal quantum number relation |la −lb | ≤
lab ≤ la + lb can be derived from it if a unique value for lab exists; the key is to
recognize that L = l + δ where δ is an increasing function of l that stays below
1/ , and the l values must be half integers. This derivation is not as elegant as
2
using the ladder operators, but the result is the same.

A.84 Awkward questions about spin


Now of course you ask: how do you know how the mathematical expressions
for spin states change when the coordinate system is rotated around some axis?
Darn.
If you did a basic course in linear algebra, they will have told you how the
components of normal vectors change when the coordinate system is rotated,
but not spin vectors, or spinors, which are two-dimensional vectors in three-
dimensional space.
You need to go back to the fundamental meaning of angular momentum. The
effect of rotations of the coordinate system around the z-axis was discussed in
chapter 11.4. The expressions given there can be straightforwardly generalized
to rotations around a line in the direction of an arbitrary unit vector (nx , ny , nz ).
Rotation by an angle ϕ multiplies the n-direction angular momentum eigenstates
by eimϕ if mh̄ is the angular momentum in the n-direction. For electron spin,
the values for m are ± 12 , so, using the Euler formula (1.5) for the exponential,
the eigenstates change by a factor
³ ´ ³ ´
1 1
cos 2
ϕ ± i sin 2
ϕ
For arbitrary combinations of the eigenstates, the ³ first
´ of the two terms above
still represents multiplication by the number cos 21 ϕ .
The second term may be compared to the effect of the n-direction angular
momentum operator L b , which multiplies the angular momentum eigenstates
n ³ ´
1 b /h̄. So the operator that describes rotation
by ± 2 h̄; it is seen to be 2i sin 21 ϕ L n
of the coordinate system over an angle ϕ around the n-axis is
³ ´ ³ ´2
Rn,ϕ = cos 1
ϕ + i sin 1
ϕ b
L (A.77)
2 2 n

Further, in terms of the x, y, and z angular momentum operators, the an-
gular momentum in the n-direction is
b =n L
L b b b
n x x + ny Ly + nz Lz

If you put it in terms of the Pauli spin matrices, h̄ drops out:


³ ´ ³ ´
1 1
Rn,ϕ = cos 2
ϕ + i sin 2
ϕ (nx σx + ny σy + nz σz )

715
Using this operator, you can find out how the spin-up and spin-down states
are described in terms of correspondingly defined basis states along the x- or
y-axis, and then deduce these correspondingly defined basis states in terms of
the z-ones.
Note however that the very idea of defining the positive x and y angular
momentum states from the z-ones by rotating the coordinate system over 90◦
is somewhat specious. If you rotate the coordinate system over 450◦ instead,
you get a different answer! Off by a factor −1, to be precise. But that is as bad
as the indeterminacy gets; whatever way you rotate the axis system to the new
position, the basis vectors you get will either be the same or only a factor −1
different {A.85}.
More awkwardly, the negative momentum states obtained by rotation do not
lead to real positive numerical factors for the corresponding ladder operators.
Presumably, this reflects the fact that at the wave function level, nature does
not have the rotational symmetry that it has for observable quantities. Anyway,
if nature does not bother to obey such symmetry, then there seems no point in
pretending it does. Especially since the non positive ladder factors would mess
up various formulae. The negative spin states found by rotation go out of the
window. Bye, bye.

A.85 More awkwardness about spin


How about that? A note on a note.
The previous note brought up the question: why can you only change the
spin states you find in a given direction by a factor −1 by rotating your point
of view? Why not by i, say?
With a bit of knowledge of linear algebra and some thought, you can see
that this question is really: how can you change the spin states if you perform
an arbitrary number of coordinate system rotations that end up in the same
orientation as they started?
One way to answer this is to show that the effect of any two rotations of the
coordinate system can be achieved by a single rotation over a suitably chosen
net angle around a suitably chosen net axis. (Mathematicians call this showing
the “group” nature of the rotations.) Applied repeatedly, any set of rotations of
the starting axis system back to where it was becomes a single rotation around a
single axis, and then it is easy to check that at most a change of sign is possible.
(To show that any two rotations are equivalent to one, just crunch out the
multiplication of two rotations, which shows that it takes the algebraic form of
a single rotation, though with a unit vector ~n not immediately evident to be of
length one. By noting that the determinant of the rotation matrix must be one,
it follows that the length is in fact one.)

716
A.86 Emergence of spin from relativity
This note will give a (relatively) simple derivation of the Dirac equation to show
how relativity naturally gives rise to spin. The equation will be derived without
ever mentioning the word spin while doing it, just to prove it can be done. Only
Dirac’s assumption that Einstein’s square root disappears,
v
u 3 3
u X X
t(m c2 )2 + (pb c)2
= α m c 2
+ αi pbi c,
0 i 0 0
i=1 i=1

will be used and a few other assumptions that have nothing to do with spin.
The conditions on the coefficient matrices αi for the linear combination to
equal the square root can be found by squaring both sides in the equation above
and then comparing sides. They turn out to be:

αi2 = 1 for every i αi αj + αj αi = 0 for i 6= j (A.78)

Now assume that the matrices αi are Hermitian, as appropriate for measur-
able energies, and choose to describe the wave function vector in terms of the
eigenvectors of matrix α0 . Under those conditions α0 will be a diagonal matrix,
and its diagonal elements must be ±1 for its square to be the unit matrix. So,
choosing the order of the eigenvectors suitably,
à !
1 0
α0 =
0 −1

where the sizes of the positive and negative unit matrices in α0 are still unde-
cided; one of the two could in principle be of zero size.
However, since α0 αi + αi α0 must be zero for the three other Hermitian αi
matrices, it is seen from multiplying that out that they must be of the form
à ! à ! à !
0 σ1H 0 σ2H 0 σ3H
α1 = α2 = α3 = .
σ1 0 σ2 0 σ3 0

The σi matrices, whatever they are, must be square in size or the αi matrices
would be singular and could not square to one. This then implies that the
positive and negative unit matrices in α0 must be the same size.
Now try to satisfy the remaining conditions on α1 , α2 , and α3 using just
complex numbers, rather than matrices, for the σi . By multiplying out the
conditions (A.78), you see that

αi αi = 1 =⇒ σiH σi = σi σiH = 1

αi αj + αj αi = 0 =⇒ σiH σj + σjH σi = σi σjH + σj σiH = 0.

717
The first condition above would require each σi to be a number of magnitude
one, in other words, a number that can be written as eiφi for some real angle φi .
The second condition is then according to the Euler formula (1.5) equivalent to
the requirement that
cos (φi − φj ) = 0 for i 6= j;
this implies that all three angles would have to be 90 degrees apart. That is
impossible: if φ2 and φ3 are each 90 degrees apart from φ1 , then φ2 and φ3 are
either the same or apart by 180 degrees; not by 90 degrees.
It follows that the components σi cannot be numbers, and must be matrices
too. Assume, reasonably, that they correspond to some measurable quantity
and are Hermitian. In that case the conditions above on the σi are the same as
those on the αi , with one critical difference: there are only three σi matrices,
not four. And so the analysis repeats.
Choose to describe the wave function in terms of the eigenvectors of the σ3
matrix; this does not conflict with the earlier choice since all half wave function
vectors are eigenvectors of the positive and negative unit matrices in α0 . So you
have à !
1 0
σ3 =
0 −1
and the other two matrices must then be of the form
à ! à !
0 τ1H 0 τ2H
σ1 = σ2 = .
τ1 0 τ2 0

But now the components τ1 and τ2 can indeed be just complex numbers, since
there are only two, and two angles can be apart by 90 degrees. You can take
τ1 = eiφ1 and then τ2 = ei(φ1 +π/2) or ei(φ1 −π/2) . The existence of two possibilities
for τ2 implies that on the wave function level, nature is not mirror symmetric;
momentum in the positive y-direction interacts differently with the x- and z
momenta than in the opposite direction. Since the observable effects are mirror
symmetric, do not worry about it and just take the first possibility.
So, the goal of finding a formulation in which Einstein’s square root falls
apart has been achieved. However, you can clean up some more, by redefining
the value of τ1 away. If the 4-dimensional wave function vector takes the form
(a1 , a2 , a3 , a4 ), define ā1 = eiφ1 /2 a1 , ā2 = e−iφ1 /2 a2 and similar for ā3 and ā4 .
In that case, the final cleaned-up σ matrices are
à ! à ! à !
1 0 0 1 0 −i
σ3 = σ1 = σ2 = (A.79)
0 −1 1 0 i 0

The “s” word has not been mentioned even once in this derivation. So, now
please express audible surprise that the σi matrices turn out to be the Pauli (it
can now be said) spin matrices of section 9.1.7.

718
But there is more. Suppose you define a new coordinate system rotated 90
degrees around the z-axis. This turns the old y-axis into a new x-axis. Since
τ2 has an additional factor eiπ/2 , to get the normalized coefficients, you must
include an additional factor eiπ/4 in ā1 , which by the fundamental definition of
angular momentum discussed in chapter 11.4 means that it describes a state
with angular momentum 1/2h̄. Similarly a3 corresponds to a state with angular
momentum 1/2h̄ and a2 and a4 to ones with −1/2h̄.
For nonzero momentum, the relativistic evolution of spin and momentum
becomes coupled. But still, if you look at the eigenstates of positive energy,
they take the form: Ã !
~a
ε(~p · ~σ )~a
where εp is a small number in the non-relativistic limit and ~a is the two-
component vector (a1 , a2 ). The operator corresponding to rotation of the co-
ordinate system around the momentum vector commutes with p~ · ~σ , hence the
entire four-dimensional vector transforms as a combination of a spin 1/2h̄ state
and a spin −1/2h̄ state for rotation around the momentum vector.

A.87 Electromagnetic evolution of expectation


values
The purpose of this note is to identify the two commutators of subsection 9.3;
the one that produces the velocity (or rather, the rate of change in expectation
position), and the one that produces the force (or rather the rate of change in
expectation linear momentum). All basic properties of commutators used in the
derivations below are described in chapter 3.4.4.
The Hamiltonian is
1 ³b ´ ³ ´
~ + qϕ = 1
~ · ~pb − q A
X3
H= ~p − q A (pbj − qAj )2 + qϕ
2m 2m j=1

when the dot product is written out in index notation.


The rate of change in the expectation value of a position vector component
ri is according to chapter 5.1.4 given by
¿ À
dhri i i
= [H, ri ]
dt h̄
so you need the commutator
 
3
1 X
[H, ri ] =  (pbj − qAj )2 + qϕ, ri 
2m j=1

719
Now the term qϕ can be dropped, since functions of position commute with
each other. On the remainder, use the fact that each of the two factors pbj − qAj
comes out at its own side of the commutator, to give
3 ½ ¾
1 X
[H, ri ] = (pbj − qAj )[pbj − qAj , ri ] + [pbj − qAj , ri ](pbj − qAj )
2m j=1

and then again, since the vector potential is just a function of position too, the
qAj can be dropped from the commutators. What is left is zero unless j is the
same as i, since different components of position and momentum commute, and
when j = i, it is minus the fundamental commutator, (minus since the order of
ri and pbi is inverted), and the fundamental commutator has value ih̄, so
1
[H, ri ] = − ih̄(pbi − qAi )
m
Plugging this in the time derivative of the expectation value of position, you get
dhri i 1
= hpbi − qAi i
dt m
so the normal momentum mvi is indeed given by the operator pbi − qAi .
On to the other commutator! The i-th component of Newton’s second law
in expectation form,
¿ À * +
dhvi i i ∂Ai
m = [H, pbi − qAi ] − q
dt h̄ ∂t
requires the commutator
 
3
1 X
[H, pi − qAi ] =
b  (pbj − qAj )2 + qϕ, pi − qAi 
2m j=1

The easiest is the term qϕ, since both ϕ and Ai are functions of position and
commute. And the commutator with pbi is the generalized fundamental operator
of chapter 3.4.4,
∂ϕ
[qϕ, pi ] = ih̄q
∂ri
and plugging that into Newton’s equation, you can verify that the electric field
term of the Lorentz law has already been obtained.
In what is left of the desired commutator, again take each factor pbj − qAj
to its own side of the commutator:
3 ½ ¾
1 X
(pbj − qAj )[pbj − qAj , pi − qAi ] + [pbj − qAj , pi − qAi ](pbj − qAj )
2m j=1

720
Work out the simpler commutator appearing here first:

∂Ai ∂Aj
[pbj − qAj , pi − qAi ] = −q[pj , Ai ] − q[Aj , pi ] = ih̄q − ih̄q
∂rj ∂ri

the first equality because momentum operators and functions commute, and the
second equality is again the generalized fundamental commutator.
Note that by assumption the derivatives of A ~ are constants, so the side
of pbj − qAj that this result appears is not relevant and what is left of the
Hamiltonian becomes
3
( )
qih̄ X ∂Ai ∂Aj
− (pbj − qAj )
m j=1 ∂rj ∂ri

Now let ı be the index following i in the sequence 123123 . . . and ı the one
preceding it (or the second following). Then the sum above will have a term
where j = i, but that term is seen to be zero, a term where j = ı, and a term
where j = ı. The total is then:
( Ã ! Ã !)
qih̄ ∂Ai ∂Aı ∂Aı ∂Ai
(pbı − qAı ) − − (pbı − qAı ) −
m ∂rı ∂ri ∂ri ∂rı

and that is
qih̄ n ³ ´ ³
~ − (pb − qA ) ∇ × A
~
´o
− (pbı − qAı ) ∇ × A ı ı
m ı ı
³ ´
~ × ∇×A
and the expression in brackets is the i-th component of (~pb − q A) ~ and
~ term in Newton’s equation provided that B
produces the q~v × B ~ = ∇ × A.
~

A.88 Existence of magnetic monopoles


Actually, various advanced quantum theories really require the existence of mag-
netic monopoles. But having never been observed experimentally with confi-
dence despite the big theoretical motivation for the search, they are clearly not
a significant factor in real life. Classical electromagnetodynamics assumes that
they do not exist at all.

A.89 More on Maxwell’s third law


Since the voltage is minus the integral of the electric field, it might seem that
there is a plus and minus mixed up in figure 9.11.

721
But actually, it is a bit more complex. The initial effect of the induced
electric field is to drive the electrons towards the pole marked as negative.
(Recall that the charge of electrons is negative, so the force on the electrons is
in the direction opposite to the electric field.) The accumulation of electrons
at the negative pole sets up a counter-acting electric field that stops further
motion of the electrons. Since the leads to the load will be stranded together
rather than laid out in a circle, they are not affected by the induced electric
field, but only by the counter-acting one. If you want, just forget about voltages
and consider that the induced electric field will force the electrons out of the
negative terminal and through the load.

A.90 Various electrostatic derivations.


This section gives various derivations for the electromagnetostatic solutions of
section 9.5.

A.90.1 Existence of a potential


~ (or of any other
This subsection shows that if the curl of the electric field E
vector field, like the magnetic one or a force field), is zero, it is minus the
gradient of some potential.
That potential can be defined to be
Z ~r
ϕ(~r) = − ~ r) d~r
E(~ (A.80)
~r 0

where ~r0 is some arbitrarily chosen reference point. You might think that the
value of ϕ(~r) would depend on what integration path you took from the reference
point to ~r, but the Stokes’ theorem of calculus says that the difference between
integrals leading to the same path must be zero since ∇ × E ~ is zero.
Now if you evaluate ϕ at a neighboring point ~r + ı̂∂x by a path first going
to ~r and from there straight to ~r + ı̂∂x, the difference in integrals is just the
integral over the final segment:
Z ~r +ı̂∂x
ϕ(~r + ı̂∂x) − ϕ(~r) = − ~ r) d~r
E(~ (A.81)
~r

Dividing by ∂x and then taking the limit ∂x → 0 shows that minus the x-
derivative of ϕ gives the x-component of the electric field. The same of course
for the other components, since the x-direction is arbitrary.
Note that if regions are multiply connected, the potential may not be quite
unique. The most important example of that is the magnetic potential of an
infinite straight electric wire. Since the curl of the magnetic field is nonzero

722
inside the wire, the path of integration must stay clear of the wire. It then
turns out that the value of the potential depends on how many times the chosen
integration path wraps around the wire. Indeed, the magnetic potential is ϕm =
−Iθ/2πǫ0 c2 . and as you know, an angle like θ is indeterminate by any integer
multiple of 2π.

A.90.2 The Laplace equation


The homogeneous Poisson equation,

∇2 ϕ = 0 (A.82)

for some unknown function ϕ is called the Laplace equation. It is very important
in many areas of physics and engineering. This note derives some of its generic
properties.
The so-called mean-value property says that the average of ϕ over the surface
of any sphere in which the Laplace equation holds is the value of ϕ at the center
of the sphere. To see why, for convenience take the center of the sphere as the
origin of a spherical coordinate system. Now
Z
0 = ∇2 ϕ d3~r
sphere
Z Z
∂ϕ 2
= r sin θ dθdφ
∂r
1 Z Z ∂ϕ
= sin θ dθdφ
4π ∂r
∂ 1 Z Z
= ϕ sin θ dθdφ
∂r 4π
the first equality since ϕ satisfies the Laplace equation, the second because of the
divergence theorem, the third because the integral is zero, so a constant factor
does not make a difference, and the fourth by changing the order of integration
and differentiation. It follows that the average of ϕ is the same on all spherical
surfaces centered around the origin. Since this includes as a limiting case the
origin and the average of ϕ over the single point at the origin is just ϕ at the
origin, the mean value property follows.
The so called maximum-minimum principle says that either ϕ is constant
everywhere or its maximum and minimum are on a boundary or at infinity.
The reason is the mean-value property above. Suppose there is an absolute
maximum in the interior of the region in which the Laplace equation applies.
Enclose the maximum by a small sphere. Since the values of ϕ would be less
than the maximum on the surface of the sphere, the average value on the surface

723
must be less than the maximum too. But the mean value theorem says it must
be the same. The only way around that is if ϕ is completely constant in the
sphere, but then the “maximum” is not a true maximum. And then you can
start “sphere-hopping” to show that ϕ is constant everywhere. Minima go the
same way.
The only solution of the Laplace equation in all of space that is zero at
infinity is zero everywhere. In more general regions, as long as the solution is
zero on all boundaries, including infinity where relevant, then the solution is
zero everywhere. The reason is the maximum-minimum principle: if there was
a point where the solution was positive/negative, then there would have to be
an interior maximum/minimum somewhere.
The solution of the Laplace equation for given boundary values is unique.
The reason is that the difference between any two solutions must satisfy the
Laplace equation with zero boundary values, hence must be zero.

A.90.3 Egg-shaped dipole field lines


The egg shape of the ideal dipole field lines can be found by assuming that the
dipole is directed along the z-axis. Then the field lines in the x, z-plane satisfy
dz Ez 2z 2 − x2
= =
dx Ex 3zx
Change to a new variable u by replacing z by xu to get:
Z Z
du 1 + u2 3u du dx
x =− =⇒ 2
=−
dx 3u 1+u x
Integrating and replacing u again by z/x gives

(x2 + z 2 )3/2 = Cx2

where C represents the integration constant from the integration. Near the
origin, x ∼ z 3/2 /C; therefor the field line has infinite curvature at the origin,
explaining the pronounced egg shape. Rewritten in spherical coordinates, the
field lines are given by r = C sin2 θ and φ constant, and that is also valid outside
the x, z-plane.

A.90.4 Ideal charge dipole delta function


Next is the delta function in the electric field generated by a charge distribution
that is contracted to an ideal dipole. To find the precise delta function, the
electric field can be integrated over a small sphere, but still large enough that
on its surface the ideal dipole potential is valid. The integral will give the

724
strength of the delta function. Since the electric field is minus the gradient of
the potential, an arbitrary component Ei integrates to
Z Z Z
Ei d3~r = − ∇ · (ϕı̂i ) d3~r = − ϕni dA
sphere sphere sphere surface

where ı̂i is the unit vector in the i-direction and the divergence theorem of
calculus was used to convert the integral to an integral over the surface area
A of the sphere. Noting that the vector ~n normal to the surface of the sphere
equals ~r/r, and that the potential is the ideal dipole one, you get
Z
1 Z ~ · ~r ri

Ei d3~r = − dA
sphere 4πǫ0 sphere surface r3 r
For simplicity, take the z-axis along the dipole moment; then ℘ ~ · ~r = ℘z. For
the x-component Ex , ri = x so that the integrand is proportional to xz, and
that integrates to zero over the surface of the sphere because the negative x-
values cancel the positive ones at the same z. The same for the y component
of the field, so only the z-component, or more generally, the component in the
same direction as ℘ ~ , has a delta function. For Ez , you are integrating z 2 , and by
symmetry that is the same as integrating x2 or y 2 , so it is the same as integrating
1 2
3
r . Since the surface of the sphere equals 4πr2 , the delta function included in
the expression for the field of a dipole as listed in table 9.2 is obtained.

A.90.5 Integrals of the current density


In subsequent derivations, various integrals of the current density ~ are needed.
In all cases it is assumed that the current density vanishes strongly outside some
region. Of course, normally an electric motor or electromagnet has electrical
leads going towards and away from of it; it is assumed that these are stranded
so tightly together that their Rnet effect can be ignored.
Consider an integral like ri m rın ji d3~r where ji is any component j1 , j2 , or
j3 of the current density, ı is the index following i in the sequence . . . 123123 . . .,
m and n are nonnegative integers, and the integration is over all of space.
By integration by parts in the i-direction, and using the fact that the current
densities vanish at infinity,
Z Z
rim+1 n ∂ji 3
rim rın ji d3~r = − r d ~r
m + 1 ı ∂ri
Now use the fact that the divergence of the current density is zero since the
charge density is constant for electromagnetostatic solutions:
Z Z Z
rim+1 n ∂jı 3 rim+1 n ∂jı 3
rim rın ji d3~r = rı d ~r + r d ~r
m + 1 ∂rı m + 1 ı ∂rı

725
where ı is the index preceding i in the sequence . . . 123123 . . .. The final integral
can be integrated in the ı-direction and is then seen to be zero because ~ vanishes
at infinity.
The first integral in the right hand side can be integrated by parts in the
ı-direction to give the final result:
Z Z
3 rim+1 n−1 3
rim rın ji d ~r = − nr jı d ~r (A.83)
m+1 ı
It follows from this equation with m = 0, n = 1 that
Z Z Z
ri jı d3~r = − rı ji d3~r = µı ~µ ≡ 1
2
~r × ~ d3~r (A.84)

with ~µ the current distribution’s dipole moment. In these expressions, you can
swap indices as

(i, ı, ı) → (ı, ı, i) or (i, ı, ı) → (ı, i, ı)

because only the relative ordering of the indices in the sequence . . . 123123 . . .
is relevant.
In quantum applications, it is often necessary to relate the dipole moment to
the angular momentum of the current carriers. Since the current density is the
charge per unit volume times its velocity, you get the linear momentum per unit
volume by multiplying by the ratio mc /qc of current carrier mass over charge.
Then the angular momentum is
Z
~ = mc 3 2mc
L ~r × ~ d ~r = ~µ
qc qc

A.90.6 Lorentz forces on a current distribution


Next is the derivation of the Lorentz forces on a given current distribution ~ in
a constant external magnetic field B~ ext . The Lorentz force law says that the
force F~ on a charge q moving with speed ~v equals

F~ = q~v × B
~ ext

In terms of a current distribution, the moving charge per unit volume times
its velocity is the current density, so the force on a volume element d3~r is:

dF~ = ~ × B
~ ext d3~r

The net force on the current distribution is therefor zero, because according
to (A.83) with m = n = 0, the integrals of the components of the current
distribution are zero.

726
The moment is not zero, however. It is given by
Z ³ ´
~ =
M ~ ext d3~r
~r × ~ × B

According to the vectorial triple product rule, that is


Z ³ ´ Z
~ =
M ~ ext ~ d3~r −
~r · B ~ ext d3~r
(~r · ~) B

The second integral is zero because of (A.83) with m = 1, n = 0. What is left


is can be written in index notation as
Z Z Z
3 3
Mi = ri Bext,i ji d ~r + rı Bext,ı ji d ~r + rı Bext,ı ji d3~r

The first of the three integrals is zero because of (A.83) with m = 1, n = 0. The
other two can be rewritten using (A.84):

Mi = −µı Bext,ı + µı Bext,ı

and in vector notation that reads


~ = ~µ × B
M ~ ext

When the (frozen) current distribution is slowly rotated around the axis
aligned with the moment vector, the work done is

−M dα = −µBext sin α dα = d(µBext cos α)

where α is the angle between ~µ and B~ ext . By integration, it follows that the
work done corresponds to a change in energy for an energy given by
~ ext
Eext = −~µ · B

A.90.7 Field of a current dipole


A current density ~ creates a magnetic field because of Maxwell’s second and
fourth equations for the divergence and curl of the magnetic field:

~ =0
∇·B ~ = 1 ~
∇×B
ǫ0 c 2

where B~ vanishes at infinity assuming there is no additional ambient magnetic


field.
A magnetic vector potential A ~ will now be defined as the solution of the
Poisson equation
~ = − 1 ~
∇2 A
ǫ0 c 2

727
that vanishes at infinity. Taking the divergence of this equation shows that
the divergence of the vector potential satisfies a homogeneous Poisson equation,
because the divergence of the current density is zero, with zero boundary condi-
tions at infinity. Therefor the divergence of the vector potential is zero. It then
follows that
B~ =∇×A ~
~ the divergence of any curl is zero, and
because it satisfies the equations for B:
the curl of the curl of the vector potential is according to the vectorial triple
product its Laplacian, hence the correct curl of the magnetic field.
You might of course wonder whether there might not be more than one
magnetic field that has the given divergence and curl and is zero at infinity.
The answer is no. The difference between any two such fields must have zero
divergence and curl. Therefor the curl of the curl of the difference is zero too,
and the vectorial triple product shows that equal to minus the Laplacian of the
difference. If the Laplacian of the difference is zero, then the difference is zero,
since the difference is zero at infinity (subsection 2). So the solutions must be
the same.
Since the integrals of the current density are zero, (A.83) with m = n = 0,
the asymptotic expansion (9.47) of the Green’s function integral shows that at
large distances, the components of A ~ behave as a dipole potential. Specifically,
X3 Z
1
Ai ∼ r i ri ji d3~r
4πǫ0 c2 r3 i=1

Now the term i = i in the sum does not give a contribution, because of (A.83)
with m = 1, n = 0. The other two terms are
·Z Z ¸
1
Ai ∼ 2 3
rı rı ji d ~r + rı rı ji d3~r
3
4πǫ0 c r
with ı following i in the sequence . . . 123123 . . . and ı preceding it. These two
integrals can be rewritten using (A.84) to give
1
Ai ∼ − [rı µı − rı µı ]
4πǫ0 c2 r3
Note that the expression between brackets is just the i-th component of ~r × ~µ.
~ so
The magnetic field is the curl of A,
∂Aı ∂Aı
Bi = −
∂rı ∂rı
and substituting in for the vector potential from above, differentiating, and
cleaning up produces
~ r2
3(~µ · ~r)~r − µ
Bi =
4πǫ0 c2 r5

728
This is the same asymptotic field as a charge dipole with strength ~µ would have.
However, for an ideal current dipole, the delta function at the origin will be
different than that derived for a charge dipole in the first subsection. Integrate
the magnetic field over a sphere large enough that on its surface, the asymptotic
field is accurate: Z Z Z
3 ∂Aı 3 ∂Aı 3
Bi d ~r = d ~r − d ~r
∂rı ∂rı
Using the divergence theorem, the right hand side becomes an integral over the
surface of the sphere:
Z Z Z
3 rı r
Bi d ~r = Aı dA − Aı ı dA
r r
Substituting in the asymptotic expression for Ai above,
Z ·Z Z ¸
1
Bi d3~r = − (ri µı − rı µi )rı dA − (rı µi − ri µı )rı dA
4πǫ0 c2 r4
The integrals of ri rı and ri rı are zero, for one because the integrand is odd in
ri . The integrals of rı rı and rı rı are each one third of the integral of r2 because
of symmetry. So, noting that the surface area A of the spherical surface is 4πr2 ,
Z
2
Bi d3~r = µi
3ǫ0 c2
That gives the strength of the delta function for an ideal current dipole.

A.90.8 Biot-Savart law


In the previous section, it was noted that the magnetic field of a current dis-
~ This vector potential satisfies the
tribution is the curl of a vector potential A.
Poisson equation
~ = − 1 ~
∇2 A
ǫ0 c 2
The solution for the vector potential can be written explicitly in terms of the
current density using the Green’s function integral (9.45):

1 Z 1
Ai = 2
ji (~r) d3~r
4πǫ0 c |~r − ~r|

~
The magnetic field is the curl of A,

∂Aı ∂Aı
Bi = −
∂rı ∂rı

729
or substituting in and differentiating under the integral
1 Z rı − r ı r − rı
Bi = − 2 3
jı (~r) − ı jı (~r) d3~r
4πǫ0 c |~r − ~r| |~r − ~r|3
In vector notation that gives the Biot-Savart law

~ =− 1 Z ~r − ~r
B × ~ d3~r
4πǫ0 c2 |~r − ~r|3
Now assume that the current distribution is limited to one or more thin wires,
as it usually is. In that case, a volume element of nonzero current distribution
can be written as
~ d3~r = I d~r
where in the right hand side ~r describes the position of the centerline of the wire
and I is the current through the wire. More specifically, I is the integral of the
current density over the cross section of the wire. The Biot-Savart law becomes

~ =− 1 Z ~r − ~r
B × I(~r) d~r
4πǫ0 c2 |~r − ~r|3
where the integration is over all infinitesimal segments d~r of the wires.

A.91 Energy due to orbital motion in a mag-


netic field
This note derives the energy of a charged particle in an external magnetic field.
The field is assumed constant.
According to subsection 9.3, the Hamiltonian is
1 ³b ´
~ 2+V
H= ~p − q A
2m
where m and q are the mass and charge of the particle and the vector potential
~ is related to the magnetic field B
A ~ by B ~ = ∇ × A.~ The potential energy V is
of no particular interest in this note. The first term is, and it can be multiplied
out as:
1 b2 q ³ b ~ ~ b´ q 2 ³ ~ ´2
H= ~p − ~p · A + A · ~p + A +V
2m 2m 2m
The middle two terms in the right hand side are the changes in the Hamiltonian
due to the magnetic field; they will be denoted as:

q ³ b ~ ~ b´ q 2 ³ ~ ´2
HBL ≡ − ~p · A + A · ~p HBD ≡ A
2m 2m
730
~ so that B
Now to simplify the analysis, align the z-axis with B ~ = k̂Bz . Then
~
an appropriate vector potential A is
~ = −ı̂ 1 yBz + ̂ 1 xBz .
A 2 2

~ =
The vector potential is not unique, but a check shows that indeed ∇ × A
~
k̂Bz = B for the one above. Also, the canonical momentum is
h̄ h̄ ∂ h̄ ∂ h̄ ∂
~pb = ∇ = ı̂ + ̂ + k̂
i i ∂x i ∂y i ∂z
Therefor, in the term HBL above,
à !
q b ~ ~ b q h̄ ∂ h̄ ∂ q b
HBL =− (~p · A + A · ~p) = − Bz x −y =− Bz L z
2m 2m i ∂y i ∂x 2m

the latter equality being true because of the definition of angular momentum as
~ Bz L
~r × ~pb. Because the z-axis was aligned with B, b =B
z
~b so, finally,
~ · L,
q ~ ~b
HBL = − B · L.
2m
Similarly, in the part HBD of the Hamiltonian, substitution of the expression
~ produces
for A
q 2 ³ ~ ´2 q2 2 ³ 2 ´
A = Bz x + y 2 ,
2m 8m
or writing it so that it is independent of how the z-axis is aligned,

q2 ³ ~ ´2
HBD = B × ~r
8m

A.92 Energy due to electron spin in a magnetic


field
If you are curious how the magnetic dipole strength of the electron can just pop
out of the relativistic Dirac equation, this note gives a quick derivation.
First, a problem must be addressed. Dirac’s equation, section 9.2, assumes
that Einstein’s energy square root falls apart in a linear combination of terms:
v
u 3 3
u X X
H = t(m0 c2 )2 + (pbi c)2 = α0 m0 c2 + αi pbi c
i=1 i=1

which works for the 4 × 4 α matrices given in that section. For an electron in
~
a magnetic field, according to subsection 9.3 you want to replace ~pb with ~pb − q A

731
where A ~ is the magnetic vector potential. But where should you do that, in the
square root or in the linear combination? It turns out that the answer you get
for the electron energy is not the same.
If you believe that the Dirac linear combination is the way physics really
works, and its description of spin leaves little doubt about that, then the answer
~ in the linear combination, not in the square
is clear: you need to put ~pb − q A
root.
So, what are now the energy levels? That would be hard to say directly from
the linear form, so square it down to H 2 , using the properties of the α matrices,
as given in section 9.2 and its note. You get, in index notation,
³ ´ 3 ³
X ´2 3
X
2 2 2
H = m0 c I+ (pbi − qAi )c I+ [pbı − qAı , pbı − qAı ]c2 αı αı
i=1 i=1

where I is the four by four unit matrix, ı is the index following i in the sequence
123123. . . , and ı is the one preceding i. The final sum represents the additional
squared energy that you get by substituting ~pb − q A ~ in the linear combination
instead of the square root. The commutator arises because αı αı + αı αı = 0,
giving the terms with the indices reversed the opposite sign. Working out the
commutator using the formulae of chapter 3.4.4, and the definition of the vector
potential A,~

³ ´2 3 ³
X ´2 3
X
H 2 = m0 c2 I+ (pbi − qAi )c I + qh̄c2 i Bi αı αı .
i=1 i=1

By multiplying out the expressions for the αi of section 9.2, using the fun-
damental commutation relation for the Pauli spin matrices that σı σı = iσi ,
à !
³ ´ 3 ³
X ´2 3
X σi 0
2 2 2 2
H = m0 c I+ (pbi − qAi )c I − qh̄c Bi
i=1 i=1
0 σi

It it seen that due to the interaction of the spin with the magnetic field, the
square energy changes by an amount −qhc2 σi Bi . Since 12 h̄ times the Pauli spin
matrices gives the spin S,~b the square energy due to the magnetic field acting on
~b · B.
the spin is −2qc2 S ~
In the nonrelativistic case, the rest mass energy m0 c2 is much larger than
the other terms, and in that case, if the change in square energy is −2qc2 S ~b · B,
~
2
the change in energy itself is smaller by a factor 2m0 c , so the energy due to
the magnetic field is
q ~b ~
HSB = − S ·B (A.85)
m
which is what was to be proved.

732
A.93 Setting the record straight on alignment
Some sources claim the spin is under an angle with the magnetic field; this
is impossible since, as pointed out in chapter 3.1.4, the angular momentum
vector does not exist. However, the angular momentum component along the
magnetic field does have measurable values, and these component values, being
one-dimensional, can only be aligned or anti-aligned with the magnetic field.
To intuitively grab the concept of Larmor precession, it may be useful any-
way to think of the various components of angular momentum as having definite
nonzero values, rather than just being uncertain. But the latter is the truth.

A.94 Solving the NMR equations


To solve the two coupled ordinary differential equations for the spin up and
down probabilities, first get rid of the time dependence of the right-hand-side
matrix by defining new variables A and B by

a = Aeiωt/2 , b = Be−iωt/2 .

Then find the eigenvalues and eigenvectors of the now constant matrix. The
eigenvalues can be written as ±iω1 /f , where f is the resonance factor given in
the main text. The solution is then
à !
A
= C1~v1 eiω1 t/f + C2~v2 e−iω1 t/f
B

where ~v1 and ~v2 are the eigenvectors. To find the constants C1 and C2 , apply
the initial conditions A(0) = a(0) = a0 and B(0) = b(0) = b0 and clean up
as well as possible, using the definition of the resonance factor and the Euler
formula.
It’s a mess.

A.95 Derivation of perturbation theory


This note derives the perturbation theory results for the solution of the eigen-
value problem (H0 + H1 )ψ = Eψ where H1 is small. The considerations for
degenerate problems use linear algebra.
First, “small” is not a valid mathematical term. There are no small num-
bers in mathematics, just numbers that become zero in some limit. Therefor,
to mathematically analyze the problem, the perturbation Hamiltonian will be
written as
H1 ≡ εHε

733
where ε is some chosen number that physically indicates the magnitude of the
perturbation potential. For example, if the perturbation is an external elec-
tric field, ε could be taken as the reference magnitude of the electric field. In
perturbation analysis, ε is assumed to be vanishingly small.
The idea is now to start with a good eigenfunction ψ~n,0 of H0 , (where “good”
is still to be defined), and correct it so that it becomes an eigenfunction of
H = H0 + H1 . To do so, both the desired energy eigenfunction and its energy
eigenvalue are expanded in a power series in terms of ε:

ψ~n = ψ~n,0 + εψ~n,ε + ε2 ψ~n,ε2 + . . .


E~n = E~n,0 + εE~n,ε + ε2 E~n,ε2 + . . .

If ε is a small quantity, then ε2 will be much smaller still, and can probably be
ignored. If not, then surely ε3 will be so small that it can be ignored. A result
that forgets about powers of ε higher than one is called first order perturbation
theory. A result that also includes the quadratic powers, but forgets about
powers higher than two is called second order perturbation theory, etcetera.
Before proceeding with the practical application, a disclaimer is needed.
While it is relatively easy to see that the eigenvalues expand in whole powers
of ε, (note that they must be real whether ε is positive or negative), it is much
more messy to show that the eigenfunctions must expand in whole powers. In
fact, for degenerate energies E~n,0 they only do if you choose good states ψ~n,0 .
See Rellich’s lecture notes on Perturbation Theory [Gordon & Breach, 1969]
for a proof. As a result the problem with degeneracy becomes that the good
unperturbed eigenfunction ψ~n,0 is initially unknown. It leads to lots of messiness
in the procedures for degenerate eigenvalues described below.
When the above power series are substituted into the eigenvalue problem to
be solved,
(H0 + εHε ) ψ~n = E~n ψ~n
the net coefficient of every power of ε must be equal in the left and right hand
sides. Collecting these coefficients and rearranging them appropriately pro-
duces:

ε0 : (H0 − E~n,0 )ψ~n,0 = 0


ε1 : (H0 − E~n,0 )ψ~n,ε = −Hε ψ~n,0 + E~n,ε ψ~n,0
ε2 : (H0 − E~n,0 )ψ~n,ε2 = −Hε ψ~n,ε + E~n,ε ψ~n,ε + E~n,ε2 ψ~n,0
ε3 : (H0 − E~n,0 )ψ~n,ε3 = −Hε ψ~n,ε2 + E~n,ε ψ~n,ε2 + E~n,ε2 ψ~n,ε + E~n,ε3 ψ~n,0
..
. ···

734
These are the equations to be solved in succession to give the various terms in
the expansion for the wave function ψ~n and the energy E~n . The further you go
down the list, the better your combined result should be.
Note that all it takes is to solve problems of the form

(H0 − E~n,0 )ψ~n,... = . . .

The equations for the unknown functions are in terms of the unperturbed Hamil-
tonian H0 , with some additional but in principle knowable terms.
For difficult perturbation problems like you find in engineering, the use of a
small parameter ε is essential to get the mathematics right. But in the simple
applications in quantum mechanics, it is usually overkill. So most of the time
the expansions are written without, like

ψ~n = ψ~n,0 + ψ~n,1 + ψ~n,2 + . . .


E~n = E~n,0 + E~n,1 + E~n,2 + . . .

where you are assumed to just imagine that ψ~n,1 and E~n,1 are “first order small,”
ψ~n,2 and E~n,2 are “second order small,” etcetera. In those terms, the successive
equations to solve are:

(H0 − E~n,0 )ψ~n,0 = 0 (A.86)


(H0 − E~n,0 )ψ~n,1 = −H1 ψ~n,0 + E~n,1 ψ~n,0 (A.87)
(H0 − E~n,0 )ψ~n,2 = −H1 ψ~n,1 + E~n,1 ψ~n,1 + E~n,2 ψ~n,0 (A.88)
(H0 − E~n,0 )ψ~n,3 = −H1 ψ~n,2 + E~n,1 ψ~n,2 + E~n,2 ψ~n,1 + E~n,3 ψ~n,0 (A.89)
···

Now consider each of these equations in turn. First, (A.86) is just the
Hamiltonian eigenvalue problem for H0 and is already satisfied by the chosen
unperturbed solution ψ~n,0 and its eigenvalue E~n,0 . However, the remaining equa-
tions are not trivial. To solve them, write their solutions in terms of the other
eigenfunctions ψ~n,0 of the unperturbed Hamiltonian H0 . In particular, to solve
(A.87), write X
ψ~n,1 = c~n,1 ψ~n,0
~
n6=~
n

where the coefficients c~n,1 are still to be determined. The coefficient of ψ~n,0 is
zero on account of the normalization requirement. (And in fact, it is easiest to
take the coefficient of ψ~n,0 also zero for ψ~n,2 , ψ~n,3 , . . . , even if it means that the
resulting wave function will no longer be normalized.)

735
The problem (A.87) becomes
X
c~n,1 (E~n,0 − E~n,0 )ψ~n,0 = −H1 ψ~n,0 + E~n,1 ψ~n,0
n6=~
~ n

where the left hand side was cleaned up using the fact that the ψ~n,0 are eigen-
functions of H0 . To get the first order energy correction E~n,1 , the trick is now to
take an inner product of the entire equation with hψ~n,0 |. Because of the fact that
the energy eigenfunctions of H0 are orthonormal, this inner product produces
zero in the left hand side, and in the right hand side it produces:

0 = −H~n~n,1 + E~n,1 H~n~n,1 = hψ~n,0 |H1 ψ~n,0 i

And that is exactly the first order correction to the energy claimed in subsec-
tion 10.1.1; E~n,1 equals the Hamiltonian perturbation coefficient H~n~n,1 . If the
problem is not degenerate or ψ~n,0 is good, that is.
To get the coefficients c~n,1 , so that you know what is the first order correction
ψ~n,1 to the wave function, just take an inner product with each of the other
eigenfunctions hψ~n,0 | of H0 in turn. In the left hand side it only leaves the
coefficient of the selected eigenfunction because of orthonormality, and for the
same reason, in the right hand side the final term drops out. The result is

c~n,1 (E~n,0 − E~n,0 ) = −H~n~n,1 for ~n 6= ~n H~n~n,1 = −hψ~n,0 |H1 ψ~n,0 i

The coefficients c~n,1 can normally be computed from this.


Note however that if the problem is degenerate, there will be eigenfunctions
ψ~n,0 that have the same energy E~n,0 as the eigenfunction ψ~n,0 being corrected.
For these the left hand side in the equation above is zero, and the equation
cannot in general be satisfied. If so, it means that the assumption that an
eigenfunction ψ~n of the full Hamiltonian expands in a power series in ε starting
from ψ~n,0 is untrue. Eigenfunction ψ~n,0 is bad. And that means that the first
order energy correction derived above is simply wrong. To fix the problem, what
needs to be done is to identify the submatrix of all Hamiltonian perturbation
coefficients in which both unperturbed eigenfunctions have the energy E~n,0 , i.e.
the submatrix
all H~ni~nj ,1 with E~ni ,0 = E~nj ,0 = E~n,0
The eigenvalues of this submatrix are the correct first order energy changes.
So, if all you want is the first order energy changes, you can stop here. Oth-
erwise, you need to replace the unperturbed eigenfunctions that have energy
E~n,0 . For each orthonormal eigenvector (c1 , c2 , . . .) of the submatrix, there is a
corresponding replacement unperturbed eigenfunction

c1 ψ~n1 ,0,old + c2 ψ~n2 ,0,old + . . .

736
You will need to rewrite the Hamiltonian perturbation coefficients in terms
of these new eigenfunctions. (Since the replacement eigenfunctions are linear
combinations of the old ones, no new integrations are needed.) You then need
to reselect the eigenfunction ψ~n,0 whose energy to correct from among these
replacement eigenfunctions. Choose the first order energy change (eigenvalue
of the submatrix) E~n,1 that is of interest to you and then choose ψ~n,0 as the
replacement eigenfunction corresponding to a corresponding eigenvector. If the
first order energy change E~n,1 is not degenerate, the eigenvector is unique, so
ψ~n,0 is now good. If not, the good eigenfunction will be some combination of
the replacement eigenfunctions that have that first order energy change, and
the good combination will have to be figured out later in the analysis. In any
case, the problem with the equation above for the c~n,1 will be fixed, because the
new submatrix will be a diagonal one: H~n~n,1 will be zero when E~n,0 = E~n,0 and
~n 6= ~n. The coefficients c~n,1 for which E~n,0 = E~n,0 remain indeterminate at this
stage. They will normally be found at a later stage in the expansion.
With the coefficients c~n,1 as found, or not found, the sum for the first order
perturbation ψ~n,1 in the wave function becomes
X H~n~n,1 X
ψ~n,1 = − ψ~n,0 + c~n,1 ψ~n,0
E~n,0 6=E~n,0 E~n,0 − E~n,0 E~n,0 =E~n,0
n6=~
~ n

The entire process repeats for higher order. In particular, to second order
(A.88) gives, writing ψ~n,2 also in terms of the unperturbed eigenfunctions,
X X H~n~n,1
c~n,2 (E~n,0 − E~n,0 )ψ~n,0 = (H1 − E~n,1 ) ψ~n,0
~
n E~n,0 6=E~n,0 E~n,0 − E~n,0
X
− c~n,1 (H1 − E~n,1 ) ψ~n,0 + E~n,2 ψ~n,0
E~n,0 =E~n,0
~n6=~
n

To get the second order contribution to the energy, take again an inner product
with hψ~n,0 |. That produces, again using orthonormality, (and diagonality of the
submatrix discussed above if degenerate),
X H~n~n,1 H~n~n,1
0= + E~n,2
E~n,0 6=E~n,0 E~n,0 − E~n,0

This gives the second order change in the energy stated in subsection 10.1.1, if
ψ~n,0 is good. Note that since H1 is Hermitian, the product of the two Hamilto-
nian perturbation coefficients in the expression is just the square magnitude of
either.

737
In the degenerate case, when taking an inner product with a hψ~n,0 | for which
E~n,0 = E~n,0 , the equation can be satisfied through the still indeterminate c~n,1
provided that the corresponding diagonal coefficient H~n~n,1 of the diagonalized
submatrix is unequal to E~n,1 = H~n~n,1 . In other words, provided that the first
order energy change is not degenerate. If that is untrue, the higher order sub-
matrix
X H~ni~n,1 H~n~nj ,1
all with E~ni ,0 = E~nj ,0 = E~n,0 E~ni ,1 = E~nj ,1 = E~n,1
E~n,0 6=E~n,0 E~n,0 − E~n,0

will need to be diagonalized, (the rest of the equation needs to be zero). Its
eigenvalues give the correct second order energy changes. To proceed to still
higher energy, reselect the eigenfunctions following the same general lines as
before. Obviously, in the degenerate case the entire process can become very
messy. And you may never become sure about the good eigenfunction.
This problem can often be eliminated or greatly reduced if the eigenfunctions
of H0 are also eigenfunctions of another operator A, and H1 commutes with A.
Then you can arrange the eigenfunctions ψ~n,0 into sets that have the same
value for the “good” quantum number a of A. You can analyze the perturbed
eigenfunctions in each of these sets while completely ignoring the existence of
eigenfunctions with different values for quantum number a.
To see why, consider two example eigenfunctions ψ1 and ψ2 of A that have
different eigenvalues a1 and a2 . Since H0 and H1 both commute with A, their
sum H does, so

0 = hψ2 |(HA − AH)ψ1 i = hψ2 |HAψ1 i + hAψ2 |Hψ1 i = (a1 − a2 )hψ2 |H|ψ1 i

and since a1 − a2 is not zero, hψ2 |H|ψ1 i must be. Now hψ2 |H|ψ1 i is the amount
of eigenfunction ψ2 produced by applying H on ψ1 . It follows that applying H
on an eigenfunction with an eigenvalue a1 does not produce any eigenfunctions
with different eigenvalues a. Thus an eigenfunction of H satisfying
   
X X X X
H c~n ψ~n,0 + c~n ψ~n,0  = E~n  c~n ψ~n,0 + c~n ψ~n,0 
a=a1 a6=a1 a=a1 a6=a1
P
can be replaced by just a=a1 c~n ψ~n,0 , since this by itself must satisfy the eigen-
value problem: the Hamiltonian of the second sum does not produce any amount
of eigenfunctions in the first sum and vice-versa. (There must always be at least
one value of a1 for which the first sum at ε = 0 is independent of the other eigen-
functions of H.) Reduce every eigenfunction of H to an eigenfunction of A in
this way. Now the existence of eigenfunctions with different values of a than
the one being analyzed can be ignored since the Hamiltonian does not produce
them. In terms of linear algebra, the Hamiltonian has been reduced to block

738
diagonal form, with each block corresponding to a set of eigenfunctions with a
single value of a. If the Hamiltonian also commutes with another operator B
that the ψ~n,0 are eigenfunctions of, the argument repeats for the subsets with a
single value for b.
The Hamiltonian perturbation coefficient hψ2 |H1 |ψ1 i is zero whenever two
good quantum numbers a1 and a2 are unequal. The reason is the same as for
hψ2 |H|ψ1 i above. Only perturbation coefficients for which all good quantum
numbers are the same can be nonzero.

A.96 Stark effect on the hydrogen ground state


This note derives the Stark effect on the hydrogen ground state. Since spin is
irrelevant for the Stark effect, it will be ignored.
The unperturbed ground state of hydrogen was derived in chapter 3.2. Fol-
lowing the convention in perturbation theory to append a subscript zero to the
unperturbed state, it can be summarized as:
h̄2 2 1
H0 ψ100,0 = E100,0 ψ100,0 H0 = − ∇ +V ψ100,0 = q e−r/a0
2me πa30
where H0 is the unperturbed hydrogen atom Hamiltonian, ψ100,0 the unper-
turbed ground state wave function, E100,0 the unperturbed ground state energy,
13.6 eV, and a0 is the Bohr radius, 0.53 Å.
The Stark perturbation produces a change ψ100,1 in this wave function that
satisfies, from (10.5),
(H0 − E100,0 )ψ100,1 = −(H1 − E100,1 )ψ100,0 H1 = eEext z
The first order energy change E100,1 is zero and can be dropped. The solution
for ψ100,1 will now simply be guessed to be ψ100,0 times some spatial function f
still to be found:
h̄2 2
(H0 − E100,0 ) (f ψ100,0 ) = −eEext zψ100,0 H0 = − ∇ +V
2me
Differentiating out the Laplacian ∇2 of the product f ψ100,0 into individual terms
using Cartesian coordinates, the equation becomes
h̄2 h̄2
f (H0 − E100,0 ) ψ100,0 − (∇f ) · (∇ψ100,0 ) − (∇2 f )ψ100,0 = −eEext zψ100,0
me 2me
The first term in this equation is zero since H0 ψ100,0 = E100,0 ψ100,0 . Also, now
using spherical coordinates, the gradients are, e.g. [15, 20.74, 20.82],
∂f 1 ∂f 1 ∂f 1
∇f = ı̂r + ı̂θ + ı̂φ ∇ψ100,0 = −ψ100,0 ı̂r
∂r r ∂θ r sin θ ∂φ a0

739
Substituting that into the equation, it reduces to
à !
h̄2 1 ∂f 1
− ∇2 f ψ100,0 = −eEext zψ100,0
me a0 ∂r 2

Now z = r cos θ in polar coordinates, and for the r-derivative of f to pro-


duce something that is proportional to r, f must be proportional to r2 . (The
Laplacian in the second term always produces lower powers of r than the r-
derivative and can for now be ignored.) So, to balance the right hand side, f
should contain a highest power of r equal to:
me eEext a0 2
f =− r cos θ + . . .
2h̄2
but then, using [15, 20.83], the ∇2 f term in the left hand side produces an
eEext a0 cos θ term. So add another term to f for its r-derivative to eliminate it:
me eEext a0 2 me eEext a20
f =− r cos θ − r cos θ
2h̄2 h̄2
The Laplacian of r cos θ = z is zero so no further terms need to be added. The
change f ψ100,0 in wave function is therefor
me eEext a0 ³ ´
ψ100,1 = − q r2 + 2a0 r e−r/a0 cos θ
2h̄2 πa30

(This “small perturbation” becomes larger than the unperturbed wave function
far from the atom because of the growing value of r2 . It is implicitly assumed
that the electric field terminates before a real problem arises. This is related
to the possibility of the electron tunneling out of the atom if the potential far
from the atom is less than its energy in the atom: if the electron can tunnel
out, there is strictly speaking no bound state.)
Now according to (10.5), the second order energy change can be found as

E100,2 = hψ100,0 |H1 ψ100,1 i H1 = eEext r cos θ

Doing the inner product integration in spherical coordinates produces


9me e2 Eext
2
a40
E100,2 = −
4h̄2

A.97 Dirac fine structure Hamiltonian


This note derives the fine structure Hamiltonian of the hydrogen atom. This
Hamiltonian fixes up the main relativistic errors in the classical solution of

740
chapter 3.2. The derivation is based on the relativistic Dirac equation from
chapter 9.2 and uses nontrivial linear algebra.
According to the Dirac equation, the relativistic Hamiltonian and wave func-
tion take the form
à ! à ! à ! à !
1 0 3
X 0 σi 1 0 ~p
ψ
HD = m e c 2
+ cpbi +V ~D =
ψ
0 −1 σi 0 0 1 ~n
ψ
i=1

where me is the mass of the electron when at rest, c the speed of light, and the
σi are the 2×2 Pauli spin matrices of chapter 9.1.7. Similarly the ones and zeros
in the shown matrices are 2 × 2 unit and zero matrices. The wave function is a
four-dimensional vector whose components depend on spatial position. It can be
subdivided into the two-dimensional vectors ψ ~ p and ψ
~ n . The two components of
~ p correspond to the spin up and spin down components of the normal classical
ψ
electron wave function; as noted in chapter 4.5.1, this can be thought of as a
vector if you want. The two components of the other vector ψ ~ n are very small
for the solutions of interest. These components would be dominant for states
that would have negative rest mass. They are associated with the anti-particle
of the electron, the positron.
The Dirac equation is solvable in closed form, but that solution is not some-
thing you want to contemplate if you can avoid it. And there is really no need
for it, since the Dirac equation is not exact anyway. To the accuracy it has, it
can easily be solved using perturbation theory in essentially the same way as
in note {A.95}. In this case, the small parameter is 1/c: if the speed of light
is infinite, the nonrelativistic solution is exact. And if you ballpark a typical
velocity for the electron in a hydrogen atom, it is only about one percent or so
of the speed of light.
So, following note {A.95}, take the Hamiltonian apart into successive powers
of 1/c as HD = HD,0 + HD,1 + HD,2 with
à ! 3
à ! à !
me c2 0 X 0 cpbi σi V 0
HD,0 = HD,1 = HD,2 =
0 −me c2 i=1
cpbi σi 0 0 V

and similarly for the wave function vector:


à ! à ! à ! à ! à !
~0p
ψ ~1p
ψ ~2p
ψ ~3p
ψ ~4p
ψ
~D =
ψ + + + + + ...
~n
ψ ~n
ψ ~n
ψ ~n
ψ ~n
ψ
0 1 2 3 4

and the energy:

ED = ED,0 + ED,1 + ED,2 + ED,3 + ED,4 + . . .


~D = ED ψ
Substitution into the Hamiltonian eigenvalue problem HD ψ ~D and then
collecting equal powers of 1/c together produces again a system of successive

741
equations, just like in note {A.95}:
"Ã ! Ã !# Ã !
2 me c2 0 ED,0 0 ~0p
ψ
c : − ~n =0
0 −me c2 0 ED,0 ψ0

"Ã ! Ã !# Ã !
1 me c2 0 ED,0 0 ~1p
ψ
c : − ~n =
0 −me c2 0 ED,0 ψ1
" 3 Ã ! Ã !# Ã !
X 0 cpbi σi ED,1 0 ~0p
ψ
− − ~n
i=1
cpbi σi 0 0 ED,1 ψ0

"Ã ! Ã !# Ã !
0 me c2 0 ED,0 0 ~2p
ψ
c : − ~n =
0 −me c2 0 ED,0 ψ2
" 3 Ã ! Ã !# Ã !
X 0 cpbi σi ED,1 0 ~1p
ψ
− − ~n
i=1
cpbi σi 0 0 ED,1 ψ1
"Ã ! Ã !# Ã !
V 0 ED,2 0 ~0p
ψ
− − ~n
0 V 0 ED,2 ψ0

"Ã ! Ã !# Ã !
−1 me c2 0 ED,0 0 ~3p
ψ
c : − ~n =
0 −me c2 0 ED,0 ψ3
" Ã ! Ã !# Ã !
3
X 0 cpbi σi ED,1 0 ~2p
ψ
− − ~n
i=1
cpbi σi 0 0 ED,1 ψ2
"Ã ! Ã !# Ã !
V 0 ED,2 0 ~1p
ψ
− − ~n
0 V 0 ED,2 ψ1
à !à !
ED,3 0 ~0p
ψ
+ ~n
0 ED,3 ψ0

"Ã ! Ã !# Ã !
me c2 0 ED,0 0 ~4p
ψ
c−2 : − ~n =
0 −me c2 0 ED,0 ψ4
" Ã ! Ã !# Ã !
3
X 0 cpbi σi ED,1 0 ~3p
ψ
− − ~n
i=1
cpi σi
b 0 0 ED,1 ψ3
"Ã ! Ã !# Ã !
V 0 ED,2 0 ~2p
ψ
− − ~n
0 V 0 ED,2 ψ2
à !à ! à !à !
ED,3 0 ~1p
ψ ED,4 0 ~0p
ψ
+ ~n + ~n
0 ED,3 ψ1
0 ED,4 ψ0

742
c−3 : ···

The first, order c2 , eigenvalue problem has energy eigenvalues ±me c2 , in


other words, plus or minus the rest mass energy of the electron. The solution
of interest is the physical one with a positive rest mass, so the desired solution
is
ED,0 = me c2 ψ~0p = still arbitrary ~n = 0
ψ0

Plug that into the order c1 equation to give, for top and bottom subvectors
X
~0p
0 = ED,1 ψ ~n = −
− 2me c2 ψ ~0p
cpbi σi ψ
1
i

It follows from the first of those that the first order energy change must be zero
because ψ ~0p cannot be zero; otherwise there would be nothing left. The second
equation gives the leading order values of the secondary components, so in total
X 1
ED,1 = 0 ~1p = still arbitrary
ψ ~n =
ψ ~0p
pbj σj ψ
1
j 2me c

where the summation index i was renamed to j to avoid ambiguity later.


Plug all that in the order c0 equation to give
1 XX ~0p − V ψ ~0p
~0p + ED,2 ψ ~n =
X 1 ~1p
0=− pbi pbj σi σj ψ ψ2 pbj σj ψ
2me i j j 2me c

The first of these two equations is the non-relativistic Hamiltonian eigenvalue


problem of chapter 3.2. To see that, note that in the double sum the terms
with j 6= i pairwise cancel since for the Pauli matrices, σi σj + σj σi = 0 when
j 6= i. For the remaining terms in which j = i, the relevant property of the
Pauli matrices is that σi σi is one (or the 2 × 2 unit matrix, really,) giving
1 XX 1 X 2
pbi pbj σi σj + V = pb + V ≡ H0
2me i j 2me i i

where H0 is the nonrelativistic hydrogen Hamiltonian of chapter 3.2.


So the first part of the order c0 equation takes the form
~0p = ED,2 ψ
H0 ψ ~0p

The energy ED,2 will therefor have to be a Bohr energy level En and each
~0p will have to be a non-relativistic energy eigenfunction with
component of ψ
that energy:
XX XX
ED,2 = En ~0p =
ψ clm+ ψnlm ↑ + clm− ψnlm ↓
l m l m

743
The sum multiplying ↑ is the first component of vector ψ ~0p and the sum multiply-
ing ↓ the second. The nonrelativistic analysis in chapter 3.2 was indeed correct
as long as the speed of light is so large compared to the relevant velocities that
1/c can be ignored.
To find out the error in it, the relativistic expansion must be taken to higher
order. To order c−1 , you get for the top vector
~1p + ED,3 ψ
0 = −(H0 − En )ψ ~0p

Now if ψ ~1p is written as a sum of the eigenfunctions of H0 , including ψ


~0p , the
~0 since (H0 − En )ψ
first term will produce zero times ψ p ~0 = 0. That means that
p

ED,3 must be zero. The expansion must be taken one step further to identify
the relativistic energy change. The bottom vector gives
X X 1
~n =
ψ
1 ~2p + V − En
pbj σj ψ ~0p
pbj σj ψ
3 2
j 2me c 2me c j 2me c

To order c−2 , you get for the top vector


XX V − En
~2p −
0 = −(H0 − En )ψ pbi σi ~0p + ED,4 ψ
pbj σj ψ ~0p
2
4me c 2
i j

and that determines the approximate relativistic energy correction.


Now recall from note {A.95} that if you do a nonrelativistic expansion of an
eigenvalue problem (H0 + H1 )ψ = Eψ, the equations to solve are (A.86) and
(A.87);

(H0 − E~n,0 )ψ~n,0 = 0 (H0 − E~n,0 )ψ~n,1 = −(H1 − E~n,1 )ψ~n,0

The first equation was satisfied by the solution for ψ~0p obtained above. However,
the second equation presents a problem. Comparison with the final Dirac result
suggests that the fine structure Hamiltonian correction H1 should be identified
as
? X X b V − En b
H1 = pi σi pj σj
i j 4m2e c2
but that is not right, since En is not a physical operator, but an energy eigen-
value for the selected eigenfunction. So mapping the Dirac expansion straight-
forwardly onto a classical one has run into a snag.
It is maybe not that surprising that a two-dimensional wave function can-
not correctly represent a truly four dimensional one. But clearly, whatever is
selected for the fine structure Hamiltonian H1 must at least get the energy
eigenvalues right. To see how this can be done, the operator obtained from the
Dirac equation will have to be simplified. Now for any given i, the sum over j

744
includes a term j = i, a term j = ı, where ı is the number following i in the
cyclic sequence . . . 123123 . . ., and it involves a term j = ı where ı precedes i in
the sequence. So the Dirac operator falls apart into three pieces:

? X b V − En b X V − En X V − En
H1 = pi σi p i σi + p
bi σi bı σı +
p p
bi σi pb σ
i 4m2e c2 i 4m2e c2 i 4m2e c2 ı ı

or using the properties of the Pauli matrices that σi σi = 1, σi σı = iσı , and


σi σı = −iσı for any i,

? X b V − En b X V − En
bı σı − i
X V − En
H1 = pi 2 c2
p i + i p
bi
2 c2
p pbi pb σ
2 c2 ı ı
(A.90)
i 4m e i 4m e i 4m e

The approach will now be to show first that the final two terms are the
spin-orbit interaction in the fine structure Hamiltonian. After that, the much
more tricky first term will be discussed. Renotate the indices in the last two
terms as follows:
X V − En X V − En
H1,spin-orbit = i pbı 2 2
pbı σi − i pbı pbı σi
i 4me c i 4m2e c2

Since the relative order of the subscripts in the cycle was maintained in the
renotation, the sums still contain the exact same three terms, just in a different
order. Take out the common factors;
i X
H1,spin-orbit = [pbı (V − En )pbı − pbı (V − En )pbı ] σi
4m2e c2 i

Now according to the generalized canonical commutator of chapter 3.4.4:


∂(V − En )
pbi (V − En ) = (V − En )pbi − ih̄
∂ri
where En is a constant that produces a zero derivative. So pbı , respectively pbı
can be taken to the other side of V − En as long as the appropriate derivatives
of V are added. If that is done, (V − En )pbı pbı and −(V − En )pbı pbı cancel since
linear momentum operators commute. What is left are just the added derivative
terms: " #
h̄ X ∂V ∂V
H1,spin-orbit = pb − pbı σi
4m2e c2 i ∂rı ı ∂rı
Note that the errant eigenvalue En mercifully dropped out. Now the hydrogen
Hamiltonian V only depends on the distance r from the origin, as 1/r, so
∂V V
= − 2 ri
∂ri r

745
and plugging that into the operator, you get
h̄V X
H1,spin-orbit = − [rı pbı − rı pbı ] σi
4m2e c2 r2 i
The term between the square brackets can be recognized as the ith component
of the angular momentum operator; also the Pauli spin matrix σi is defined as
Sbi / 21 h̄, so
V X
H1,spin-orbit = − 2 2 2 b S
L b
i i
2me c r i
Get rid of c2 using |E1 | = 21 α2 me c2 , of V using V = −2|E1 |a0 /r, and me using
|E1 | = h̄2 /2me a20 to get the spin-orbit interaction as claimed in the section on
fine structure.
That leaves the term
X V − En
pbi pb
2 c2 i
i 4m e

in (A.90). Since V = H0 − pb2 /2me , it can be written as


2
X H0 − En (pb 2 )
pbi bi −
p
i 4m2e c2 8m3e c2
The final term is the claimed Einstein correction in the fine structure Hamilto-
nian, using |E1 | = 12 α2 me c2 to get rid of c2 .
The first term,
? X b H0 − En b
H1,Darwin = pi pi
i 4m2e c2
is the sole remaining problem. It cannot be transformed into a decent physical
operator. The objective is just to get the energy correction right. And to achieve
that requires only that the Hamiltonian perturbation coefficients are evaluated
correctly at the En energy level. Specifically, what is needed is that
1 X
H~n~n,1,Darwin ≡ hψ~n,0 |H1,Darwin ψ~n,0 i = hψ~n,0 |pbi (H0 − En )pbi ψ~n,0 i
4m2e c2 i
for any arbitrary pair of unperturbed hydrogen energy eigenfunctions ψ~n,0 and
ψ~n,0 with energy En . To see what that means, the leading Hermitian operator
pbi can be taken to the other side of the inner product, and in half of that result,
H0 − En will also be taken to the other side:
1 X³ ´
H~n~n,1,Darwin = h bi ψ~n,0 |(H0 − En )pbi ψ~n,0 i + h(H0 − En )pbi ψ~n,0 |pbi ψ~n,0 i
p
8m2e c2 i

Now if you simply swap the order of the factors in (H0 −En )pbi in this expression,
you get zero, because both eigenfunctions have energy En . However, swapping

746
the order of (H0 − En )pbi brings in the generalized canonical commutator [V, pbi ]
that equals ih̄∂V /∂ri . Therefor, writing out the remaining inner product you
get
−h̄2 X Z ∂V ∂ψ~n∗ ,0 ψ~n,0 3
H~n~n,1,Darwin = d ~r
8m2e c2 i all ~r ∂ri ∂ri
Now, the potential V becomes infinite at r = 0, and that makes mathematical
manipulation difficult. Therefor, assume for now that the nuclear charge e is
not a point charge, but spread out over a very small region around the origin.
In that case, the inner product can be rewritten as
" Ã ! #
−h̄2 X Z ∂ ∂V ∗ ∂2V ∗
H~n~n,1,Darwin = ψ~n,0 ψ~n,0 − ψ ψ~n,0 d3~r
2 2
8me c i all ~r ∂ri ∂ri ∂ri2 ~n,0

and the first term integrates away since ψ~n∗ ,0 ψ~n,0 vanishes at infinity. In the
final term, use the fact that the derivatives of the potential energy V give e
times the electric field of the nucleus, and therefor the second order derivatives
give e times the divergence of the electric field. Maxwell’s first equation (9.23)
says that that is e/ǫ0 times the nuclear charge density. Now if the region of
nuclear charge is allowed to contract back to a point, the charge density must
still integrate to the net proton charge e, so the charge density becomes eδ 3 (~r)
where δ 3 (~r) is the three-dimensional delta function. Therefor the Darwin term
produces Hamiltonian perturbation coefficients as if its Hamiltonian is

h̄2 e2 3
H1,Darwin = δ (~r)
8m2e c2 ǫ0

Get rid of c2 using |E1 | = 12 α2 me c2 , of e2 /ǫ0 using e2 /4πǫ0 = 2|E1 |a0 , and me
using |E1 | = h̄2 /2me a20 to get the Darwin term as claimed in the section on fine
structure. It will give the right energy correction for the nonrelativistic solution.
But you may rightly wonder what to make of the implied wave function.

A.98 Classical spin-orbit derivation


This note derives the spin-orbit Hamiltonian from a more intuitive, classical
point of view than the Dirac equation mathematics.
Picture the magnetic electron as containing a pair of positive and negative
magnetic monopoles of a large strength qm . The very small distance from neg-
ative to positive pole is denoted by d~ and the product ~µ = qm d~ is the magnetic
dipole strength, which is finite.
Next imagine this electron smeared out in some orbit encircling the nucleus
with a speed ~v . The two poles will then be smeared out into two parallel “mag-
netic currents” that are very close together. The two currents have opposite

747
directions because the velocity ~v of the poles is the same while their charges are
opposite. These magnetic currents will be encircled by electric field lines just
like the electric currents in figure 9.21 were encircled by magnetic field lines.
Now assume that seen from up very close, a segment of these currents will
seem almost straight and two-dimensional, so that two-dimensional analysis can
be used. Take a local coordinate system such that the z-axis is aligned with the
negative magnetic current and in the direction of positive velocity. Rotate the
x, y-plane around the z-axis so that the positive current is to the right of the
negative one. The picture is then just like figure 9.21, except that the currents
are magnetic and the field lines electric. In this coordinate system, the vector
from negative to positive pole takes the form d~ = dx ı̂ + dz k̂.
′ ′
The magnetic current strength is defined as qm v, where qm is the moving
magnetic charge per unit length of the current. So, according to table 9.2
the negative current along the z-axis generates a two-dimensional electric field
whose potential is
µ ¶
q′ v q′ v y
ϕ⊖ = − m 2 θ = − m 2 arctan
2πǫ0 c 2πǫ0 c x
To get the field of the positive current a distance dx to the right of it, shift x
and change sign: µ ¶
q′ v y
ϕ⊕ = m 2 arctan
2πǫ0 c x − dx
If these two potentials are added, the difference between the two arctan functions
can be approximated as −dx times the x derivative of the unshifted arctan. That
can be seen from either recalling the very definition of the partial derivative, or
from expanding the second arctan in a Taylor series in x. The bottom line is
that the monopoles of the moving electron generate a net electric field with a
potential
q ′ dx v y
ϕ= m 2 2
2πǫ0 c x + y 2
Now compare that with the electric field generated by a couple of opposite
electric line charges like in figure 9.18, a negative one along the z-axis and a
positive one above it at a position y = dc . The electric dipole moment per unit
length of such a pair of line charges is by definition ℘ ~ ′ = q ′ dc ̂, where q ′ is the
electric charge per unit length. According to table 9.1, a single electric charge
along the z-axis creates an electric field whose potential is
q′ 1 q′ ³ ´
ϕ= ln = − ln x2 + y 2
2πǫ0 r 4πǫ0
For an electric dipole consisting of a negative line charge along the z axis and a
positive one above it at y = dc , the field is then
q′ ³ ´ q′ ³ ´
2 2
ϕ=− ln x + (y − d) + ln x2 + y 2
4πǫ0 4πǫ0

748
and the difference between the two logarithms can be approximated as −dc
times the y-derivative of the unshifted one. That gives

q ′ dc y
ϕ=
2πǫ0 x + y 2
2

Comparing this with the potential of the monopoles, it is seen that the
magnetic currents create an electric dipole in the y-direction whose strength ℘ ~′
′ 2
is qm dx v/c ̂. And since in this coordinate system the magnetic dipole moment
is ~µ ′ = qm

(dx ı̂ + dz k̂) and the velocity v k̂, it follows that the generated electric
dipole strength is
~ ′ = −~µ ′ × ~v /c2

Since both dipole moments are per unit length, the same relation applies be-
tween the actual magnetic dipole strength of the electron and the electric dipole
strength generated by its motion. The primes can be omitted.
Now the energy of the electric dipole is −~℘·E ~ where E~ is the electric field
3
of the nucleus, e~r/4πǫ0 r according to table 9.1. So the energy is:

e 1
~r · (~µ × ~v )
4πǫ0 c r3
2

and the order of the triple product of vectors can be changed and then the
angular momentum can be substituted:

e 1 e 1 ~
− 2 3
~µ · (~r × ~v ) = − ~µ · L
4πǫ0 c r 4πǫ0 c me r3
2

To get the correct spin-orbit interaction, the magnetic dipole moment ~µ used
in this expression must be the classical one, −eS/2m~ e . The additional factor
ge = 2 for the energy of the electron in a magnetic field does not apply here.
There does not seem to be a really good reason to give for that, except for saying
that the same Dirac equation that says that the additional g-factor is there in
the magnetic interaction also says it is not in the spin-orbit interaction. The
expression for the energy becomes

e2 1~ ~
S·L
8πǫ0 me c r3
2 2

Getting rid of c2 using |E1 | = 21 α2 me c2 , of e2 /ǫ0 using e2 /4πǫ0 = 2|E1 |a0 , and
of me using |E1 | = h̄2 /2me a20 , the claimed expression for the spin-orbit energy
is found.

749
A.99 Expectation powers of r for hydrogen
This note derives the expectation values of the powers of r for the hydrogen
energy eigenfunctions ψnlm . The various values to be be derived are:
...
1
hψnlm |(a0 /r)3 ψnlm i = 1
l(l + 2
+ 1)n3
)(l
1
hψnlm |(a0 /r)2 ψnlm i =
(l + 12 )n3
1
hψnlm |(a0 /r)ψnlm i = 2
n
(A.91)
hψnlm |1ψnlm i = 1
3n2 − l(l + 1)
hψnlm |(r/a0 )ψnlm i =
2
n (5n2 − 3l(l + 1) + 1)
2
hψnlm |(r/a0 )2 ψnlm i =
2
...

where a0 is the Bohr radius, about 0.53 Å. Note that you can get the expectation
value of a more general function of r by summing terms, provided that the
function can be expanded into a Laurent series. Also note that the value of m
does not make a difference: you can combine ψnlm of different m values together
and it does not change the above expectation values. And watch it, when the
power of r becomes too negative, the expectation value will cease to exist. For
example, for l = 0 the expectation values of (a0 /r)3 and higher powers are
infinite.
The trickiest to derive is the expectation value of (a0 /r)2 , and that one will
be done first. First recall the hydrogen Hamiltonian from chapter 3.2,
( Ã ! Ã ! )
h̄2 ∂ ∂ 1 ∂ ∂ 1 ∂2 e2 1
H=− r2 + sin θ + −
2me r2 ∂r ∂r sin θ ∂θ ∂θ sin2 θ ∂φ2 4πǫ0 r
Its energy eigenfunctions of given square and z angular momentum and their
energy are

m h̄2 4πǫ0 h̄2


ψnlm = Rnl (r)Yl (θ, φ) En = − 2 a0 =
2n me a20 me e2
where the Ylm are called the spherical harmonics.
When this Hamiltonian is applied to an eigenfunction ψnlm , it produces the
exact same result as the following “dirty trick Hamiltonian” in which the angular

750
derivatives have been replaced by l(l + 1):
( Ã ! )
h̄2 ∂ ∂ e2 1
HDT =− r2 − l(l + 1) −
2me r2 ∂r ∂r 4πǫ0 r
The reason is that the angular derivatives are essentially the square angular
momentum operator of chapter 3.1.3. Now, while in the hydrogen Hamiltonian
the quantum number l has to be an integer because of its origin, in the dirty
trick one l can be allowed to assume any value. That means that you can
differentiate the Hamiltonian and its eigenvalues En with respect to l. And that
allows you to apply the Hellmann-Feynman theorem of section 10.1.1:
¿ ¯ À
∂En,DT ¯ ∂HDT
= ψnlm ¯¯ ψnlm
∂l ∂l
(Yes, the eigenfunctions ψnlm are good, because the purely radial HDT commutes
with both L b and Lb 2 , which are angular derivatives.) Substituting in the dirty
z
trick Hamiltonian,
¿ ¯µ ¶2 À
∂En,DT h̄2 (2l + 1) ¯ a0
= 2
ψnlm ¯¯ ψnlm
∂l 2me a0 r
So, if you can figure out how the dirty trick energy changes with l near some
desired integer value l = l0 , the desired expectation value of (a0 /r)2 at that
integer value of l follows. Note that the eigenfunctions of HDT can still be taken
to be of the form Rnl (r)Yl0m (θ, φ), where Yl0m can be divided out of the eigenvalue
problem to give HDT Rnl = EDT Rnl . If you skim back through chapter 3.2 and
its note, you see that that eigenvalue problem was solved in note A.17. Now, of
course, l is no longer an integer, but if you skim through the note, it really makes
almost no difference. The energy eigenvalues are still En,DT = −h̄2 /2n2 me a20 .
If you look near the end of the note, you see that the requirement on n is that
n = q + l + 1 where q must remain an integer for valid solutions, hence must
stay constant under small changes. So dn/dl = 1, and then according to the
chain rule the derivative of EDT is h̄2 /n3 me a20 . Substitute it in and there you
have that nasty expectation value as given in (A.91).
All other expectation values of (r/a0 )q for integer values of q may be found
from the “Kramers relation,” or “(second) Pasternack relation:”

4(q + 1)hqi − 4n2 (2q + 1)hq − 1i + n2 q[(2l + 1)2 − q 2 ]hq − 2i = 0 (A.92)

where hqi is shorthand for the expectation value hψnlm |(r/a0 )q ψnlm i.
Substituting q = 0 into the Kramers-Pasternack relation produces the ex-
pectation value of a0 /r as in (A.91). It may be noted that this can instead
be derived from the virial theorem of chapter 5.1.4, or from the Hellmann-
Feynman theorem by differentiating the hydrogen Hamiltonian with respect to

751
the charge e. Substituting in q = 1, 2, . . . produces the expectation values for
r/a0 , (r/a0 )2 , . . .. Substituting in q = −1 and the expectation value for (a0 /r)2
from the Hellmann-Feynman theorem gives the expectation value for (a0 /r)3 .
The remaining negative integer values q = −2, −3, . . . produce the remaining
expectation values for the negative integer powers of r/a0 as the hq − 2i term
in the equation.
Note that for a sufficiently negative powers of r, the expectation value be-
comes infinite. Specifically, since ψnlm is proportional to rl , {A.17}, it can be
seen that hq − 2i becomes infinite when q = −2l − 1. When that happens, the
coefficient of the expectation value in the Kramers-Pasternack relation becomes
zero, making it impossible to compute the expectation value. The relationship
can be used until it crashes and then the remaining expectation values are all
infinite.
The remainder of this note derives the Kramers-Pasternack relation. First
note that the expectation values are defined as
Z Z
3
q
hqi ≡ hψnlm |(r/a0 ) ψnlm i = q
(r/a0 ) |ψnlm | d ~r = 2
(r/a0 )q |Rnl Ylm |2 d3~r
all ~r all ~r

When this integral is written in spherical coordinates, the integration of the


square spherical harmonic over the angular coordinates produces one. So, the
expectation value simplifies to
Z ∞
hqi = (r/a0 )q Rnl
2 2
r dr
r=0

To simplify the notations, a non-dimensional


q radial coordinate ρ = r/a0 will be
3
used. Also, a new radial function f ≡ a0 ρRnl will be defined. In those terms,
the expression above for the expectation value shortens to
Z ∞
hqi = ρq f 2 dρ
0

To further shorten the notations, from now on the limits of integration and dρ
will be omitted throughout. In those notations, the expectation value of (r/a0 )q
is Z
hqi = ρq f 2
Also note that the integrals are improper. It is to be assumed that the integra-
tions are from a very small value of r to a very large one, and that only at the
end of the derivation, the limit is taken that the integration limits become zero
and infinity.
According to note {A.17}, the function Rnl satisfies in terms of ρ the ordinary
differential equation.
· ¸
2 ′′ ′ 1
−ρ Rnl − 2ρRnl + l(l + 1) − 2ρ + 2 ρ2 Rnl = 0
n

752
where
q primes indicate derivatives with respect to ρ. Substituting in Rnl =
f / a30 ρ, you get in terms of the new unknown function f that
" #
1 2 l(l + 1)
f ′′ = 2 − + f (A.93)
n ρ ρ2
R
Since this makes f ′′ proportional to Rf , forming the integral ρq f ′′ f produces
a combination of terms of the form ρpower f 2 , hence of expectation values of
powers of ρ:
Z
1
ρq f ′′ f = 2 hqi − 2hq − 1i + l(l + 1)hq − 2i (A.94)
n
R
The idea is now to apply integration by parts on ρq f ′′ f to produce a different
combination of expectation values. The fact that the two combinations must
be equal will then give the Kramers-Pasternack relation.
Before embarking on this, first note that since
Z Z ³ ´′ ¯ Z
q ′ q 1 2
¯
ρ ff = ρ 2
f = ρq 21 f 2 ¯¯ − qρq−1 21 f 2 ,

the latter from integration by parts, it follows that


Z ¯
1 ¯ q
q ′
ρ f f = ρq f 2 ¯¯ − hq − 1i (A.95)
2 2
This result will be used routinely in the manipulations below to reduce integrals
of that form.
R
Now an obvious first integration by parts on ρq f ′′ f produces
Z ¯ Z ¯ Z Z
¯ ′ ′ ¯
q ′′ q q q′¯ ′¯
ρ f f = ρ f f ¯ − (ρ f ) f = ρ f f ¯ − qρ f f − ρq f ′ f ′
q−1 ′

The first of the two integrals reduces to an expectation value of ρq−2 using
(A.95). For the final integral, use another integration by parts, but make sure
you do not run around in a circle because if you do you will get a trivial expres-
sion. What works is integrating ρq and differentiating f ′ f ′ :
Z ¯ ¯ ¯ Z
q ′′
¯ q q−1 2 ¯¯ q(q − 1)
q ′¯ ρq+1 ′ 2 ¯¯ ρq+1 ′ ′′
ρ ff = ρ ff ¯ − ρ f ¯ + hq − 2i − f ¯+2 ff
2 q+1 2 q+1
(A.96)
In the final integral, according to the differential equation (A.93), the factor f ′′
can be replaced by powers of ρ times f :
Z Z " #
ρq+1 ′ ′′ ρq+1 1 2 l(l + 1)
2 ff =2 2
− + ff′
q+1 q+1 n ρ ρ2

753
and each of the terms is of the form (A.95), so you get

Z ¯ ¯ ¯
ρq+1 ′ ′′ 1 ¯ 2 q 2 ¯¯ l(l + 1) q−1 2 ¯¯
2 ff = 2
ρq+1 f 2 ¯¯ − ρ f ¯+ ρ f ¯
q+1 (q + 1)n q+1 q+1
1 2q l(l + 1)(q − 1)
− 2
hqi + hq − 1i − hq − 2i
n q+1 q+1

Plugging this into (A.96) and then equating that to (A.94) produces the
Kramers-Pasternack relation. It also gives an additional right hand side
¯ ¯ ¯ ¯ ¯ ¯
¯ qρq−1 2 ¯¯ ρq+1 ′ 2 ¯¯ ρq+1 ¯ 2ρq 2 ¯¯ l(l + 1)ρq−1 2 ¯¯
ρq f f ′ ¯¯ − f ¯− f ¯+ f 2¯
− f + f ¯
2 q+1 (q + 1)n2 ¯ q + 1 ¯ q+1

but that term becomes zero when the integration limits take their final values
zero and infinity. In particular, the upper limit values always become zero in
the limit of the upper bound going to infinity; f and its derivative go to zero
exponentially then, beating out any power of ρ. The lower limit values also
become zero in the region of applicability that hq − 2i exists, because that
requires that ρq−1 f 2 is for small ρ proportional to a power of ρ greater than
zero.
The above analysis is not valid when q = −1, since then the final integration
by parts would produce a logarithm, but since the expression is valid for any
other q, not just integer ones you can just take a limit q → −1 to cover that
case.

A.100 Symmetry eigenvalue preservation


Since a symmetry operator like Rϕ commutes with the Hamiltonian H, the two
have a common set of eigenfunctions. Hence, if ρ is an eigenfunction of Rϕ with
eigenvalue r, it can always be written as a linear combination of eigenfunctions
ρ1 , ρ2 , . . . with the same eigenvalue that are also eigenfunctions of H. So the
wave function is
c1 e−iE1 t/h̄ ρ1 + c2 e−iE2 t/h̄ ρ2 + . . .

which remains a linear combination of eigenfunctions with eigenvalue r, hence


an eigenfunction with eigenvalue r.
(A mathematical condition for the property that two commuting operators
have a complete set of common eigenvectors is that they are both diagonalizable.
While Rϕ is not Hermitian, it is still diagonalizable since it is unitary.)

754
A.101 Everett’s theory and vacuum energy
It is interesting to conjecture about relations between Everett’s theory and the
hot potato of relativistic quantum mechanics, “vacuum energy.”
As mentioned in chapter 5.2, spontaneous emission of radiation by atoms
is really due to interactions with photons that pop into and out of existence
due to the quantum fluctuations of the ground state electromagnetic field. If
there are on average a given number of “virtual” photons present, should then
those photons not represent a net energy for empty space? It looks like a logical
assumption, but when physicist compute the total energy in those quantum
fluctuations, they find that it is infinite! Even if they cut off the infinity by
some ad-hoc assumptions, they still end up with an amount of energy that is
beyond all reason. The gravitational effects of all that mass would be appalling.
However, Everett’s theory suggests: “Not so fast. Einstein’s nonquantum
equations for mass and gravity have been developed and verified for energy that
is observable. The virtual fluctuations are not observable, since they disappear
before they can be observed (i.e. before a universe can be established in which
there is no doubt that the fluctuation exists.) So it is not automatic that you
can use Einstein’s theories on this energy.
Having say 100 virtual photons presence on average is not the same as having
100 photons, because at the end of the day, you still end up with zero actual
photons whose presence has been established and that you can examine. Any
observed universe has not been blown to smithereens by the pressure of all
those virtual particles acting a microsecond ago, since an observed universe has
‘measured’ all fluctuations a microsecond ago and has found that every one of
them did not exist.”
The difference from the conventional interpretation is of course that Ev-
erett’s theory thinks in terms of the creation of universes in which there is no
doubt that the virtual photons exist, while conventionally, the wave function is
simply being taken as the probability of a photon being present period. Everett
gives some sort of “it must be so” argument to argue that the square amplitude
of the eigenfunction is a measure of the generated universes with a given eigen-
value. Virtual particles would presumably need another argument. Intuition
suggests that for virtual particles, there should be a rapidly decreasing measure
of universes in which these particles seem to persist for any time.

A.102 A tenth of a googol in universes


There is an oft-cited story going around that the many worlds interpretation
implies the existence of 1099 worlds, and this number apparently comes from Ev-
erett, III himself. It is often used to argue that the many-worlds interpretation

755
is just not credible. However, the truth is that the existence of infinitely many
worlds, (or practically speaking infinitely many of them, maybe, if space and
time would turn out to be discrete and finite), is a basic requirement of quantum
mechanics itself, regardless of interpretation. Everett, III cannot be blamed for
that, just for coming up with the ludicrous number of 1099 to describe infinity.

756
Bibliography

[1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions.


Dover, third edition, 1965.
[2] A. Aharoni. Introduction to the Theory of Ferromagnetism. Oxford Uni-
versity Press, second edition, 2000.
[3] R. Baierlein. Thermal Physics. Cambridge University Press, Cambridge,
UK, 1999.
[4] A. M. Ellis. Spectroscopic selection rules: The role of photon states. J.
Chem. Educ., 76:1291–1294, 1999.
[5] Hugh Everett, III. The theory of the universal wave function. In Bryce S.
DeWitt and Neill Graham, editors, The Many-Worlds Interpretation of
Quantum Mechanics, pages 3–140. Princeton University Press, 1973.
[6] R. P. Feynman. QED, the Strange Theory of Light and Matter. Princeton,
expanded edition, 2006.
[7] R. P. Feynman. Statistical Mechanics. Westview/Perseus, 1998.
[8] R.P. Feynman, R.B. Leighton, and M. Sands. The Feynman Lectures on
Physics, volume I. Addison-Wesley, 1965.
[9] R.P. Feynman, R.B. Leighton, and M. Sands. The Feynman Lectures on
Physics, volume III. Addison-Wesley, 1965.
[10] David J. Griffiths. Introduction to Quantum Mechanics. Pearson Prentice-
Hall, second edition, 2005.
[11] C. Kittel. Introduction to Solid State Physics. Wiley, 7th edition, 1996.
[12] W. Koch and M. C. Holthausen. A chemist’s guide to density functional
theory. Wiley-VCH, Weinheim, second edition, 2000.
[13] R. G. Parr and W. Yang. Density Functional Theory of Atoms and
Molecules. Oxford, New York, 1989.

757
[14] A. J. M. Schmets and W. Montfrooij. Teaching superfluidity at the intro-
ductory level, 2008. URL http://arxiv.org/abs/0804.3086.

[15] M.R. Spiegel and J. Liu. Mathematical Handbook of Formulas and Tables.
Schaum’s Outline Series. McGraw-Hill, second edition, 1999.

[16] R. L. Sproull. Modern Physics, a textbook for engineers. Wiley, first edition,
1956.

[17] M. Srednicki. Quantum Field Theory. Cambridge University Press, Cam-


bridge, UK, 2007.

[18] A. Szabo and N. S. Ostlund. Modern Quantum Chemistry. Dover, first,


revised edition, 1996.

[19] D. R. Tilley and J. Tilley. Superfluidity and Superconductivity. Institute of


Physics Publishing, Bristol and Philadelphia, third edition, 1990.

[20] A. Yariv. Theory and Applications of Quantum Mechanics. Wiley & Sons,
1982.

[21] A. Zee. Quantum Field Theory in a Nutshell. Princeton University Press,


Princeton, NJ, 2003.

758
Web Pages

Below is a list of relevant web pages. Some of the discussions were based on
them.

1. Amber Schilling’s page1 One of the info sources for chemical bonds, with
lots of good pictures.

2. chemguide.co.uk2 Jim Clarke’s UK site with lots of solid info.

3. Citizendium3 The Citizen’s Compendium. Had a rather good write up on


the quantization of the electromagnetic field.

4. Hyperphysics4 An extensive source of info on chemical bonds and the


periodic table.

5. Middlebury College Modern Physics Laboratory Manual5 Gives a very


understandable introduction to NMR with actual examples (item XIX.)

6. Purdue chemistry review6 This book’s source for the electronegativity val-
ues.

7. The Quantum Exchange7 Lots of stuff.

8. University of Michigan8 Invaluable source on the hydrogen molecule and


chemical bonds. Have a look at the animated periodic table for actual
atom energy levels.
1
http://wulfenite.fandm.edu/Intro_to_Chem/table_of_contents.htm
2
http://www.chemguide.co.uk/
3
http://en.citizendium.org/
4
http://hyperphysics.phy-astr.gsu.edu/hbase/hph.html
5
http://cat.middlebury.edu/~PHManual/
6
http://chemed.chem.purdue.edu/genchem/topicreview/index.html
7
http://www.compadre.org/quantum/
8
http://www.umich.edu/~chem461/

759
9. Wikipedia9 Probably this book’s primary source of information on about
every loose end, though somewhat uneven. Some great, some confusing,
some overly technical.

9
http://wikipedia.org

760
Notations

The below are the simplest possible descriptions of various symbols, just to help
you keep reading if you do not remember/know what they stand for. Don’t cite
them on a math test and then blame this book for your grade.
Watch it. There are so many ad hoc usages of symbols, some will have been
overlooked here. Always use common sense first in guessing what a symbol
means in a given context.

· A dot might indicate


• A dot product between vectors, if in between them.
• A time derivative of a quantity, if on top of it.
And also many more prosaic things (punctuation signs, decimal points,
. . . ).
× Multiplication symbol. May indicate:
• An emphatic multiplication.
• Multiplication continued on the next line / from the previous line.
• A vectorial product between vectors. In index notation, the i-th
component of ~v × w
~ equals

(~v × w)
~ i = v ı wı − v ı wı

where ı is the index following i in the sequence 123123. . . , and ı the


one preceding it (or second following). Alternatively, evaluate the
determinant ¯ ¯
¯ ı̂ ̂ k̂ ¯¯
¯
~ = ¯¯¯ vx vy vz ¯¯¯
~v × w
¯ wx wy wz ¯

! Might be used to indicate a factorial. Example: 5! = 1 × 2 × 3 × 4 × 5 = 120.


The function that generalizes n! to noninteger values of n is called the
gamma function; n! = Γ(n + 1). The gamma function generalization is

761
due to, who else, Euler. (However, the fact that n! = Γ(n + 1) instead
of n! = Γ(n) is due to the idiocy of Legendre.) In Legendre-resistant
notation, Z ∞
n! = tn e−t dt
0
Straightforward integration shows that 0! is 1 as it should, and integration
by parts shows that (n + 1)! = (n + 1)n!, which ensures that the integral
also produces the correct value of n! for any higher integer value of n than
0. The integral, however, exists for any real value of n above −1, not
just integers. The values of the integral are always positive, tending to
positive infinity for both n ↓ −1, (because the integral then blows up at
small values of t), and for n ↑ ∞, (because the integral then blows up at
medium-large values of t). In particular, Stirling’s formula says that for
large positive n, n! can be approximated as

n! ∼ 2πnnn e−n [1 + . . .]

where the value indicated by the dots becomes negligibly small for large n.
The function n! can be extended further to any complex value of n, except
the negative integer values of n, where n! is infinite, but is then no longer
positive. Euler’s integral can be done for n = − 12 by making the change
√ R R ∞ −u2
t = u, producing the integral 0∞ 2e−u du, or −∞
2
of variables q e du,
R∞ 2 R∞ 2
which equals −x dx −y dy and the integral under the square
−∞ e −∞ e
root can be done analytically using polar coordinates. The result is that
Z ∞ √
1 2
− != e−u du = π
2 −∞

To get 21 !, multiply by 12 , since n! = n(n − 1)!.

| May indicate:

• The magnitude or absolute value of the number or vector, if enclosed


between a pair of them.
• The determinant of a matrix, if enclosed between a pair of them.
• The norm of the function, if enclosed between two pairs of them.
• The end of a bra or start of a ket.
• A visual separator in inner products.

| . . .i A “ket” is used to indicate some state. For example, |l mi indicates an


angular momentum state with azimuthal quantum number l and magnetic
quantum number m. Similarly, |1/2 1/2i is the spin-down state of a particle

762
with spin 12 . Other common ones are |xi for the position eigenfunction
x, i.e. δ(x − x), |1si for the 1s or ψ100 hydrogen state, |2pz i for the 2pz
or ψ210 state, etcetera. In short, whatever can indicate some state can be
pushed into a ket.
h. . . | A “bra” is like a ket | . . . i, but appears in the left side of inner products,
instead of the right one.
↑ Indicates the “spin up” state. Mathematically, equals the function ↑(Sz )
which is by definition equal to 1 at Sz = 12 h̄ and equal to 0 at Sz = − 21 h̄.
A spatial wave function multiplied by ↑ is a particle in that spatial state
with its spin up. For multiple particles, the spins are listed with particle
1 first.
↓ Indicates the “spin down” state. Mathematically, equals the function ↓(Sz )
which is by definition equal to 0 at Sz = 21 h̄ and equal to 1 at Sz = − 21 h̄.
A spatial wave function multiplied by ↓ is a particle in that spatial state
with its spin down. For multiple particles, the spins are listed with particle
1 first.
P
Summation symbol. Example: if in three dimensional space a vector f~ has
P
components f1 = 2, f2 = 1, f3 = 4, then all i fi stands for 2 + 1 + 4 = 7.
R
Integration symbol, the continuous version of the summation symbol. For
example, Z
f (x) dx
all x
is the summation of f (x) dx over all little fragments dx that make up the
entire x-range.
→ May indicate:
• An approaching process. limε→0 indicates for practical purposes the
value of the expression following the lim when ε is extremely small,
limr→∞ the value of the following expression when r is extremely
large.
• The fact that the left side leads to, or implies, the right-hand side.

~ Vector symbol. An arrow above a letter indicates it is a vector. A vector


is a quantity that requires more than one number to be characterized.
Typical vectors in physics include position ~r, velocity ~v , linear momentum
p~, acceleration ~a, force F~ , angular momentum L,
~ etcetera.

b A hat over a letter in this book indicates that it is the operator, turning
functions into other functions.

763

May indicate:

• A derivative of a function. Examples: 1′ = 0, x′ = 1, sin′ (x) = cos(x),


cos′ (x) = − sin(x), (ex )′ = ex .
• A small or modified quantity.
• A quantity per unit length.

∇ The spatial differentiation operator nabla. In Cartesian coordinates:


à !
∂ ∂ ∂ ∂ ∂ ∂
∇≡ , , = ı̂ + ̂ + k̂
∂x ∂y ∂z ∂x ∂y ∂z

Nabla can be applied to a scalar function f in which case it gives a vector


of partial derivatives called the gradient of the function:
∂f ∂f ∂f
grad f = ∇f = ı̂ + ̂ + k̂ .
∂x ∂y ∂z

Nabla can be applied to a vector in a dot product multiplication, in which


case it gives a scalar function called the divergence of the vector:
∂vx ∂vy ∂vz
div ~v = ∇ · ~v = + +
∂x ∂y ∂z
or in index notation
3
X ∂vi
div ~v = ∇ · ~v =
i=1 ∂xi

Nabla can also be applied to a vector in a vectorial product multiplication,


in which case it gives a vector function called the curl or rot of the vector.
In index notation, the i-th component of this vector is
∂vı ∂vı
(curl ~v )i = (rot ~v )i = (∇ × ~v )i = −
∂xı ∂xı

where ı is the index following i in the sequence 123123. . . , and ı the one
preceding it (or second following).
The operator ∇2 is called the Laplacian. In Cartesian coordinates:
∂2 ∂2 ∂2
∇2 ≡ + +
∂x2 ∂y 2 ∂z 2

In non Cartesian coordinates, don’t guess; look these operators up in a


table book.

764

A superscript star normally indicates a complex conjugate. In the complex
conjugate of a number, every i is changed into a −i.

< Less than.

h. . .i May indicate:

• An inner product.
• An expectation value.

> Greater than.

[. . .] May indicate:

• A grouping of terms in a formula.


• A commutator. For example, [A, B] = AB − BA.

≡ Emphatic equals sign. Typically means “by definition equal” or “everywhere


equal.”

∼ Indicates approximately equal when something is small or large. I suggest


you read it as “is approximately equal to.”

∝ Proportional to. The two sides are equal except for some unknown factor.

α May indicate:

• The fine structure constant, e2 /4πǫ0 h̄c, about 1/137.036 in value.


• A Dirac equation matrix.
• Some constant.
• Some angle.
• An eigenfunction of a generic operator A.
• A summation index.
• Component index of a vector.

β May indicate:

• Some constant.
• Some angle.
• An eigenfunction of a generic operator B.
• A summation index.

765
γ May indicate:

• Gyromagnetic ratio.
• Summation index.
• Integral in the tunneling WKB approximation.

∆ May indicate:

• An increment in the quantity following it.


• A delta particle.
• Often used to indicate the Laplacian ∇2 .

δ May indicate:

• With two subscripts, the “Kronecker delta”, which by definition is


equal to one if its two subscripts are equal, and zero in all other cases.
• Without two subscripts, the “Dirac delta function”, which is infinite
when its argument is zero, and zero if it is not. In addition the infinity
is such that the integral of the delta function over its single nonzero
point is unity. The delta function is not a normal function, but a
distribution. It is best to think of it as the approximate function
shown in the right hand side of figure 5.3 for a very, very, small
positive value of ε.
One often important way to create a three-dimensional delta func-
tion in spherical coordinates is to take the Laplacian of the function
−1/4πr. Chapter 9.5 explains why. In two dimensions, take the
Laplacian of ln(r)/2π to get a delta function.
• Often used to indicate a small amount of the following quantity,
or of a small change in the following quantity. There are nuanced
differences in the usage of δ, ∂ and d that are too much to go in here.
• Often used to indicate a second small quantity in addition to ε.

∂ Indicates a vanishingly small change or interval of the following variable. For


example, ∂f /∂x is the ratio of a vanishingly small change in function f di-
vided by the vanishingly small change in variable x that causes this change
in f . Such ratios define derivatives, in this case the partial derivative of
f with respect to x.

ǫ May indicate:

• Scaled energy.

766
• Orbital energy.
• Lagrangian multiplier.
• A small quantity, if symbol ε is not available.

ǫ0 Permittivity of space. Equal to 8.85419 10−12 C2 /J m

ε The Greek symbol that is conventionally used to indicate very small quanti-
ties.

η y-position of a particle.

Θ Used in this book to indicate some function of θ to be determined.

θ May indicate:

• In spherical coordinates, the angle from the chosen z axis, with apex
at the origin.
• z-position of a particle.
• A generic angle, like the one between the vectors in a cross or dot
product.
• Integral acting as an angle in the classical WKB approximation.
• Integral acting as an angle in the adiabatic approximation.

ϑ An alternate symbol for θ.

κ May indicate:

• A constant that physically corresponds to some wave number.


• A summation index.

λ May indicate:

• Wave length.
• Some multiple of something.
• A generic eigenvalue.
• Scaled square momentum.

µ May indicate:

• Magnetic dipole moment.

767
• µB = eh̄/2me = 9.27 10−24 J/T or 5.788 10−5 eV/T is the Bohr
magneton.
• A summation index.
• Chemical potential/molar Gibbs free energy.

ν May indicate:

• Scaled energy eigenfunction number in solids.


• A summation index.
• Strength of a delta function potential.

ξ May indicate:

• Scaled argument of the one-dimensional harmonic oscillator eigen-


functions.
• x-position of a particle.
• A summation or integration index.

π May indicate:

• The area of a circle of unit radius. Value 3.141592...


• Half the perimeter of a circle of unit radius. Value 3.141592...
• A 180◦ angle expressed in radians. Note that e±iπ = −1. Value
3.141592...
• A bond that looks from the side like a p state.
• A particle involved in the forces keeping the nuclei of atoms together
(π-meson).

ρ May indicate:

• Electric charge per unit volume.


• Scaled radial coordinate.
• Radial coordinate.
• Eigenfunction of a rotation operator R.
• Mass-base density.
• Energy density of electromagnetic radiation.

σ May indicate:

768
• A standard deviation of a value.
• A chemical bond that looks like an s state when seen from the side.
• Pauli spin matrix.

τ May indicate:

• Life time or half life.


• Some coefficient.

Φ May indicate:

• Some function of φ to be determined.


• The momentum-space wave function.
• Relativistic electromagnetic potential.

φ May indicate:

• In spherical coordinates, the angle around the chosen z axis. Increas-


ing φ by 2π encircles the z-axis exactly once.
• A phase angle.
• Something equivalent to an angle.
• Field operator φ(~r) annihilates a particle at position ~r while φ† (~r)
creates one at that position.

ϕ May indicate:

• A change in angle φ.
• An alternate symbol for φ.
• An electric potential.

χ Spinor component.

Ψ Upper case psi is used for the wave function.

ψ Lower case psi is typically used to indicate an energy eigenfunction. Depend-


ing on the system, indices may be added to distinguish different ones.
In some cases ψ might be used instead of Ψ to indicate a system in an
energy eigenstate. Let me know and I will change it. A system in an
energy eigenstate should be written as Ψ = cψ, not ψ, with c a constant
of magnitude 1.

769
ω May indicate:

• q
Angular frequency of the classical harmonic oscillator. Equal to
c/m where c is the spring constant and m the mass.
• Angular frequency of a system.
• Angular frequency of light waves.
• Perturbation frequency,
• Any quantity having units of frequency, 1/s.

A May indicate:

• Repeatedly used to indicate the operator for a generic physical quan-


tity a, with eigenfunctions α.
• Electromagnetic vector potential.
• Einstein A coefficient.
• Some generic matrix.
• Some constant.
• Area.

Å Ångstrom. Equal to 10−10 m.

a May indicate:

• The value of a generic physical quantity with operator A


• The amplitude of the spin-up state
• The amplitude of the first state in a two-state system.
• Acceleration.
• Start point of an integration interval.
• The first of a pair of particles.
• Some coefficient.
• Some constant.
• Absorptivity of electromagnetic radiation.
• Annihilation operator ab or creation operator ab† .
• Bohr radius of helium ion.

a0 May indicate:

770
• Bohr radius, 4πǫ0 h̄2 /me e2 or 0.529177 Å. Comparable in size to
atoms, and a good size to use to simplify various formulae.
• The initial value of a coefficient a.

adiabatic An adiabatic process is a process in which there is no heat transfer


with the surroundings. If the process is also reversible, it is called isen-
tropic. Typically, these processes are fairly quick, in order not to give heat
conduction enough time to do its stuff, but not so excessively quick that
they become irreversible.
Adiabatic processes in quantum mechanics are defined quite differently to
keep students on their toes. See chapter 5.1.7. These processes are very
slow, to keep the system all possible time to adjust to its surroundings. Of
course, quantum physicist were not aware that the same term had already
been used for a hundred years or so for relatively fast processes. They
assumed they had just invented a great new term!

adjoint The adjoint AH or A† of an operator is the one you get if you take
it to the other side of an inner product. (While keeping the value of
the inner product the same regardless of whatever two vectors or func-
tions may be involved.) Hermitian operators are “self-adjoint;”they do
not change if you take them to the other side of an inner product. “Skew-
Hermitian”operators just change sign. “Unitary operators”change into
their inverse when taken to the other side of an inner product. Unitary
operators generalize rotations of vectors: an inner product of vectors is
the same whether you rotate the first vector one way, or the second vec-
tor the opposite way. Unitary operators preserve inner products (when
applied to both vectors or functions). Fourier transforms are unitary op-
erators on account of the Parseval equality that says that inner products
are preserved.

angle According to trigonometry, if the length of a segment of a circle is divided


by its radius, it gives the total angular extent of the circle segment. More
precisely, it gives the angle, in radians, between the line from the center
to the start of the circle segment and the line from the center to the
end of the segment. The generalization to three dimensions is called the
“solid angle;” the total solid angle over which a segment of a spherical
surface extends, measured from the center of the sphere, is the area of
that segment divided by the square radius of the sphere.

B May indicate:

• Repeatedly used to indicate a generic second operator or matrix.

771
• Magnetic field strength.
• Einstein B coefficient.
• Some constant.

b May indicate:

• Repeatedly used to indicate the amplitude of the spin-down state


• Repeatedly used to indicate the amplitude of the second state in a
two-state system.
• End point of an integration interval.
• The second of a pair of particles.
• Some coefficient.
• Some constant.

basis A basis is a minimal set of vectors or functions that you can write all other
vectors or functions in terms of. For example, the unit vectors ı̂, ̂, and k̂
are a basis for normal three-dimensional space. Every three-dimensional
vector can be written as a linear combination of the three.

C Degrees Centigrade. A commonly used temperature scale that has the value
−273.15◦ C instead of zero when systems are in their ground state. Rec-
ommendation: use degrees Kelvin (K) instead. However, differences in
temperature are the same in Centrigrade as in Kelvin.

C May indicate:

• A third operator.
• A variety of different constants.

c May indicate:

• The speed of light, about 2.99792 108 m/s.


• A variety of different constants.
• Speed of sound.

Cauchy-Schwartz inequality The Cauchy-Schwartz inequality describes a


limitation on the magnitude of inner products. In particular, it says that
for any f and g, q q
|hf |gi| ≤ hf |f i hg|gi

772
In words, the magnitude of an inner product hf |gi is at most the magni-
tude (i.e. the length or norm) of f times the one of g. For example, if f
and g are real vectors, the inner product is the dot product and we have
f · g = |f ||g| cos θ, where |f | is the length of vector f and |g| the one of
g, and θ is the angle in between the two vectors. Since a cosine is less
than one in magnitude, the Cauchy-Schwartz inequality is therefor true
for vectors.
But it is true even if f and g are functions. To prove it, first recognize
that hf |gi may in general be a complex number, which according to (1.6)
must take the form eiα |hf |gi| where α is some real number whose value is
not important, and that hg|f i is its complex conjugate e−iα |hf |gi|. Now,
(yes, this is going to be some convoluted reasoning), look at

hf + λe−iα g|f + λe−iα gi

where λ is any real number. The above dot product gives the square
magnitude of f + λe−iα g, so it can never be negative. But if we multiply
out, we get
hf |f i + 2|hf |gi|λ + hg|giλ2
and if this quadratic form in λ is never negative, its discriminant must be
less or equal to zero:
|hf |gi| ≤ hf |f ihg|gi
and taking square roots gives the Cauchy-Schwartz inequality.

Classical Can mean any older theory. In this work, most of the time it either
means “nonquantum,” or “nonrelativistic.”

cos The cosine function, a periodic function oscillating between 1 and -1 as


shown in [15, pp. 40-...].

D Difference in wave number values.


~ Primitive (translation) vector of a reciprocal lattice.
D

d Indicates a vanishingly small change or interval of the following variable. For


example, dx can be thought of as a small segment of the x-axis.

d May indicate:

• The distance between the protons of a hydrogen molecule.


• The distance between the atoms or lattice points in a crystal.
• A constant.

773
d~ Primitive (translation) vector of a crystal lattice.

derivative A derivative of a function is the ratio of a vanishingly small change


in a function divided by the vanishingly small change in the independent
variable that causes the change in the function. The derivative of f (x)
with respect to x is written as df /dx, or also simply as f ′ . Note that the
derivative of function f (x) is again a function of x: a ratio f ′ can be found
at every point x. The derivative of a function f (x, y, z) with respect to
x is written as ∂f /∂x to indicate that there are other variables, y and z,
that do not vary.

determinant The determinant of a square matrix A is a single number in-


dicated by |A|. If this number is nonzero, A~v can be any vector w ~ for
the right choice of ~v . Conversely, if the determinant is zero, A~v can only
produce a very limited set of vectors, though if it can produce a vector w,
it can do so for multiple vectors ~v .
There is a recursive algorithm that allows you to compute determinants
from increasingly bigger matrices in terms of determinants of smaller ma-
trices. For a 1 × 1 matrix consisting of a single number, the determinant
is simply that number:
|a11 | = a11
(This determinant should not be confused with the absolute value of the
number, which is written the same way. Since we normally do not deal
with 1 × 1 matrices, there is normally no confusion.) For 2 × 2 matrices,
the determinant can be written in terms of 1 × 1 determinants:
¯ ¯ ¯ ¯ ¯ ¯
¯ a a12 ¯ ¯ ¯ ¯ ¯
¯ 11 ¯ ¯ ¯ ¯ ¯
¯ ¯ = +a11 ¯ ¯ − a12 ¯ ¯
¯ a21 a22 ¯ ¯ a22 ¯ ¯ a21 ¯

so the determinant is a11 a22 − a12 a21 in short. For 3 × 3 matrices, we have
¯ ¯
¯ a a12 a13 ¯
¯ 11 ¯
¯ ¯
¯ a21 a22 a23 ¯=
¯ ¯
¯ a31 a32 a33 ¯
¯ ¯ ¯ ¯ ¯ ¯
¯ ¯ ¯ ¯ ¯ ¯
¯ ¯ ¯ ¯ ¯ ¯
¯ ¯ ¯ ¯ ¯ ¯
+a11 ¯¯ a22 a23 ¯ − a12 ¯ a21 a23 ¯ + a13 ¯ a21 a22 ¯
¯ ¯ ¯ ¯ ¯
¯ a32 a33 ¯ ¯ a31 a33 ¯ ¯ a31 a32 ¯

and we already know how to work out those 2 × 2 determinants, so we


now know how to do 3 × 3 determinants. Written out fully:

a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )

774
For 4 × 4 determinants,
¯ ¯
¯ a11 a12 a13 a14 ¯
¯ ¯
¯ a21 a22 a23 a24 ¯
¯ ¯
¯ ¯=
¯
¯ a31 a32 a33 a34 ¯
¯
¯ a41 a42 a43 a44 ¯

¯ ¯ ¯ ¯
¯ ¯ ¯ ¯
¯ ¯ ¯ ¯
¯ a22 a23 a24 ¯ ¯ a a23 a24 ¯
+a11 ¯¯ ¯ ¯ 21
¯ − a12 ¯
¯
¯
¯
¯ a32 a33 a34 ¯
¯
¯ a31
¯ a33 a34 ¯
¯
¯ a42 a43 a44 ¯ ¯ a a43 a44 ¯
41

¯ ¯ ¯ ¯
¯ ¯ ¯ ¯
¯ ¯ ¯ ¯
¯ a a22 a24 ¯ ¯ a a22 a23 ¯
+a13 ¯¯¯ 21 ¯ ¯ 21
¯ − a14 ¯
¯
¯
¯ a31 a32 a34 ¯ ¯ a31 a32 a33 ¯
¯ ¯ ¯
¯ a a42 a44 ¯ ¯ a a42 a43 ¯
41 41

Etcetera. Note the alternating sign pattern of the terms.


As you might infer from the above, computing a good size determinant
takes a large amount of work. Fortunately, it is possible to simplify the
matrix to put zeros in suitable locations, and that can cut down the work
of finding the determinant greatly. We are allowed to use the following
manipulations without seriously affecting the computed determinant:

1. We may “transpose”the matrix, i.e. change its columns into its rows.
2. We can create zeros in a row by substracting a suitable multiple of
another row.
3. We may also swap rows, as long as we remember that each time
that we swap two rows, it will flip over the sign of the computed
determinant.
4. We can also multiply an entire row by a constant, but that will
multiply the computed determinant by the same constant.

Applying these tricks in a systematic way, called “Gaussian elimination”


or “reduction to lower triangular form”, we can eliminate all matrix co-
efficients aij for which j is greater than i, and that makes evaluating the
determinant pretty much trivial.

E May indicate:

• The total energy. Possible values are the eigenvalues of the Hamilto-
nian.

775
• En = −h̄2 /2me a20 n2 = E1 /n2 may indicate the nonrelativistic (Bohr)
energy levels of the hydrogen atom. The ground state energy E1
equals −13.6057 eV.
• Electric field strength. To keep electric field apart from energy, note
that the electric field is a vector while energy is a scalar. A vector
sign over the E or a subscript denoting a component indicate it refers
to the electric field.
• Internal energy of a substance.

E = mc2 Einstein’s famous relationship between mass and energy, with c the
speed of light. Actually, Einstein swiped the mass energy relation and the
other fundamental ideas of special relativativity from Poincaré, without
mentioning his name even once.

e May indicate:

• The basis for the natural logarithms. Equal to 2.718281828459...


This number produces the “exponential function” ex , or exp(x), or
in words “e to the power x”, whose derivative with respect to x is
again ex . If a is a constant, then the derivative of eax is aeax . Also,
if a is an ordinary real number, then eia is a complex number with
magnitude 1.
• The magnitude of the charge of an electron or proton, equal to
1.60218 10−19 C.
• Often used to indicate a unit vector.
• A superscript e may indicate a single-electron quantity.
• Specific internal energy of a substance.

eiax Assuming that a is an ordinary real number, and x a real variable, eiax is
a complex function of magnitude one. The derivative of eiax with respect
to x is iaeiax

eigenvector A concept from linear algebra. A vector ~v is an eigenvector of


a matrix A if ~v is nonzero and A~v = λ~v for some number λ called the
corresponding eigenvalue.
The basic quantum mechanics section of this book avoids linear algebra
completely, and the advanced part almost completely. The few exceptions
are almost all two-dimensional matrix eigenvalue problems. In case you

776
did not have any linear algebra, here is the solution: the two-dimensional
matrix eigenvalue problem
à !
a11 a12
~v = λ~v
a21 a22

has eigenvalues that are the two roots of the quadratic equation

λ2 − (a11 + a22 )λ + a11 a22 − a12 a21 = 0

The corresponding eigenvectors are


à ! à !
a12 λ2 − a22
~v1 = ~v2 =
λ1 − a11 a21

On occasion you may have to swap λ1 and λ2 to use these formulae. If


λ1 and λ2 are equal, there might not be two eigenvectors that are not
multiples of each other; then the matrix is called defective. However,
Hermitian matrices are never defective.
See also “matrix” and “determinant.”

eV The electron volt, a commonly used unit of energy equal to 1.60218 10−19
J.

exponential function A function of the form e... , also written as exp(. . .). See
function and e.

F May indicate:

• The force in Newtonian mechanics. Equal to the negative gradient


of the potential. Quantum mechanics is formulated in terms of po-
tentials, not forces.
• The anti-derivative of some function f .
• Some function.
• Helmholtz free energy.

f May indicate:

• A generic function.
• A generic vector.
• A fraction.
• The resonance factor.

777
• Specific Helmholtz free energy.

function A mathematical object that associates values with other values. A


function f (x) associates every value of x with a value f . For example,
the function f (x) = x2 associates x = 0 with f = 0, x = 21 with f =
1
4
, x = 1 with f = 1, x = 2 with f = 4, x = 3 with f = 9, and
more generally, any arbitrary value of x with the square of that value
x2 . Similarly, function f (x) = x3 associates any arbitrary x with its
cube x3 , f (x) = sin(x) associates any arbitrary x with the sine of that
value, etcetera. A wave function Ψ(x, y, z) associates each spatial position
(x, y, z) with a wave function value. Going beyond mathematics, its square
magnitude associates any spatial position with the relative probability of
finding the particle near there.

functional A functional associates entire functions with single numbers. For


example, the expectation energy is mathematically a functional: it as-
sociates any arbitrary wave function with a number: the value of the
expectation energy if physics is described by that wave function.

G Gibbs free energy.

g May indicate:

• A second generic function or a second generic vector.


• The strength of gravity, 9.80665 m/s2 exactly under standard condi-
tions on the surface of the earth.
• The g-factor, a nondimensional constant that indicates the gyro-
magnetic ratio relative to charge and mass. For the electron ge =
2.002319304362. For the proton gp = 5.5856947. For the neutron,
based on the mass and charge of the proton, gn = −3.826085.
• Specific Gibbs free energy/chemical potential.

Gauss’ Theorem This theorem, also called divergence theorem or Gauss-


Ostrogradsky theorem, says that for a continuously differentiable vector
~v , Z Z
∇ · ~v dV = ~v · ~n dA
V A

where the first integral is over the volume of an arbitrary region and the
second integral is over all the surface area of that region; ~n is at each point
found as the unit vector that is normal to the surface at that point.

H May indicate:

778
• The Hamiltonian, or total energy, operator. Its eigenvalues are indi-
cated by E.
• Hn stands for the n-th order Hermite polynomial.
• Enthalpy.

h May indicate:

• The original Planck constant h = 2πh̄.


• hn is a one-dimensional harmonic oscillator eigenfunction.
• Single-electron Hamiltonian.
• Specific enthalpy.

h̄ The reduced Planck constant, equal to 1.05457 10−34 Js. A measure of the
uncertainty of nature in quantum mechanics. Multiply by 2π to get the
original Planck constant h.

I May indicate:

• The number of electrons or particles.


• Electrical current.
• Unit matrix.
• IA is Avogadro’s number, 6.0221 1026 particles per kmol. (More
standard symbols are NA or L, but they are incompatible with the
general notations in this book.)

ℑ The imaginary part of a complex number. If c = cr + ici with cr and ci real


numbers, then ℑ(c) = ci . Note that c − c∗ = 2iℑ(c).

I Radiation energy intensity.

i May indicate:

• The number of a particle.


• A summation index.
• A generic index or counter.

Not to be confused with i.



i The standard square root of minus one: i = −1, i2 = −1, 1/i = −i, i∗ = −i.

779
index notation A more concise and powerful way of writing vector and matrix
components by using a numerical index to indicate the components. For
Cartesian coordinates, we might number the coordinates x as 1, y as 2,
and z as 3. In that case, a sum like vx + vy + vz can be more concisely
P
written as i vi . And a statement like vx 6= 0, vy 6= 0, vz 6= 0 can be more
compactly written as vi 6= 0. To really see how it simplifies the notations,
have a look at the matrix entry. (And that one shows only 2 by 2 matrices.
Just imagine 100 by 100 matrices.)

iff Emphatic “if.” Should be read as “if and only if.”

integer Integer numbers are the whole numbers: . . . , −2, −1, 0, 1, 2, 3, 4, . . ..

inverse (Of matrices or operators.) If an operator A converts a vector or


function f into a vector or function g, then the inverse of the operator
A−1 converts g back into f . For example, the operator 2 converts vectors
or functions into two times themselves, and its inverse operator 12 converts
these back into the originals. Some operators do not have inverses. For
example, the operator 0 converts all vectors or functions into zero. But
given zero, there is no way to figure out what function or vector it came
from; the inverse operator does not exist.

iso Means “equal” or “constant.”

• Isenthalpic: constant enthalpy.


• Isentropic: constant entropy. This is a process that is both adiabatic
and reversible.
• Isobaric: constant pressure.
• Isochoric: constant (specific) volume.
• Isothermal: constant temperature.

isolated An isolated system is one that does not interact with its surroundings
in any way. No heat is transfered with the surroundings, no work is done
on or by the surroundings.

J May indicate:

• Total angular momentum.


• Number of nuclei in a quantum computation.

j May indicate:

• The quantum number of total square angular momentum.

780
• ~j is electrical current density.
• The number of a nucleus in a quantum computation.
• A summation index.
• A generic index or counter.

K May indicate:

• The atomic states or orbitals with theoretical Bohr energy E1


• Degrees Kelvin.

K May indicate:

• An exchange integral in Hartree-Fock.


• Maximum wave number value.

k May indicate:

• A wave number. A wave number is a measure for how fast a periodic


function oscillates with variations in spatial position.
• A summation index.

kB Boltzmann constant. Equal to 1.38065 10−23 J/K. Relates absolute temper-


ature to a typical unit of heat motion energy.

kmol A kilo mole refers to 6.0221 1026 atoms or molecules. The weight of this
many particles is about the number of protons and neutrons in the atom
nucleus/molecule nuclei. So a kmol of hydrogen atoms has a mass of about
1 kg, and a kmol of hydrogen molecules about 2 kg. A kmol of helium
atoms has a mass of about 4 kg, since helium has two protons and two
neutrons in its nucleus. These numbers are not very accurate, not just
because the electron masses are ignored, and the free neutron and proton
masses are somewhat different, but also because of relativity effects that
cause actual nuclear masses to deviate from the sum of the free proton
and neutron masses.

L The atomic states or orbitals with theoretical Bohr energy E2

L Angular momentum.

l May indicate:

• The azimuthal quantum number.

781
• A generic summation index.

ℓ May indicate:

• The typical length in the harmonic oscillator problem.


• The dimensions of a solid block (with subscripts).
• A length.

lim Indicates the final result of an approaching process. limε→0 indicates for
practical purposes the value of the following expression when ε is extremely
small.

linear combination A very generic concept indicating sums of objects times


coefficients. For example, a position vector ~r in basic physics is the linear
combination xı̂ + y̂ + z k̂ with the objects the unit vectors ı̂, ̂, and k̂ and
the coefficients the position coordinates x, y, and z.

M The atomic states or orbitals with theoretical Bohr energy E3

M May indicate:

• Molecular mass.
• Mirror operator.

m May indicate:

• Mass.
– me : electron mass. Equal to 9.109382 10−31 kg. The rest masss
energy is 0.510999 MeV.
– mp : proton mass. Equal to 1.672621 10−27 kg. The rest masss
energy is 938.272 MeV.
– mn : neutron mass. Equal to 1.674927 10−27 kg. The rest masss
energy is 939.566 MeV.
– m: particle mass.
• The magnetic quantum number.
• Number of a single-electron wave function.
• A generic summation index or generic integer.

782
matrix A table of numbers.
As a simple example, a two-dimensional matrix A is a table of four num-
bers called a11 , a12 , a21 , and a22 :
à !
a11 a12
a21 a22

unlike a two-dimensional (ket) vector ~v , which would consist of only two


numbers v1 and v2 arranged in a column:
à !
v1
v2

(Such a vector can be seen as a “rectangular matrix” of size 2 × 1, but


let’s not get into that.)
In index notation, a matrix A is a set of numbers {aij } indexed by two
indices. The first index i is the row number, the second index j is the
column number. A matrix turns a vector ~v into another vector w
~ according
to the recipe
X
wi = aij vj for all i
all j

where vj stands for “the j-th component of vector ~v ,” and wi for “the i-th
component of vector w.”
~
As an example, the product of A and ~v above is by definition
à !à ! à !
a11 a12 v1 a11 v1 + a12 v2
=
a21 a22 v2 a21 v1 + a22 v2

which is another two-dimensional ket vector.


Note that in matrix multiplications like the example above, in geometric
terms we take dot products between the rows of the first factor and the
column of the second factor.
To multiply two matrices together, just think of the columns of the second
matrix as separate vectors. For example:
à !à ! à !
a11 a12 b11 b12 a11 b11 + a12 b21 a11 b12 + a12 b22
=
a21 a22 b21 b22 a21 b11 + a22 b21 a21 b12 + a22 b22

which is another two-dimensional matrix. In index notation, the ij com-


P
ponent of the product matrix has value k aik bkj .

783
The zero matrix is like the number zero; it does not change a matrix it is
added to and turns whatever it is multiplied with into zero. A zero matrix
is zero everywhere. In two dimensions:
à !
0 0
0 0

A unit matrix is the equivalent of the number one for matrices; it does
not change the quantity it is multiplied with. A unit matrix is one on its
“main diagonal” and zero elsewhere. The 2 by 2 unit matrix is:
à !
1 0
0 1

More generally the coefficients, {δij }, of a unit matrix are one if i = j and
zero otherwise.
The transpose of a matrix A, AT , is what you get if you switch the two
indices. Graphically, it turns its rows into its columns and vice versa. The
Hermitian “adjoint”AH is what you get if you switch the two indices and
then take the complex conjugate of every element. If you want to take a
matrix to the other side of an inner product, you will need to change it to
its Hermitian adjoint. “Hermitian matrices”are equal to their Hermitian
adjoint, so this does nothing for them.
See also “determinant” and “eigenvector.”

metric prefixes In the metric system, the prefixes Y, Z, E, P, T, G, M, and k


stand for 10i with i = 24, 21, 18, 15, 12, 9, 6, and 3, respectively. Similarly,
d, c, m, µ, n, p, f, a, z, y stand for 10−i with i = 1, 2, 3, 6, 9, 12, 15, 18,
21, and 24 respectively. For example, 1 ns is 10−9 seconds. Corresponding
names are yotta, zetta, exa, peta, tera, giga, mega, kilo, deci, centi, milli,
micro, nano, pico, femto, atto, zepto, and yocto.

N The atomic states or orbitals with theoretical Bohr energy E4

N May indicate:

• Number of states.
• Number of single-particle states.
• Number of neutrons in a nucleus.

n May indicate:

784
• The principal quantum number for hydrogen atom energy eigenfunc-
tions.
• A quantum number for harmonic oscillator energy eigenfunctions.
• Number of a single-electron or single-particle wave function.
• Generic summation index over energy eigenfunctions.
• Generic summation index over other eigenfunctions.
• Integer factor in Fourier wave numbers.
• Probability density.
• A generic index.
• A natural number.
• ns is the number of spin states.

and maybe some other stuff.

natural Natural numbers are the numbers: 1, 2, 3, 4, . . ..

normal A normal operator or matrix is one that has orthonormal eigenfunc-


tions or eigenvectors. Since these are not orthonormal in general, a normal
operator or matrix is abnormal! Another example of a highly confusing
term. If it would have been called, say, orthonormal, you would have a
clue what meaning of “normal” was being referred to. To be fair, the au-
thor is not aware of any physicists being involved in this particular term;
it may be the mathematicians that are to blame here. For an operator
or matrix A to be (ortho)normal, it must commute with its Hermitian
adjoint, [A, A† ] = 0. Therefor, Hermitian, skew-Hermitian, and unitary
operators or matrices are (ortho)normal.

P May indicate:

• The linear momentum eigenfunction.


• A power series solution.
• Probability.
• Pressure.
• Hermitian part of an annihilation operator.

p May indicate:

• Linear momentum.
• Linear momentum in the x-direction.

785
• Integration variable with units of linear momentum.
• A superscript p may indicate a single-particle quantity.

p Energy state with orbital azimuthal quantum number l = 1.


perpendicular bisector For two given points P and Q, the perpendicular
bisector consists of all points R that are equally far from P as they are
from Q. In two dimensions, the perpendicular bisector is the line that
passes through the point exactly half way in between P and Q, and that
is orthogonal to the line connecting P and Q. In three dimensions, the
perpendicular bisector is the plane that passes through the point exactly
half way in between P and Q, and that is orthogonal to the line connecting
P and Q. In vector notation, the perpendicular bisector of points P and
Q is all points R whose radius vector ~r satisfies the equation:
(~r − ~rP ) · (~rQ − ~rP ) = 12 (~rQ − ~rP ) · (~rQ − ~rP )
(Note that the halfway point ~r −~rP = 12 (~rQ −~rP ) is included in this formula,
as is the half way point plus any vector that is normal to (~rQ − ~rP ).)
phase angle Any complex number can be written in “polar form” as c = |c|eiα
where both the magnitude |c| and the phase angle α are real numbers.
Note that when the phase angle varies from zero to 2π, the complex num-
ber c varies from positive real to positive imaginary to negative real to
negative imaginary and back to positive real. When the complex number
is plotted in the complex plane, the phase angle is the direction of the
number relative to the origin. The phase angle α is often called the argu-
ment, but so is about everything else in mathematics, so that is not very
helpful.
In complex time-dependent waves of the form ei(ωt−φ) , and its real equiva-
lent cos(ωt − φ), the phase angle φ gives the angular argument of the wave
at time zero.
photon Unit of electromagnetic radiation (which includes light, x-rays, mi-
crowaves, etcetera). A photon has a energy h̄ω, where ω is its angular
frequency, and a wave length 2πc/ω where c is the speed of light.
px Linear momentum in the x-direction. (In the one-dimensional cases at the
end of the unsteady evolution chapter, the x subscript is omitted.) Compo-
nents in the y- and z-directions are py and pz . Classical Newtonian physics
has px = mu where m is the mass and u the velocity in the x-direction. In
quantum mechanics, the possible values of px are the eigenvalues of the op-
erator pbx which equals h̄∂/i∂x. (But which becomes canonical momentum
in a magnetic field.)

786
Q May indicate

• Number of energy eigenfunctions of a system of particles.


• Anti-Hermitian part of an annihilation operator divided by i.

q May indicate:

• Charge.
• The number of an energy eigenfunction of a system of particles.
• Generic index.

R May indicate:

• Some function of r to be determined.


• Some function of (x, y, z) to be determined.
• Rotation operator.
• Ideal gas constant.
• Transition rate.
• Rnl is a hydrogen radial wave function.
• Ru = 8.314472 kJ/kmol K is the universal gas constant, the equiva-
lent of Boltzman’s constant for a kmole instead of a single atom or
molecule.

ℜ The real part of a complex number. If c = cr +ici with cr and ci real numbers,
then ℜ(c) = cr . Note that c + c∗ = 2ℜ(c).

relativity The special theory of relativity accounts for the experimental obser-
vation that the speed of light c is the same in all local coordinate systems.
It necessarily drops the basic concepts of absolute time and length that
were corner stones in Newtonian physics.
Albert Einstein should be credited with the boldness to squarely face up
to the unavoidable where others wavered. However, he should also be
credited for the boldness of swiping the basic ideas from Lorentz and
Poincaré without giving them proper, or any, credit. The evidence is very
strong he was aware of both works, and his various arguments are almost
carbon copies of those of Poincaré, but in his paper it looks like it all came
from Einstein, with the existence of the earlier works not mentioned. (Note
that the general theory of relativity, which is of no interest to this book, is
almost surely properly credited to Einstein. But he was a lot less hungry
then.)

787
Relativity implies that a length seen by an observer moving atq a speed v is
shorter than the one seen by a stationary observer by a factor 1 − (v/c)2
assuming the length is in the direction of motion. This is called Lorentz-
Fitzgerald contraction. It makes galactic travel somewhat more conceiv-
able because the size of the galaxy will contract for an astronaut in a
rocket ship moving close to the speed of light. Relativity also q
implies that
the time that an event takes seems to be slower by a factor 1/ 1 − (v/c)2
if the event is seen by an observer in motion compared to the location
where the event occurs. That is called time dilation. Some high-energy
particles generated in space move so fast that they reach the surface of
the earth though this takes much more time than the particles would last
at rest in a laboratory. The decay time increases because of the motion
of the particles. (Of course, as far as the particles themselves see it, the
distance to travel is a lot shorter than it seems to be to earth. For them,
it is a matter of length contraction.)
The following formulae give the relativistic mass, momentum, and kinetic
energy of a particle in motion:
m0
m= q p = mv T = mc2 − m0 c2
1− (v/c)2

where m0 is the rest mass of the particle, i.e. the mass as measured by
an observer to whom the particle seems at rest. The formula for kinetic
energy reflects the fact that even if a particle is at rest, it still has an
amount of “build-in” energy equal to m0 c2 left. The total energy of a
particle in empty space, being kinetic and rest mass energy, is given by
q
E = mc2 = (m0 c2 )2 + c2 p2

as can be verified by substituting in the expression for the momentum, in


terms of the rest mass, and then taking both terms inside the square root
under a common denominator. For small linear momentum p, this can be
approximated as 21 m0 v 2 .
Relativity seemed quite a dramatic departure of Newtonian physics when
it developed. Then quantum mechanics started to emerge. . .

r May indicate:

• The radial distance from the chosen origin of the coordinate system.
• ri typically indicates the i-th Cartesian component of the radius vec-
tor ~r.
• Some ratio.

788
~r The position vector. In Cartesian coordinates (x, y, z) or xı̂+y̂+z k̂. In spher-
ical coordinates rı̂r . Its three Cartesian components may be indicated by
r1 , r2 , r3 or by x, y, z or by x1 , x2 , x3 .
S May indicate:
• Number of states per unit volume.
• Number of states at a given energy level.
• Spin angular momentum (as an alternative to using L for generic
angular momentum.)
• Entropy.

s Energy state with orbital azimuthal quantum number l = 0. Spherically


symmetric.
s May indicate:
• Spin value of a particle. Equals 1/2 for electrons, protons, and neu-
trons, is also half an odd natural number for other fermions, and is a
nonnegative integer for bosons. It is the azimuthal quantum number
l due to spin.
• Specific entropy.

scalar A quantity that is not a vector, a quantity that is just a single number.
sin The sine function, a periodic function oscillating between 1 and -1 as shown
in [15, pp. 40-]. Good to remember: cos2 α + sin2 α = 1.
Stokes’ Theorem This theorem, first derived by Kelvin and first published
by someone else I cannot recall, says that for any reasonably smoothly
varying vector ~v , Z I
(∇ × ~v ) dA = ~v · d~r
A
where the first integral is over any smooth surface area A and the second
integral is over the edge of that surface. How did Stokes get his name on
it? He tortured his students with it, that’s why!
symmetry Symmetries are operations under which an object does not change.
For example, a human face is almost, but not completely, mirror sym-
metric: it looks almost the same in a mirror as when seen directly. The
electrical field of a single point charge is spherically symmetric; it looks
the same from whatever angle you look at it, just like a sphere does. A
simple smooth glass (like a glass of water) is cylindrically symmetric; it
looks the same whatever way you rotate it around its vertical axis.

789
T May indicate:
• Kinetic energy. A hat indicates the associated operator. The opera-
tor is given by the Laplacian times −h̄2 /2m.
• Absolute temperature. The absolute temperature in degrees K equals
the temperature in centigrade plus 273.15. When the absolute tem-
perature is zero, (i.e. at −273.15◦ C), nature is in the state of lowest
possible energy.
• Tesla. The unit of magnetic field strength, kg/C-s.

t The time.
temperature A measure of the heat motion of the particles making up macro-
scopic objects. At absolute zero temperature, the particles are in the
“ground state” of lowest possible energy.
triple product A product of three vectors. There are two different versions:
• The scalar triple product ~a · (~b × ~c). In index notation,
X
~a · (~b × ~c) = ai (bı cı − bı cı )
i

where ı is the index following i in the sequence 123123. . . , and ı the


one preceding it. This triple product equals the determinant |~a~b~c|
formed with the three vectors. Geometrically, it is plus or minus the
volume of the parallelepiped that has vectors ~a, ~b, and ~c as edges.
Either way, as long as the vectors are normal vectors and not opera-
tors,
~a · (~b × ~c) = ~b · (~c × ~a) = ~c · (~a × ~b)
and you can change the two sides of the dot product without changing
the triple product, and/or you can change the sides in the vectorial
product with a change of sign. If any of the vectors is an operator,
use the index notation expression to work it out.
• The vectorial triple product ~a ×(~b×~c). In index notation, component
number i of this triple product is
aı (bi cı − bı ci ) − aı (bı ci − bi cı )
which may be rewritten as
a i b i c i + a ı b i c ı + a ı b i c ı − ai b i c i − a ı b ı c i − a ı b ı c i
In particular, as long as the vectors are normal ones,
~a × (~b × ~c) = (~a · ~c)~b − (~a · ~b)~c

790
u May indicate:

• The first velocity component in a Cartesian coordinate system.


• A complex coordinate in the derivation of spherical harmonics.
• An integration variable.

V May indicate:

• The potential energy. V is used interchangeably for the numerical


values of the potential energy and for the operator that corresponds
to multiplying by V . In other words, Vb is simply written as V .
• Volume.

v May indicate:

• The second velocity component in a Cartesian coordinate system.


• A complex coordinate in the derivation of spherical harmonics.
• As v ee , a single electron pair potential.

~v May indicate:

• Velocity vector.
• Generic vector.
• Summation index of a lattice potential.

vector A list of numbers. A vector ~v in index notation is a set of numbers


{vi } indexed by an index i. In normal three-dimensional Cartesian space,
i takes the values 1, 2, and 3, making the vector a list of three numbers,
v1 , v2 , and v3 . These numbers are called the three components of ~v . The
list of numbers can be visualized as a column, and is then called a ket
vector, or as a row, in which case it is called a bra vector. This convention
indicates how multiplication should be conducted with them. A bra times
a ket produces a single number, the dot product or inner product of the
vectors:  
7
(1, 3, 5)  
 11  = 1 7 + 3 11 + 5 13 = 105
13
To turn a ket into a bra for purposes of taking inner products, write the
complex conjugates of its components as a row.

791
vectorial product An vectorial product, or cross product is a product of vec-
tors that produces another vector. If

~c = ~a × ~b,

it means in index notation that the i-th component of vector ~c is

c i = a ı b ı − aı b ı

where ı is the index following i in the sequence 123123. . . , and ı the one
preceding it. For example, c1 will equal a2 b3 − a3 b2 .

w May indicate:

• The third velocity component in a Cartesian coordinate system.


• Weight factor.

w
~ Generic vector.

X Used in this book to indicate a function of x to be determined.

x May indicate:

• First coordinate in a Cartesian coordinate system.


• A generic argument of a function.
• An unknown value.

Y Used in this book to indicate a function of y to be determined.

Ylm Spherical harmonic. Eigenfunction of both angular momentum in the z-


direction and of total square angular momentum.

y May indicate:

• Second coordinate in a Cartesian coordinate system.


• A generic argument of a function.

Z May indicate:

• Number of particles.
• Atomic number (number of protons in the nucleus).
• Partition function.
• Used in this book to indicate a function of z to be determined.

792
z May indicate:

• Third coordinate in a Cartesian coordinate system.


• A generic argument of a function.

793
Index

F , 777 ν, 768
·, 761 ξ, 768
×, 761 π, 768
!, 761 ρ, 768
|, 762 σ, 768
| . . .i, 762 τ , 769
h. . . |, 763 21 cm line
↑, 763 derivation, 478
↓, 763 intro, 469
Σ,
P8 Φ, 769
R , 763 φ, 769
, 763 ϕ, 769
→, 763 χ, 769
~, 763 Ψ, 769
b, 763 ψ, 769

, 763 ω, 769
∇, 764

, 764 A, 770
<, 765 Å, 770
h. . .i, 765 a, 770
>, 765 a0 , 770
[. . .], 765 absolute value, 3
≡, 765 absolute zero
∼, 765 nonzero energy, 52
∝, 765 requires ground state, 339
α, 765 absorption
β, 765 incoherent radiation, 191
γ, 765 single weak wave, 189
∆, 766 absorptivity, 392
δ, 766 acceleration
∂, 766 in quantum mechanics, 175
ǫ, 766 action, 545
ǫ0 , 767 relativistic, see special relativity
ε, 767 activation energy, 119
η, 767 adiabatic
Θ, 767 disambiguation, 771
θ, 767 quantum mechanics, 178
ϑ, 767 thermodynamics, 370
κ, 767 adiabatic surfaces, 237
λ, 767 adiabatic theorem
µ, 767 derivation and implications, 604

795
intro, 178 number of terms, 142
adjoint, 771 using groupings, 138
matrices, 784 using occupation numbers, 484
Aharonov-Bohm effect, 416 using Slater determinants, 139
Airy functions atomic number, 147
application, 622 atoms
connection formulae, 631 eigenfunctions, 148
graphs, 630 eigenvalues, 148
software, 628 ground state, 150
alpha decay, 219 Hamiltonian, 147
Gamow/Gurney and Condon theory, 221 average
NUBASE data, 219 versus expectation value, 85
alpha particle, 219 Avogadro’s number, 779
ammonia molecule, 120 azimuthal quantum number, 68
angle, 771
angular frequency, 203 B, 771
angular momentum, 63 b, 772
addition, 405 Balmer transitions, 77
Clebsch-Gordan coefficients, 406 band gap, 267
intro, 186 band structure
advanced treatment, 395 crossing bands, 286
component, 64 intro, 266
eigenfunctions, 65 nearly-free electrons, 309
eigenvalues, 65 widely spaced atoms, 270
conservation, 185 baryon, 124
definition, 63 basis, 772
fundamental commutation relations crystal, see lattice, basis
as an axiom, 396 spin states, 133
intro, 97 vectors or functions, 14
ladder operators, 397 bcc, see lattice
ladders, 398 Bell’s theorem, 513
normalization factors, 402 cheat, 517
operator benzene molecular ring, 119
Cartesian, 63 Berry’s phase, 608
possible values, 400 Bessel functions
spin, 123 spherical, 638
square angular momentum, 66 beta decay
eigenfunctions, 67 intro, 218
eigenvalues, 68 binding energy
symmetry and conservation, 518 definition, 105
intro, 188 hydrogen molecular ion, 105
triangle inequality hydrogen molecule, 118
intro, 187 lithium hydride, 122
uncertainty, 69 Biot-Savart law, 438
anomalous magnetic moment, 443 derivation, 729
anti-bonding, 287 blackbody radiation, 389
anticommutator, 489 intro, 320
antisymmetrization requirement, 135 blackbody spectrum, 390
graphical depiction, 343 Bloch function
indistinguishable particles, 343 free electron gas, 301

796
nearly-free electrons, 309 symmetrization requirement, 135
one-dimensional lattice, 276 bound states
three-dimensional lattice, 284 hydrogen
Bloch’s theorem, 276 energies, 76
body-centered cubic, see lattice boundary conditions
Bohm acceptable singularity, 574
EPR experiment, 514 hydrogen atom, 588
Bohr energies, 76 across delta function potential, 626
relativistic corrections, 467 at infinity
Bohr magneton, 442 harmonic oscillator, 577
Bohr radius, 79 hydrogen atom, 587
Boltzmann constant, 781 impenetrable wall, 31
Boltzmann factor, 352 radiation, 621
bond accelerating potential, 622
covalent, 159 three-dimensional, 635
hydrogen, 161 unbounded potential, 623
ionic, 165 bra, 9, 763
pi, 160 Bragg diffraction
polar, 161 electrons, 336
sigma, 159 Bragg planes
Van der Waals, 259 Brillouin fragment boundaries, 285
bond length energy singularities, 314
definition, 105 one-dimensional (Bragg points), 278
hydrogen molecular ion, 107 X-ray diffraction, 336
hydrogen molecule, 118 Bragg’s law, 330
Born Brillouin zone
approximation, 641 one-dimensional, 277
series, 643 three-dimensional, 285
Born statistical interpretation, 20

Born-Oppenheimer approximation C, 772
basic idea, 233 C, 772
derivation, 231 c, 772
diagonal correction, 656 canonical commutation relation, 94
hydrogen molecular ion, 99 canonical Hartree-Fock equations, 248
hydrogen molecule, 111 canonical momentum, see special relativity
include nuclear motion, 235 relation to Heisenberg picture, 177
relation to adiabatic theorem, 178 with a magnetic field, 416
spin degeneracy, 651 canonical probability distribution, 351
vibronic coupling terms, 654 Carnot cycle, 362
Bose-Einstein condensation cat, Schrödinger’s, 512
derivation, 384 Cauchy-Schwartz inequality, 772
intro, 320 charge
superfluidity, 320, 681 electrostatics, 425
Bose-Einstein distribution, 319 chemical bonds, 159
blackbody radiation, 320, 389 covalent pi bonds, 160
canonical probability distribution, 351 covalent sigma bonds, 159
for given energy, 349 hybridization, 162
identify chemical potential, 382 ionic bonds, 165
bosons, 124 polar covalent bonds, 161
statistics, 319 promotion, 162

797
spn hybridization, 162 lattice, see lattice
chemical equilibrium one-dimensional
constant pressure, 380 primitive translation vector, 275
constant volume, 380 three-dimensional
chemical potential, 318, 378 primitive vectors, 282
and distributions, 382 curl, 764
microscopic, 382
classical, 773 d, 773
Clausius-Clapeyron equation, 379 D, 773
Clebsch-Gordan coefficients, 405 d, 773
~ 773
D,
coefficient of performance, 364
~
d, 773
coefficients of eigenfunctions
evaluating, 59 Darwin term, 471
give probabilities, 27 Debye model, 392
time variation, 169 Debye temperature, 322, 392
collapse of the wave function, 26 degeneracy, 57
collision-dominated regime, 191 degeneracy pressure, 289, 294
delta function, 195
collisionless regime, 190
three-dimensional, 196
commutation relation
density
canonical, 94
mass, 356
commutator, 91
molar, 356
definition, 93
particle, 356
commutator eigenvalue problems, 398
density of states, 295
commuting operators, 92
derivative, 774
common eigenfunctions, 92
determinant, 774
complete set, 14
diamagnetic contribution, 444
complex conjugate, 4
diamond
complex numbers, 3
band gap, 286
component waves, 200
differential cross-section, 636
components of a vector, 6
dipole, 429
conduction band, 267
dipole strength
conduction of electricity
molecules, 260
intro, 266
Dirac delta function, 195
confined electrons, 289 Dirac equation, 412
confinement, 43, 295 Dirac notation, 16
density of states, 295 dispersion relation, 203
connection formulae, 630, 633 div, 764
conservation laws, 518 divergence, 764
conventional cell, 283 divergence theorem, 778
Copenhagen Interpretation, 26 dot product, 8
correlation energy, 255 doublet states, 134
cos, 773 dynamic phase, 607
Coulomb integrals, 247
Coulomb potential, 70 E, 775
covalent bond e, 776
hydrogen molecular ion, 98 E = mc2 , 776
covalent solids, 286 effective mass
creationists, 702 hydrogen atom electron, 71
cross product, 792 Ehrenfest’s theorem, 175
crystal eiax , 776

798
eigenfunction, 12 Hamiltonian, 611
eigenfunctions intro, 185
angular momentum component, 65 selection rules, see selection rules
atoms, 148 electricity
free electron gas, 292 conduction
harmonic oscillator, 53 intro, 266
hydrogen atom, 79 electromagnetic field
linear momentum, 197 energy, 496
position, 195 Hamiltonian, 414
square angular momentum, 67 Maxwell’s equations, 417
eigenvalue, 12 quantized, 498
eigenvalue problems electromagnetic potentials
commutator type, 398 gauge transformation, 568
ladder operators, 398 electromagnetic waves
eigenvalues quantized, 502
angular momentum component, 65 electron
atoms, 148 in magnetic field, 442
free electron gas, 292 electron affinity, 262
harmonic oscillator, 51 Hartree-Fock, 252
hydrogen atom, 76 electron capture
linear momentum, 197 intro, 219
position, 195 electronegativity, 155, 262
square angular momentum, 68 atoms, 152
eigenvector, 12, 776 emission of radiation, 180
Einstein emissivity, 392
dice, 27 energy conservation, 170
mass-energy relation, see special relativ- energy spectrum
ity banded, 266
summation convention, 560 free electron gas, 293
swiped special relativity, 548 harmonic oscillator, 51
Einstein A and B coefficients, 192 hydrogen atom, 76
Einstein’s derivation, 193 solids, 266
Einstein A coefficients energy-time uncertainty principle, 173
quantum derivation, 502 enthalpy, 358
Einstein B coefficients enthalpy of vaporization, 380
quantum derivation, 614 entropy, 367
Einstein Podolski Rosen, 514 descriptive, 360
electric charge EPR, 514
electron and proton, 70 equipartition theorem, 322
electric dipole transitions Euler formula, 4
intro, 184 eV, 777
selection rules, see selection rules Everett, III, 525
electric potential every possible combination, 109, 126
classical derivation exchange integrals, 247
emph, 417, 722 exchange operator, 118
quantum derivation exchange terms, see twilight terms
emph, 415 excited determinants, 256
relativistic derivation exclusion-principle repulsion, 157
emph, 567 expectation value, 84
electric quadrupole transitions definition, 87

799
simplified expression, 88 energy spectrum, 293
versus average, 85 ground state, 293
exponential function, 777 Hamiltonian, 290
exponential of an operator, 175 function, 6, 7, 778
extended zone scheme, 305 functional, 778
extensive variable, 356 fundamental commutation relations
as an axiom, 396
f , 777 orbital angular momentum, 97
face centered cubic, see lattice spin
factorial, 761 introduction, 129
fcc, see lattice
Fermi surface G, 778
confined boundary conditions, 293 g, 778
periodic boundary conditions, 305 g-factor, 443
periodic zone scheme, 308 Galilean transformation, 553
reduced zone scheme, 308 gamma decay
Fermi temperature, 386 intro, 219
gamma function, 761
Fermi-Dirac distribution, 318
Gamow theory, 221
canonical probability distribution, 351
gauge transformation
for given energy, 349
electromagnetic potentials, 568
identify chemical potential, 382
Gauss’ theorem, 778
fermions, 124
generalized coordinates, 543
antisymmetrization requirement, 135
geometric phase, 607
statistics, 318
Gibbs free energy, 375
Feynman diagrams, 644
microscopic, 382
field operators, 505
grad, 764
Fine structure, 468
gradient, 764
fine structure
grain, 270
hydrogen atom, 467
grain boundaries, 270
fine structure constant, 468
Green’s function
first law of thermodynamics, 339, 358
Laplacian, 436
fission
ground state
intro, 218 atoms, 150
flopping frequency, 452 free electron gas, 293
Floquet theory, 275 harmonic oscillator, 53
Fock operator, 249 hydrogen atom, 77, 79
Fock state, 485 hydrogen molecular ion, 106
forbidden transitions, 184 hydrogen molecule, 118, 132, 136
force nonzero energy, 52
in quantum mechanics, 175 group property
four-vectors, see special relativity coordinate system rotations, 716
Fourier analysis, 276 Lorentz transform, 560
Fourier coefficients, 569 group theory, 522
Fourier integral, 570 group velocity, 204
Fourier series, 569 gyromagnetic ratio, 442
Fourier transform, 204, 570
free electron gas, 289 H, 778
Bloch function, 301 h, 779
eigenfunctions, 292 half-life
eigenvalues, 292 atomic nuclei, 218

800
excited atoms, 194 Hermitian matrices, 784
Hamiltonian, 25 Hermitian operators, 13
and physical symmetry, 519 hidden variables, 28, 515
atoms, 147 hidden versus nonexisting, 69
classical, 547 hieroglyph, 329, 477
electromagnetic field, 414 Hund’s rules, 328
free electron gas, 290 hybridization, 162
gives time variation, 169 hydrogen atom, 70
harmonic oscillator, 47 eigenfunctions, 79
partial, 49 eigenvalues, 76
hydrogen atom, 70 energy spectrum, 76
hydrogen molecular ion, 99 ground state, 77, 79
hydrogen molecule, 111 Hamiltonian, 70
in matrix form, 145 relativistic corrections, 467
numbering of eigenfunctions, 25 hydrogen bonds, 161, 261
one-dimensional free space, 199 hydrogen molecular ion, 98
relativistic, non quantum, 568 bond length, 107
Hamiltonian dynamics experimental binding energy, 106
relation to Heisenberg picture, 177 ground state, 106
Hamiltonian perturbation coefficients, 456 Hamiltonian, 99
Hankel functions shared states, 101
spherical, 638 hydrogen molecule, 111
harmonic oscillator, 46 binding energy, 118
classical frequency, 47 bond length, 118
eigenfunctions, 53 ground state, 118, 132, 136
eigenvalues, 51 Hamiltonian, 111
energy spectrum, 51 hyperfine splitting, 467
ground state, 53
Hamiltonian, 47 I, 779
partial Hamiltonian, 49 i, 3, 779
particle motion, 207 inverse, 4
Hartree product, 138, 239 i index, 6
Hartree-Fock, 238 ℑ, 779
Coulomb integrals, 247 i, 779
exchange integrals, 247 I, 779
restricted ideal gas
closed shell, 242 quantum derivation, 387
open shell, 242 thermodynamic properties, 377
unrestricted, 242 ideal gas law, 388
h̄, 779 ideal magnetic dipole, 431
heat, 359 identical particles, 134
Heisenberg iff, 10, 780
uncertainty principle, 22 imaginary part, 3
uncertainty relationship, 94 impact parameter, 637
helium ionization energy, 457 incompressibility
Hellmann-Feynman theorem, 457 emph, 294
Helmholtz equation, 641 index notation, 779
Green’s function solution, 641 indistinguishable particles, 343
Helmholtz free energy, 375 inner product
microscopic, 381 multiple variables, 17

801
inner product of functions, 9 L, 781
inner product of vectors, 8 L, 781
insulated system, 370 l, 781
integer, 780 ℓ, 782
intelligent designers, 702 ladder operators
intensive variable, 356 angular momentum, 397
internal energy, 357 Lagrangian
interpretation relativistic, 567
interpretations, 26 simplest case, 542
many worlds, 525 Lagrangian mechanics, 541
orthodox, 25 Lagrangian multipliers
relative state, 524 derivations, 648
statistical, 25 for variational statements, 230
inverse, 780 Lamb shift, 467, 477
inverse beta decay Landé g-factor, 476
intro, 219 Laplace equation, 723
inversion operator, 188, see parity Laplacian, 764
ionic bonds, 165 Larmor frequency
ionic molecules, 261 definition, 448
ionic solids, 261 Larmor precession, 450
ionization, 77 laser, 180
ionization energy, 262 latent heat of vaporization, 380
atoms, 152 lattice, 264
Hartree-Fock, 251 basis, 264
helium, 457 diamond, 287
hydrogen atom, 77 lithium, 269
iso, 780 NaCl, 264
isolated, 780 bcc, 269
isolated system, 370 diamond, 287
isotope, 218 fcc, 265
lithium, 269
J, 780 NaCl, 264
j, 780 one-dimensional, 270
primitive translation vector, 275
K, 781 reciprocal
K, 781 lithium, 285
k, 781 NaCl, 285
kB , 781 one-dimensional, 277
ket, 9, 762 primitive vectors, 285
ket notation three-dimensional, 285
spherical harmonics, 67 three-dimensional
spin states, 125 primitive vectors, 282
kinetic energy unit cell, 265
operator, 24 bcc, 269
kinetic energy operator fcc, 265
in spherical coordinates, 71 law of Dulong and Petit, 322
Klein-Gordon equation, 412 Lebesgue integration, 618
kmol, 781 length of a vector, 9
Koopman’s theorem, 251 Lennard-Jones potential, 259
Kramers relation, 751 lifetime

802
atomic nuclei, 218 need for quantum field theory, 480
excited atom, 194 nuclear physics, 223
light waves matching regions, 631
classical, 424 mathematicians, 560, 669
light-cone, see special relativity matrix, 11, 782
lim, 782 maximum principle
linear combination, 782 Laplace equation, 723
linear momentum Maxwell relations, 376
classical, 22 Maxwell’s equations, 417
eigenfunctions, 197 Maxwell-Boltzmann distribution, 321
eigenvalues, 197 canonical probability distribution, 351
operator, 23 for given energy, 349
symmetry and conservation, 518 mean value property
localization Laplace equation, 723
absence of, 201 measurable values, 25
London forces, 259 measurement, 26
Lorentz factor, 552 meson, 119, 124
Lorentz transformation, see special relativity metals, 268
Lorentz-Fitzgerald, see special relativity method of stationary phase, 619
luminosity, 636 metric prefixes, 784
Lyman transitions, 77 mole, 356
molecular mass, 357
M, 782 molecular solids, 259
M , 782 molecules
m, 782 ionic, 261
me , 782 momentum space wave function, 198
mn , 782 moving mass
mp , 782 seespecial relativity, 549
Madelung constant, 264 multipole expansion, 435
magnetic dipole moment, 442
magnetic dipole transitions N, 784
Hamiltonian, 611 N , 784
intro, 185 n, 784
selection rules, see selection rules nabla, 764
magnetic quantum number, 65 natural, 785
magnetic spin anomaly, 443 nearly free electron model, 309
magnetic vector potential Neumann functions
classical derivation, 727 spherical, 638
in the Dirac equation, 731 neutron stars, 219
quantum derivation, 415 Newton’s second law
relativistic derivation, 567 in quantum mechanics, 175
magnitude, 3 Newtonian analogy, 24
mass number, 217 Maxwellian analogy, 498
mass-energy equation Newtonian mechanics, 19
credit, 776 in quantum mechanics, 174
mass-energy relation, see special relativity noble gas, 151
Dirac equation noncanonical Hartree-Fock equations, 668
emph, 413 nonexisting versus hidden, 69
fine-structure nonholonomic, 608
emph, 469 norm of a function, 9

803
normal, 785 common phrasing, 153
normalized, 9 free electron gas, 289
normalized wave functions, 21 Pauli repulsion, 157
nuclear magnetic resonance, 446 Pauli spin matrices, 409
nuclear magneton, 444 generalized, 411
periodic zone scheme, 308
observable values, 25 permanents, 140
occupation numbers permittivity of space, 70
single-state, 484 perpendicular bisector, 786
one-dimensional free space perturbation theory
Hamiltonian, 199 helium ionization energy, 457
operator second order, 456
exponential of an operator, 175 time dependent, 190
operators, 11 time-independent, 455
angular momentum component, 64 weak lattice potential, 310
Hamiltonian, 25 phase angle, 786
kinetic energy, 24 phase equilibrium, 379
in spherical coordinates, 71
phonons, 322
linear momentum, 23
photon, 77, 786
position, 23
energy, 77
potential energy, 24
spin value, 124
quantum mechanics, 23
photon packet, 500
square angular momentum, 66
physical symmetry
total energy, 24
commutes with Hamiltonian, 519
orbitals, 239
physicists, 27, 28, 178, 183, 237, 256, 275, 285,
orthodox interpretation, 25
328, 424, 425, 443, 485, 498, 557–
orthogonal, 10
559, 582, 636, 637, 639, 675
orthonormal, 10
pi bonds, 160
P , 785 Plancherel’s theorem, 618
p states, 80 Planck’s blackbody spectrum, 390
p, 785 Planck’s constant, 24
p-state, 786 Planck-Einstein relation, 77
parity, 188 point charge
as a symmetry, 522 static, 425
conservation in atom transitions, 189 pointer states, 81
inversion operator, 188 Poisson bracket, 177
multiplies instead of adds, 188 Poisson equation, 436
orbital angular momentum, 188 polar bonds, 161
photons, 189 poly-crystalline, 270
spherical harmonics population inversion, 181
derivation, 583 position
symmetry and conservation, 518 eigenfunctions, 195
intro, 188 eigenvalues, 195
Parseval’s relation, 618 operator, 23
partial wave analysis, 637 possible values, 25
partition function, 352 potential
Paschen transitions, 77 existence, 722
Pasternack relation, 751 potential energy
Pauli exclusion principle, 141 operator, 24
atoms, 152 potential energy surfaces, 237

804
Poynting vector, 496 radiation
prefixes emission and absorption, 178
YZEPTGMkmunpfazy, 784 radioactivity
pressure, 357 intro, 217
primitive cell, 282 random number generator, 27
primitive translation vector real part, 3
one-dimensional, 275 reciprocal lattice, see lattice, reciprocal
reciprocal lattice, 285 reduced zone scheme, 307
three-dimensional, 282 reflection coefficient, 216, 647
principal quantum number, 73 relative state formulation, 528
principle of relativity, 550 relative state interpretation, 524
probabilities relativistic corrections
evaluating, 59 hydrogen atom, 467
from coefficients, 27 Relativistic effects
probability current, 645 Dirac equation, 412
probability density, 113 relativistic mass
probability to find the particle, 20 seespecial relativity, 549
promotion, 162 relativity, see special relativity, 787
proper distance, see special relativity resonance factor, 451
proper time, see special relativity rest mass
pure substance, 337 seespecial relativity, 549
px , 786 restricted Hartree-Fock, 242
Pythagorean theorem, 555 reversibility, 362
RHF, 242
Q, 786
rot, 764
q, 787
quantum confinement, 43, 295
S, 789
quantum dot, 46
s state, 789
quantum electrodynamics
s states, 80
electron g factor, 443
saturated, 379
Feynman’s book, 481
scalar, 789
quantum mechanics
scattering, 213
acceleration, 175
force, 175 one-dimensional coefficients, 216
Newton’s second law, 175 three-dimensional coefficients, 634
Newtonian mechanics, 174 scattering amplitude, 635
velocity, 175 Schrödinger equation, 169
wave packet velocity, 202 failure?, 522
quantum well, 45 integral form, 642
quantum wire, 45 Schrödinger’s cat, 512
quark second law of thermodynamics, 360
spin, 124 second quantization, 498
quarks selection rules
Dirac equation, 413 angular momentum conservation, 185
proton and neutron, 444 derivation, 609
electric dipole transitions, 184
R, 787 electric quadrupole transitions, 185
ℜ, 787 magnetic dipole transitions, 185
r, 788 parity conservation, 188
~r, 789 self-adjoint, 771
Rabi flopping frequency, 452 self-consistent field method, 250

805
semi-conductors derivation, 564
band gap, 286 Lagrangian derivation, 568
conduction, 267 mechanics
lattice, see lattice, diamond intro, 562
manipulation, 267 Lagrangian, 565
separation of variables, 48 momentum four-vector, 563
for atoms, 148 moving mass, 549
for free electron gas, 290 derivation, 563
linear momentum, 197 Lagrangian derivation, 566
position, 195 proper distance, 556
shielding approximation, 148 as dot product, 558
SI prefixes, 784 proper time, 555
sigma bonds, 159 relativistic mass, 549
simple cubic lattice, 303 rest mass energy, 550
sin, 789 derivation, 564
singlet state, 133 space-like, 556
derivation, 403 space-time, 558
skew-Hermitian, 771 space-time interval, 556
Slater determinants, 139 causality, 557
small perturbation theory, 310 superluminal interaction, 556
solid angle, 771 time dilation
solids, 259 derivation, 554
band structure time-dilation, 551
intro, 266 time-like, 556
covalent, 286 velocity transformation, 553
energy spectrum, 266 derivation, 554
ionic, 261 warp factor, 556
molecular, 259 specific heat
spn hybridization, 162 constant pressure, 359
space-like, see special relativity constant volume, 359
space-time, see special relativity values, 322
space-time interval, see special relativity specific volume, 356
special relativity, 548 molar, 356
action, 567 spectral line broadening, 191
canonical momentum, 567 spectrum
causality and proper time, 557 hydrogen, 78
four-vectors, 558 spherical Bessel functions, 638
dot product, 558 spherical coordinates, 64
in terms of momentum, 549 spherical Hankel functions, 638
light-cone, 557 spherical harmonics
Lorentz force, 567 derivation, 712
derivation, 568 derivation from the ODE, 581
Lorentz transformation, 552 derivation using ladders, 712
basic derivation, 553 generic expression, 583
group derivation, 561 intro, 67
group property, 560 Laplace equation derivation, 583
index notation, 559 parity, 583
Lorentz-Fitzgerald contraction, 552 spherical Neumann functions, 638
derivation, 554 spin, 123
mass-energy relation, 549 fundamental commutation relations

806
introduction, 129 t, 790
value, 124 temperature, 338, 790
x- and y-eigenstates, 410 definition, 350
spin down, 124 Carnot, 367
spin orbitals, 239 definition using entropy, 378
spin states temperatures above absolute zero, 317
ambiguity in sign, 716 thermal de Broglie wavelength, 383
axis rotation, 715 thermal efficiency, 365
spin up, 124 thermal equilibrium, 338
spinor, 126 thermodynamics
spontaneous emission first law, 358
quantum derivation, 502 second law, 360
spontaneous emission rate, 192 third law, 372
standard deviation, 84 third law of thermodynamics, 372
definition, 85 throw the dice, 27
simplified expression, 88 time variation
Stark effect, 464 Hamiltonian, 169
stationary states, 171 time-dilation, see special relativity
statistical interpretation, 25 time-like, see special relativity
statistical mechanics, 317 total energy
statistics operator, 24
bosons, 319 transition probability, 190
fermions, 318 transition rate, 192
Stefan-Boltzmann formula, 390 transitions
Stefan-Boltzmann law, 391 hydrogen atom, 77
Stern-Gerlach apparatus, 445 transmission coefficient, 216, 217, 648
stoichiometric coefficient, 380 transpose of a matrix, 775
Stokes’ theorem, 789 triangle inequality, 186
superfluidity triple alpha process, 219
Bose-Einstein condensation, 320 triple product, 790
superluminal interaction triplet states, 133
Bell’s theorem, 513 derivation, 403
hidden variables, 515 tunneling, 214
many worlds interpretation, 528 Stark effect, 467
quantum, 21 WKB approximation, 216
do not allow communication, 516 turning point, 207
produce paradoxes, 517 turning points
relativistic paradoxes, 556 WKB approximation, 211
symmetrization requirement twilight terms, 121
graphical depiction, 342 exchange terms, 122
identical bosons, 135 Hartree-Fock, 247
identical fermions, % seeantisymmetriza- Lennard-Jones/London force, 679
tion requirement135 lithium hydride, 122
indistinguishable particles, 343 particle exchange rate, 173
using groupings, 140 spontaneous emission, 504
using occupation numbers, 484 two state systems
using permanents, 140 ground state energy, 119
symmetry, 789 time variation, 172
unsteady perturbations, 178
T , 789 two-state systems

807
atom-photon model, 502 w, 792
w,
~ 792
u, 791 warp factor, see special relativity
UHF, 242 wave function, 19
uncertainty principle multiple particles, 109
angular momentum, 69 multiple particles with spin, 130
energy, 55, 171 with spin, 125
Heisenberg, 22 wave number, 203
position and linear momentum, 22 Floquet, 276
uncertainty relationship Fourier versus Floquet, 277
generalized, 93 free electron gas, 290
Heisenberg, 94 wave number vector
unit cell Bloch function, 285
seelattice, 265 wave packet
unit matrix, 784 accelerated motion, 207
unitary definition, 201
matrix, 669 free space, 199, 206
time advance operator, 176 harmonic oscillator, 207
unitary operators, 771 partial reflection, 213
universal gas constant, 322, 377 physical interpretation, 202
unrestricted Hartree-Fock, 242 reflection, 207
Wigner-Seitz cell, 283
V , 791 WKB approximation
v, 791 connection formulae, 630
~v , 791 WKB connection formulae, 633
vacuum energy, 193, 755 WKB theory, 209
vacuum state, 487 Wronskian, 646
valence band, 267
X, 792
values
x, 792
measurable, 25
X-ray diffraction, 330
observable, 25
possible, 25 Y , 792
Van der Waals forces, 259 y, 792
variational method, 105 Ylm , 792
helium ionization energy, 460
hydrogen molecular ion, 104 Z, 792
hydrogen molecule, 117 z, 792
variational principle, 227 Zeeman effect, 463
basic statement, 227 intermediate, 475
differential form, 228 weak, 475
Lagrangian multipliers, 229 zero matrix, 784
vector, 6, 791 zero point energy, 236
vectorial product, 791 zeroth law of thermodynamics, 338
velocity
in quantum mechanics, 175
wave packet, 202
vibronic coupling terms, 654
virial theorem, 173
virtual work, 544
viscosity, 363

808

Anda mungkin juga menyukai