Anda di halaman 1dari 537

Geographical Information

Systems and Science


2nd Edition

Paul A. Longley University College London, UK


Michael F. Goodchild University of California, Santa Barbara, USA
David J. Maguire ESRI Inc., Redlands, USA
David W. Rhind City University, London, UK
Geographical Information
Systems and Science
Geographical Information
Systems and Science
2nd Edition

Paul A. Longley University College London, UK


Michael F. Goodchild University of California, Santa Barbara, USA
David J. Maguire ESRI Inc., Redlands, USA
David W. Rhind City University, London, UK
Copyright  2005 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,
West Sussex PO19 8SQ, England

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): cs-books@wiley.co.uk


Visit our Home Page on www.wiley.com

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or
transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or
otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a
licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK,
without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the
Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex
PO19 8SQ, England, or emailed to permreq@wiley.co.uk, or faxed to (+44) 1243 770620.

This publication is designed to provide accurate and authoritative information in regard to the subject
matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional
services. If professional advice or other expert assistance is required, the services of a competent
professional should be sought.

ESRI Press logo is the trademark of ESRI and is used herein under licence.

Main cover image and first box from bottom, courtesy of NASA.
Second box, reproduced from Ordnance Survey.
Third box, reproduced by permission of National Geographic Maps.
Fourth box, reproduced from Ordnance Survey, courtesy @Last.

Other Wiley Editorial Offices

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print may not be available in electronic books.

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 0-470-87000-1 (HB)


ISBN 0-470-87001-X (PB)

Typeset in 9/10.5pt Times by Laserwords Private Limited, Chennai, India


Printed and bound in Spain by Grafos S.A., Barcelona, Spain
This book is printed on acid-free paper responsibly manufactured from sustainable forestry
in which at least two trees are planted for each one used for paper production.
Contents

Foreword ix 3.3 Representation for what and for whom? 67


Addendum x 3.4 The fundamental problem 68
Preface xi 3.5 Discrete objects and continuous fields 70
List of Acronyms and Abbreviations xv
3.6 Rasters and vectors 74
3.7 The paper map 76
3.8 Generalization 80
I Introduction 1 3.9 Conclusion 82
Questions for further study 83
Further reading 83
1 Systems, science, and study 3
4 The nature of geographic data 85
1.1 Introduction: why does GIS matter? 4
1.2 Data, information, evidence, knowledge,
4.1 Introduction 86
wisdom 11
4.2 The fundamental problem revisited 86
1.3 The science of problem solving 13
4.3 Spatial autocorrelation and scale 87
1.4 The technology of problem solving 16
4.4 Spatial sampling 90
1.5 The business of GIS 24
4.5 Distance decay 93
1.6 GISystems, GIScience, and GIStudies 28
4.6 Measuring distance effects as spatial
1.7 GIS and geography 31
autocorrelation 95
Questions for further study 33
4.7 Establishing dependence in space 101
Further reading 33
4.8 Taming geographic monsters 104
4.9 Induction and deduction and how it all
2 A gallery of applications 35 comes together 106
Questions for further study 107
2.1 Introduction 36 Further reading 107
2.2 Science, geography, and applications 39
2.3 Representative application areas and their
5 Georeferencing 109
foundations 41
2.4 Concluding comments 60
5.1 Introduction 110
Questions for further study 60
5.2 Placenames 112
Further reading 60
5.3 Postal addresses and postal codes 113
5.4 Linear referencing systems 114
5.5 Cadasters and the US Public Land Survey
II Principles 61 System 114
5.6 Measuring the Earth: latitude and
longitude 115
3 Representing geography 63 5.7 Projections and coordinates 117
5.8 Measuring latitude, longitude, and elevation:
3.1 Introduction 64 GPS 122
3.2 Digital representation 65 5.9 Converting georeferences 123

v
vi CONTENTS

5.10 Summary 125 9 GIS data collection 199


Questions for further study 126
Further reading 126 9.1 Introduction 200
9.2 Primary geographic data capture 201
9.3 Secondary geographic data capture 205
6 Uncertainty 127 9.4 Obtaining data from external sources (data
transfer) 211
6.1 Introduction 128 9.5 Capturing attribute data 215
6.2 U1: Uncertainty in the conception of 9.6 Managing a data collection project 215
geographic phenomena 129 Questions for further study 216
6.3 U2: Further uncertainty in the measurement Further reading 216
and representation of geographic
phenomena 136 10 Creating and maintaining geographic
6.4 U3: Further uncertainty in the analysis of databases 217
geographic phenomena 144
6.5 Consolidation 152 10.1 Introduction 218
Questions for further study 153 10.2 Database management systems 218
Further reading 153 10.3 Storing data in DBMS tables 222
10.4 SQL 225
10.5 Geographic database types and
functions 226
10.6 Geographic database design 227
III Techniques 155
10.7 Structuring geographic information 229
10.8 Editing and data maintenance 235
10.9 Multi-user editing of continuous
7 GIS Software 157
databases 235
10.10 Conclusion 237
7.1 Introduction 158 Questions for further study 238
7.2 The evolution of GIS software 158 Further reading 239
7.3 Architecture of GIS software 159
7.4 Building GIS software systems 165 11 Distributed GIS 241
7.5 GIS software vendors 165
7.6 Types of GIS software systems 167 11.1 Introduction 242
7.7 GIS software usage 174 11.2 Distributing the data 244
7.8 Conclusion 174 11.3 The mobile user 250
Questions for further study 174 11.4 Distributing the software: GIServices 257

Further reading 175 11.5 Prospects 259


Questions for further study 259
Further reading 259
8 Geographic data modeling 177

8.1 Introduction 178


8.2 GIS data models 179
IV Analysis 261

8.3 Example of a water-facility object data


model 192 12 Cartography and map production 263
8.4 Geographic data modeling in practice 195
Questions for further study 196 12.1 Introduction 264
Further reading 197 12.2 Maps and cartography 267
CONTENTS vii

12.3 Principles of map design 270 16.5 Accuracy and validity: testing the
12.4 Map series 281 model 379
12.5 Applications 284 16.6 Conclusion 381
12.6 Conclusions 287 Questions for further study 381
Questions for further study 287 Further reading 382
Further reading 287

13 Geovisualization 289
V Management and Policy 383

13.1 Introduction: uses, users, messages, and


media 290 17 Managing GIS 385
13.2 Geovisualization and spatial query 293
13.3 Geovisualization and transformation 297
17.1 The big picture 386
13.4 Immersive interaction and PPGIS 302
17.2 The process of developing a sustainable
13.5 Consolidation 309 GIS 390
Questions for further study 312 17.3 Sustaining a GIS – the people and their
Further reading 313 competences 399
17.4 Conclusions 401
14 Query, measurement, and Questions for further study 402
transformation 315 Further reading 403

14.1 Introduction: what is spatial analysis? 316 18 GIS and management, the Knowledge
14.2 Queries 320 Economy, and information 405
14.3 Measurements 323
14.4 Transformations 329 18.1 Are we all in ‘managed businesses’
14.5 Conclusion 339 now? 406
Questions for further study 339 18.2 Management is central to the successful use
Further reading 339 of GIS 408
18.3 The Knowledge Economy, knowledge
management, and GIS 413
15 Descriptive summary, design, and
18.4 Information, the currency of the Knowledge
inference 341
Economy 415
18.5 GIS as a business and as a business
15.1 More spatial analysis 342 stimulant 422
15.2 Descriptive summaries 343 18.6 Discussion 424
15.3 Optimization 352 Questions for further study 424
15.4 Hypothesis testing 359 Further reading 424
15.5 Conclusion 361
Questions for further study 362
19 Exploiting GIS assets and navigating
Further reading 362 constraints 425

16 Spatial modeling with GIS 363 19.1 GIS and the law 426
19.2 GIS people and their skills 431
16.1 Introduction 364 19.3 Availability of ‘core’ geographic
16.2 Types of model 369 information 434
16.3 Technology for modeling 376 19.4 Navigating the constraints 440
16.4 Multicriteria methods 378 19.5 Conclusions 444
viii CONTENTS

Questions for further study 445 21 Epilog 471


Further reading 445
21.1 Introduction 472
20 GIS partnerships 447 21.2 A consolidation of some recurring
themes 472
20.1 Introduction 448 21.3 Ten ‘grand challenges’ for GIS 478

20.2 Collaborations at the local level 448 21.4 Conclusions 485

20.3 Working together at the national level 450 Questions for further study 485
20.4 Multi-national collaborations 458 Further reading 486
20.5 Nationalism, globalization, politics, and
GIS 459 Index 487
20.6 Extreme events can change everything 464
20.7 Conclusions 470
Questions for further study 470
Further reading 470
Foreword

A
t the time of writing, the first edition of Geographic functions. As we demonstrate in this book, GIS&S was
Information Systems and Science (GIS&S) has sold never simply hardware and software. It has also always
well over 25 000 copies – the most, it seems, of any been about people and, in preparing this second edition,
GIS textbook. Its novel structure, content, and ‘look and we have taken the decision to present an entirely new
feel’ expanded the very idea of what a GIS is, what set of current GIS protagonists. This has inevitably meant
it involves, and its pervasive importance. In so doing, that all boxes from the first edition pertaining to living
the book introduced thousands of readers to the field in individuals have been removed in order to create space:
which we have spent much of our working lifetimes. we hope that the individuals concerned will understand,
Being human, we take pleasure in that achievement – but and we congratulate them on their longevity! This
it is not enough. Convinced as we are of the benefits second edition, then, remains about hardware, software,
of thinking and acting geographically, we are determined people – and also about geographic information, some
to enthuse and involve many more people. This and the real science, a clutch of partnerships, and much judgment.
high rate of change in GIS&S (Geographic Information Yet we recognize the progressive ‘consumerization’ of our
Systems and Science) demands a new edition that benefits basic tool set and welcome it, for it means more can be
from the feedback we have received on the first one. done for greater numbers of beneficiaries for less money.
Setting aside the (important) updates, the major Our new book reflects the continuing shift from tools to
changes reflect our changing world. The use of GIS understanding and coping with the fact that, in the real
was pioneered in the USA, Canada, various countries in world, ‘everything is connected to everything else’!
Europe, and Australia. But it is expanding rapidly – and We asked Joe Lobley, an individual unfamiliar with
in innovative ways – in South East Asia, Latin America political correctness and with a healthy scepticism about
and Eastern Europe, for example. We have recognized this the utterances of GIS gurus, to write the foreword for the
by broadening our geography of examples. The world of first edition. To our delight, he is now cited in various
2005 is not the same as that prior to 11 September 2001. academic papers and reviews as a stimulating, fresh, and
Almost all countries are now engaged in seeking to protect lateral thinker. Sadly, at the time of going to press, Joe
their citizens against the threat of terrorism. Whilst we do had not responded to our invitation to repeat his feat.
not seek to exaggerate the contribution of GIS, there are He was last heard of on location as a GIS consultant in
many ways in which these systems and our geographic Afghanistan. So this Foreword is somewhat less explosive
knowledge can help in this, the first duty of a national than last time. We hope the book is no less valuable.
government. Finally, the sheen has come off much
information technology and information systems: they
Paul A. Longley
have become consumer goods, ubiquitous in the market
place. Increasingly they are recognized as a necessary Michael F. Goodchild
underpinning of government and commerce – but one David J. Maguire
where real advantage is conferred by their ease of use David W. Rhind
and low price, rather than the introduction of exotic new October 2004

ix
Addendum

H
i again! Greetings from Afghanistan, where I live with terrorist threats these days and GIS can help
am temporarily resident in the sort of hotel that as a data and intelligence integrator. I like the revised
offers direct access to GPS satellite signals through structure, the continuing emphasis on business benefit and
the less continuous parts of its roof structure. Global institutions and the new set of role models they have
communications mean I can stay in touch with the GIS chosen (though ‘new’ is scarcely the word I would have
world from almost anywhere. Did you know that when used for Roger Tomlinson. . .). I like the same old unstuffy
Abraham Lincoln was assassinated it took 16 days for ways these guys write in proper American English, mostly
the news to reach Britain? But when William McKinley avoiding jargon.
was assassinated only 36 years later the telegraph (the On the down-side, I still think they live in a rose-
first Internet) ensured it took only seconds for the news tinted world where they believe government and academia
to reach Old and New Europe. Now I can pull down maps actually do useful things. If you share their strange views,
and images of almost anything I want, almost anywhere. tell me what the great National Spatial Data Infrastructure
Of course, I get lots of crap as well – the curse of the movement has really achieved worldwide except hype and
age – and some of the information is rubbish. What does numerous meetings in nice places? Wise up guys! You
Kabul’s premier location prospector want with botox? But don’t have to pretend. Now I do like the way that the guys
technology makes good (and bad) information available, recognize that places are unique (boy, my hotel is. . .), but
often without payment (which I like), to all those with don’t swallow the line that digital representations of space
telecoms and access to a computer. Sure, I know that’s are any less valid, ethical or usable than digital measures
still a small fraction of mankind but boy is that fraction of time or sound. Boast a little more, and, while you’re at
growing daily. It’s helped of course by the drop in price it, say less about ‘the’ digital divide and more about digital
of hardware and even software: GIS tools are increasingly differentiation. And keep well clear of patronizing, social
becoming like washing machines – manufactured in bulk theory stroking, box-ticking, self-congratulatory claptrap.
and sold on price though there is a lot more to getting The future of that is just people with spectacles who write
success than buying the cheapest. books in garden sheds. Trade up from the caves of the pre-
I’ve spent lots of time in Asia since last we digital era and educate the wannabes that progress can be
communicated and believe me there are some smart things a good thing. And wise up that the real benefits of GIS do
going on there with IS and GIS. Fuelled by opportunism not depend on talking shops or gravy trains. What makes
(and possibly a little beer) the guys writing this book have GIS unstoppable is what we can do with the tools, with
seen the way the wind is blowing and made a good stab decent data and with our native wit and training to make
at representing the whole world of GIS. So what else is the world a better and more efficient place. Business and
new in this revised edition of what they keep telling me markets (mostly) will do that for you!
is the world’s best-selling GIS textbook? I like the way
homeland security issues are built in. All of us have to Joe Lobley
Preface

T
he field of geographic information systems (GIS) analysis are sustainable, and is essential for everything
is concerned with the description, explanation, and except the most trivial use of GIS. GIScience is also
prediction of patterns and processes at geographic scales. founded on a search for understanding and predictive
GIS is a science, a technology, a discipline, and an applied power in a world where human factors interact with those
problem solving methodology. There are perhaps 50 other relating to the physical environment. Good science is also
books on GIS now on the world market. We believe ethical and clearly communicated science, and thus the
that this one has become one of the fastest selling and ways in which we analyze and depict geography also play
most used because we see GIS as providing a gateway an important role.
to science and problem solving (geographic information Digital geographic information is central to the prac-
systems ‘and science’ in general), and because we relate ticality of GIS. If it does not exist, it is expensive to
available software for handling geographic information collect, edit, or update. If it does exist, it cuts costs
to the scientific principles that should govern its use and time – assuming it is fit for the purpose, or good
(geographic information: ‘systems and science’). GIS enough for the particular task in hand. It underpins the
is of enduring importance because of its central co- rapid growth of trading in geographic information (g-
ordinating principles, the specialist techniques that have commerce). It provides possibilities not only for local
been developed to handle spatial data, the special analysis business but also for entering new markets or for forging
methods that are key to spatial data, and because of the new relationships with other organizations. It is a foolish
particular management issues presented by geographic individual who sees it only as a commodity like baked
information (GI) handling. Each section of this book beans or shaving foam. Its value relies upon its coverage,
investigates the unique, complex, and difficult problems on the strengths of its representation of diversity, on its
that are posed by geographic information, and together truth within a constrained definition of that word, and on
they build into a holistic understanding of all that is its availability.
important about GIS. Few of us are hermits. The way in which geographic
information is created and exploited through GIS affects
us as citizens, as owners of enterprises, and as employ-
ees. It has increasingly been argued that GIS is only a
part – albeit a part growing in importance and size – of
Our approach the Information, Communications, and Technology (ICT)
industry. This is a limited perception, typical of the ICT
supply-side industry which tends to see itself as the sole
GIS is a proven technology and the basic operations of progenitor of change in the world (wrongly). Actually, it
GIS today provide secure and established foundations is much more sensible to take a balanced demand- and
for measurement, mapping, and analysis of the real supply-side perspective: GIS and geographic information
world. GIScience provides us with the ability to devise can and do underpin many operations of many organi-
GIS-based analysis that is robust and defensible. GI zations, but how GIS works in detail differs between
technology facilitates analysis, and continues to evolve different cultures, and can often also partly depend on
rapidly, especially in relation to the Internet, and its likely whether an organization is in the private or public sector.
successors and its spin-offs. Better technology, better Seen from this perspective, management of GIS facilities
systems, and better science make better management and is crucial to the success of organizations – businesses as
exploitation of GI possible. we term them later. The management of the organizations
Fundamentally, GIS is an applications-led technology, using our tools, information, knowledge, skills, and com-
yet successful applications need appropriate scientific mitment is therefore what will ensure the ultimate local
foundations. Effective use of GIS is impossible if they and global success of GIS. For this reason we devote
are simply seen as black boxes producing magic. GIS an entire section of this book to management issues. We
is applied rarely in controlled, laboratory-like conditions. go far beyond how to choose, install, and run a GIS;
Our messy, inconvenient, and apparently haphazard real that is only one part of the enterprise. We try to show
world is the laboratory for GIS, and the science of how to use GIS and geographic information to contribute
real-world application is the difficult kind – it can rarely to the business success of your organization (whatever it
control for, or assume away, things that we would prefer is), and have it recognized as doing just that. To achieve
were not there and that get in the way of almost any that, you need to know what drives organizations and how
given application. Scientific understanding of the inherent they operate in the reality of their business environments.
uncertainties and imperfections in representing the world You need to know something about assets, risks, and con-
makes us able to judge whether the conclusions of our straints on actions – and how to avoid the last two and

xi
xii PREFACE
nurture the first. And you need to be exposed – for that or (particularly if their previous experience lies outside
is reality – to the inter-dependencies in any organization the mainstream geographic sciences) a fast track to get
and the tradeoffs in decision making in which GIS can up-to-speed with the range of principles, techniques, and
play a major role. practice issues that govern real-world application.
The format of the book is intended to make learning
about GIS fun. GIS is an important transferable skill
because people successfully use it to solve real-world
Our audience problems. We thus convey this success through use of real
(not contrived, conventional text-book like) applications,
in clearly identifiable boxes throughout the text. But
Originally, we conceived this book as a ‘student com- even this does not convey the excitement of learning
panion’ to a very different book that we also produced about GIS that only comes from doing. With this in
as a team – the second edition of the ‘Big Book’ of mind, an on-line series of laboratory classes have been
GIS (Longley et al 1999). This reference work on GIS created to accompany the book. These are available, free
provided a defining statement of GIS at the end of the of charge, to any individual working in an institution
last millennium: many of the chapters that are of endur- that has an ESRI site license (see www.esri.com).
ing relevance are now available as an advanced reader
They are cross-linked in detail to individual chapters
in GIS (Longley et al 2005). These books, along with
and sections in the book, and provide learners with the
the first ‘Big Book’ of GIS (Maguire et al 1991) were
opportunity to refresh the concepts and techniques that
designed for those who were already very familiar with
they have acquired through classes and reading, and the
GIS, and desired an advanced understanding of endur-
opportunity to work through extended examples using
ing GIS principles, techniques, and management practices.
ESRI ArcGIS. This is by no means the only available
They were not designed as books for those being intro-
software for learning GIS: we have chosen it for our
duced to the subject.
own lab exercises because it is widely used, because one
This book is the companion for everyone who desires
of us works for ESRI Inc. (Redlands, CA., USA) and
a rich understanding of how GIS is used in the real world.
because ESRI’s cooperation enabled us to tailor the lab
GIS today is both an increasingly mature technology
exercises to our own material. There are, however, many
and a strategically important interdisciplinary meeting
other options for lab teaching and distance learning from
place. It is taught as a component of a huge range of
private and publicly funded bodies such as the UNIGIS
undergraduate courses throughout the world, to students
consortium, the Worldwide Universities Network, and
that already have different skills, that seek different
Pennsylvania State University in its World Campus
disciplinary perspectives on the world, and that assign
(www.worldcampus.psu.edu/pub/index.shtml).
different priorities to practical problem solving and the
GIS is not just about machines, but also about people.
intellectual curiosities of science. This companion can be
It is very easy to lose touch with what is new in GIS,
thought of as a textbook, though not in a conventionally
such is the scale and pace of development. Many of these
linear way. We have not attempted to set down any
developments have been, and continue to be, the outcome
kind of rigid GIS curriculum beyond the core organizing
of work by motivated and committed individuals – many
principles, techniques, analysis methods, and management
an idea or implementation of GIS would not have taken
practices that we believe to be important. We have
place without an individual to champion it. In the first
structured the material in each of the sections of the
edition of this book, we used boxes highlighting the
book in a cumulative way, yet we envisage that very
contributions of a number of its champions to convey
few students will start at Chapter 1 and systematically
that GIS is a living, breathing subject. In this second
work through to Chapter 21 – much of learning is not
edition, we have removed all of the living champions of
like that any more (if ever it was), and most instructors
GIS and replaced them with a completely new set – not as
will navigate a course between sections and chapters
any intended slight upon the remarkable contributions that
of the book that serves their particular disciplinary,
these individuals have made, but as a necessary way of
curricular, and practical priorities. The ways in which
three of us use the book in our own undergraduate and freeing up space to present vignettes of an entirely new set
postgraduate settings are posted on the book’s website of committed, motivated individuals whose contributions
(www.wiley.com/go/longley), and we hope that other have also made a difference to GIS.
instructors will share their best practices with us as As we say elsewhere in this book, human attention is
time goes on (please see the website for instructions valued increasingly by business, while students are also
on how to upload instructor lists and offer feedback seemingly required to digest ever-increasing volumes of
on those that are already there!). Our Instructor Manual material. We have tried to summarize some of the most
(see www.wiley.com/go/longley) provides suggestions important points in this book using short ‘factoids’, such
as to the use of this book in a range of disciplines as that below, which we think assist students in recalling
and educational settings. The linkage of the book to core points.
reference material (specifically Longley et al (2005) and Short, pithy, statements can be memorable.
Maguire et al (1991) at www.wiley.com/go/longley)
is a particular strength for GIS postgraduates and We hope that instructors will be happy to use this book
professionals. Such users might desire an up-to-date as a core teaching resource. We have tried to provide
overview of GIS to locate their own particular endeavors, a number of ways in which they can encourage their
PREFACE xiii
students to learn more about GIS through a range of ■ We have linked our book to online learning resources
assessments. At the end of each chapter we provide four throughout, notably the ESRI Virtual Campus.
questions in the following sequence that entail: ■ The book that you have in your hands has been
■ Student-centred learning by doing. completely restructured and revised, while retaining
■ A review of material contained in the chapter. the best features of the (highly successful) first edition
published in 2001.
■ A review and research task – involving integration
of issues discussed in the chapter with those discussed
in additional external sources.
■ A compare and research task – similar to the review
and research task above, but additionally entailing
Summary
linkage with material from one or more other chapters
in the book. This is a book that recognizes the growing commonality
The on-line lab classes have also been designed to allow between the concerns of science, government, and busi-
learning in a self-paced way, and there are self-test ness. The examples of GIS people and problems that are
exercises at the end of each section for use by learners scattered through this book have been chosen deliberately
working alone or by course evaluators at the conclusion to illuminate this commonality, as well as the interplay
of each lab class. between organizations and people from different sectors.
As the title implies, this is a book about geographic To differing extents, the five sections of the book develop
information systems, the practice of science in general, common concerns with effectiveness and efficiency, by
and the principles of geographic information science bringing together information from disparate sources, act-
(GIScience) in particular. We remain convinced of the ing within regulatory and ethical frameworks, adhering
need for high-level understanding and our book deals to scientific principles, and preserving good reputations.
with ideas and concepts – as well as with actions. Just This, then, is a book that combines the basics of GIS
as scientists need to be aware of the complexities of with the solving of problems which often have no single,
interactions between people and the environment, so ideal solution – the world of business, government, and
managers must be well-informed by a wide range of interdisciplinary, mission-orientated holistic science.
knowledge about issues that might impact upon their In short, we have tried to create a book that remains
actions. Success in GIS often comes from dealing as much attuned to the way the world works now, that understands
with people as with machines. the ways in which most of us increasingly operate as
knowledge workers, and that grasps the need to face
complicated issues that do not have ideal solutions. As
with the first edition of the book, this is an unusual
The new learning paradigm enterprise and product. It has been written by a multi-
national partnership, drawing upon material from around
the world. One of the authors is an employee of a leading
This is not a traditional textbook because: software vendor and two of the other three have had
business dealings with ESRI over many years. Moreover,
■ It recognizes that GISystems and GIScience do not some of the illustrations and examples come from the
lend themselves to traditional classroom teaching customers of that vendor. We wish to point out, however,
alone. Only by a combination of approaches can such that neither ESRI (nor Wiley) has ever sought to influence
crucial matters as principles, technical issues, practice, our content or the way in which we made our judgments,
management, ethics, and accountability be learned. and we have included references to other software and
Thus the book is complemented by a website vendors throughout the book. Whilst our lab classes are
(www.wiley.com/go/longley) and by exercises that part of ESRI’s Virtual Campus, we also make reference
can be undertaken in laboratory or self-paced settings. to similar sources of information in both paper and digital
■ It brings the principles and techniques of GIScience to form. We hope that we have again created something
those learning about GIS for the first time – and as novel but valuable by our lateral thinking in all these
such represents part of the continuing evolution respects, and would very much welcome feedback through
of GIS. our website (www.wiley.com/go/longley).
■ The very nature of GIS as an underpinning technology
in huge numbers of applications, spanning different
fields of human endeavor, ensures that learning has to
be tailored to individual or small-group needs. These Conventions and organization
are addressed in the Instructor Manual to the book
(www.wiley.com/go/longley).
■ We have recognized that GIS is driven by real-world We use the acronym GIS in many ways in the book, partly
applications and real people, that respond to to emphasize one of our goals, the interplay between geo-
real-world needs. Hence, information on a range of graphic information systems and geographic information
applications and GIS champions is threaded science; and at times we use two other possible interpreta-
throughout the text. tions of the three-letter acronym: geographic information
xiv PREFACE
studies and geographic information services. We distin- Sheppard, Karen Siderelis, David Simonett, Roger Tom-
guish between the various meanings where appropriate, linson, Carol Tullo, Dave Unwin, Sally Wilkinson, David
or where the context fails to make the meaning clear, Willey, Jo Wood, Mike Worboys.
especially in Section 1.6 and in the Epilog. We also use Many of those listed above also helped us in our
the acronym in both singular and plural senses, follow- work on the second edition. But this time around we
ing what is now standard practice in the field, to refer as additionally acknowledge the support of: Tessa Anderson,
appropriate to a single geographic information system or David Ashby, Richard Bailey, Brad Baker, Bob Barr,
to geographic information systems in general. To compli- Elena Besussi, Dick Birnie, John Calkins, Christian
cate matters still further, we have noted the increasing use Castle, David Chapman, Nancy Chin, Greg Cho, Randy
of ‘geospatial’ rather than ‘geographic’. We use ‘geospa- Clast, Rita Colwell, Sonja Curtis, Jack Dangermond, Mike
tial’ where other people use it as a proper noun/title, but de Smith, Steve Evans, Andy Finch, Amy Garcia, Hank
elsewhere use the more elegant and readily intelligible Gerie, Muki Haklay, Francis Harvey, Denise Lievesley,
‘geographic’. Daryl Lloyd, Joe Lobley, Ian Masser, David Miller,
We have organized the book in five major but inter- Russell Morris, Doug Nebert, Hugh Neffendorf, Justin
locking sections: after two chapters that establish the foun- Norry, Geof Offen, Larry Orman, Henk Ottens, Jonathan
dations to GI Systems and Science and the real world of Rhind, Doug Richardson, Dawn Robbins, Peter Schaub,
applications, the sections appear as Principles (Chapters 3 Sorin Scortan, Duncan Shiell, Alex Singleton, Aidan
through 6), Techniques (Chapters 7 through 11), Analysis Slingsby, Sarah Smith, Kevin Schürer, Josef Strobl, Larry
(12 through 16) and Management and Policy (Chapters 17 Sugarbaker, Fraser Taylor, Bethan Thomas, Carolina
through 20). We cap the book off with an Epilog that Tobón, Paul Torrens, Nancy Tosta, Tom Veldkamp, Peter
summarizes the main topics and looks to the future. The Verburg, and Richard Webber. Special thanks are also
boundaries between these sections are in practice perme- due to Lyn Roberts and Keily Larkins at John Wiley
able, but remain in large part predicated upon providing and Sons for successfully guiding the project to fruition.
a systematic treatment of enduring principles – ideas that Paul Longley’s contribution to the book was carried out
will be around long after today’s technology has been under ESRC AIM Fellowship RES-331-25-0001, and he
relegated to the museum – and the knowledge that is nec- also acknowledges the guiding contribution of the CETL
essary for an understanding of today’s technology, and Center for Spatial Literacy in Teaching (Splint).
likely near-term developments. In a similar way, we illus- Each of us remains indebted in different ways to Stan
trate how many of the analytic methods have had reincar- Openshaw, for his insight, his energy, his commitment to
nations through different manual and computer technolo- GIS, and his compassion for geography.
gies in the past, and will doubtless metamorphose further Finally, thanks go to our families, especially Amanda,
in the future. Fiona, Heather, and Christine.
We hope you find the book stimulating and helpful.
Please tell us – either way!

Acknowledgments Paul Longley, University College London


Michael Goodchild, University of California
Santa Barbara
We take complete responsibility for all the material
contained herein. But much of it draws upon contributions David Maguire, ESRI Inc., Redlands CA
made by friends and colleagues from across the world, David Rhind, City University, London
many of them outside the academic GIS community. We
thank them all for those contributions and the discussions October 2004
we have had over the years. We cannot mention all
of them but would particularly like to mention the
following. Further reading
We thanked the following for their direct and indirect
inputs to the first edition of this book: Mike Batty, Clint Maguire D.J., Goodchild M.F., and Rhind D.W. (eds)
Brown, Nick Chrisman, Keith Clarke, Andy Coote, Martin 1991 Geographical Information Systems. Harlow:
Dodge, Danny Dorling, Jason Dykes, Max Egenhofer, Pip Longman.
Forer, Andrew Frank, Rob Garber, Gayle Gaynor, Peter Longley P.A., Goodchild M.F., Maguire D.W., and Rhind
Haggett, Jim Harper, Rich Harris, Les Hepple, Sophie D.W. (eds) 1999 Geographical Information Systems:
Hobbs, Andy Hudson-Smith, Karen Kemp, Chuck Kill- Principles, Techniques, Management and Applications
pack, Robert Laurini, Vanessa Lawrence, John Leonard, (two volumes). New York, NJ: Wiley.
Bob Maher, Nick Mann, David Mark, David Martin, Longley P.A., Goodchild M.F., Maguire D.W., and Rhind
Elanor McBay, Ian McHarg, Scott Morehouse, Lou Page, D.W. (eds) 2005 Geographical Information Systems:
Peter Paisley, Cath Pyke, Jonathan Raper, Helen Ridg- Principles, Techniques, Management and Applications
way, Jan Rigby, Christopher Roper, Garry Scanlan, Sarah (abridged edition). Hoboken, NJ: Wiley.
List of Acronyms and Abbreviations

AA Automobile Association DLM digital landscape model


ABM agent-based model DML data manipulation language
AGI Association for Geographic Information DRG digital raster graphic
AGILE Association of Geographic Information Laborato- DST Department of Science and Technology
ries in Europe DXF drawing exchange format
AHP Analytical Hierarchy Process EBIS ESRI Business Information Solutions
AM automated mapping EC European Commission
AML Arc Macro Language ECU Experimental Cartography Unit
API application programming interface EDA exploratory data analysis
ARPANET Advanced Research Projects Agency Network EOSDIS Earth Observing System Data and Information
ASCII American Standard Code for Information System
Interchange EPA Environmental Protection Agency
ASP Active Server Pages EPS encapsulated postscript
AVIRIS Airborne Visible InfraRed Imaging Spectrometer ERDAS Earth Resource Data Analysis System
BBC British Broadcasting Corporation ERP Enterprise Resource Planning
BLM Bureau of Land Management ERTS Earth Resources Technology Satellite
BLOB binary large object ESDA exploratory spatial data analysis
CAD Computer-Aided Design ESRI Environmental Systems Research Institute
CAMA Computer Assisted Mass Appraisal EU European Union
CAP Common Agricultural Policy EUROGI European Umbrella Organisation for
CASA Centre for Advanced Spatial Analysis Geographic Information
CASE computer-aided software engineering FAO Food and Agriculture Organization
CBD central business district FEMA Federal Emergency Management Agency
CD compact disc FGDC Federal Geographic Data Committee
CEN Comité Européen de Normalisation FIPS Federal Information Processing Standard
CERN Conseil Européen pour la Recherche Nucléaire FM facility management
CGIS Canada Geographic Information System FOIA Freedom of Information Act
CGS Czech Geological Survey FSA Forward Sortation Area
CIA Central Intelligence Agency GAO General Accounting Office
CLI Canada Land Inventory GBF-DIME Geographic Base Files – Dual Independent
CLM collection-level metadata Map Encoding
COGO coordinate geometry GDI GIS data industry
COM component object model GIO Geographic Information Officer
COTS commercial off-the-shelf GIS geographic(al) information system
CPD continuing professional development GIScience geographic(al) information science
CSDGM Content Standards for Digital Geospatial GML Geography Markup Language
Metadata GNIS Geographic Names Information System
CSDMS Centre for Spatial Database Management and GOS geospatial one-stop
Solutions GPS Global Positioning System
CSO color separation overlay GRASS Geographic Resources Analysis Support System
CTA Chicago Transit Authority GSDI global spatial data infrastructure
DARPA Defense Advanced Research Projects Agency GUI graphical user interface
DBA database administrator GWR geographically weighted regression
DBMS database management system HLS hue, lightness, and saturation
DCL data control language HTML hypertext markup language
DCM digital cartographic model HTTP hypertext transmission protocol
DCW Digital Chart of the World ICMA International City/County Management
DDL data definition language Association
DEM digital elevation model ICT Information and Communication Technology
DGPS Differential Global Positioning System ID identifier
DHS Department of Homeland Security IDE Integrated Development Environment
DIME Dual Independent Map Encoding IDW inverse-distance weighting
DLG digital line graph IGN Institut Géographique National

xv
xvi LIST OF ACRONYMS AND ABBREVIATIO NS
IMW International Map of the World PASS Planning Assistant for Superintendent Scheduling
INSPIRE Infrastructure for Spatial Information in Europe PCC percent correctly classified
IP Internet protocol PCGIAP Permanent Committee on GIS Infrastructure for
IPR intellectual property rights Asia and the Pacific
IS information system PDA personal digital assistant
ISCGM International Steering Committee for Global PE photogrammetric engineering
Mapping PERT Program, Evaluation, and Review Techniques
ISO International Standards Organization PLSS Public Land Survey System
IT information technology PPGIS public participation in GIS
ITC International Training Centre for Aerial Survey RDBMS relational database management system
ITS intelligent transportation systems RFI Request for Information
JSP Java Server Pages RFP Request for Proposals
KE knowledge economy RGB red-green-blue
KRIHS Korea Research Institute for Human Settlements RMSE root mean square error
KSUCTA Kyrgyz State University of Construction, Trans- ROMANSE Road Management System for Europe
portation and Architecture RRL Regional Research Laboratory
LAN local area network RS remote sensing
LBS location-based services SAP spatially aware professional
LiDAR light detection and ranging SARS severe acute respiratory syndrome
LISA local indicators of spatial association SDE Spatial Database Engine
LMIS Land Management Information System SDI spatial data infrastructure
MAT point of minimum aggregate travel SDSS spatial decision support systems
MAUP Modifiable Areal Unit Problem SETI Search for Extraterrestrial Intelligence
MBR minimum bounding rectangle SIG Special Interest Group
MCDM multicriteria decision making SOHO small office/home office
MGI Masters in Geographic Information SPC State Plane Coordinates
MIT Massachusetts Institute of Technology SPOT Système Probatoire d’Observation de la Terre
MOCT Ministry of Construction and Transportation SQL Structured/Standard Query Language
MrSID Multiresolution Seamless Image Database SWMM Storm Water Management Model
MSC Mapping Science Committee SWOT strengths, weaknesses, opportunities, threats
NASA National Aeronautics and Space Administration TC technical committee
NATO North Atlantic Treaty Organization TIGER Topologically Integrated Geographic Encoding
NAVTEQ Navigation Technologies and Referencing
NCGIA National Center for Geographic Information and TIN triangulated irregular network
Analysis TINA there is no alternative
NGA National Geospatial-Intelligence Agency TNM The National Map
NGIS National GIS TOID Topographic Identifier
NILS National Integrated Land System TSP traveling-salesman problem
NIMA National Imagery and Mapping Agency TTIC Traffic and Travel Information Centre
NIMBY not in my back yard UCAS Universities Central Admissions Service
NMO national mapping organization UCGIS University Consortium for Geographic Informa-
NMP National Mapping Program tion Science
NOAA National Oceanic and Atmospheric UCSB University of California, Santa Barbara
Administration UDDI Universal Description, Discovery, and Integration
NPR National Performance Review UDP Urban Data Processing
NRC National Research Council UKDA United Kingdom Data Archive
NSDI National Spatial Data Infrastructure UML Unified Modeling Language
NSF National Science Foundation UN United Nations
OCR optical character recognition UNIGIS UNIversity GIS Consortium
ODBMS object database management system UPS Universal Polar Stereographic
OEM Office of Emergency Management URISA Urban and Regional Information Systems
OGC Open Geospatial Consortium Association
OLM object-level metadata USGS United States Geological Survey
OLS ordinary least squares USLE Universal Soil Loss Equation
OMB Office of Management and Budget UTC urban traffic control
ONC Operational Navigation Chart UTM Universal Transverse Mercator
ORDBMS object-relational database management VBA Visual Basic for Applications
system VfM value for money
PAF postcode address file VGA video graphics array
LIST OF ACRONYMS AND ABBREVIATIO NS xvii
ViSC visualization in scientific computing WTC World Trade Center
VPF vector product format WTO World Trade Organization
WAN wide area network WWF World Wide Fund for Nature
WIMP windows, icons, menus, and pointers WWW World Wide Web
WIPO World Intellectual Property Organization WYSIWYG what you see is what you get
WSDL Web Services Definition Language XML extensible markup language
I

Introduction
1 Systems, science, and study
2 A gallery of applications
1 Systems, science, and study

This chapter introduces the conceptual framework for the book, by


addressing several major questions:

■ What exactly is geographic information, and why is it important? What is


special about it?
■ What is information generally, and how does it relate to data,
knowledge, evidence, wisdom, and understanding?
■ What kinds of decisions make use of geographic information?
■ What is a geographic information system, and how would I know one if I
saw one?
■ What is geographic information science, and how does it relate to the use
of GIS for scientific purposes?
■ How do scientists use GIS, and why do they find it helpful?
■ How do companies make money from GIS?

Geographic Information Systems and Science, 2nd edition Paul Longley, Michael Goodchild, David Maguire, and David Rhind.
 2005 John Wiley & Sons, Ltd. ISBNs: 0-470-87000-1 (HB); 0-470-87001-X (PB)
4 PART I INTRODUCTIO N

Learning Objectives Because location is so important, it is an issue in many


of the problems society must solve. Some of these are
so routine that we almost fail to notice them – the daily
question of which route to take to and from work, for
At the end of this chapter you will:
example. Others are quite extraordinary occurrences, and
require rapid, concerted, and coordinated responses by a
■ Know definitions of the terms used wide range of individuals and organizations – such as the
throughout the book, including GIS itself; events of September 11 2001 in New York (Box 1.1).
Problems that involve an aspect of location, either in
the information used to solve them, or in the solutions
■ Be familiar with a brief history of GIS; themselves, are termed geographic problems. Here are
some more examples:
■ Recognize the sometimes invisible roles of
■ Health care managers solve geographic problems (and
GIS in everyday life, and the roles of GIS may create others) when they decide where to locate
in business; new clinics and hospitals.
■ Delivery companies solve geographic problems when
■ Understand the significance of geographic they decide the routes and schedules of their vehicles,
often on a daily basis.
information science, and how it relates to
■ Transportation authorities solve geographic problems
geographic information systems; when they select routes for new highways.
■ Geodemographics consultants solve geographic
■ Understand the many impacts GIS is having problems when they assess and recommend where
on society, and the need to study best to site retail outlets.
those impacts. ■ Forestry companies solve geographic problems when
they determine how best to manage forests, where to
cut, where to locate roads, and where to plant
new trees.
■ National Park authorities solve geographic problems
when they schedule recreational path maintenance and
1.1 Introduction: why does improvement (Figure 1.3).
GIS matter? ■ Governments solve geographic problems when they
decide how to allocate funds for building sea defenses.
■ Travelers and tourists solve geographic problems
Almost everything that happens, happens somewhere. when they give and receive driving directions, select
Largely, we humans are confined in our activities to the hotels in unfamiliar cities, and find their way around
surface and near-surface of the Earth. We travel over it theme parks (Figure 1.4).
and in the lower levels of the atmosphere, and through
tunnels dug just below the surface. We dig ditches and ■ Farmers solve geographic problems when they employ
bury pipelines and cables, construct mines to get at new information technology to make better decisions
mineral deposits, and drill wells to access oil and gas. about the amounts of fertilizer and pesticide to apply
Keeping track of all of this activity is important, and to different parts of their fields.
knowing where it occurs can be the most convenient If so many problems are geographic, what distin-
basis for tracking. Knowing where something happens is guishes them from each other? Here are three bases
of critical importance if we want to go there ourselves for classifying geographic problems. First, there is the
or send someone there, to find other information about question of scale, or level of geographic detail. The archi-
the same place, or to inform people who live nearby. tectural design of a building can present geographic prob-
In addition, most (perhaps all) decisions have geographic lems, as in disaster management (Box 1.1), but only at
consequences, e.g., adopting a particular funding formula a very detailed or local scale. The information needed
creates geographic winners and losers, especially when to service the building is also local – the size and shape
the process entails zero sum gains. Therefore geographic of the parcel, the vertical and subterranean extent of the
location is an important attribute of activities, policies, building, the slope of the land, and its accessibility using
strategies, and plans. Geographic information systems are normal and emergency infrastructure. The global diffusion
a special class of information systems that keep track not of the 2003 severe acute respiratory syndrome (SARS)
only of events, activities, and things, but also of where epidemic, or of bird flu in 2004 were problems at a much
these events, activities, and things happen or exist. broader and coarser scale, involving information about
entire national populations and global transport patterns.
Almost everything that happens, happens
somewhere. Knowing where something happens Scale or level of geographic detail is an essential
can be critically important. property of any GIS project.
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 5
Second, geographic problems can be distinguished on in a government agency to ensure the protection of an
the basis of intent, or purpose. Some problems are strictly endangered species are essentially the same as the tools
practical in nature – they must often be solved as quickly used by an academic ecologist to advance our scientific
as possible and/or at minimum cost, in order to achieve knowledge of biological systems. Both use the most
such practical objectives as saving money, avoiding fines accurate measurement devices, use terms whose meanings
by regulators, or coping with an emergency. Others have been widely shared and agreed, insist that their
are better characterized as driven by human curiosity. results be replicable by others, and in general follow all
When geographic data are used to verify the theory of the principles of science that have evolved over the
of continental drift, or to map distributions of glacial past centuries.
deposits, or to analyze the historic movements of people The use of GIS for both forms of activity certainly
in anthropological or archaeological research (Box 1.2 reinforces this idea that science and practical problem
and Figure 1.5), there is no sense of an immediate solving are no longer distinct in their methods, as
problem that needs to be solved – rather, the intent is the does the fact that GIS is used widely in all kinds of
advancement of human understanding of the world, which organizations, from academic institutions to government
we often recognize as the intent of science. agencies and corporations. The use of similar tools and
Although science and practical problem solving are methods right across science and problem solving is
often seen as distinct human activities, it is often argued part of a shift from the pursuit of curiosity within
that there is no longer any effective distinction between traditional academic disciplines to solution centered,
their methods. The tools and methods used by a scientist interdisciplinary team work.

Applications Box 1.1

September 11 2001
Almost everyone remembers where they were was crucial in the immediate aftermath and
when they learned of the terrorist atrocities the emergency response, and the attacks had
in New York on September 11 2001. Location locational repercussions at a range of spatial

Original
OEM in
WTC
Complex
Bldg 7

Figure 1.1 GIS in the Office of Emergency Management (OEM), first set up in the World Trade Center (WTC) complex
immediately following the 2001 terrorist attacks on New York (Courtesy ESRI)


6 PART I INTRODUCTIO N


(geographic) and temporal (short, medium, and in the medium term they blocked part of the
long time periods) scales. In the short term, the New York subway system (that ran underneath
incidents triggered local emergency evacuation the Twin Towers), profoundly changed regional
and disaster recovery procedures and global work patterns (as affected workers became
shocks to the financial system through the telecommuters) and had calamitous effects
suspension of the New York Stock Exchange; on the local retail economy; and in the

(A)

(B)

Figure 1.2 GIS usage in emergency management following the 2001 terrorist attacks on New York: (A) subway, pedestrian
and vehicular traffic restrictions; (B) telephone outages; and (C) surface dust monitoring three days after the disaster.
(Courtesy ESRI)
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 7

(C)

Figure 1.2 (continued)

long term, they have profoundly changed the New York in the immediate aftermath of the
way that we think of emergency response attacks. But the events also have much wider
in our heavily networked society. Figures 1.1 implications for the handling and management
and 1.2 depict some of the ways in which of geographic information, that we return to in
GIS was used for emergency management in Chapter 20.

At some points in this book it will be useful to operational, and are required for the smooth functioning
distinguish between applications of GIS that focus on of an organization, such as how to control electricity
design, or so-called normative uses, and applications inputs into grids that experience daily surges and troughs
that advance science, or so-called positive uses (a rather in usage (see Section 10.6). Others are tactical, and
confusing meaning of that term, unfortunately, but the concerned with medium-term decisions, such as where
one commonly used by philosophers of science – its use to cut trees in next year’s forest harvesting plan. Others
implies that science confirms theories by finding positive are strategic, and are required to give an organization
evidence in support of them, and rejects theories when long-term direction, as when retailers decide to expand
negative evidence is found). Finding new locations for or rationalize their store networks (Figure 1.7). These
retailers is an example of a normative application of GIS, terms are explored in the context of logistics applications
with its focus on design. But in order to predict how of GIS in Section 2.3.4.6. The real world is somewhat
consumers will respond to new locations it is necessary more complex than this, of course, and these distinctions
for retailers to analyze and model the actual patterns of may blur – what is theoretically and statistically the 1000-
behavior they exhibit. Therefore, the models they use will year flood influences strategic and tactical considerations
be grounded in observations of messy reality that have but may possibly arrive a year after the previous one!
been tested in a positive manner. Other problems that interest geophysicists, geologists,
or evolutionary biologists may occur on time scales
With a single collection of tools, GIS is able to that are much longer than a human lifetime, but are
bridge the gap between curiosity-driven science still geographic in nature, such as predictions about the
and practical problem-solving. future physical environment of Japan, or about the animal
populations of Africa. Geographic databases are often
Third, geographic problems can be distinguished
transactional (see Sections 10.2.1 and 10.9.1), meaning
on the basis of their time scale. Some decisions are
8 PART I INTRODUCTIO N

Figure 1.4 Navigating tourist destinations is a geographic


problem

with the same meaning as geographic. But many of the


methods used in GIS are also applicable to other non-
geographic spaces, including the surfaces of other planets,
the space of the cosmos, and the space of the human body
that is captured by medical images. GIS techniques have
even been applied to the analysis of genome sequences
on DNA. So the discussion of analysis in this book is
of spatial analysis (Chapters 14 and 15), not geographic
Figure 1.3 Maintaining and improving footpaths in National analysis, to emphasize this versatility.
Parks is a geographic problem Another term that has been growing in usage in recent
years is geospatial – implying a subset of spatial applied
specifically to the Earth’s surface and near-surface. The
that they are constantly being updated as new information former National Intelligence and Mapping Agency was
arrives, unlike maps, which stay the same once printed. renamed as the National Geospatial-Intelligence Agency
Chapter 2 contains a more detailed discussion of the in late 2003 by the US President and the Web portal for
range and remits of GIS applications, and a view of US Federal Government data is called Geospatial One-
how GIS pervades many aspects of our daily lives. Stop. In this book we have tended to avoid geospatial,
Other applications are discussed to illustrate particular preferring geographic, and spatial where we need to
principles, techniques, analytic methods, and management emphasize generality (see Section 21.2.2).
practices as these arise throughout the book. People who encounter GIS for the first time are some-
times driven to ask why geography is so important – why
is spatial special? After all, there is plenty of informa-
1.1.1 Spatial is special tion around about geriatrics, for example, and in prin-
ciple one could create a geriatric information system.
The adjective geographic refers to the Earth’s surface and So why has geographic information spawned an entire
near-surface, and defines the subject matter of this book, industry, if geriatric information hasn’t to anything like
but other terms have similar meaning. Spatial refers to the same extent? Why are there no courses in universi-
any space, not only the space of the Earth’s surface, ties specifically in geriatric information systems? Part of
and it is used frequently in the book, almost always the answer should be clear already – almost all human

Applications Box 1.2

Where did your ancestors come from?


As individuals, many of us are interested types of origins – many of which are explic-
in where we came from – socially and geo- itly or implicitly geographic in origin (such
graphically. Some of the best clues to clues are less important in some Eastern
our ancestry come from our (family) sur- societies where family histories are gener-
names, and Western surnames have different ally much better documented). Research at
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 9

University College London is using GIS This tells us quite a lot about migration,
and historic censuses and records to inves- changes in local and regional economies,
tigate the changing local and regional and even about measures of local eco-
geographies of surnames within the UK nomic health and vitality. Similar GIS-based
since the late 19th century (Figure 1.5). analysis can be used to generalize about

(A)

Longley Goodchild
Surname Index 151– 200 501–1000
0–100 201–250 1001–1500
101–150 251– 500 1501– 2000
Kilometres
0 50 100 200 300 400

Maguire Rhind
Source: 1881 Census of Population

Figure 1.5 The UK geography of the Longleys, the Goodchilds, the Maguires, and the Rhinds in (A) 1881 and (B) 1998
(Reproduced with permission of Daryl Lloyd)


10 PART I INTRODUCTION


the characteristics of international emigrants research: it is interesting to individuals to
(for example to North America, Australia, understand more about their origins, and it is
and New Zealand: Figure 1.6), or the regional interesting to everyone with planning or policy
naming patterns of immigrants to the US from concerns with any particular place to understand
the Indian sub-continent or China. In all kinds the social and cultural mix of people that live
of senses, this helps us understand our place in there. But it is not central to resolving any
the world. Fundamentally, this is curiosity-driven specific problem within a specific timescale.

(B)

Longley Goodchild
Surname Index 151– 200 501–1000
0–100 201– 250 1001–1500
101–150 251– 500 1501– 2000

Kilometres
0 50 100 200 300 400

Maguire Rhind
Source: 1998 Electoral Register

Figure 1.5 (continued)


CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 11

Darwin
(NT)

Brisbane
(QLD)

Canberra
Perth Adelaide (ACT)
(WA) (SA)

Surname index based on


GB 1881 regions State population Sydney
North 47 612 – 150 000 (NSW)
24
Midlands 150 001 – 650 000
Scotland SE 650 001 – 1 500 000
Melbourne
(VI)
Wales SW 1 500 001 – 2 615 975
Other
Kilometres
0 250 500 1 000 1 500 2 000 Hobart
(TAS)

Figure 1.6 The geography of British emigrants to Australia (bars beneath the horizontal line indicate low numbers of
migrants to the corresponding destination) (Reproduced with permission of Daryl Lloyd)

1.2 Data, information, evidence,


knowledge, wisdom

Information systems help us to manage what we know,


by making it easy to organize and store, access and
retrieve, manipulate and synthesize, and apply knowledge
to the solution of problems. We use a variety of terms
to describe what we know, including the five that head
this section and that are shown in Table 1.2. There are
no universally agreed definitions of these terms, the first
two of which are used frequently in the GIS arena.
Nevertheless it is worth trying to come to grips with their
Figure 1.7 Store location principles are very important in the various meanings, because the differences between them
developing markets of Europe, as with Tesco’s successful can often be significant, and what follows draws upon
investment in Budapest, Hungary many sources, and thus provides the basis for the use of
these terms throughout the book. Data clearly refers to
the most mundane kind of information, and wisdom to
the most substantive.
activities and decisions involve a geographic component, Data consist of numbers, text, or symbols which
and the geographic component is important. Another rea- are in some sense neutral and almost context-free. Raw
son will become apparent in Chapter 3 – working with geographic facts (see Box 18.7), such as the temperature
geographic information involves complex and difficult at a specific time and location, are examples of data. When
choices that are also largely unique. Other, more-technical data are transmitted, they are treated as a stream of bits;
reasons will become clear in later chapters, and are briefly a crucial requirement is to preserve the integrity of the
summarized in Box 1.3. dataset. The internal meaning of the data is irrelevant in
12 PART I INTRODUCTION

Technical Box 1.3

Some technical reasons why geographic information is special


can strongly influence the ease of analysis
■ It is multidimensional, because two
and the end results.
coordinates must be specified to define a
location, whether they be x and y or latitude ■ It must often be projected onto a flat surface,
and longitude. for reasons identified in Section 5.7.
■ It is voluminous, since a geographic database ■ It requires many special methods for its
can easily reach a terabyte in size (see analysis (see Chapters 14 and 15).
Table 1.1). ■ It can be time-consuming to analyze.
■ It may be represented at different levels of ■ Although much geographic information is
spatial resolution, e.g., using a representation static, the process of updating is complex
equivalent to a 1:1 million scale map and a and expensive.
1:24 000 scale one (see Box 4.2). ■ Display of geographic information in the
■ It may be represented in different ways inside form of a map requires the retrieval of large
a computer (Chapter 3) and how this is done amounts of data.

such considerations. Data (the noun is the plural of datum) Knowledge does not arise simply from having access
are assembled together in a database (see Chapter 10), to large amounts of information. It can be considered
and the volumes of data that are required for some typical as information to which value has been added by
applications are shown in Table 1.1. interpretation based on a particular context, experience,
The term information can be used either narrowly or and purpose. Put simply, the information available in a
broadly. In a narrow sense, information can be treated book or on the Internet or on a map becomes knowledge
as devoid of meaning, and therefore as essentially syn- only when it has been read and understood. How the
onymous with data, as defined in the previous paragraph. information is interpreted and used will be different for
Others define information as anything which can be dig- different readers depending on their previous experience,
itized, that is, represented in digital form (Chapter 3), expertise, and needs. It is important to distinguish two
but also argue that information is differentiated from data types of knowledge: codified and tacit. Knowledge is
by implying some degree of selection, organization, and codifiable if it can be written down and transferred
preparation for particular purposes – information is data relatively easily to others. Tacit knowledge is often slow
serving some purpose, or data that have been given some to acquire and much more difficult to transfer. Examples
degree of interpretation. Information is often costly to include the knowledge built up during an apprenticeship,
produce, but once digitized it is cheap to reproduce and understanding of how a particular market works, or
distribute. Geographic datasets, for example, may be very familiarity with using a particular technology or language.
expensive to collect and assemble, but very cheap to copy This difference in transferability means that codified and
and disseminate. One other characteristic of information tacit knowledge need to be managed and rewarded quite
is that it is easy to add value to it through processing, differently. Because of its nature, tacit knowledge is often
and through merger with other information. GIS provides a source of competitive advantage.
an excellent example of the latter, because of the tools it Some have argued that knowledge and information
provides for combining information from different sources are fundamentally different in at least three impor-
(Section 18.3). tant respects:

GIS does a better job of sharing data and ■ Knowledge entails a knower. Information exists
information than knowledge, which is more independently, but knowledge is intimately related
difficult to detach from the knower. to people.

Table 1.1 Potential GIS database volumes for some typical applications (volumes estimated to the nearest order of
magnitude). Strictly, bytes are counted in powers of 2 – 1 kilobyte is 1024 bytes, not 1000

1 megabyte 1 000 000 Single dataset in a small project database


1 gigabyte 1 000 000 000 Entire street network of a large city or small country
1 terabyte 1 000 000 000 000 Elevation of entire Earth surface recorded at 30 m intervals
1 petabyte 1 000 000 000 000 000 Satellite image of entire Earth surface at 1 m resolution
1 exabyte 1 000 000 000 000 000 000 A future 3-D representation of entire Earth at 10 m resolution?
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 13
Table 1.2 A ranking of the support infrastructure for decision making

Decision-making support Ease of sharing with GIS example


infrastructure everyone

Wisdom Impossible Policies developed and accepted by


↑ stakeholders

Knowledge Difficult, especially tacit knowledge Personal knowledge about places and
↑ issues

Evidence Often not easy Results of GIS analysis of many


↑ datasets or scenarios

Information Easy Contents of a database assembled


↑ from raw facts

Data Easy Raw geographic facts

■ Knowledge is harder to detach from the knower than human in origin, reflecting the increasing influence that
information; shipping, receiving, transferring it we have on our natural environment, through the burning
between people, or quantifying it are all much more of fossil fuels, the felling of forests, and the cultivation of
difficult than for information. crops (Figure 1.8). Others are imposed by us, in the form
■ Knowledge requires much more assimilation – we of laws, regulations, and practices. For example, zoning
digest it rather than hold it. While we may hold regulations affect the ways in which specific parcels of
conflicting information, we rarely hold land can be used.
conflicting knowledge.
Knowledge about how the world works is more
Evidence is considered a half way house between valuable than knowledge about how it looks,
information and knowledge. It seems best to regard it as a because such knowledge can be used to predict.
multiplicity of information from different sources, related
to specific problems and with a consistency that has been These two types of information differ markedly in
validated. Major attempts have been made in medicine to their degree of generality. Form varies geographically,
extract evidence from a welter of sometimes contradictory and the Earth’s surface looks dramatically different
sets of information, drawn from worldwide sources, in in different places – compare the settled landscape of
what is known as meta-analysis, or the comparative northern England with the deserts of the US Southwest
analysis of the results of many previous studies. (Figure 1.9). But processes can be very general. The
Wisdom is even more elusive to define than the other ways in which the burning of fossil fuels affects the
terms. Normally, it is used in the context of decisions atmosphere are essentially the same in China as in
made or advice given which is disinterested, based on Europe, although the two landscapes look very different.
all the evidence and knowledge available, but given with Science has always valued such general knowledge over
some understanding of the likely consequences. Almost knowledge of the specific, and hence has valued process
invariably, it is highly individualized rather than being knowledge over knowledge of form. Geographers in
easy to create and share within a group. Wisdom is in particular have witnessed a longstanding debate, lasting
a sense the top level of a hierarchy of decision-making
infrastructure.

1.3 The science of problem solving

How are problems solved, and are geographic problems


solved any differently from other kinds of problems? We
humans have accumulated a vast storehouse about the
world, including information both on how it looks, or
its forms, and how it works, or its dynamic processes.
Some of those processes are natural and built into the
design of the planet, such as the processes of tectonic
movement that lead to earthquakes, and the processes of Figure 1.8 Social processes, such as carbon dioxide
atmospheric circulation that lead to hurricanes. Others are emissions, modify the Earth’s environment
14 PART I INTRODUCTION

Figure 1.9 The form of the Earth’s surface shows enormous variability, for example, between the deserts of the southwest USA and
the settled landscape of northern England

centuries, between the competing needs of idiographic


geography, which focuses on the description of form
and emphasizes the unique characteristics of places,
and nomothetic geography, which seeks to discover
general processes. Both are essential, of course, since
knowledge of general process is only useful in solving
specific problems if it can be combined effectively with
knowledge of form. For example, we can only assess the
impact of soil erosion on agriculture in New South Wales
if we know both how soil erosion is generally impacted
by such factors as slope and specifically how much of
New South Wales has steep slopes, and where they are
located (Figure 1.10).
One of the most important merits of GIS as a tool
for problem solving lies in its ability to combine the
general with the specific, as in this example from New
South Wales. A GIS designed to solve this problem would
contain knowledge of New South Wales’s slopes, in the
form of computerized maps, and the programs executed
by the GIS would reflect general knowledge of how
slopes affect soil erosion. The software of a GIS captures
and implements general knowledge, while the database
of a GIS represents specific information. In that sense
a GIS resolves the old debate between nomothetic and
idiographic camps, by accommodating both.
GIS solves the ancient problem of combining
general scientific knowledge with specific
information, and gives practical value to both.
General knowledge comes in many forms. Classifica-
tion is perhaps the simplest and most rudimentary, and is
widely used in geographic problem solving. In many parts Figure 1.10 Predicting landslides requires general knowledge
of processes and specific knowledge of the area – both are
of the USA and other countries efforts have been made to
available in a GIS (Reproduced with permission of PhotoDisc,
limit development of wetlands, in the interests of preserv-
Inc.)
ing them as natural habitats and avoiding excessive impact
on water resources. To support these efforts, resources
have been invested in mapping wetlands, largely from More sophisticated forms of knowledge include rule
aerial photography and satellite imagery. These maps sim- sets – for example, rules that determine what use can
ply classify land, using established rules that define what be made of wetlands, or what areas in a forest can be
is and what is not a wetland (Figure 1.11). legally logged. Rules are used by the US Forest Service
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 15

Figure 1.11 A wetland map of part of Erie County, Ohio, USA. The map has been made by classifying Landsat imagery at 30 m
resolution. Brown = woods on hydric soil, dark blue = open water (excludes Lake Erie), green = shallow marsh, light blue =
shrub/scrub wetland, blue-green = wet meadow, pink = farmed wetland. Source: Ohio Department of Natural Resources,
www.dnr.state.oh.us

to define wilderness, and to impose associated regulations Solving problems involves several distinct components
regarding the use of wilderness, including prohibition on and stages. First, there must be an objective, or a goal
logging and road construction. that the problem solver wishes to achieve. Often this is
Much of the knowledge gathered by the activities of a desire to maximize or minimize – find the solution of
scientists suggests the term law. The work of Sir Isaac least cost, or shortest distance, or least time, or greatest
Newton established the Laws of Motion, according to profit; or to make the most accurate prediction possible.
which all matter behaves in ways that can be perfectly These objectives are all expressed in tangible form, that
predicted. From Newton’s laws we are able to predict is, they can be measured on some well-defined scale.
the motions of the planets almost perfectly, although Others are said to be intangible, and involve objectives
Einstein later showed that certain observed deviations that are much harder, if not impossible to measure. They
from the predictions of the laws could be explained with include maximizing quality of life and satisfaction, and
his Theory of Relativity. Laws of this level of predictive minimizing environmental impact. Sometimes the only
quality are few and far between in the geographic way to work with such intangible objectives is to involve
world of the Earth’s surface. The real world is the
human subjects, through surveys or focus groups, by
only geographic-scale ‘laboratory’ that is available for
asking them to express a preference among alternatives.
most GIS applications, and considerable uncertainty is
A large body of knowledge has been acquired about
generated when we are unable to control for all conditions.
such human-subjects research, and much of it has been
These problems are compounded in the socioeconomic
realm, where the role of human agency makes it almost employed in connection with GIS. For an example of the
inevitable that any attempt to develop rigid laws will use of such mixed objectives see Section 16.4.
be frustrated by isolated exceptions. Thus, while market Often a problem will have multiple objectives. For
researchers use spatial interaction models, in conjunction example, a company providing a mobile snack service
with GIS, to predict how many people will shop at each to construction sites will want to maximize the number
shopping center in a city, substantial errors will occur of sites that can be visited during a daily operating
in the predictions. Nevertheless the results are of great schedule, and will also want to maximize the expected
value in developing location strategies for retailing. The returns by visiting the most lucrative sites. An agency
Universal Soil Loss Equation, used by soil scientists in charged with locating a corridor for a new power
conjunction with GIS to predict soil erosion, is similar in transmission line may decide to minimize cost, while at
its relatively low predictive power, but again the results the same time seeking to minimize environmental impact.
are sufficiently accurate to be very useful in the right Such problems employ methods known as multicriteria
circumstances. decision making (MCDM).
16 PART I INTRODUCTION
Many geographic problems involve multiple goals Table 1.3 Definitions of a GIS, and the groups who find them
and objectives, which often cannot be expressed in useful
commensurate terms.
A container of maps in The general public
digital form
A computerized tool for Decision makers, community
solving geographic groups, planners
1.4 The technology of problem problems
A spatial decision support Management scientists,
solving system operations researchers
A mechanized inventory of Utility managers, transportation
geographically distributed officials, resource managers
The previous sections have presented GIS as a technology
features and facilities
to support both science and problem solving, using both
A tool for revealing what is Scientists, investigators
specific and general knowledge about geographic reality.
otherwise invisible in
GIS has now been around for so long that it is, in many
senses, a background technology, like word processing. geographic information
This may well be so, but what exactly is this technology A tool for performing Resource managers, planners
called GIS, and how does it achieve its objectives? In operations on geographic
what ways is GIS more than a technology, and why does data that are too tedious
it continue to attract such attention as a topic for scientific or expensive or
journals and conferences? inaccurate if performed
Many definitions of GIS have been suggested over the by hand
years, and none of them is entirely satisfactory, though
many suggest much more than a technology. Today, the
label GIS is attached to many things: amongst them,
a software product that one can buy from a vendor to operations on geographic data that are too tedious or
carry out certain well-defined functions (GIS software); expensive or inaccurate if performed by hand, a definition
digital representations of various aspects of the geographic that speaks to the problems associated with manual analy-
world, in the form of datasets (GIS data); a community sis of maps, particularly the extraction of simple measures,
of people who use and perhaps advocate the use of these of area for example.
tools for various purposes (the GIS community); and the
activity of using a GIS to solve problems or advance Everyone has their own favorite definition of a
science (doing GIS). The basic label works in all of these GIS, and there are many to choose from.
ways, and its meaning surely depends on the context in
which it is used.
Nevertheless, certain definitions are particularly help- 1.4.1 A brief history of GIS
ful (Table 1.3). As we describe in Chapter 3, GIS is much
more than a container of maps in digital form. This can As might be expected, there is some controversy about
be a misleading description, but it is a helpful definition the history of GIS since parallel developments occurred
to give to someone looking for a simple explanation – a in North America, Europe, and Australia (at least). Much
guest at a cocktail party, a relative, or a seat neighbor on of the published history focuses on the US contributions.
an airline flight. We all know and appreciate the value We therefore do not yet have a well-rounded history of
of maps, and the notion that maps could be processed our subject. What is clear, though, is that the extraction
by a computer is clearly analogous to the use of word of simple measures largely drove the development of the
processing or spreadsheets to handle other types of infor- first real GIS, the Canada Geographic Information System
mation. A GIS is also a computerized tool for solving or CGIS, in the mid-1960s (see Box 17.1). The Canada
geographic problems, a definition that speaks to the pur- Land Inventory was a massive effort by the federal
poses of GIS, rather than to its functions or physical and provincial governments to identify the nation’s land
form – an idea that is expressed in another definition, a resources and their existing and potential uses. The most
spatial decision support system. A GIS is a mechanized useful results of such an inventory are measures of area,
inventory of geographically distributed features and facil- yet area is notoriously difficult to measure accurately from
ities, the definition that explains the value of GIS to the a map (Section 14.3). CGIS was planned and developed
utility industry, where it is used to keep track of such as a measuring tool, a producer of tabular information,
entities as underground pipes, transformers, transmission rather than as a mapping tool.
lines, poles, and customer accounts. A GIS is a tool for
revealing what is otherwise invisible in geographic infor- The first GIS was the Canada Geographic
mation (see Section 2.3.4.4), an interesting definition that Information System, designed in the mid-1960s as
emphasizes the power of a GIS as an analysis engine, to a computerized map measuring system.
examine data and reveal its patterns, relationships, and
anomalies – things that might not be apparent to some- A second burst of innovation occurred in the late
one looking at a map. A GIS is a tool for performing 1960s in the US Bureau of the Census, in planning the
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 17
tools needed to conduct the 1970 Census of Population. and the Defense Mapping Agency (now the National
The DIME program (Dual Independent Map Encoding) Geospatial-Intelligence Agency) began to investigate the
created digital records of all US streets, to support use of computers to support the editing of maps, to avoid
automatic referencing and aggregation of census records. the expensive and slow process of hand correction and
The similarity of this technology to that of CGIS was redrafting. The first automated cartography developments
recognized immediately, and led to a major program at occurred in the 1960s, and by the late 1970s most major
Harvard University’s Laboratory for Computer Graphics cartographic agencies were already computerized to some
and Spatial Analysis to develop a general-purpose GIS degree. But the magnitude of the task ensured that it
that could handle the needs of both applications – a was not until 1995 that the first country (Great Britain)
project that led eventually to the ODYSSEY GIS of the achieved complete digital map coverage in a database.
late 1970s. Remote sensing also played a part in the development
of GIS, as a source of technology as well as a source
Early GIS developers recognized that the same of data. The first military satellites of the 1950s were
basic needs were present in many different developed and deployed in great secrecy to gather
application areas, from resource management to intelligence, but the declassification of much of this
the census. material in recent years has provided interesting insights
into the role played by the military and intelligence
In a largely separate development during the latter communities in the development of GIS. Although the
half of the 1960s, cartographers and mapping agencies early spy satellites used conventional film cameras to
had begun to ask whether computers might be adapted record images, digital remote sensing began to replace
to their needs, and possibly to reducing the costs them in the 1960s, and by the early 1970s civilian
and shortening the time of map creation. The UK remote sensing systems such as Landsat were beginning
Experimental Cartography Unit (ECU) pioneered high- to provide vast new data resources on the appearance
quality computer mapping in 1968; it published the of the planet’s surface from space, and to exploit
world’s first computer-made map in a regular series in the technologies of image classification and pattern
1973 with the British Geological Survey (Figure 1.12); recognition that had been developed earlier for military
the ECU also pioneered GIS work in education, post applications. The military was also responsible for the
and zip codes as geographic references, visual perception development in the 1950s of the world’s first uniform
of maps, and much else. National mapping agencies, system of measuring location, driven by the need for
such as Britain’s Ordnance Survey, France’s Institut accurate targeting of intercontinental ballistic missiles,
Géographique National, and the US Geological Survey and this development led directly to the methods of

Figure 1.12 Section of the 1:63 360 scale geological map of Abingdon – the first known example of a map produced by automated
means and published in a standard map series to established cartographic standards. (Reproduced by permission of the British
Geological Survey and Ordnance Survey  NERC. All right reserved. IPR/59-13C)
18 PART I INTRODUCTION
positional control in use today (Section 5.6). Military and intelligence applications. GIS is a dynamic and
needs were also responsible for the initial development evolving field, and its future is sure to be exciting, but
of the Global Positioning System (GPS; Section 5.8). speculations on where it might be headed are reserved for
the final chapter.
Many technical developments in GIS originated in
the Cold War. Today a single GIS vendor offers many different
products for distinct applications.
GIS really began to take off in the early 1980s,
when the price of computing hardware had fallen to a
level that could sustain a significant software industry 1.4.3 Anatomy of a GIS
and cost-effective applications. Among the first customers
were forestry companies and natural-resource agencies, 1.4.3.1 The network
driven by the need to keep track of vast timber
Despite the complexity noted in the previous section, a
resources, and to regulate their use effectively. At the
GIS does have its well-defined component parts. Today,
time a modest computing system – far less powerful than
the most fundamental of these is probably the network,
today’s personal computer – could be obtained for about
without which no rapid communication or sharing of
$250 000, and the associated software for about $100 000.
digital information could occur, except between a small
Even at these prices the benefits of consistent management group of people crowded around a computer monitor. GIS
using GIS, and the decisions that could be made with these today relies heavily on the Internet, and on its limited-
new tools, substantially exceeded the costs. The market access cousins, the intranets of corporations, agencies,
for GIS software continued to grow, computers continued and the military. The Internet was originally designed as
to fall in price and increase in power, and the GIS software a network for connecting computers, but today it is rapidly
industry has been growing ever since. becoming society’s mechanism of information exchange,
The modern history of GIS dates from the early handling everything from personal messages to massive
shipments of data, and increasing numbers of business
1980s, when the price of sufficiently powerful
transactions.
computers fell below a critical threshold. It is no secret that the Internet in its many forms
As indicated earlier, the history of GIS is a complex has had a profound effect on technology, science,
story, much more complex than can be described in this and society in the last few years. Who could have
brief history, but Table 1.4 summarizes the major events foreseen in 1990 the impact that the Web, e-commerce,
of the past three decades. digital government, mobile systems, and information
and communication technologies would have on our
everyday lives (see Section 18.4.4)? These technologies
have radically changed forever the way we conduct
1.4.2 Views of GIS business, how we communicate with our colleagues and
friends, the nature of education, and the value and
It should be clear from the previous discussion that GIS transitory nature of information.
is a complex beast, with many distinct appearances. To The Internet began life as a US Department of Defense
some it is a way to automate the production of maps, communications project called ARPANET (Advanced
while to others this application seems far too mundane Research Projects Agency Network) in 1972. In 1980
compared to the complexities associated with solving Tim Berners-Lee, a researcher at CERN, the Euro-
geographic problems and supporting spatial decisions, and pean organization for nuclear research, developed the
with the power of a GIS as an engine for analyzing hypertext capability that underlies today’s World Wide
data and revealing new insights. Others see a GIS as a Web – a key application that has brought the Inter-
tool for maintaining complex inventories, one that adds net into the realm of everyday use. Uptake and use
geographic perspectives to existing information systems, of Web technologies have been remarkably quick, dif-
and allows the geographically distributed resources of a fusion being considerably faster than almost all com-
forestry or utility company to be tracked and managed. parable innovations (for example, the radio, the tele-
The sum of all of these perspectives is clearly too phone, and the television: see Figure 18.5). By 2004,
much for any one software package to handle, and GIS 720 million people worldwide used the Internet (see
has grown from its initial commercial beginnings as a Section 18.4.4 and Figure 18.8), and the fastest growth
simple off-the-shelf package to a complex of software, rates were to be found in the Middle East, Latin Amer-
hardware, people, institutions, networks, and activities ica, and Africa (www.internetworldstats.com). How-
that can be very confusing to the novice. A major ever, the global penetration of the medium remained
software vendor such as ESRI today sells many distinct very uneven – for example 62% of North Americans used
products, designed to serve very different needs: a major the medium, but only 1% of Africans (Figure 1.13).
GIS workhorse (ArcInfo), a simpler system designed for Other Internet maps are available at the Atlas of Cyber-
viewing, analyzing, and mapping data (ArcView), an geography maintained by Martin Dodge (www.geog.ucl.
engine for supporting GIS-oriented websites (ArcIMS), ac.uk/casa/martin/atlas/atlas.html).
an information system with spatial extensions (ArcSDE), Geographers were quick to see the value of the
and several others. Other vendors specialize in certain Internet. Users connected to the Internet could zoom in
niche markets, such as the utility industry, or military to parts of the map, or pan to other parts, using simple
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 19
Table 1.4 Major events that shaped GIS

Date Type Event Notes

The Era of Innovation

1957 Application First known automated Swedish meteorologists and British biologists
mapping produced
1963 Technology CGIS development initiated Canada Geographic Information System is developed by
Roger Tomlinson and colleagues for Canadian Land
Inventory. This project pioneers much technology and
introduces the term GIS.
1963 General URISA established The Urban and Regional Information Systems
Association founded in the US. Soon becomes point
of interchange for GIS innovators.
1964 Academic Harvard Lab established The Harvard Laboratory for Computer Graphics and
Spatial Analysis is established under the direction of
Howard Fisher at Harvard University. In 1966 SYMAP,
the first raster GIS, is created by Harvard researchers.
1967 Technology DIME developed The US Bureau of Census develops DIME-GBF (Dual
Independent Map Encoding – Geographic Database
Files), a data structure and street-address database for
1970 census.
1967 Academic and general UK Experimental Cartography Pioneered in a range of computer cartography and GIS
Unit (ECU) formed areas.
1969 Commercial ESRI Inc. formed Jack Dangermond, a student from the Harvard Lab, and
his wife Laura form ESRI to undertake projects in GIS.
1969 Commercial Intergraph Corp. formed Jim Meadlock and four others that worked on guidance
systems for Saturn rockets form M&S Computing,
later renamed Intergraph.
1969 Academic ‘Design With Nature’ published Ian McHarg’s book was the first to describe many of the
concepts in modern GIS analysis, including the map
overlay process (see Chapter 14).
1969 Academic First technical GIS textbook Nordbeck and Rystedt’s book detailed algorithms and
software they developed for spatial analysis.
1972 Technology Landsat 1 launched Originally named ERTS (Earth Resources Technology
Satellite), this was the first of many major Earth
remote sensing satellites to be launched.
1973 General First digitizing production line Set up by Ordnance Survey, Britain’s national mapping
agency.
1974 Academic AutoCarto 1 Conference Held in Reston, Virginia, this was the first in an
important series of conferences that set the GIS
research agenda.
1976 Academic GIMMS now in worldwide use Written by Tom Waugh (a Scottish academic), this
vector-based mapping and analysis system was run at
300 sites worldwide.
1977 Academic Topological Data Structures Harvard Lab organizes a major conference and develops
conference the ODYSSEY GIS.

The Era of Commercialization

1981 Commercial ArcInfo launched ArcInfo was the first major commercial GIS software
system. Designed for minicomputers and based on
the vector and relational database data model, it set a
new standard for the industry.

(continued overleaf)
20 PART I INTRODUCTION
Table 1.4 (continued)

Date Type Event Notes

1984 Academic ‘Basic Readings in Geographic This collection of papers published in book form by
Information Systems’ Duane Marble, Hugh Calkins, and Donna Peuquet
published was the first accessible source of information about
GIS.
1985 Technology GPS operational The Global Positioning System gradually becomes a
major source of data for navigation, surveying, and
mapping.
1986 Academic ‘Principles of Geographical Peter Burrough’s book was the first specifically on GIS
Information Systems for Land principles. It quickly became a worldwide reference
Resources Assessment’ text for GIS students.
published
1986 Commercial MapInfo Corp. formed MapInfo software develops into first major desktop GIS
product. It defined a new standard for GIS products,
complementing earlier software systems.
1987 Academic International Journal of Terry Coppock and others published the first journal on
Geographical Information GIS. The first issue contained papers from the USA,
Systems, now IJGI Science, Canada, Germany, and UK.
introduced
1987 General Chorley Report ‘Handling Geographical Information’ was an influential
report from the UK government that highlighted the
value of GIS.
1988 General GISWorld begins GISWorld, now GeoWorld, the first worldwide
magazine devoted to GIS, was published in the USA.
1988 Technology TIGER announced TIGER (Topologically Integrated Geographic Encoding
and Referencing), a follow-on from DIME, is described
by the US Census Bureau. Low-cost TIGER data
stimulate rapid growth in US business GIS.
1988 Academic US and UK Research Centers Two separate initiatives, the US NCGIA (National Center
announced for Geographic Information and Analysis) and the UK
RRL (Regional Research Laboratory) Initiative show the
rapidly growing interest in GIS in academia.
1991 Academic Big Book 1 published Substantial two-volume compendium Geographical
Information Systems; principles and applications,
edited by David Maguire, Mike Goodchild, and David
Rhind documents progress to date.
1992 Technical DCW released The 1.7 GB Digital Chart of the World, sponsored by the
US Defense Mapping Agency, (now NGA), is the first
integrated 1:1 million scale database offering global
coverage.
1994 General Executive Order signed by Executive Order 12906 leads to creation of US National
President Clinton Spatial Data Infrastructure (NSDI), clearinghouses, and
Federal Geographic Data Committee (FGDC).
1994 General OpenGIS Consortium born The OpenGIS Consortium of GIS vendors, government
agencies, and users is formed to improve
interoperability.
1995 General First complete national Great Britain’s Ordnance Survey completes creation of
mapping coverage its initial database – all 230 000 maps covering
country at largest scale (1:1250, 1:2500 and
1:10 000) encoded.
1996 Technology Internet GIS products Several companies, notably Autodesk, ESRI, Intergraph,
introduced and MapInfo, release new generation of
Internet-based products at about the same time.
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 21
Table 1.4 (continued)

Date Type Event Notes

1996 Commercial MapQuest Internet mapping service launched, producing over 130
million maps in 1999. Subsequently purchased by
AOL for $1.1 billion.
1999 General GIS Day First GIS Day attracts over 1.2 million global participants
who share an interest in GIS.

The Era of Exploitation

1999 Commercial IKONOS Launch of new generation of satellite sensors: IKONOS


claims 90 centimeter ground resolution; Quickbird
(launched 2001) claims 62 cm resolution.
2000 Commercial GIS passes $7 bn Industry analyst Daratech reports GIS hardware,
software, and services industry at $6.9 bn, growing at
more than 10% per annum.
2000 General GIS has 1 million users GIS has more than 1 million core users, and there are
perhaps 5 million casual users of GI.
2002 General Launch of online National Atlas Online summary of US national-scale geographic
of the United States information with facilities for map making
(www.nationalatlas.gov)
2003 General Launch of online national Exemplar of new government websites describing
statistics for the UK economy, population, and society at local and
regional scales (www.statistics.gov.uk)
2003 General Launch of Geospatial One-Stop A US Federal E-government initiative providing access to
geospatial data and information
(www.geodata.gov/gos)
2004 General National Geospatial-Intelligence Biggest GIS user in the world, National Imagery and
Agency (NGA) formed Mapping Agency (NIMA), renamed NGA to signify
emphasis on geo-intelligence

mouse clicks in their desktop WWW browser, without The Internet has proven very popular as a vehicle
ever needing to install specialized software or download for delivering GIS applications for several reasons. It
large amounts of data. This research project soon gave is an established, widely used platform and accepted
way to industrial-strength Internet GIS software products standard for interacting with information of many types.
from mainstream software vendors (see Chapter 7). It also offers a relatively cost-effective way of linking
together distributed users (for example, telecommuters
The use of the WWW to give access to maps dates and office workers, customers and suppliers, students
from 1993. and teachers). The interactive and exploratory nature of
The recent histories of GIS and the Internet have navigating linked information has also been a great hit
been heavily intertwined; GIS has turned out to be a with users. The availability of geographically enabled
compelling application that has prompted many peo- multi-content site gateways (geoportals) with powerful
ple to take advantage of the Web. At the same time, search engines has been a stimulus to further success.
GIS has benefited greatly from adopting the Internet Internet technology is also increasingly portable – this
paradigm and the momentum that the Web has gen- means not only that portable GIS-enabled devices can be
erated. Today there are many successful applications used in conjunction with the wireless networks available
of GIS on the Internet, and we have used some of in public places such as airports and railway stations,
them as examples and illustrations at many points in but also that such devices may be connected through
this book. They range from using GIS on the Inter- broadband in order to deliver GIS-based representations
net to disseminate information – a type of electronic on the move. This technology is being exploited in the
yellow pages – (e.g., www.yell.com), to selling goods burgeoning GIService (yet another use of the three-letter
and services (e.g., www.landseer.com.sg, Figure 1.14), acronym GIS) sector, which offers distributed users access
to direct revenue generation through subscription ser- to centralized GIS capabilities. Later (Chapter 18 and
vices (e.g., www.mapquest.com/solutions/main.adp), onwards) we use the term g-business to cover all the
to helping members of the public to participate in impor- myriad applications carried out in enterprises in different
tant local, regional, and national debates. sectors that have a strong geographical component. The
22 PART I INTRODUCTION

100 104
(A)

103 107
(B)

Figure 1.13 (A) The density of Internet hosts (routers) in 2002, a useful surrogate for Internet activity. The bar next to the map
gives the range of values encoded by the color code per box (pixel) in the map. (B) This can be compared with the density of
population, showing a strong correlation with Internet access in economically developed countries: elsewhere Internet access is
sparse and is limited to urban areas. Both maps have a resolution of 1◦ × 1◦ . (Courtesy Yook S.-H., Jeong H. and Barabsi A.-L.
2002. ‘Modeling the Internet’s large-scale topology,’ Proceedings of the National Academy of Sciences 99, 13382–13386. See
www.nd.edu/∼networks/PDF/Modeling%202002.pdf) (Reproduced with permission of National Academy of Sciences, USA)

more restrictive term g-commerce is also used to describe and Federal levels. Its geoportal (www.geodata.gov/
types of electronic commerce (e-commerce) that include gos) identifies an integrated collection of geographic
location as an essential element. Many GIServices are information providers and users that interact via the
made available for personal use through mobile and medium of the Internet. On-line content can be located
handheld applications as location-based services (see using the interactive search capability of the portal and
Chapter 11). Personal devices, from pagers to mobile then content can be directly used over the Internet.
phones to Personal Digital Assistants, are now filling the This form of Internet application is explored further in
briefcases and adorning the clothing of people in many Chapter 11.
walks of life (Figure 1.15). These devices are able to
provide real-time geographic services such as mapping, The Internet is increasingly integrated into many
routing, and geographic yellow pages. These services are aspects of GIS use, and the days of standalone GIS
often funded through advertisers, or can be purchased on are mostly over.
a pay-as-you go or subscription basis, and are beginning
to change the business GIS model for many types of
applications.
A further interesting twist is the development of 1.4.3.2 The other five components of the
themed geographic networks, such as the US Geospatial GIS anatomy
One-Stop (www.geo-one-stop.gov/: see Box 11.4),
which is one of 24 federal e-government initiatives to The second piece of the GIS anatomy (Figure 1.16) is the
improve the coordination of government at local, state, user’s hardware, the device that the user interacts with
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 23

Figure 1.14 Niche marketing of residential property in Singapore (www.landseer.com.sg) (Reproduced with permission of
Landseer Property Services Pte Ltd.)

office desktop, but today’s user has much more freedom,


because GIS functions can be delivered through laptops,
personal data assistants (PDAs), in-vehicle devices, and
even cellular telephones. Section 11.3 discusses the cur-
rently available technologies in greater detail. In the lan-
guage of the network, the user’s device is the client,
connected through the network to a server that is proba-
bly handling many other user clients simultaneously. The
client may be thick, if it performs a large part of the
work locally, or thin if it does little more than link the
user to the server. A PC or Macintosh is an instance
of a thick client, with powerful local capabilities, while
devices attached to TVs that offer little more than Web
browser capabilities are instances of thin clients.
The third piece of the GIS anatomy is the soft-
ware that runs locally in the user’s machine. This can
Figure 1.15 Wearable computing and personal data assistants be as simple as a standard Web browser (Microsoft
are key to the diffusion and use of location-based services Explorer or Netscape) if all work is done remotely
using assorted digital services offered on large servers.
directly in carrying out GIS operations, by typing, point- More likely it is a package bought from one of the
ing, clicking, or speaking, and which returns information GIS vendors, such as Intergraph Corp. (Huntsville,
by displaying it on the device’s screen or generating Alabama, USA; www.ingr.com), Environmental Sys-
meaningful sounds. Traditionally this device sat on an tems Research Institute (ESRI; Redlands, California,
24 PART I INTRODUCTION

Six parts of a GIS easily stored on a few diskettes), or as large as a terabyte


(roughly a trillion bytes, occupying a storage unit as big
as a small office). Table 1.1 gives some sense of potential
People GIS database volumes.
Software
GIS databases can range in size from a megabyte
to a petabyte.
In addition to these four components – network, hard-
ware, software, and database – a GIS also requires man-
Network agement. An organization must establish procedures, lines
Data
of reporting, control points, and other mechanisms for
ensuring that its GIS activities stay within budgets, main-
tain high quality, and generally meet the needs of the orga-
nization. These issues are explored in Chapters 18, 19,
and 20.
Finally, a GIS is useless without the people who
Hardware design, program, and maintain it, supply it with data,
and interpret its results. The people of GIS will have
Procedures various skills, depending on the roles they perform.
Almost all will have the basic knowledge needed to work
Figure 1.16 The six component parts of a GIS
with geographic data – knowledge of such topics as data
sources, scale and accuracy, and software products – and
USA; www.esri.com), Autodesk Inc. (San Rafael, Cal- will also have a network of acquaintances in the GIS
ifornia, USA; www.autodesk.com), or MapInfo Corp. community. We refer to such people in this book as
(Troy, New York, USA; www.mapinfo.com). Each ven- spatially aware professionals, or SAPs, and the humor in
dor offers a range of products, designed for different levels this term is not intended in any way to diminish their
of sophistication, different volumes of data, and different importance, or our respect for what they know – after
application niches. IDRISI (Clark University, Worcester, all, we would like to be recognized as SAPs ourselves!
Massachusetts, USA, www.clarklabs.org) is an example The next section outlines some of the roles played by the
of a GIS produced and marketed by an academic institu- people of GIS, and the industries in which they work.
tion rather than by a commercial vendor.
Many GIS tasks must be performed repeatedly, and
GIS designers have created tools for capturing such
repeated sequences into easily executed scripts or macros
(Section 16.3.1) For example, the agency that needs to
predict erosion of New South Wales’s soils (Section 1.3)
1.5 The business of GIS
would likely establish a standard script written in the
scripting language of its favorite GIS. The instructions in
Very many people play many roles in GIS, from software
the script would tell the GIS how to model erosion given
development to software sales, and from teaching about
required data inputs and parameters, and how to output the
GIS to using its power in everyday activities. GIS is big
results in suitable form. Scripts can be used repeatedly,
business, and this section looks at the diverse roles that
for different areas or for the same area at different times.
people play in the business of GIS, and is organized by
Support for scripts is an important aspect of GIS software.
the major areas of human activity associated with GIS.
GIS software can range from a simple package
designed for a PC and costing a few hundred dollars, to
a major industrial-strength workhorse designed to serve
an entire enterprise of networked computers, and costing 1.5.1 The software industry
tens of thousands of dollars. New products are constantly
emerging, and it is beyond the scope of this book to Perhaps the most conspicuous sector, although by
provide a complete inventory. no means the largest either in economic or human
The fourth piece of the anatomy is the database, which terms, is the GIS software industry. Some GIS ven-
consists of a digital representation of selected aspects of dors have their roots in other, larger computer appli-
some specific area of the Earth’s surface or near-surface, cations: thus Intergraph and Autodesk, have roots in
built to serve some problem solving or scientific purpose. computer-assisted design software developed for engi-
A database might be built for one major project, such neering and architectural applications; and Leica Geosys-
as the location of a new high-voltage power transmission tems (ERDAS IMAGINE: gis.leica-geosystems.com)
corridor, or it might be continuously maintained, fed by and PCI (www.pcigeomatics.com) have roots in remote
the daily transactions that occur in a major utility company sensing and image processing. Others began as specialists
(installation of new underground pipes, creation of new in GIS. Measured in economic terms, the GIS software
customer accounts, daily service crew activities). It might industry currently accounts for over $1.8 billion in annual
be as small as a few megabytes (a few million bytes, sales, although estimates vary, in part because of the
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 25
difficulty of defining GIS precisely. The software industry value-added data. The Internet makes it possible for GIS
employs several thousand programmers, software design- users to access routinely collected data from sites that
ers, systems analysts, application specialists, and sales may be remote from locations where more specialized
staff, with backgrounds that include computer science, analysis and interpretation functions are performed. In
geography, and many other disciplines. these circumstances, it is no longer incumbent upon an
organization to manage either its own data, or those that
The GIS software industry accounts for about $1.8 it buys in from value-added resellers. For example, ESRI
billion in annual sales. offers a data management service, in which client data are
managed and maintained for a range of clients that are at
liberty to analyze them in quite separate locations. This
1.5.2 The data industry may lead to greater vertical integration of the software
and data industry – ESRI has developed an e-bis division
The acquisition, creation, maintenance, dissemination, and and acquired its own geodemographic system (called
sale of GIS data also account for a large volume of Tapestry) to service a range of business needs. As GIS-
economic activity. Traditionally, a large proportion of GIS based data handling becomes increasingly commonplace,
data have been produced centrally, by national mapping so GIS is finding increasing application in new areas of
agencies such as Great Britain’s Ordnance Survey. In public sector service provision, particularly where large
most countries the funds needed to support national amounts of public money are disbursed at the local
mapping come from sales of products to customers, level – as in policing, education provision, and public
and sales now account for almost all of the Ordnance health. Many data warehouses and start-up organizations
Survey’s annual turnover of approximately $200 million. are beginning to develop public sector data infrastructures
But federal government policy in the US requires that particularly where greater investment in public services is
prices be set at no more than the cost of reproduction taking place.
of data, and sales are therefore only a small part of the
income of the US Geological Survey, the nation’s premier
civilian mapping agency.
1.5.3 The GIService industry
In value of annual sales, the GIS data industry is
much more significant than the software industry. The Internet also allows GIS users to access specific
functions that are provided by remote sites. For example,
In recent years improvements in GIS and related tech- the US MapQuest site (www.mapquest.com) or the
nologies, and reductions in prices, along with various UK Yellow Pages site (www.yell.com) provide routing
kinds of government stimulus, have led to the rapid services that are used by millions of people every day
growth of a private GIS data industry, and to increasing to find the best driving route between two points. By
interest in data sales to customers in the public sector. In typing a pair of street addresses, the user can execute
the socioeconomic realm, there is continuing investment a routing analysis (see Section 15.3.2) and receive the
in the creation and updating of general-purpose geode- results in the form of a map and a set of written driving
mographic indicators (Section 2.3.3), created using pri- or walking directions (see Figure 1.17B). This has several
vate sector datasets alongside traditional socioeconomic advantages over performing the same analysis on one’s
sources such as the Census. For example, UK data ware- own PC – there is no need to buy software to perform
house Experian’s (Nottingham, UK) 2003 Mosaic prod- the analysis, there is no need to buy the necessary data,
uct comprises 54% census data, with the balance of and the data are routinely updated by the GIService
46% coming from private sector sources and spatial indi- provider. There are clear synergies of interest between
cators created using GIS. Data may also be packaged GIService providers and organizations providing location-
with software in order to offer integrated solutions, as based services (Section 1.4.3.1 and Chapter 11), and both
with ESRI’s Business Analyst product. Private compa- activities are part of what we will describe as g-business
nies are now also licensed to collect high-resolution data in Chapter 19. Many sites that provide access to raw GIS
using satellites, and to sell it to customers – Space Imag- data also provide GIServices.
ing (www.spaceimaging.com) and its IKONOS satellite
are a prominent instance (see Table 1.4). Other compa- GIServices are a rapidly growing form of
nies collect similar data from aircraft. Still other com- electronic commerce.
panies specialize in the production of high-quality data
on street networks, a basic requirement of many deliv- GIServices continue to develop rapidly. In today’s
ery companies. Tele Atlas (www.teleatlas.com and its world one of the most important commodities is atten-
North American subsidiary, Geographic Data Technology tion – the fraction of a second of attention given to a
www.geographic.com) is an example of this industry, billboard, or the audience attention that a TV station sells
employing some 1850 staff in producing, maintaining, and to its advertisers. The value of attention also depends on
marketing high-quality street network data in Europe and the degree of fit between the message and the recipi-
North America. ent – an advertiser will pay more for the attention of a
As developments in the information economy gather small number of people if it knows that they include
pace, many organizations are becoming focused upon a large proportion of its typical customers. Advertis-
delivering integrated business solutions rather than raw or ing directed at the individual, based on an individual
26 PART I INTRODUCTION

Figure 1.17 A GIS-enabled London electronic yellow pages: (A) location map of a dentist near St. Paul’s Cathedral; and
(B) written directions of how to get there from University College London Department of Geography

profile, is even more attractive to the advertiser. Direct- to take this much further. For example, the technology
mail companies have exploited the power of geographic already exists to identify the buying habits of a cus-
location to target specific audiences for many years, tomer who stops at a gas pump and uses a credit card,
basing their strategies on neighborhood profiles con- and to direct targeted advertising through a TV screen at
structed from census records. But new technologies offer the pump.
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 27

1.5.4 The publishing industry Science, established in 1987. Other older journals in areas
such as cartography now regularly accept GIS articles,
Much smaller, but nevertheless highly influential in the and several have changed their names and shifted focus
world of GIS, is the publishing industry, with its maga- significantly. Box 1.5 gives a list of the journals that
zines, books, and journals. Several magazines are directed emphasize GIS research.
at the GIS community, as well as some increasingly sig-
nificant news-oriented websites (see Box 1.4).
Several journals have appeared to serve the GIS 1.5.5 GIS education
community, by publishing new advances in GIS research.
The oldest journal specifically targeted at the community The first courses in GIS were offered in universities
is the International Journal of Geographical Information in the early 1970s, often as an outgrowth of courses

Technical Box 1.4

Magazines and websites offering GIS news and related services


ArcNews and ArcUser Magazine (published GIS@development (published monthly for an
by ESRI), see www.esri.com Asian readership by GIS Development,
Directions Magazine (Internet-centered and India), with website at
weekly newsletter publication by www.GISDevelopment.net
directionsmag.com), available online at Spatial Business Online (published
www.directionsmag.com fortnightly in hard and electronic copy
GEO:connexion UK Magazine published form by South Pacific Science Press),
quarterly by GEO:connexion Ltd., with available online at www.gisuser.com.au
website at www.geoconnexion.com Some websites offering online resources for
GEOInformatics published eight times a year the GIS community:
by Cmedia Productions BV, with website www.gisdevelopment.net
at www.geoinformatics.com
www.geoconnexion.com
GeoSpatial Solutions (published monthly by
www.gis.com
Advanstar Communications), and see
their website www.geospatial-online.com. www.giscafe.com
The company also publishes GPSWorld gis.about.com
GeoWorld (published monthly by GEOTEC www.geocomm.com and
Media), available online at www.spatialnews.com
www.geoplace.com www.directionsmag.com
www.opengis.org/press/

Technical Box 1.5

Some scholarly journals emphasizing GIS research


Annals of the Association of American International Journal of Geographical
Geographers Information Systems)
Cartography and Geographic ISPRS Journal of Photogrammetry and
Information Science Remote Sensing
Cartography – The Journal Journal of Geographical Systems
Computers and Geosciences Photogrammetric Engineering and Remote
Computers, Environment and Urban Systems Sensing (PE&RS)
Geographical Analysis Terra Forum
GeoInformatica The Photogrammetric Record
International Journal of Geographical Transactions in GIS
Information Science (formerly URISA Journal
28 PART I INTRODUCTION

Technical Box 1.6

Sites offering Web-based education and training programs in GIS


Birkbeck College (University of London) Kingston Centre for GIS, Distance Learning
GIScOnline M.Sc. in Geographic Programme at www.kingston.ac.uk
Information Science at www.bbk.ac.uk Pennsylvania State University Certificate
City University (London) MGI – Masters in Program in Geographic Information
Geographic Information – a course with Systems at www.worldcampus.psu.edu
face-to-face or distance learning options UNIGIS International, Postgraduate Courses
at www.city.ac.uk in GIS at www.unigis.org
Curtin University’s distance learning University of Southern California GIS
programs in geographic information distance learning certificate program
science at www.cage.curtin.edu.au at www.usc.edu
ESRI’s Virtual Campus at campus.esri.com

in cartography or remote sensing. Today, thousands of GIS data and methods. Taken together, we can think of
courses can be found in universities and colleges all over them as questions that arise from the use of GIS – that are
the world. Training courses are offered by the vendors stimulated by exposure to GIS or to its products. Many of
of GIS software, and increasing use is made of the Web them are addressed in detail at many points in this book,
in various forms of remote GIS education and training and the book’s title emphasizes the importance of both
(Box 1.6). systems and science.
Often, a distinction is made between education and The term geographic information science was coined
training in GIS – training in the use of a particular in a paper by Michael Goodchild published in 1992. In it,
software product is contrasted with education in the the author argued that these questions and others like them
fundamental principles of GIS. In many university were important, and that their systematic study constituted
courses, lectures are used to emphasize fundamental a science in its own right. Information science studies the
principles while computer-based laboratory exercises fundamental issues arising from the creation, handling,
emphasize training. In our view, an education should be storage, and use of information – similarly, GIScience
for life, and the material learned during an education should study the fundamental issues arising from geo-
should be applicable for as far into the future as possible. graphic information, as a well-defined class of information
Fundamental principles tend to persist long after software in general. Other terms have much the same meaning:
has been replaced with new versions, and the skills geomatics and geoinformatics, spatial information sci-
learned in running one software package may be of very ence, geoinformation engineering. All suggest a scientific
little value when a new technology arrives. On the other approach to the fundamental issues raised by the use of
hand much of the fun and excitement of GIS comes from GIS and related technologies, though they all have differ-
actually working with it, and fundamental principles can ent roots and emphasize different ways of thinking about
be very dry and dull without hands-on experience. problems (specifically geographic or more generally spa-
tial, emphasizing engineering or science, etc.).
GIScience has evolved significantly in recent years.
It is now part of the title of several renamed research
journals (see Box 1.5), and the focus of the US Uni-
versity Consortium for Geographic Information Sci-
1.6 GISystems, GIScience, and ence (www.ucgis.org), an organization of roughly 60
GIStudies research universities that engages in research agenda
setting (Box 1.7), lobbying for research funding, and
related activities. An international conference series on
Geographic information systems are useful tools, helping GIScience has been held in the USA biannually since
everyone from scientists to citizens to solve geographic 2000 (see www.giscience.org). The Varenius Project
problems. But like many other kinds of tools, such as (www.ncgia.org) provides one disarmingly simple way
computers themselves, their use raises questions that to view developments in GIScience (Figure 1.18). Here,
are sometimes frustrating, and sometimes profound. For GIScience is viewed as anchored by three concepts – the
example, how does a GIS user know that the results individual, the computer, and society. These form the ver-
obtained are accurate? What principles might help a tices of a triangle, and GIScience lies at its core. The
GIS user to design better maps? How can location- various terms that are used to describe GIScience activ-
based services be used to help users to navigate and ity can be used to populate this triangle. Thus research
understand human and natural environments? Some of about the individual is dominated by cognitive science,
these are questions of GIS design, and others are about with its concern for understanding of spatial concepts,
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 29

Technical Box 1.7

The 2002 research agenda of the US University Consortium for Geographic


Information Science (www.ucgis.org), and related chapters in this book
g. Geographic information resource
1. Long-term research challenges
management (Chapters 17 and 18)
a. Spatial ontologies (Chapters 3 and 6) h. Emergency data acquisition and analysis
b. Geographic representation (Chapter 3) (Chapter 9)
c. Spatial data acquisition and integration i. Gradation and indeterminate boundaries
(Chapters 9 and 10) (Chapter 6)
d. Scale (Chapter 4) j. Geographic information security
e. Spatial cognition (Chapter 3) (Chapter 17)
f. Space and space/time analysis and k. Geospatial data fusion (Chapters 2
modeling (Chapters 4, 14, 15, and 16) and 11)
g. Uncertainty in geographic information l. Institutional aspects of SDIs (Chapters 19
(Chapter 6) and 20)
h. Visualization (Chapters 12 and 13) m. Geographic information partnering
i. GIS and society (Chapters 1 and 17) (Chapter 20)
j. Geographic information engineering n. Geocomputation (Chapter 16)
(Chapters 11 and 20) o. Global representation and modeling
2. Short-term research priorities (Chapter 3)
p. Spatialization (Chapters 3 and 13)
a. GIS and decision making (Chapters 2, 17,
q. Pervasive computing (Chapter 11)
and 18)
r. Geographic data mining and knowledge
b. Location-based services (Chapters 7
discovery (Chapter 14)
and 11)
s. Dynamic modeling (Chapter 16)
c. Social implications of LBS (Chapter 11)
d. Identification of spatial clusters More detail on all of these topics, and
(Chapters 13 and 14) additional topics added at more recent UCGIS
e. Geospatial semantic Web (Chapters 1 assemblies, can be found at www.ucgis.org/
and 11) priorities/research/2002researchagenda.htm
f. Incorporating remotely sensed data and
information in GIS (Chapters 3 and 9)

learning and reasoning about geographic data, and inter- Individual


action with the computer. Research about the computer
is dominated by issues of representation, the adaptation
of new technologies, computation, and visualization. And
finally, research about society addresses issues of impacts
and societal context. Others have developed taxonomies
of challenges facing the nascent discipline of GIScience,
GI
such as the US University Consortium for Geographic
Science
Information Science (Box 1.7). It is possible to imagine
how the themes presented in Box 1.7 could be used to
populate Figure 1.18 in relation to the three vertices of Computer Society
this triangle.
There are important respects in which GIScience is Figure 1.18 The remit of GIScience, according to Project
about using the software environment of GIS to redefine, Varenius (www.ncgia.org)
reshape, and resolve pre-existing problems. Many of the
research topics in GIScience are actually much older
than GIS. The need for methods of spatial analysis, for
example, dates from the first maps, and many methods that GISystems implement and exploit. Map projections
were developed long before the first GIS appeared (Chapter 5), for example, are part of GIScience, and
on the scene in the mid-1960s. Another way to look are used and transformed in GISystems. Another area
at GIScience is to see it as the body of knowledge of great importance to GIS is cognitive science, and
30 PART I INTRODUCTION

Biographical Box 1.8

Reg Golledge, Behavioral Geographer


Reg Golledge was born in Australia but has worked in the US since
completing his Ph.D. at the University of Iowa in 1966. He has worked
at The Ohio State University (1967–1977), and since 1977 at the University
of California, Santa Barbara (UCSB).
GIScience revisits many of the classic problems of spatial analysis,
most of which assumed that people were rational and were optimizers
in a very narrow sense. Over the last four decades, Reg’s work has
contributed much to our understanding of individual spatial behavior
by relaxing these restrictive assumptions yet retaining the power of
scientific generalization. Golledge’s analytical behavioral geography has
examined individual behavior using statistical and computational process
models, particularly within the domain of transportation GIS (GIS-T: see
Section 2.3.4), and has done much to make sense of the complexities and
constraints that govern movement within urban systems. Related to this,
analytical behavioral geography has also developed our understanding of
individual cognitive awareness of urban networks and landmarks.
Reg’s work is avowedly interdisciplinary. He has undertaken extensive Figure 1.19 Reg Golledge,
work with cognitive psychologists at UCSB to develop personal guidance behavioral geographer
systems for use by visually-impaired travelers. This innovative work has
linked GPS (for location and tracking) and GIS (for performing operations such as shortest path calculation,
buffering, and orientation: see Chapters 14 and 15) with a novel auditory virtual system that presents users
with the spatial relations between nearby environmental features. The device also allows users to personalize
their representations of the environment.
Reg’s enduring contribution to GIScience has been in modeling, explaining, and predicting disaggregate
behaviors of individuals. This has been achieved through researching spatial cognition and cognitive science
through GIS applications. He has established the importance of cognitive mapping to reasoning through
GIScience, developed our understanding of the ways in which spatial concepts are embedded in GIS
technology, and made vital contributions to the development of multimodal interfaces to GIS. These efforts
have helped to develop new links to information science, information technology, and multimedia, and
suggest ways of bridging the digital divides that threatens to further disadvantage disabled and elderly
people. As a visually-impaired individual himself, Reg firmly believes that GIS technology and GIScience
research are the most significant contributions that geography can make to truly integrated human and
physical sciences, and sees a focus upon cognition as the natural bridge between these approaches to
scientific inquiry.

particularly the scientific understanding of how people particularly with the development of the Internet and new
think about their geographic surroundings. If GISystems approaches to software engineering, the old monolithic
are to be easy to use they must fit with human ideas about nature of GIS has been replaced by something much more
such topics as driving directions, or how to construct fluid, and GIS is no longer an activity confined to the
useful and understandable maps. Box 1.8 introduces desktop (Chapter 11). The emphasis throughout this book
Reg Golledge, a quantitative and behavioral geographer is on this new vision of GIS, as the set of coordinated parts
who has brought diverse threads of cognitive science, discussed earlier in Section 1.4. Perhaps the system part
transportation modeling, and analysis of geography and of GIS is no longer necessary – certainly the phrase GIS
disability together under the umbrella of GIScience. data suggests some redundancy, and many people have
Many of the roots to GIS can be traced to the suggested that we could drop the ‘S’ altogether in favor of
GI, for geographic information. GISystems are only one
spatial analysis tradition in the discipline
part of the GI whole, which also includes the fundamental
of geography.
issues of GIScience. Much of this book is really about
In the 1970s it was easy to define or delimit a GIStudies, which can be defined as the systematic study
geographic information system – it was a single piece of of society’s use of geographic information, including its
software residing on a single computer. With time, and institutions, standards, and procedures, and many of these
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 31
topics are addressed in the later chapters. Several of begin to see in Chapter 14, spatial analysis is the pro-
the UCGIS research topics suggest this kind of focus, cess by which we turn raw spatial data into useful spatial
including GIS and society and geographic information information. For the first half of its history, the principal
partnering. In recent years the role of GIS in society – its focus of spatial analysis in most universities was upon
impacts and its deeper significance – has become the development of theory, rather than working applications.
focus of extensive writing in the academic literature, Actual data were scarce, as were the means to process
particularly in the discipline of geography, and much of and analyze them.
it has been critical of GIS. We explore these critiques in In the 1980s GIS technology began to offer a solution
detail in the next section. to the problems of inadequate computation and limited
The importance of social context is nicely expressed by data handling. However, the quite sensible priorities of
Nick Chrisman’s definition of GIS which might also serve vendors at the time might be described as solving the
as an appropriate final comment on the earlier discussion problems of 80% of their customers 80% of the time,
of definitions: and the integration of techniques based upon higher-order
concepts was a low priority. Today’s GIS vendors can
The organized activity by which people: probably be credited with solving the problems of at least
1) measure aspects of geographic phenomena and 90% of their customers 90% of the time, and much of
processes; 2) represent these measurements, the remit of GIScience is to diffuse improved, curiosity-
usually in the form of a computer database, to driven scientific understanding into the knowledge base
of existing successful applications. But the drive towards
emphasize spatial themes, entities, and
improved applications has also been propelled to a
relationships; 3) operate upon these significant extent by the advent of GPS and other digital
representations to produce more measurements data infrastructure initiatives by the late 1990s. New data
and to discover new relationships by integrating handling technologies and new rich sources of digital
disparate sources; and 4) transform these data open up prospects for refocusing and reinvigorating
representations to conform to other frameworks academic interest in applied scientific problem solving.
of entities and relationships. These activities Although repeat purchases of GIS technology leave
the field with a buoyant future in the IT mainstream,
reflect the larger context (institutions and cultures)
there is enduring unease in some academic quarters about
in which these people carry out their work. In turn, GIS applications and their social implications. Much of
the GIS may influence these structures. this unease has been expressed in the form of critiques,
(Chrisman 2003, p. 13) notably from geographers. John Pickles has probably
contributed more to the debate than almost anyone
Chrisman’s social structures are clearly part of the GIS else, notably through his 1993 edited volume Ground
whole, and as students of GIS we should be aware of the Truth: The Social Implications of Geographic Information
ethical issues raised by the technology we study. This is Systems. Several types of arguments have surfaced:
the arena of GIStudies.
■ The ways in which GIS represents the Earth’s surface,
and particularly human society, favor certain
phenomena and perspectives, at the expense of others.
For example, GIS databases tend to emphasize
homogeneity, partly because of the limited space
1.7 GIS and geography available and partly because of the costs of more
accurate data collection (see Chapters 3, 4, and 8).
Minority views, and the views of individuals, can be
GIS has always had a special relationship to the academic submerged in this process, as can information that
discipline of geography, as it has to other disciplines differs from the official or consensus view. For
that deal with the Earth’s surface, including planning and example, a soil map represents the geographic
landscape architecture. This section explores that special variation in soils by depicting areas of constant class,
relationship and its sometimes tense characteristics. Non- separated by sharp boundaries. This is clearly an
geographers can conveniently skip this section, though approximation, and in Chapter 6 we explore the role
much of its material might still be of interest. of uncertainty in GIS. GIS often forces knowledge
Chapter 2 presents a gallery of successful GIS appli- into forms more likely to reflect the view of the
cations. This paints a picture of a field built around majority, or the official view of government, and as a
low-order concepts that actually stands in rather stark result marginalizes the opinions of minorities or the
contrast to the scientific tradition in the academic dis- less powerful.
cipline of geography. Here, the spatial analysis tradition ■ Although in principle it is possible to use GIS for any
has developed during the past 40 years around a range purpose, in practice it is often used for purposes that
of more-sophisticated operations and techniques, which may be ethically questionable or may invade
have a much more elaborate conceptual structure (see individual privacy, such as surveillance and the
Chapters 14 through 16). One of the foremost proponents gathering of military and industrial intelligence. The
of the spatial analysis approach is Stewart Fotheringham, technology may appear neutral, but it is always used
whose contribution is discussed in Box 1.9. As we will in a social context. As with the debates over the
32 PART I INTRODUCTION

Biographical Box 1.9

Stewart Fotheringham, geocomputation specialist


There are many close synonyms for geographic information science
(GIScience), one of which is geocomputation – a term first coined by
the geographer Stan Openshaw to describe the scientific application
of computationally-intensive techniques to problems with a spatial
dimension. A. Stewart Fotheringham is Science Foundation Ireland Research
Professor and Director of the National Centre for Geocomputation at the
National University of Ireland in Maynooth. He is a spatial scientist who
has considerable previous experience of the Anglo-American university
systems – he has worked and studied at the Universities of Newcastle and
Aberdeen in the UK, the State University of New York at Buffalo, the
University of Florida, and Indiana University in the US, and McMaster
University in Canada (Figure 1.20). Figure 1.20 Stewart Fotheringham,
Like GIScience, geocomputation is fundamentally about satisfying quantitative geographer
human curiosity through systematic, scientific problem solving. Many of
the roots to the scientific use of GIS in scientific problem solving can be traced to the ‘Quantitative
Revolution’ in geography of the 1960s, which had the effect of popularizing systematic techniques of spatial
analysis throughout the discipline – an approach that had its detractors then as well as now (Section 1.7).
The Quantitative Revolution has not only bequeathed GIS a rich legacy of methods and techniques, but has
also developed into a sustained concern for understanding the nature of spatial variations in relationships.
The range of these methods and techniques is described in Stewart’s 2000 book Quantitative Geography:
Perspectives on Spatial Data Analysis (Sage, London: with co-researchers Chris Brunsdon and Martin Charlton),
while spatial variations in relationships are considered in detail in his 2002 book Geographically Weighted
Regression: The Analysis of Spatially Varying Relationships (Wiley, Chichester: also with the same co-
authors).
The methods and techniques that Stewart has developed and applied permeate the world of GIS
applications that we consider in Chapter 2. Stewart remains evangelical about the importance of space and
our need to use GIS to make spatial analysis sensitive to context. He says: ‘We know that many spatially
aggregated statements, such as the average temperature of the entire US on any given day, actually tell
us very little. Yet when we seek to establish relationships between data, we all too often hypothesize
that relationships are the same everywhere – that relationships are spatially invariant’. Stewart’s work
on geographically weighted regression (GWR) is part of a growing realization that relationships, or our
measurements of them, can vary over space and that we need to investigate this potential non-stationarity
further (see Chapter 4). Stewart’s geocomputational approach is closely linked to GIS because it uses
locational information as inputs and produces geocoded results as outputs that can be mapped and further
analyzed. GWR exploits the property of spatial location to the full, and has led to geocomputational analysis
of relationships by researchers working in many disciplines.

atomic bomb in the 1940s and 1950s, the scientists ■ There are concerns that GIS remains a tool in the
who develop and promote the use of GIS surely bear hands of the already powerful – notwithstanding the
some responsibility for how it is eventually used. The diffusion of technology that has accompanied the
idea that a tool can be inherently neutral, and its plummeting cost of computing and wide adoption of
developers therefore immune from any ethical the Internet. As such, it is seen as maintaining the
debates, is strongly questioned in this literature. status quo in terms of power structures. By
implication, any vision of GIS for all of society is
■ The very success of GIS is a cause of concern. There
seen as unattainable.
are qualms about a field that appears to be led by
technology and the marketplace, rather than by human ■ There appears to be an absence of applications of GIS
need. There are fears that GIS has become too in critical research. This academic perspective is
successful in modeling socioeconomic distributions, centrally concerned with the connections between
and that as a consequence GIS has become a tool of human agency and particular social structures and
the ‘surveillance society’. contexts. Some of its protagonists are of the view that
CHAPTER 1 SYSTEMS, SCIENCE, AND STUDY 33
such connections are not amenable to digital systems and science, and certainly much more of this book
representation in whole or in part. is about the broader concept of geographic information
■ Some view the association of GIS with the scientific than about isolated, monolithic software systems per se.
and technical project as fundamentally flawed. More We believe strongly that effective users of GIS require
narrowly, there is a view that GIS applications are some awareness of all aspects of geographic information,
(like spatial analysis before it) inextricably bound to from the basic principles and techniques to concepts of
the philosophy and assumptions of the approach to management and familiarity with applications. We hope
science known as logical positivism (see also the this book provides that kind of awareness. On the other
reference to ‘positive’ in Section 1.1). As such, the hand, we have chosen not to include GIStudies in the
argument goes, GIS can never be more than a logical title. Although the later chapters of the book address many
positivist tool and a normative instrument, and cannot aspects of the social context of GIS, including issues of
enrich other more critical perspectives in geography. privacy, the context to GIStudies is rooted in social theory.
GIStudies need the kind of focused attention that we
Many geographers remain suspicious of the use of cannot give, and we recommend that students interested
GIS in geography. in more depth in this area explore the specialized texts
listed in the guide to further reading.
We wonder where all this discussion will lead. For
our own part, we have chosen a title that includes both

Questions for further study Further reading


Chrisman N.R. 2003 Exploring Geographical Information
1. Examine the geographic data available for the area Systems (2nd edn). Hoboken, NJ: Wiley.
within 50 miles (80 km) of either where you live or Curry M.R. 1998 Digital Places: Living with Geographic
where you study. Use it to produce a short (2500 Information Technologies. London: Routledge.
word) illustrated profile of either the socioeconomic Foresman T.W. (ed) 1998 The History of Geographic
or the physical environment. (See for example Information Systems: Perspectives from the Pioneers.
www.geodata.gov/gos;
Upper Saddle River, NJ: Prentice Hall.
www.geographynetwork.com; eu-geoportal.jrc.it;
Goodchild M.F. 1992 ‘Geographical information sci-
or www.magic.gov.uk). ence’. International Journal of Geographical Informa-
2. What are the distinguishing characteristics of the tion Systems 6: 31–45.
scientific method? Discuss the relevance of each Longley P.A. and Batty M. (eds) 2003 Advanced Spatial
to GIS. Analysis: The CASA Book of GIS. Redlands, CA: ESRI
3. We argued in Section 1.4.3.1 that the Internet has Press.
dramatically changed GIS. What are the arguments Pickles J. 1993 Ground Truth: The Social Implications of
for and against this view? Geographic Information Systems. New York: Guilford
4. Locate each of the issues identified in Box 1.7 in two Press.
triangular ‘GIScience’ diagrams like that shown in University Consortium for Geographic Information Sci-
Figure 1.18 – one for long-term research challenges ence 1996 ‘Research priorities for geographic infor-
and one for short-term research priorities. Give short mation science’. Cartography and Geographic Infor-
written reasons for your assignments. Compare the mation Systems 23(3): 115–127.
distribution of issues within each of your triangles in
order to assess the relative importance of the
individual, the computer, and society in the
development of GIScience over both the short- and
long-term.
2 A gallery of applications

Fundamentally, GIS is about workable applications. This chapter gives a flavor


of the breadth and depth of real-world GIS implementations. It considers:

■ How GIS affects our everyday lives;


■ How GIS applications have developed, and how the field compares with
scientific practice;
■ The goals of applied problem solving;
■ How GIS can be used to study and solve problems in transportation, the
environment, local government, and business.

Geographic Information Systems and Science, 2nd edition Paul Longley, Michael Goodchild, David Maguire, and David Rhind.
 2005 John Wiley & Sons, Ltd. ISBNs: 0-470-87000-1 (HB); 0-470-87001-X (PB)
36 PART I INTRODUCTION

Learning Objectives
2.1 Introduction

After studying this chapter you will:

■ Grasp the many ways in which we interact


2.1.1 One day of life with GIS
with GIS in everyday life; 7:00 My alarm goes off. . . The energy to power
the alarm comes from the local energy company,
which uses a GIS to manage all its assets (e.g.,
■ Appreciate the range and diversity of GIS electrical conductors, devices, and structures) so that
applications in environmental and it can deliver electricity continuously to domestic and
social science; commercial customers (Figure 2.1).
7:05 I jump in the shower. . . The water for the shower
is provided by the local water company, which uses
■ Be able to identify many of the scientific a hydraulic model linked to its GIS to predict water
assumptions that underpin real-world usage and ensure that water is always available to its
valuable customers (Figure 2.2).
applications;
7:35 I open the mail. . . A property tax bill comes from
a local government department that uses a GIS to store
■ Understand how GIS is applied in the property data and automatically produce annual tax
bills. This has helped the department to peg increases
representative application areas of in property taxes to levels below retail price inflation.
transportation, the environment, local There are also a small number of circulars addressed
government, and business. to me, sometimes called ‘junk mail’. We spent our

Figure 2.1 An electrical utility application of GIS


CHAPTER 2 A GALLERY OF APPLICATIONS 37

Figure 2.2 Application of a GIS for managing the assets of a water utility

vacation in Southlands and Santatol last year, and the forest products company indicates which areas are
the holiday company uses its GIS to market similar available for logging, the best access routes, and the
destinations to its customer base – there are good deals likely yield (Figure 2.4).
for the Gower and Northampton this season. A second 9:30 I arrive at work. . . I am GIS Manager for the
item is a special offer for property insurance, from a local City government. Today I have meetings to
firm that uses its GIS to target neighborhoods with low review annual budgets, plan for the next round of
past-claims histories. We get less junk mail than we hardware and software acquisition, and deal with a
used to (and we don’t want to opt out of all programs), nasty copyright infringement claim.
because geodemographic and lifestyles GIS is used to
12:00 I grab a sandwich for lunch. . . The price of bread
target mailings more precisely, thus reducing waste
fell in real terms for much of the past decade. In some
and saving time.
small part this is because of the increasing use of GIS
8:00 The other half leaves for work. . . He teaches GIS in precision agriculture. This has allowed real-time
at one of the city community colleges. As a lecturer mapping of soil nutrients and yield, and means that
on one of the college’s most popular classes he has a farmers can apply just the right amount of fertilizer in
full workload and likes to get to work early. the right location and at the right time.
8:15 I walk the kids to the bus stop. . . Our children 6:30 Shop till you drop. . . After work we go shopping
attend the local middle school that is three miles and use some of the discount coupons that were in the
away. The school district administrators use a GIS morning mail. The promotion is to entice customers
to optimize the routing of school buses (Figure 2.3). back to the renovated downtown Tesbury Center. We
Introduction of this service enabled the district to cut usually go to MorriMart on the far side of town, but
their annual school busing costs by 16% and the time thought we’d participate in the promotion. We actually
it takes the kids to get to school has also been reduced. bump into a few of our neighbors at Tesbury – I sus-
8:45 I catch a train to work. . . At the station the current pect the promotion was targeted by linking a marketing
location of trains is displayed on electronic maps GIS to Tesbury’s own store loyalty card data.
on the platforms using a real-time feed from global 10:30 The kids are in bed. . . I’m on the Internet
positioning (GPS) receivers mounted on the trains. The to try and find a new house. . . We live in a good
same information is also available on the Internet so neighborhood with many similarly articulate, well-
I was able to check the status of trains before I left educated folk, but it has become noisier since the
the house. new distributor road was routed close by. Our resident
9:15 I read the newspaper on the train. . . The paper for association mounted a vociferous campaign of protest,
the newspaper comes from sustainable forests managed and its members filed numerous complaints to the
by a GIS. The forestry information system used by website where the draft proposals were posted. But
38 PART I INTRODUCTION

Figure 2.3 A GIS used for school bus routing

Figure 2.4 Forestry management GIS

the benefit-cost analysis carried out using the local get such a free run as they once did. So here I am
authority’s GIS clearly demonstrated that it was either using one of the on-line GIS-powered websites to find
a bit more noise for us, or the physical dissection of a properties that match our criteria (similar to that in
vast swathe of housing elsewhere, and that we would Figure 1.14). Once we have found a property, other
have to grin and bear it. Post GIS, I guess that narrow mapping sites provide us with details about the local
interest NIMBY (Not In My Back Yard) protests don’t and regional facilities.
CHAPTER 2 A GALLERY OF APPLICATIONS 39
GIS is used to improve many of our day-to-day ■ Better technology to support applications, specifically
working and living arrangements. in terms of visualization, data management and
analysis, and linkage to other software.
This diary is fictitious of course, but most of the
■ The proliferation of geographically referenced digital
things described in it are everyday occurrences repeated
data, such as those generated using Global Positioning
hundreds and thousands of times around the world. It
System (GPS) technology or supplied by value-added
highlights a number of key things about GIS. GIS
resellers (VARs) of data.
■ affects each of us, every day; ■ Availability of packaged applications, which are
■ can be used to foster effective short- and long-term available commercially off-the-shelf (COTS) or ‘ready
decision making; to run out of the box’.
■ has great practical importance; ■ The accumulated experience of applications that work.
■ can be applied to many socio-economic and
environmental problems;
■ supports mapping, measurement, management,
monitoring, and modeling operations;
■ generates measurable economic benefits; 2.2 Science, geography, and
■ requires key management skills for effective applications
implementation;
■ provides a challenging and stimulating educational
experience for students;
■ can be used as a source of direct income; 2.2.1 Scientific questions and GIS
■ can be combined with other technologies; and
operations
■ is a dynamic and stimulating area in which to work.
At the same time, the examples suggest some of the As we saw in Section 1.3, one objective of science is to
elements of the critique that has been leveled at GIS in solve problems that are of real-world concern. The range
recent years (see Section 1.7). Only a very small fraction and complexity of scientific principles and techniques that
of the world’s population has access to information are brought to bear upon problem solving will clearly
technologies of any kind, let alone high-speed access to vary between applications. Within the spatial domain, the
the Internet. At the global scale, information technology goals of applied problem solving include, but are not
can exacerbate the differences between developed and restricted to:
less-developed nations, across what has been called the ■ Rational, effective, and efficient allocation of
digital divide, and there is also digital differentiation resources, in accordance with clearly stated
between rich and poor communities within nations. Uses criteria – whether, for example, this entail physical
of GIS for marketing often involve practices that border construction of infrastructure in utilities applications,
on invasion of privacy, since they allow massive databases or scattering fertilizer in precision agriculture.
to be constructed from what many would regard as ■ Monitoring and understanding observed spatial
personal information. It is important that we understand distributions of attributes – such as variation in soil
and reflect on issues like these while exploring GIS. nutrient concentrations, or the geography of
environmental health.
■ Understanding the difference that place makes –
2.1.2 Why GIS? identifying which characteristics are inherently similar
between places, and what is distinctive and possibly
Our day of life with GIS illustrates the unprecedented unique about them. For example, there are regional
frequency with which, directly or indirectly, we interact and local differences in people’s surnames (see
with digital machines. Today, more and more individuals Box 1.2), and regional variations in voting patterns
and organizations find themselves using GIS to answer are the norm in most democracies.
the fundamental question, where? This is because of:
■ Understanding of processes in the natural and human
■ Wider availability of GIS through the Internet, as well environments, such as processes of coastal erosion or
as through organization-wide local area networks. river delta deposition in the natural environment, and
■ Reductions in the price of GIS hardware and software, understanding of changes in residential preferences or
because economies of scale are realized by a store patronage in the social.
fast-growing market. ■ Prescription of strategies for environmental mainte-
■ Greater awareness that decision making has a nance and conservation, as in national park
geographic dimension. management.
■ Greater ease of user interaction, using standard Understanding and resolving these diverse problems
windowing environments. entails a number of general data handling operations – such
40 PART I INTRODUCTION
as inventory compilation and analysis, mapping, and 2.2.2 GIScience applications
spatial database management – that may be successfully
undertaken using GIS.
Early GIS was successful in depicting how the world
GIS is fundamentally about solving looks, but shied away from most of the bigger questions
real-world problems. concerning how the world works. Today GIScience is
GIS has always been fundamentally an applications-led developing this extensive experience of applications into
area of activity. The accumulated experience of appli- a bigger agenda – and is embracing a full range of
cations has led to borrowing and creation of particular conceptual underpinnings to successful problem solving.
conventions for representing, visualizing, and to some GIS nevertheless remains fundamentally an appli-
extent analyzing data for particular classes of applica- cations-led technology, and many applications remain
tions. Over time, some of these conventions have become modest in both the technology that they utilize and the
useful in application areas quite different from those for scientific tasks that they set out to accomplish. There
which they were originally intended, and software ven- is nothing fundamentally wrong with this, of course,
dors have developed general-purpose routines that may as the most important test of geographic science and
be customized in application-specific ways, as in the way technology is whether or not it is useful for exploring
that spatial data are visualized. The way that accumu- and understanding the world around us. Indeed the
lated experience and borrowed practice becomes formal- broader relevance of geography as a discipline can only
ized into standard conventions makes GIS essentially an be sustained in relation to this simple goal, and no
inductive field. amount of scientific and technological ingenuity can
In terms of the definition and remit of GIScience salvage geographic representations of the world that are
(Section 1.6) the conventions used in applications are too inaccurate, expensive, cumbersome, or opaque to
based on very straightforward concepts. Most data- reveal anything new. In practice, this means that GIS
handling operations are routine and are available as applications must be grounded in sound concepts and
adjuncts to popular word-processing packages (e.g., theory if they are to resolve any but the most trivial
Microsoft MapPoint: www.microsoft.com/mappoint). of questions.
They work and are very widely used (e.g., see Figure 2.5),
yet may not always be readily adaptable to scientific GIS applications need to be grounded in sound
problem solving in the sense developed in Section 1.3. concepts and theory.

Figure 2.5 Microsoft MapPoint Europe mapping of spreadsheet data of burglary rates in Exeter, England using an adjunct to a
standard office software package (courtesy D. Ashby.  1988–2001 Microsoft Corp. and/or its suppliers. All rights reserved.  2000
Navigation Technologies B.V. and its suppliers. All rights reserved. Selected Road Maps  2000 by AND International Publishers
N.V. All rights reserved.  Crown Copyright 2000. All rights reserved. License number 100025500. Additional demographic data
courtesy of Experian Limited.  2004 Experian Limited. All rights reserved.)
CHAPTER 2 A GALLERY OF APPLICATIONS 41
■ Respectable Early Adopters – regarded as opinion
2.3 Representative application formers or ‘role models’.
areas and their foundations ■ Deliberate Early Majority – willing to consider
adoption only after peers have adopted.
■ Skeptical Late Majority – overwhelming pressure
from peers is needed before adoption occurs.
■ Traditional Laggards – people oriented to the past.
2.3.1 Introduction and overview
GIS is moving into the Late Majority stage, although
There is, quite simply, a huge range of applications of some areas of application are more comprehensively
GIS, and indeed several pages of this book could be developed than others. The Innovators who dominated
filled with a list of application areas. They include topo- the field in the 1970s were typically based in universities
graphic base mapping, socio-economic and environmental and research organizations. The Early Adopters were the
modeling, global (and interplanetary!) modeling, and edu- users of the 1980s, many of whom were in government
cation. Applications generally set out to fulfill the five and military establishments. The Early Majority, typically
Ms of GIS: mapping, measurement, monitoring, model- in private businesses, came to the fore in the mid-1990s.
ing, management. The current question for potential users appears to be:
do you want to gain competitive advantage by being part
The five Ms of GIS application are mapping, of the Majority user base or wait until the technology
measurement, monitoring, modeling, is completely accepted and contemplate joining the GIS
and management. community as a Laggard?
In very general terms, GIS applications may be classi- A wide range of motivations underpins the use of
fied as traditional, developing, and new. Traditional GIS GIS, although it is possible to identify a number of
application fields include military, government, education, common themes. Applications dealing with day-to-day
and utilities. The mid-1990s saw the wide development issues typically focus on very practical concerns such as
of business uses, such as banking and financial services, cost effectiveness, service provision, system performance,
transportation logistics, real estate, and market analy- competitive advantage, and database creation, access, and
sis. The early years of the 21st century are seeing new use. Other, more strategic applications are more concerned
forward-looking application areas in small office/home with creating and evaluating scenarios under a range of
office (SOHO) and personal or consumer applications, circumstances.
as well as applications concerned with security, intelli- Many applications involve use of GIS by large
gence, and counter-terrorism measures. This is a some- numbers of people. It is not uncommon for a large
what rough-and-ready classification, however, because the government agency, university, or utility to have more
applications of some agencies (such as utilities) fall into than 100 GIS seats, and a significant number have more
more than one class. than 1000. Once GIS applications become established
A further way to examine trends in GIS applications within an organization, usage often spreads widely.
is to examine the diffusion of GIS use. Figure 2.6 shows Integration of GIS with corporate information system (IS)
the classic model of GIS diffusion originally developed policy and with forward planning policy is an essential
by Everett Rogers. Rogers’ model divides the adopters of prerequisite for success in many organizations.
an innovation into five categories: The scope of these applications is best illustrated with
respect to representative application areas, and in the
■ Venturesome Innovators – willing to accept risks and remainder of this chapter we consider:
sometimes regarded as oddballs.
1. Government and public service (Section 2.3.2)
GIS sales
2. Business and service planning (Section 2.3.3)
3. Logistics and transportation (Section 2.3.4)
Laggards
4. Environment (Section 2.3.5)
Late Majority We begin by identifying the range of applications
within each of the four domains. Next, we go on to
Early Majority focus upon one application within each domain. Each
application is chosen, first, for simplicity of exposition
but also, second, for the scientific questions that it raises.
Early Adopters In this book, we try to relate science and application in
two ways. First, we flag the sections elsewhere in the
book where the scientific issues raised by the applications
Innovators are discussed. Second, the applications discussed here,
and others like them, provide the illustrative material
1970 1980 1990 2000 2010 for our discussion of principles, techniques, analysis, and
Figure 2.6 The classic Rogers model of innovation diffusion practices in the other chapters of the book.
applied to GIS (After Rogers E.M. 2003 Diffusion of A recurrent theme in each of the application classes
Innovations (5th edn). New York: Simon and Schuster.) is the importance of geographic location, and hence
42 PART I INTRODUCTION
what is special about the handling of georeferenced data processes, and services through ever-increasing efficiency
(Section 1.1.1). The gallery of applications that we set out of resource usage (see also Section 15.3). Thus GIS
here intends to show how geographic data can provide is used to inventory resources and infrastructure, plan
crucial context to decision making. transportation routing, improve public service delivery,
manage land development, and generate revenue by
increasing economic activity.
Local governments also use GIS in unique ways.
2.3.2 Government and public service Because governments are responsible for the long-term
health, safety, and welfare of citizens, wider issues need
2.3.2.1 Applications overview to be considered, including incorporating public values in
Government users were among the first to discover the decision making, delivering services in a fair and equi-
value of GIS. Indeed the first recognized GIS – the table manner, and representing the views of citizens by
Canadian Geographic Information System (CGIS) – was working with elected officials. Typical GIS applications
developed for natural resource inventory and management thus include monitoring public health risk, managing pub-
by the Canadian government (see Section 1.4.1). CGIS lic housing stock, allocating welfare assistance funds,
was a national system and, unlike now, in the early days and tracking crime. Allied to analysis using geodemo-
of GIS it was only national or federal organizations that graphics (see Section 2.3.3) they are also used for oper-
could afford the technology. Today GIS is used at all ational, tactical, and strategic decision making in law
levels of government from the national to the neighbor- enforcement, health care planning, and managing educa-
hood, and government users still comprise the biggest tion systems.
single group of GIS professionals. It is helping to supple- It is convenient to group local government GIS
ment traditional ‘top down’ government decision making applications on the basis of their contribution to
with ‘bottom up’ representation of real communities in asset inventory, policy analysis, and strategic model-
government decision making at all levels (Figure 2.7). ing/planning. Table 2.1 summarizes GIS applications in
We will see in later chapters how this deployment of this way.
GIS applications is consistent with greater supplementa- These applications can be implemented as centralized
tion of ‘top down’ deductivism with ‘bottom up’ induc- GIS or distributed desktop applications. Some will be
tivism in science. The importance of spatial variation designed for use by highly trained GIS professionals,
to government and public service should not be under- while citizens will access others as ‘front counter’
estimated – 70–80% of local government work should or Internet systems. Chapter 8 discusses the different
involve GIS in some way. implementation models for GIS.

As GIS has become cheaper, so it has come to be


used in government decision making at all levels 2.3.2.2 Case study application: GIS in tax
from the nation to the neighborhood. assessment
Today, local government organizations are acutely Tax mapping and assessment is a classic example of the
aware of the need to improve the quality of their products, value of GIS in local government. In many countries local

Unified control

Federal/
Central Government

national audit/inspection, Central Government


drafting legislation Organizations

regional and State/Regional


state policy Government

neighborhood service
provision Local Government

Real Communitites

Top Down Bottom up


Acknowledges heterogeneity

Figure 2.7 The use of GIS at different levels of government decision making
CHAPTER 2 A GALLERY OF APPLICATIONS 43
Table 2.1 GIS applications in local government (simplified from O’Looney 2000)

Inventory Applications Policy Analysis Applications Management/Policy-Making


(locating property information (e.g., number of features Applications
such as ownership and per area, proximity to a (e.g., more efficient routing,
tax assessments by feature or land use, correlation modeling alternatives,
clicking on a map) of demographic features with forecasting future needs,
geological features) work scheduling)

Economic Location of major businesses Analysis of resource demand by Informing businesses of


Development and their primary resource potential local supplier availability of local suppliers
demands
Transportation Identification of sanitation Analysis of potential capacity Identification of ideal
and Services truck routes, capacities and strain given development in high-density development
Routing staffing by area; certain areas; analysis of areas based on criteria such
identification of landfill and accident patterns by type of as established transportation
recycling sites site capacity
Housing Inventory of housing stock age, Analysis of public support for Analysis of funding for housing
condition, status (public, housing by geographic area, rehabilitation, location of
private, rental, etc.), drive time from low-income related public facilities;
durability, and demographics areas to needed service planning for capital
facilities, etc. investment in housing based
on population growth
projections
Infrastructure Inventory of roads, sidewalks, Analysis of infrastructure Analysis to schedule
bridges, utilities (locations, conditions by demographic maintenance and expansion
names, conditions, variables such as income and
foundations, and most population change
recent maintenance)
Health Locations of persons with Spatial, time-series analysis of Analysis to pinpoint possible
particular health problems the spread of disease; effects sources of disease
of environmental conditions
on disease
Tax Maps Identification of ownership Analysis of tax revenues by land Projecting tax revenue change
data by land plot use within various distances due to land-use changes
from the city center
Human Services Inventory of neighborhoods Analysis of match between Facility siting, public
with multiple social risk service facilities and human transportation routing,
indicators; location of services needs and capacities program planning, and
existing facilities and services of nearby residents place-based social
designated to address these intervention
risks
Law Enforcement Inventory of location of police Analysis of police visibility and Reallocation of police resources
stations, crimes, arrests, presence; officers in relation and facilities to areas where
convicted perpetrators, and to density of criminal activity; they are likely to be most
victims; plotting police beats victim profiles in relation to efficient and effective;
and patrol car routing; alarm residential populations; creation of random routing
and security system locations police experience and beat maps to decrease
duties predictability of police beats
Land-use Planning Parcel inventory of zoning Analysis of percentage of land Evaluation of land-use plan
areas, floodplains, industrial used in each category, based on demographic
parks, land uses, trees, green density levels by characteristics of nearby
space, etc. neighborhoods, threats to population (e.g., will a
residential amenities, smokestack industry be sited
proximity to locally upwind of a respiratory
unwanted land uses disease hospital?)
(continued overleaf)
44 PART I INTRODUCTION
Table 2.1 (continued)

Inventory Applications Policy Analysis Applications Management/Policy-Making


Applications

Parks and Inventory of park Analysis of neighborhood access Modeling population growth
Recreation holdings/playscapes, trails by to parks and recreation projections and potential
type, etc. opportunities, age-related future recreational
proximity to relevant needs/playscape uses
playscapes
Environmental Inventory of environmental Analysis of spread rates and Modeling potential
Monitoring hazards in relation to vital cumulative pollution levels; environmental harm to
resources such as analysis of potential years of specific local areas; analysis of
groundwater; layering of life lost in a particular area place-specific multilayered
nonpoint pollution sources due to environmental hazards pollution abatement plans
Emergency Location of key emergency exit Analysis of potential effects of Modeling effect of placing
Management routes, their traffic flow emergencies of various emergency facilities and
capacity and critical danger magnitudes on exit routes, response capacities in
points (e.g., bridges likely to traffic flow, etc. particular locations
be destroyed by an
earthquake)
Citizen Location of persons with Analysis of voting characteristics Modeling effect of placing
Information/ specific demographic of particular areas information kiosks at
Geodemographics characteristics such as voting particular locations
patterns, service usage and
preferences, commuting
routes, occupations

government agencies have a mandate to raise revenue 2.3.2.3 Method


from property taxes. The amount of tax payable is partly
or wholly determined by the value of taxable land and Tax Assessors, working in a Tax Assessor’s Office,
property. A key part of this process is evaluating the are responsible for accurately, uniformly, and fairly
value of land and property fairly to ensure equitable judging the value of all taxable properties in their
distribution of a community’s tax burden. In the United jurisdiction. Details about properties are maintained on
States the task of determining the taxable value of land a tax assessment roll that includes information such as
and property is performed by the Tax Assessor’s Office, ownership, address, land and building value, and tax
which is usually a separate local government department. exemptions. The Assessor’s Office is also responsible
The Valuation Office Agency fulfills a similar role in the for processing applications for tax abatement, in cases
UK. The tax department can quickly get overwhelmed of overvaluation, and exemptions for surviving spouses,
with requests for valuation of new properties and protests veterans, and the elderly. Figure 2.8 shows some aspects
about existing valuations. of a tax assessment GIS in Ohio, USA.
A GIS is used to collect and manage the geographic
The Tax Assessor’s Office is often the first home of boundaries and associated information about properties.
GIS in local government. Typically, data associated with properties is held in a
Essentially, a Tax Assessor’s role is to assign a value Computer Assisted Mass Appraisal (CAMA) system that
to properties using three basic methods: cost, income, and is responsible for sale analysis, evaluation, data manage-
market. The cost method is based on the replacement cost ment, and administration, and for generating notices to
of the property and the value of the land. The Tax Assessor owners. CAMA systems are usually implemented on top
must examine data on construction costs and vacant land of a database management system (DBMS) and can be
values. The income method takes into consideration how linked to the parcel database using a common key (see
much income a property would generate if it were rented. Section 10.2 for further discussion of how this works).
This requires details on current market rents, vacancy The basic tax assessment task involves a geographic
rates, operating expenses, taxes, insurance, maintenance, database query to locate all sales of similar properties
and other costs. The market method is the most popular. within a predetermined distance of a given property.
It compares the property to other recent sales that have a The property to be valued is first identified in the
similar location, size, condition, and quality. property database. Next, a geographic query is used to
Collecting, storing, managing, analyzing, and display- ascertain the values of all comparable properties within
ing all this information is a very time-consuming activity a predetermined search radius (typically one mile) of
and not surprisingly GIS has had a major impact on the the property. These properties are then displayed on the
way Tax Assessors go about their business. assessor’s screen. The assessor can then compare the
CHAPTER 2 A GALLERY OF APPLICATIONS 45

(A) (B)

Figure 2.8 Lucas County, Ohio, USA tax assessment GIS: (A) tax map; (B) property attributes and photograph

characteristics of these properties (lot size, sales price and Principles


date of sale, neighborhood status, property improvements, Tax assessment makes the assumption that, other things
etc.) and value the property. being equal, properties close together in space will have
similar values. This is an application of Tobler’s First
Law of Geography, introduced in Section 3.1. However,
2.3.2.4 Scientific foundations: principles,
it is left to the Assessor to identify comparator properties
techniques, and analysis and to weight their relative importance. This seems
Scientific foundations rather straightforward, but in practice can prove very
Critical to the success of the tax assessment process difficult – particularly where the exact extent of the
is a high-quality, up-to-date geographic database that effects of good and bad neighborhood attributes cannot
can be linked to a CAMA system. Considerable effort be precisely delineated. In practice the value of location
must expended to design, implement, and maintain the in a given neighborhood is often assumed to be uniform
geographic database. Even for a small community of (see Section 4.7), and properties of a given construction
50 000 properties it can take several months to assemble type are also assumed to be identical. This assumption
the geographic descriptions of property parcels with may be valid in areas where houses were constructed at
their associated attributes. Chapters 9 and 10 explain the the same time according to common standards; however,
processes involved in managing geographic databases in older areas where infill has been common, properties of
such as this. Linking GIS and CAMA systems can be quite a given type vary radically in quality over short distances.
straightforward providing that both systems are based on
DBMS technology and use a common identifier to effect Techniques
linkage between a map feature and a property record. Tax assessment requires a good database, a plan for
Typically, a unique parcel number (in the US) or unique system management and administration, and a workflow
property reference number (in the UK) is used. design. These procedures are set out in Chapters 9, 10
and 17. The alternative of manually sorting paper records,
A high-quality geographic database is essential to or even tabular data in a CAMA system, is very laborious
tax assessment. and time-consuming, and thus the automated approach of
GIS is very cost-effective.
Clearly, the system is dependent on an unambiguous
definition of parcels, and common standards about how Analysis
different characteristics (such as size, age, and value Tax assessment actually uses standard GIS techniques
of improvements) are represented. The GIS can help such as proximity analysis, and geographic and attribute
enforce coding standards and can be used to derive some query, mapping, and reporting. These must be robust
characteristics automatically in an objective fashion. For and defensible when challenged by individuals seeking
example, GIS makes it straightforward to calculate the reductions in assessments. Chapter 14 sets out appropriate
area of properties using boundary coordinates. procedures, while Chapter 12 describes appropriate con-
Fundamentally, this application, like many others ventions for representing properties and neighborhoods
in GIS, depends upon an unambiguous and accurate cartographically.
inventory of geographic extent. To be effective it must
link this with clear, intelligible, and stable attribute
descriptors. These are all core characteristics of scientific
2.3.2.5 Generic scientific questions arising
investigation, and although the application is driven by from the application
results rather than scientific curiosity, it nevertheless This is not perhaps the most glamorous application of
follows scientific procedures of controlled comparison. GIS, but its operational value in tax assessment cannot be
46 PART I INTRODUCTION
overestimated. It requires an up-to-date inventory of prop-
erties and information from several sources about sales
and sale prices, improvements, and building programs.
To help tax assessors understand geographic variations
in property characteristics it is also possible to use GIS
for more strategic modeling activities. The many tools in
GIS for charting, reporting, mapping, and exploratory data
analysis help assessors to understand the variability of
property value within their jurisdictions. Some assessors
have also built models of property valuations and have
clustered properties based on multivariate criteria (see
Section 4.7). These help assessors to gain knowledge of
the structure of communities and highlight unusually high
or low valuations. Once a property database has been
created, it becomes a very valuable asset, not just for
the tax assessor’s department, but also for many other
departments in a local government agency. Public works
departments may seek to use it to label access points for Figure 2.9 A geodemographic profile: Town Gown Transition
repairs and meter reading, housing departments may use (a Type within the Urban Intelligence Group of the 2001
it to maintain data on property condition, and many other MOSAIC classification). (Courtesy of Experian Limited. 
departments may like shared access to a common address 2004 Experian Limited. All rights reserved)
list for record keeping and mailings.
Groups (such as ‘Happy Families’, ‘Urban Intelligence’,
A property database is useful for many purposes
and ‘Blue Collar Enterprise’), which in turn are subdi-
besides tax assessment. vided into a total of 61 Types. Geodemographic data are
frequently used in business applications to identify geo-
2.3.2.6 Management and policy graphic variations in the incidences of customer types.
They are often supplemented by lifestyles data on the
Tax assessment is a key local government applica-
consumption choices and shopping habits of individuals
tion because it is a direct revenue generator. It is
who fill out questionnaires or participate in store loyalty
easy to develop a cost-benefit case for this application
programs. The term market area analysis describes the
(Chapter 17) and it can pay for a complete department or
activity of assessing the distribution of retail outlets rela-
corporate GIS implementation quickly (Chapter 18). Tax
tive to the greatest concentrations of potential customers.
assessment is a service offered directly to members of the
The approach is increasingly being adapted to improving
public. As such, the service must be reliable and achieve
public service planning, in areas such as health, education,
a quick turnaround (usually with one week). It is quite
and law enforcement (see Box 2.1 and Section 2.3.2).
common for citizens to question the assessed value for
their property, since this is the principal determinant of Geodemographic data are the basis for much
the amount of tax they will have to pay. A tax assessor market area analysis.
must, therefore, be able to justify the method and data
used to determine property values. A GIS is a great help The tools of business applications typically range from
and often convinces people of the objectivity involved simple desktop mapping to sophisticated decision sup-
(sometimes over-impressing people that it is totally sci- port systems. Tools are used to analyze and inform the
entific). As such, GIS is an important tool for efficiency range of operational, tactical, and strategic functions of
and equitable local government. an organization. These tools may be part of standard
GIS software, or they may be developed in-house by the
organization, or they may be purchased (with or with-
out accompanying data) as a ‘business solution’ product.
2.3.3 Business and service planning We noted in Section 1.1 that operational functions con-
cern the day-to-day processing of routine transactions
2.3.3.1 Applications overview and inventory analysis in an organization, such as stock
Business and service planning (sometimes called retail- management. Tactical functions require the allocation of
ing) applications focus upon the use of geographic data resources to address specific (usually short term) prob-
to provide operational, tactical, and strategic context to lems, such as store sales promotions. Strategic functions
decisions that involve the fundamental question, where? contribute to the organization’s longer-term goals and
Geodemographics is a shorthand term for composite indi- mission, and entail problems such as opening new stores
cators of consumer behavior that are available at the or rationalizing existing store networks. Early business
small-area level (e.g., census output area, or postal zone). applications were simply concerned with mapping spa-
Figure 2.9 illustrates the profile of one geodemographic tially referenced data, as a general descriptive indicator
type from a UK classification called Mosaic, developed by of the retail environment. This remains the first stage in
market researcher and academic Richard Webber. The cur- most business applications, and in itself adds an important
rent version of Mosaic divides the UK population into 11 dimension to analysis of organizational function. More
CHAPTER 2 A GALLERY OF APPLICATIONS 47

Biographical Box 2.1

Marc Farr, geodemographer


‘City Adventurers are young, well educated and open to new ideas and
influences. They are cosmopolitan in their tastes and liberal in their
social attitudes. Few have children. Many are in further education while
others are moving into full-time employment. Most do not feel ready to
make permanent commitments, whether to partners, professions or to
specific employers. As higher education has become internationalized, the
City Adventurers group has acquired many foreign-born residents, which
further encourages ethnic and cultural variety.’
This is the geodemographic profile of the neighborhood in Hove, UK,
where Marc Farr lives. Marc read economics and marketing at Lancaster
University before going to work as a market researcher in London, first
at the TMS Partnership and then at Experian. His work involved use of
geodemographic data to analyze retail catchments, measure insurance
risk, and analyze household expenditure patterns.
Over time, Marc gained increasing consultancy responsibilities for public Figure 2.10 Marc Farr,
sector clients in education, health, and law enforcement. As a consequence geodemographer
of his developing interests in the problems that they face, five years after
graduating, he began to work on a Ph.D. that used geodemographics to analyze the ways in which
prospective students in the UK choose the universities at which they want to study. He did this work in
association with the UK Universities Central Admissions Service (UCAS). Speaking about his Ph.D., which
was completed after five years’ part time study, Marc says: ‘I question the assumptions that the massive
increases in numbers of people entering UK higher education during the late 1990s will reduce inequality
between different socio-economic groups, or that they will necessarily improve economic and social mobility.
Geodemographic analysis also suggests that we need to better understand the relationship between the
geography of demand for higher education and its physical supply.’
Marc now works for the Dr. Foster consultancy firm, where he has responsibilities for the calculation of
hospital and health authority performance statistics.

recently, decision support tools used by Spatially Aware merge; (b) in response to competitive threat; or (c) in
Professionals (SAPs, Section 1.4.3.2) have created main- response to changes in the retail environment. Changes
stream research and development roles for business GIS in the retail environment may be short term and cyclic, as
applications. in the response to the recession phase of business cycles,
Some of the operational roles of GIS in business or structural, as with the rationalization of clearing bank
are discussed under the heading of logistics applica- branches following the development of personal, tele-
tions in Section 2.3.4. These include stock flow man- phone, and Internet-based banking (see Section 18.4.4).
agement systems and distribution network management, Still other organizations undergo spatial restructuring, as
the specifics of which vary from industry sector to sec- in the market repositioning of bank branches to sup-
tor. Geodemographic analysis is an important opera- ply a wider range of more profitable financial services.
tional tool in market area analysis, where it is used to
Spatial restructuring is often the consequence of tech-
plan marketing campaigns. Each of these applications
nological change. For example, a ‘clicks and mortar’
can be described as assessing the circumstances of an
strategy might be developed by a chain of conventional
organization.
The most obvious strategic application concerns the bookstores, whereby their retail outlets might be recon-
spatial expansion of a new entrant across a retail market. figured to offer reliable pick-up points for Internet and
Expansion in a market poses fundamental spatial prob- telephone orders – perhaps in association with location-
lems – such as whether to expand through contagious based services (Sections 1.4.3.1 and 11.3.2). This may
diffusion across space, or hierarchical diffusion down confer advantage over new, purely Internet-based entrants.
a settlement structure, or to pursue some combination A final type of strategic operation involves distribution of
of the two (Figure 2.11). Many organizations periodi- goods and services, as in the case of so-called ‘e-tailers’,
cally experience spatial consolidation and branch ratio- who use the Internet for merchandizing, but must create
nalization. Consolidation and rationalization may occur: or buy into viable distribution networks. These various
(a) when two organizations with overlapping networks strategic operations require a range of spatial analytic
48 PART I INTRODUCTION

(A) quality and might be encouraged to purchase goods from


the company’s ‘Finest’ range. This is a very powerful
marketing tool, although unlike geodemographic discrim-
inators these data tell the company rather little about those
households that are not their customers, or the products
that their own customers buy elsewhere.
A third driver to growth entails the creation or
acquisition of much smaller neighborhood stores (low
order, in the lightest red in Figure 2.11). These provide a
local community service and are not very intrusive on the
retail landscape, and are thus much easier to create within
the constraints of the planning system. The ‘Express’
format store shown in Figure 2.12A opened in 2003 and
(B)
was planned by Tesco’s in-house store location team using
GE Smallworld GIS. Figure 2.12B shows its location
(labeled T3) in Bournemouth, UK, in relation to the edge
of the town and the locations of five competitor chains.
GIS can be used to predict the success of a retailer
in penetrating a local market area.

2.3.3.3 Method
The location is in a suburban residential neighborhood,
and as such it was anticipated that its customer base would
be mainly local – that is, resident within a 1 km radius of
the store. A budget was allocated for promoting the new
establishment, in order to encourage repeat patronage. An
Figure 2.11 (A) Hierarchical and (B) contagious spatial established means of promoting new stores is through
diffusion leaflet drops or enclosures with free local newspapers.
However, such tactical interventions are limited by the
tools and data types, and entail a move from ‘what-is’ coarseness of distribution networks – most organizations
visualization to ‘what-if’ forecasts and predictions. that deliver circulars will only undertake deliveries for
complete postal sectors (typically 20 000 population size)
and so this represents a rather crude and wasteful medium.
2.3.3.2 Case study application: hierarchical A second strategy would be to use the GIS to identify
diffusion and convenience shopping all of the households resident within a 1 km radius of the
store. Each UK unit postcode (roughly equivalent to a US
Tesco is, by some margin, the most successful grocery Zip+4 code, and typically comprising 18–22 addresses)
(food) retailer in the UK, and has used its knowledge is assigned the grid reference of the first mail delivery
of the home market to launch successful initiatives in point on the ‘mailman’s walk’. Thus one of the quickest
Asia and the developing markets of Eastern Europe (see ways of identifying the relevant addresses entails plotting
Figure 1.7). Achieving real sales growth in its core busi-
ness of groceries is difficult, particularly in view of a
strict national planning regime that prevents widespread (A)
development of new stores, and legislation to prevent the
emergence, through acquisitions, of local spatial monop-
olies of supply. One way in which Tesco has succeeded
in sustaining market growth in the domestic market in
these circumstances is through strategic diversification
into consumer durables and clothing in its largest (high
order, colored dark red in Figure 2.11) stores. A second
driver to growth has been the successful development of
a store loyalty card program, which rewards members
with money-off coupons or leisure experiences according
to their weekly spend. This program generates lifestyles
data as a very useful by-product, which enables Tesco
to identify consumption profiles of its customers, not
unlike the Mosaic geodemographic system. This enables
the company to identify, for example, whether customers
are ‘value driven’ and should be directed to budget food Figure 2.12 (A) The site and (B) the location of a new Tesco
offerings, or whether they are principally motivated by Express store
CHAPTER 2 A GALLERY OF APPLICATIONS 49

(B)

Figure 2.12 (continued)

the unit postcode addresses and selecting those that lie 2.3.3.4 Scientific foundations: geographic
within the search radius. Matching the unit postcodes with principles, techniques, and analysis
the full postcode address file (PAF) then suggests that
The following assumptions and organizing principles are
there are approximately 3236 households resident in the
inherent to this case study.
search area. Each of these addresses might then be mailed
a circular, thus eliminating the largely wasteful activity of Scientific foundations
contacting around 16 800 households that were unlikely to Fundamental to the application is the assumption that the
use the store. closer a customer lives to the store, the more likely he or
Yet even this tactic can be refined. Sending the same she is to patronize it. This is formalized as Tobler’s First
packet of money-off coupons to all 3236 households Law of Geography in Section 3.1 and is accommodated
assumes that each has identical disposable incomes into our representations as a distance decay effect. The
and consumption habits. There may be little point, for nature (Chapter 4) of such distance decay effects does
example, incentivizing a domestic-beer drinker to buy not have to be linear – in Section 4.5 we will introduce a
premium champagne, or vice-versa. Thus it makes sense range of non-linear effects.
to overlay the pattern of geodemographic profiles onto the The science of geodemographic profiling can be stated
target area, in order to tailor the coupon offerings to the succinctly as ‘birds of a feather flock together’ – that
differing consumption patterns of ‘blue collar enterprise’ is, the differences in the observed social and economic
neighborhoods versus those classified as belonging to characteristics of residents between neighborhoods are
‘suburban comfort’, for example. greater than differences observed within them. The use
There is a final stage of refinement that can be devel- of small-area geodemographic profiles to mix the coupon
oped for this analysis. Using its lifestyles (storecard) data, incentives that might be sent to prospective customers
assumes that each potential customer is equally and
Tesco can identify those households who already prefer
utterly typical of the post code in which he or she
to use the chain, despite the previous nonavailability of
resides. The individual resident in an area is thus
a local convenience store. Some of these customers will
assigned the characteristics of the area. In practice,
use Tesco for their main weekly shop, but may ‘top up’ of course, individuals within households have different
with convenience or perishable goods (such as bread, characteristics, as do households within streets, zones,
milk, or cut flowers) from a competitor. Such house- and any other aggregation. The practice of confounding
holds might be offered stronger incentives to purchase characteristics of areas with individuals resident in them
particular ranges of goods from the new store, without is known as committing the ecological fallacy – the
the wasteful ‘cannibalizing’ activity of offering coupons term does not refer to any branch of biology, but
towards purchases that are already made from Tesco in shares with ecology a primary concern with describing
the weekly shop. the linkage of living organisms (individuals) to their
50 PART I INTRODUCTION
geographical surroundings. This is inevitable in most Techniques
socio-economic GIS applications because data that enable The assignment of unit postcode coordinates to the
sensitive characteristics of individuals must be kept catchment zone is performed through a procedure known
confidential (see the point about GIS and the surveillance as point in polygon analysis, which is considered in our
society in Section 1.7). Whilst few could take offence discussion of transformation (Section 14.4.2).
at error in mis-targeting money-off coupons as in this The analysis as described here assumes that the prin-
example, ecological analysis has the potential to cause ciples that underpin consumer behavior in Bournemouth,
distinctly unethical outcomes if individuals are penalized UK, are essentially the same as those operating anywhere
because of where they live – for example if individuals else on Planet Earth. There is no attempt to accommodate
find it difficult to gain credit because of the credit regional and local factors. These might include: adjusting
histories of their neighborhoods. Such discriminatory the attenuating effect of distance (see above) to accommo-
activity is usually prevented by industry codes of conduct date the different distances people are prepared to travel
or even legislation. The use of lifestyles data culled to find a convenience store (e.g., as between an urban and
from store loyalty card records enables individuals to a rural area); and adjusting the likely attractiveness of the
be targeted precisely, but such individuals might well outlet to take account of ease of access, forecourt size, or
be geographically or socially unrepresentative of the a range of qualitative factors such as layout or branding.
population at large or the market as a whole. A range of spatial techniques is now available for making
More generally, geography is a science that has very the general properties of spatial analysis more sensitive
few natural units of analysis – what, for example, is to context.
the natural unit for measuring a soil profile? In socio-
economic applications, even if we have disaggregate data Analysis
we might remain uncertain as to whether we should Our stripped down account of the store location prob-
consider the individual or the household as the basic unit lem has not considered the competition from the other
of analysis – sometimes one individual in a household stores shown in Figure 2.12B – despite that fact that
always makes the important decisions, while in others all residents almost certainly already purchase conve-
this is a shared responsibility. We return to this issue in nience goods somewhere! Our description does, however,
our discussion of uncertainty (Chapter 6). address the phenomenon of cannibalizing – whereby new
The use of lifestyle data from store loyalty programs outlets of a chain poach customers from its existing
allows the retailer to enrich geodemographic profiling sites. In practice, both of these issues may be addressed
with information about its own customers. This is a through analysis of the spatial interactions between
cutting-edge marketing activity, but one where there is stores. Although this is beyond the scope of this book,
plenty of scope for relevant research that is able to take Mark Birkin and colleagues have described how the tra-
‘what is’ information about existing customer character- dition of spatial interaction modeling is ideally suited to
istics and use it to conduct ‘what if’ analysis of behavior the problems of defining realistic catchment areas and esti-
given a different constellation of retail outlets. We return mating store revenues. A range of analytic solutions can
to the issue of defining appropriate predictor variables and be devised in order to accommodate the fact that store
measurement error in our discussions of spatial depen- catchments often overlap.
dence (Section 4.7) and uncertainty (Section 6.3). More
fundamentally still, is it acceptable (predictively and eth-
ically) to represent the behavior of consumers using any 2.3.3.5 Generic scientific questions arising
measurable socio-economic variables? from the application
A dynamic retail sector is fundamental to the functioning
Principles
of all advanced economies, and many investments in
The definition of the primary market area that is to receive
location are so huge that they cannot possibly be left to
incentives assumes that a linear radial distance measure is
chance. Doing nothing is simply not an option. Intuition
intrinsically meaningful in terms of defining market area.
tells us that the effects of distance to outlet, and the
In practice, there are a number of severe shortcomings
organization of existing outlets in the retail hierarchy
in this. The simplest is that spatial structure will distort
must have some kind of impact upon patterns of store
the radial measure of market area–the market is likely to
patronage. But, in intensely competitive consumer-led
extend further along the more important travel arteries,
markets, the important question is how much impact?
for example, and will be restricted by physical obstacles
such as blocked-off streets and rivers, and by traffic Human decision making is complex, but predicting
management devices such as stop lights. Impediments even a small part of it can be very important to
to access may be perceived as well as real – it may a retailer.
be that residents of West Parley (Figure 2.12B) would
never think of going into north Bournemouth to shop, Consumers are sophisticated beings and their shopping
for example, and that the store’s customers will remain behavior is often complex. Understanding local patterns
overwhelmingly drawn from the area south of the store. of convenience shopping is perhaps quite straightforward,
Such perceptions of psychological distance can be very when compared with other retail decisions that involve
important yet difficult to accommodate in representations. stores that have a wider range of attributes, in terms of
We return to the issue of appropriate distance metrics in floor space, range and quality of goods and services, price,
Sections 4.6 and 14.3.1. and customer services offered. Different consumer groups
CHAPTER 2 A GALLERY OF APPLICATIONS 51
find different retailer attributes attractive, and hence it 2.3.4 Logistics and transportation
is the mix of individuals with particular characteristics
that largely determines the likely store turnover of a
particular location. Our example illustrates the kinds of
2.3.4.1 Applications overview
simplifying assumptions that we may choose to make Knowing where things are can be of enormous importance
using the best available data in order to represent for the fields of logistics and transportation, which deal
consumer characteristics and store attributes. However, with the movement of goods and people from one place
it is important to remember that even blunt-edged tools to another, and the infrastructure (highways, railroads,
can increase the effectiveness of operational and strategic canals) that moves them. Highway authorities need to
R&D (research and development) activities many-fold. decide what new routes are needed and where to build
An untargeted leafleting campaign might typically achieve them, and later need to keep track of highway condition.
a 1% hit rate, while one informed by even quite Logistics companies (e.g., parcel delivery companies,
rudimentary market area analysis might conceivably shipping companies) need to organize their operations,
achieve a rate that is five times higher. The pessimist deciding where to place their central sorting warehouses
might dwell on the 95% failure rate that a supposedly and the facilities that transfer goods from one mode
scientific approach entails, yet the optimist should be more to another (e.g., from truck to ship), how to route
than happy with the fivefold increase in the efficiency of parcels from origins to destinations, and how to route
use of the marketing budget! delivery trucks. Transit authorities need to plan routes
and schedules, to keep track of vehicles and to deal with
incidents that delay them, and to provide information on
2.3.3.6 Management and policy the system to the traveling public. All of these fields
The geographic development of retail and business employ GIS, in a mixture of operational, tactical, and
organizations has sometimes taken place in a haphazard strategic applications.
way. However, the competitive pressures of today’s
markets require an understanding of branch location The field of logistics addresses the shipping and
networks, as well as their abilities to anticipate and transportation of goods.
respond to threats from new entrants. The role of Internet Each of these applications has two parts: the static part
technologies in the development of ‘e-tailing’ is important that deals with the fixed infrastructure, and the dynamic
too, and these introduce further spatial problems to part that deals with the vehicles, goods, and people that
retailing – for example, in developing an understanding move on the static part. Of course, not even a highway
of the geographies of engagement with new information network is truly static, since highways are often rebuilt,
and communications technologies and in working out new highways are added, and highways are even some-
the logistics of delivering goods and services ordered in times moved. But the minute-to-minute timescale of vehi-
cyberspace to the geographic locations of customers (see cle movement is sharply different from the year-to-year
Section 2.3.4). changes in the infrastructure. Historically, GIS has been
Thus the role of the Spatially Aware Professional is
easier to apply to the static part, but recent developments
increasingly as a mainstream manager alongside accoun- in the technology are making it much more powerful as a
tants, lawyers, and general business managers. SAPs
tool to address the dynamic part as well. Today, it is pos-
complement understanding of corporate performance in
sible to use GPS (Section 5.8) to track vehicles as they
national and international markets with performance at
move around, and transit authorities increasingly use such
the regional and local levels. They have key roles in such
systems to inform their users of the locations of buses and
areas of organizational activity as marketing, store rev-
trains (Section 11.3.2 and Box 13.4).
enue predictions, new product launch, improving retail
GPS is also finding applications in dealing with emer-
networks, and the assimilation of pre-existing compo-
gency incidents that occur on the transportation network
nents into combined store networks following mergers and
(Figure 2.13). The OnStar system (www.onstar.com) is
acquisitions.
one of several products that make use of the ability of
Spatially Aware Professionals do much more than GPS to determine location accurately virtually anywhere.
simple mapping of data.
When installed in a vehicle, the system is programmed to
transmit location automatically to a central office when-
Simple mapping packages alone provide insufficient ever the vehicle is involved in an accident and its airbags
scientific grounding to resolve retail location problems. deploy. This can be life-saving if the occupants of the
Thus a range of GIServices have been developed – some vehicle do not know where they are, or are otherwise
in house, by large retail corporations (such as Tesco, unable to call for help.
above), some by software vendors that provide analyti- Many applications in transportation and logistics
cal and data services to retailers, and some by specialist involve optimization, or the design of solutions to meet
consultancy services. There is ongoing debate as to which specified objectives. Section 15.3 discusses this type of
of these solutions is most appropriate to retail applica- analysis in detail, and includes several examples dealing
tions. The resolution of this debate lies in understanding with transportation and logistics. For example, a delivery
the nature of particular organizations, their range of goods company may need to deliver parcels to 200 locations
and services, and the priority that organizations assign to in a given shift, dividing the work between 10 trucks.
operational, tactical, and strategic concerns. Different ways of dividing the work, and routing the
52 PART I INTRODUCTION

Figure 2.14 Hurricane Frances approaching the coast of


Florida, USA, September 3, 2004 (Courtesy US National
Figure 2.13 Systems such as OnStar allow information on the Oceanic and Atmospheric Administration, NOAA)
location of an accident, determined by a GPS unit in the
vehicle, to be sent to a central office and compared to a GIS
would be used in such cases to determine areas lying
database of highways and streets, to determine the incident
location so that emergency teams can respond
within a specified distance). Locations can be anticipated
for some disasters, such as those resulting from fire
in buildings known to be storing toxic chemicals, but
vehicles, can result in substantial differences in time hurricanes and earthquakes can impact almost anywhere
and cost, so it is important for the company to use the within large areas.
most efficient solution (see Box 15.4 for an example
of the daily workload of an elevator repair company). The magnitude and location of a disaster can
Logistics and related applications of GIS have been rarely be anticipated.
known to save substantially over traditional, manual ways
of determining routes. To illustrate the value of GIS in evacuation plan-
ning, we have chosen the work of Tom Cova, an aca-
GIS has helped many service and delivery demic expert on GIS in emergency management. Tom’s
companies to substantially reduce their operating early work was strongly motivated by the problems that
costs in the field. occurred in the Oakland Hills fire of October 1991
in Northern California, USA, which destroyed approxi-
mately 1580 acres and over 2700 structures in the East
2.3.4.2 Case study application: planning for Bay Hills. This became the most expensive fire disaster
emergency evacuation in Californian history (Figure 2.15), taking 25 lives and
Modern society is at risk from numerous types of disas- causing over $1.68 billion in damages.
ters, including terrorist attacks, extreme weather events Cova has developed a planning tool that allows neigh-
such as hurricanes, accidental spills of toxic chemicals borhoods to rate the potential for problems associated with
resulting from truck collisions or train derailments, and evacuation, and to develop plans accordingly. The tool
earthquakes. In recent years several major events have
required massive evacuation of civilian populations – for
example, 800 000 people evacuated in Florida in advance
of Hurricane Frances in 2004 (Figure 2.14).
In response to the threat of such events, most
communities attempt to plan. But planning is made
particularly difficult because the magnitude and location
of the event can rarely be anticipated. Suppose, for
example, that we attempt to develop a plan for dealing
with a spill of a volatile toxic chemical resulting from a
train derailment. It might make sense to plan for the worst
case, for example the spillage of the entire contents of
several cars loaded with chlorine gas. But the derailment
might occur anywhere on the rail network, and the impact
will depend on the strength and direction of the wind.
Possible scenarios might involve people living within tens
of kilometers of any point on the track network (see Figure 2.15 The Oakland Hills fire of October, 1991, which
Section 14.4.1 for details of the buffer operation, which took 25 lives, in part because of the difficulty of evacuation
CHAPTER 2 A GALLERY OF APPLICATIONS 53
uses a GIS database containing information on the distri- routes. The red area in the upper left has a much lower
bution of population in the neighborhood, and the street population density, but has only one narrow exit.
pattern. The result is an evacuation vulnerability map.
Because the magnitude of a disaster cannot be known in
advance, the method works by identifying the worst-case
2.3.4.3 Method
scenario that could affect a given location. Two types of data are required for the analysis. Census
Suppose a specific household is threatened by an event data are used to determine population and household
that requires evacuation, such as a wildfire, and assume for counts, and to estimate the number of vehicles involved
the moment that one vehicle is needed to evacuate each in an evacuation. Census data are available as aggregate
household. If the house is in a cul-de-sac, the number of counts for areas of a few city blocks, but not for individual
vehicles needing to exit the cul-de-sac will be equal to the houses, so there will be some uncertainty regarding the
number of households on the street. If the entire neigh- exact numbers of vehicles needing to leave a specific
borhood of streets has only one exit, all vehicles carrying street, though estimates for entire neighborhoods will
people from the neighborhood will need to use that one be much more accurate. The locations of streets are
exit. Cova’s method works by looking further and further obtained from so-called street centerline files, which give
from the household location, to find the most important the geographic locations, names, and other details of
bottleneck – the one that has to handle the largest amount individual streets (see Sections 9.4 and 10.8 for overviews
of traffic. In an area with a dense network of streets traffic of geographic data sources). The TIGER (Topologically
will disperse among several exits, reducing the bottleneck Integrated Geographic Encoding and Referencing) files,
effect. But a densely packed neighborhood with only a sin- produced by the US Bureau of the Census and the
gle exit can be the source of massive evacuation problems, US Geological Survey and readily available from many
if a disaster requires the rapid evacuation of the entire sites on the Internet, are one free source of such
neighborhood. In the Oakland Hills fire there were several data for the USA, and many private companies also
critical bottlenecks – one-lane streets that normally carry offer such data, many adding new information such
traffic in both directions, but became hopelessly clogged as traffic flow volumes or directions (for US sources,
in the emergency. see, for example, GDT Inc., Lebanon, New Hampshire,
Figure 2.16 shows a map of Santa Barbara, California, now part of Tele Atlas, www.geographic.com; and
USA, with streets colored according to Cova’s measure NAVTEQ, formerly Navigation Technologies, Chicago,
of evacuation vulnerability. The color assigned to any Illinois, www.navteq.com).
location indicates the number of vehicles that would have Street centerline files are essential for many
to pass through the critical bottleneck in the worst-case
applications in transportation and logistics.
evacuation, with red indicating that over 500 vehicles per
lane would have to pass through the bottleneck. The red The analysis proceeds by beginning at every street
area near the shore in the lower left is a densely packed intersection, and working outwards following the street
area of student housing, with very few routes out of the connections to reach new intersections. Every connection
neighborhood. An evacuation of the entire neighborhood is tested to see if it presents a bottleneck, by dividing
would produce a very heavy flow of vehicles on these exit the total number of vehicles that would have to move out

Figure 2.16 Evacuation vulnerability map of the area of Santa Barbara, California, USA. Colors denote the difficulty of evacuating
an area based on the area’s worst-case scenario (Reproduced by permission of Tom Cova)
54 PART I INTRODUCTION
of the neighborhood by the number of exit lanes. After of them in Section 15.3.3. Many WWW sites will find
all streets have been searched out to a specified distance shortest paths between two street addresses (Figure 1.17).
from the start, the worst-case value (vehicles per lane) In practice, people will often not use the shortest path,
is assigned to the starting intersection. Finally, the entire preferring routes that may be quicker but longer, or routes
network is colored by the worst-case value. that are more scenic.

Techniques
2.3.4.4 Scientific foundations: geographic
The techniques used in this example are widely available
principles, techniques, and analysis in GIS. They include spatial interpolation techniques,
Scientific foundations which are needed to assign worst-case values to the
Cova’s example is one of many applications that have streets, since the analysis only produces values for the
been found for GIS in the general areas of logistics and intersections. Spatial interpolation is widely applied in
transportation. As a planning tool, it provides a way GIS to use information obtained at a limited number
of rating areas against a highly uncertain form of risk, of sample points to guess values at other points, and
a major evacuation. Although the worst-case scenario is discussed in general in Box 4.3, and in detail in
that might affect an area may never occur, the tool Section 14.4.4.
nevertheless provides very useful information to planners The shortest path methods used to route traffic are
who design neighborhoods, giving them graphic evidence also widely available in GIS, along with other functions
of the problems that can be caused by lack of foresight needed to create, manage, and visualize information
in street layout. Ironically, the approach points to a about networks.
major problem with the modern style of street layout
in subdivisions, which limits the number of entrances to Analysis
subdivisions from major streets in the interests of creating Cova’s technique is an excellent example of the use of
a sense of community, and of limiting high-speed through GIS analysis to make visible what is otherwise invisible.
traffic. Cova’s analysis shows that such limited entrances By processing the data and mapping the results in
can also be bottlenecks in major evacuations. ways that would be impossible by hand, he succeeds
The analysis demonstrates the value of readily avail- in exposing areas that are difficult to evacuate and
able sources of geographic data, since both major draws attention to potential problems. This idea is so
inputs – demographics and street layout – are available in central to GIS that it has sometimes been claimed as the
digital form. At the same time we should note the limita- primary purpose of the technology, though that seems a
tions of using such sources. Census data are aggregated to little strong, and ignores many of the other applications
areas that, while small, nevertheless provide only aggre- discussed in this chapter.
gated counts of population. The street layouts of TIGER
and other sources can be out of date and inaccurate, par-
ticularly in new developments, although users willing to 2.3.4.5 Generic scientific questions arising
pay higher prices can often obtain current data from the
from the application
private sector. And the essentially geometric approach
cannot deal with many social issues: evacuation of the Logistic and transportation applications of GIS rely heav-
disabled and elderly, and issues of culture and language ily on representations of networks, and often must ignore
that may impede evacuation. In Chapter 16 we look at off-network movement. Drivers who cut through parking
this problem using the tools of dynamic simulation mod- lots, children who cross fields on their way to school,
eling, which are much more powerful and provide ways houses in developments that are not aligned along linear
of addressing such issues. streets, and pedestrians in underground shopping malls all
confound the network-based analysis that GIS makes pos-
Principles sible. Humans are endlessly adaptable, and their behavior
Central to Cova’s analysis is the concept of connectivity. will often confound the simplifying assumptions that are
Very little would change in the analysis if the input inherent to a GIS model. For example, suppose a system is
maps were stretched or distorted, because what matters developed to warn drivers of congestion on freeways, and
is how the network of streets is connected to the rest to recommend alternative routes on neighborhood streets.
of the world. Connectivity is an instance of a topological While many drivers might follow such recommendations,
property, a property that remains constant when the spatial others will reason that the result could be severe conges-
framework is stretched or distorted. Other examples of tion on neighborhood streets, and reduced congestion on
topological properties are adjacency and intersection, the freeway, and ignore the recommendation. Residents
both of which cannot be destroyed by stretching a map. of the neighborhood streets might also be tempted to try
We discuss the importance of topological properties and to block the use of such systems, arguing that they result
their representation in GIS in Section 10.7.1. in unwanted and inappropriate traffic, and risk to them-
The analysis also relies on being able to find the selves. Arguments such as these are based on the notion
shortest path from one point to another through a street that the transportation system can only be addressed as
network, and it assumes that people will follow such paths a whole, and that local modifications based on limited
when they evacuate. Many forms of GIS analysis rely on perspectives, such as the addition of a new freeway or
being able to find shortest paths, and we discuss some bypass, may create more problems than they solve.
CHAPTER 2 A GALLERY OF APPLICATIONS 55

2.3.4.6 Management and policy resident in cities and towns, and so understanding of
the environmental impacts of urban settlements is an
GIS is used in all three modes – operational, tactical,
increasingly important focus of attention in science
and strategic – in logistics and transportation. This section
and policy. Researchers have used GIS to investigate
concludes with some examples in all three categories.
and understand how urban sprawl occurs, in order to
In operational systems, GIS is used:
understand the environmental consequences of sprawl
■ To monitor the movement of mass transit vehicles, in and to predict its future consequences. Such predictions
order to improve performance and to provide can be based on historic patterns of growth, together
improved information to system users. with information on the locations of roads, steeply
■ To route and schedule delivery and service vehicles on sloping land unsuitable for development, land that is
a daily basis to improve efficiency and reduce costs. otherwise protected from urban use, and other factors that
encourage or restrict urban development. Each of these
In tactical systems: factors may be represented in map form, as a layer in
the GIS, while specialist software can be designed to
■ To design and evaluate routes and schedules for
simulate the processes that drive growth. These urban
public bus systems, school bus systems, garbage growth models are examples of dynamic simulation
collection, and mail collection and delivery.
models, or computer programs designed to simulate the
■ To monitor and inventory the condition of highway operation of some part of the human or environmental
pavement, railroad track, and highway signage, and to system. Figure 2.18, taken from the work of geographer
analyze traffic accidents. Paul Torrens, presents a simple simulation of urban
In strategic systems: growth in the American Mid-West under four rather
different growth scenarios: (A) uncontrolled suburban
■ To plan locations for new highways and pipelines, and sprawl; (B) growth restricted to existing travel arteries;
associated facilities. (C) ‘leap-frog’ development, occurring because of local
■ To select locations for warehouses, intermodal transfer zoning controls; and (D) development that is constrained
points, and airline hubs. to some extent.
Other applications are concerned with the simulation
of processes principally in the natural environment. Many
models have been coupled with GIS in the past decade,
2.3.5 Environment to simulate such processes as soil erosion, forest growth,
groundwater movement, and runoff. Dynamic simulation
2.3.5.1 Applications overview modeling is discussed in detail in Chapter 16.
Although it is the last area to be discussed here, the envi-
ronment drove some of the earliest applications of GIS,
and was a strong motivating force in the development 2.3.5.2 Case study application:
of the very first GIS in the mid-1960s (Section 1.4.1). deforestation on Sibuyan Island, the
Environmental applications are the subject of several GIS Philippines
texts, so only a brief overview will be given here for the If the increasing extent of urban areas, described above,
purposes of illustration. is one side of the development coin, then the reduction
The development of the Canada Geographic Infor- in the extent of natural land cover is frequently the other.
mation System in the 1960s was driven by the need Deforestation is one important manifestation of land use
for policies over the use of land. Every country’s land change, and poses a threat to the habitat of many species
base is strictly limited (although the Dutch have man- in tropical and temperate forest areas alike. Ecologists,
aged to expand theirs very substantially by damming and environmentalists, and urban geographers are therefore
draining), and alternative uses must compete for space. using GIS in interdisciplinary investigations to understand
Measures of area are critical to effective strategy – for the local conditions that lead to deforestation, and to
example, how much land is being lost to agriculture understand its consequences. Important evidence of the
through urban development and sprawl, and how will this rate and patterning of deforestation has been provided
impact upon the ability of future generations to feed them- through analysis of remote sensing images (again, see
selves? Today, we have very effective ways of monitoring Figure 15.14 for the case of the Amazon), and these
land use change through remote sensing from space, and analyses of pattern need to be complemented by analysis
are able to get frequent updates on the loss of tropical for- at detailed levels of the causes and underlying driving
est in the Amazon basin (see Figure 15.14). GIS is also factors of the processes that lead to deforestation. The
allowing us to devise measures of urban sprawl in his- negative environmental impacts of deforestation can be
torically separate national settlement systems in Europe ameliorated by adequate spatial planning of natural parks
(Figure 2.17). and land development schemes. But the more strategic
GIS allows us to compare the environmental objective of sustainable development can only be achieved
conditions prevailing in different nations.
if a holistic approach is taken to ecological, social, and
economic needs. GIS provides the medium of choice for
Generally, it is understood that the 21st century will integrating knowledge of natural and social processes in
see increasing proportions of the world’s population the interests of integrated environmental planning.
56 PART I INTRODUCTION

(A) (B)
Case Study Bristol
Local Moran I 1991
inhabitants per km^2
1 to 2 (0)
0.2 to 1 (25)
0.02 to 0.2 (50)
−0.02 to 0.02 (40)
−0.3 to −0.02 (54)

Case Study Brussels


Local Morans I 2001
inhabitants per km^2
1 to 5.3 (14)
0.2 to 1 (5)
0.02 to 0.2 (69)
−0.02 to 0.02 (13)
0 10 20 0 10 20
−0.2 to −0.02 (35)
kilometres kilometres

(C) (D)
Case Study Helsinki
Local Moran I 1999
inhabitants per km^2
1 to 12.4 (35)
0.2 to 1 (119)
0.02 to 0.2 (196)
−0.02 to 0.02 (58)
−0.3 to −0.02 (82)

Case Study Milan


Local Moran I 2001
inhabitants per km^2
1 to 4 (11)
0.2 to 1 (55)
0.02 to 0.2 (66)
0 10 20 −0.02 to 0.02 (24)
kilometres −0.2 to −0.02 (31) 0 10 20
kilometres

(E) (F) Case Study Stuttgart


Local Morans I 2000
inhabitants per km^2
1 to 3.8 (8)
0.2 to 1 (56)
0.02 to 0.2 (44)
−0.02 to 0.02 (29)
−0.4 to −0.02 (42)

Case Study Rennes


Local Moran I 1999
inhabitants per km^2
1 to 2.6 (1)
0.2 to 1 (10)
0.02 to 0.2 (53)
−0.02 to 0.02 (69)
0 10 20
0 10 20 −0.6 to −0.02 (7)
kilometres
kilometres

Figure 2.17 GIS enables standardized measures of sprawl for the different nation states of Europe (Reproduced by permission of
Guenther Haag & Elena Besussi)

Working at the University of Wageningen in the make it possible to anticipate future land use and habitat
Netherlands, Peter Verburg and Tom Veldkamp coordinate change, and hence also anticipate changes in biodiversity.
a research program that is using GIS to understand the
sometimes complex interactions that exist between socio-
economic and environmental systems, and to gauge their 2.3.5.3 Method
impact upon land use change in a range of different The initial stage of Verburg and Veldkamp’s research
regions of the world (see www.cluemodel.nl). One of was a qualitative investigation, involving interviews
their case study areas is Sibuyan Island in the Philippines with different stakeholders on the island to identify
(Figure 2.19A), where deforestation poses a major threat a list of factors that are likely to influence land use
to biodiversity. Sibuyan is a small island (area 456 km2 ) patterns. Table 2.2 lists the data that provided direct
of steep forested mountain slopes (Figure 2.19B) and or indirect indicators of pressure for land use change.
gently sloping coastal land that is used mainly for For example, the suitability of the soil for agriculture
agriculture, mining, and human settlement. The island has or the accessibility of a location to local markets can
remarkable biodiversity – there are an estimated 700 plant increase the likelihood that a location will be stripped
species, of which 54 occur only on Sibuyan Island, and a of forest and used for agriculture. They then used these
unique local fauna. The objective of this case study was to data in a quantitative GIS-based analysis to calculate the
identify a range of different development scenarios that probabilities of land use transition under three different
CHAPTER 2 A GALLERY OF APPLICATIONS 57

(A) (B) scenarios of land use change – each of which was based
on different spatial planning policies. Scenario 1 assumes
no effective protection of the forests on the island
(and a consequent piecemeal pattern of illegal logging),
Scenario 2 assumes protection of the designated natural
park area alone, and Scenario 3 assumes protection not
only of the natural park but also a GIS-defined buffer
zone. Figure 2.20 illustrates the forecasted remaining
forest area under each of the scenarios at the end of a
twenty-year simulation period (1999–2019). The three
different scenarios not only resulted in different forest
areas by 2019 but also different spatial patterning of
(C) (D) the remaining forest. For example, gaps in the forest
area under Scenario 1 were mainly caused by shifting
cultivation and illegal logging within the area of primary
rainforest, while most deforestation under Scenario 2
occurred in the lowland areas. Qualitative interpretations
of the outcomes and aggregate statistics are supplemented
by numerical spatial indices such as fractal dimensions in
order to anticipate the effects of changes upon ecological
processes – particularly the effects of disturbance at the
edges of the remaining forest area. Such statistics make it
possible to define the relative sizes of core and fragmented
forest areas (for example, Scenario 1 in Figure 2.20 leads
Figure 2.18 Growth in the American Mid-West under four
to the greatest fragmentation of the forest area), and
different urban growth scenarios. Horizontal extent of image is
this in turn makes it possible to measure the effects of
400 km (Source: P. Torrens 2005 ‘Simulating sprawl with
geographic automata models’, reproduced with permission of development on biodiversity. Fragmentation statistics are
Paul Torrens) discussed in Section 15.2.5.

The Philippines

N
Luzon

Manila
*
Sibuyan
Island

Visayas
Cebu
*
an
aw

Natural park
l
Pa

Buffer
N zone

Mindanao 2 0 2 4 6 Kilometers

100 0 100 200 Kilometers

Figure 2.19 (A) Location of Sibuyan Island in the Philippines, showing location of the park and buffer zone, and (B) typical
forested mountain landscape of Sibuyan Island
58 PART I INTRODUCTION
Table 2.2 Data sources used in Verburg and
Veldkamp’s ecological analysis

Land use
Mangroves
Coconut plantations
Wetland rice cultivation
Grassland
Secondary forest
Swidden agriculture
Primary rainforest and mossy forest
Location factors
Accessibility of roads, rivers and populated places
Altitude
Slope
Aspect
Geology
Geomorphology
Population density
Population pressure
Land tenure
Spatial policies
Figure 2.19 (continued)

2.3.5.4 Scientific foundations: geographic size of natural forest, and by changes in the patterning of
the forest that remained. It was hypothesized that changes
principles, techniques, and analysis
in patterning would be caused by the combined effects of
Scientific foundations a further set of physical, biological, and human processes.
The goal of the research was to predict changes in the Thus the researchers take the observed existing land use
biodiversity of the island. Existing knowledge of a set of pattern, and use understanding of the physical, biological,
ecological processes led the researchers to the view that and human processes to predict future land use changes.
biodiversity would be compromised by changes in overall The different forecasts of land use (based on different

Figure 2.20 Forest area (dark green) in 1999 and at the end of the land use change simulations (2019) for three different scenarios
CHAPTER 2 A GALLERY OF APPLICATIONS 59
scenarios) can then be used see whether the functioning Analysis
of ecological processes on the island will be changed in The GIS is used to simulate scenarios of future land use
the future. This theme of inferring process from pattern, change based on different spatial policies. The application
or function from form, is a common characteristic of is predicated on the premise that changes in ecological
GIScience applications. process can be reliably inferred from predicted changes
Of course, it is not possible to identify a uniform set in land use pattern. Process is inferred not just through
of physical, biological, and human processes that is valid size measures, but also through spatial measures of
in all regions of the world (the pure nomothetic approach connectivity and fragmentation – since these latter aspects
of Section 1.3). But conversely it does not make sense to affect the ability of a species to mix and breed without
treat every location as unique (the idiographic approach of disturbance. The analysis of the extent and ways in
Section 1.3) in terms of the processes extant upon it. The which different land uses fill space is performed using
art and science of ecological modeling requires us to make specialized software (see Section 15.2.5). Although such
a good call not just on the range of relevant determining spatial indices are useful tools, they may be more
factors, but also their importance in the specific case relevant to some aspects of biodiversity than others, since
study, with due consideration to the appropriate scale different species are vulnerable to different aspects of
range at which each is relevant. habitat change.

Geographic principles
2.3.5.5 Generic scientific questions arising
GIS makes it possible to incorporate diverse physical,
biological, and human elements, and to forecast the from the application
size, shape, scale, and dimension of land use parcels. GIS applications need to be based on sound science.
Therefore, it is possible in this case study to predict habitat In environmental applications, this knowledge base is
fragmentation and changes in biodiversity. Fundamental unlikely to be the preserve of any single academic disci-
to the end use of this analysis is the assumption that pline. Many environmental applications require recourse
the ecological consequences of future deforestation can to use of GIS in the field, and field researchers often
be reliably predicted using a forecast land use map. require multidisciplinary understanding of the full range
The forecasting procedure also assumes that the various of processes leading to land use change.
indicators of land development pressure are robust, Irrespective of the quality of the measurement process,
accurate, and reliable. There are inevitable uncertainties uncertainty will always creep into any prediction, for
in the ways in which these indicators are conceived and a number of reasons. Data are never perfect, being
measured. Further uncertainties are generated by the scale subject to measurement error (Chapter 6), and uncertainty
of analysis that is carried out (Section 6.4), and, like our arising out of the need selectively to generalize, abstract,
retailing example above, the qualitative importance of and approximate (Chapter 3). Furthermore, simulations
local context may be important. of land use change are subject to changes in exogenous
Land use change is deemed to be a measurable forces such as the world economy. Any forecast can only
response to a wide range of locationally variable fac- be a selectively simplified representation of the real world
tors. These factors have traditionally been the remit of and the processes operating within it. GIS users need to
different disciplines that have different intellectual tradi- be aware of this, because the forecasts produced by a
tions of measurement and analysis. As Peter Verburg says: GIS will always appear to be precise in numerical terms,
‘The research assumes that GIS can provide a sort of and spatial representations will usually be displayed using
“Geographic Esperanto” – that is, a common language to crisp lines and clear mapped colors.
integrate diverse, geographically variable factors. It makes GIS users should not think of systems as black boxes,
use of the core GIS idea that the world can be understood and should be aware that explicit spatial forecasts may
as a series of layers of different types of information, have been generated by invoking assumptions about
that can be added together meaningfully through overlay process and data that are not as explicit. User awareness
analysis to arrive at conclusions.’ of these important issues can be improved through
appropriate metadata and documentation of research
procedures – particularly when interdisciplinary teams
Techniques may be unaware of the disciplinary conventions that
The multicriteria techniques used to harmonize the govern data creation and analysis in parts of the research.
different location factors into a composite spatial indicator Interdisciplinary science and the cumulative development
of development pressure are widely available in GIS, and of algorithms and statistical procedures can lead GIS
are discussed in detail in Section 16.4. The individual applications to conflict with an older principle of scientific
component indicators are acquired using techniques such reporting, that the results of analysis should always be
as on-screen digitizing and classification of imagery to reported in sufficient detail to allow someone else to
obtain a land use map. All data are converted to a replicate them. Today’s science is complex, and all of
raster data structure with common resolution and extent. us from time to time may find ourselves using tools
Relations between the location factors and land use are developed by others that we do not fully understand. It is
quantified using correlation and regression analysis based up to all of us to demand to know as many of the details
on the spatial dataset (Section 4.7). of GIS analysis as is reasonably possible.
60 PART I INTRODUCTION
Users of GIS should always know exactly what the
system is doing to their data. 2.4 Concluding comments

2.3.5.6 Management and policy This chapter has presented a selection of GIS application
GIS is now widely used in all areas of environmental areas and specific instances within each of the selected
science, from ecology to geology, and from oceanography areas. Throughout, the emphasis has been on the range of
to alpine geomorphology. GIS is also helping to reinvent contexts, from day-to-day problem solving to curiosity-
environmental science as a discipline grounded in field driven science. The principles of the scientific method
observation, as data can be captured using battery- have been stressed throughout – the need to maintain
powered personal data assistants (PDAs) and notebooks, an enquiring mind, constantly asking questions about
before being analyzed on a battery-powered laptop in what is going on, and what it means; the need to use
a field tent, and then uploaded via a satellite link to terms that are well-defined and understood by others,
a home institution. The art of scientific forecasting (by so that knowledge can be communicated; the need to
no means a contradiction in terms) is developing in a describe procedures in sufficient detail so that they can
cumulative way, as interdisciplinary teams collaborate be replicated by others; and the need for accuracy,
in the development and sharing of applications that in observations, measurements, and predictions. These
range in sophistication from simple composite mapping principles are valid whether the context is a simple
projects to intensive numerical and statistical simulation inventory of the assets of a utility company, or the
experiments. simulation of complex biological systems.

Questions for further study Further reading


Birkin M., Clarke G.P., and Clarke M. 2002 Retail Geog-
1. Devise a diary for your own activity patterns for a raphy and Intelligent Network Planning. Chichester,
typical (or a special) day, like that described in UK: Wiley.
Section 2.1.1, and speculate how GIS might affect Chainey S., and Ratcliffe J. 2005 GIS and Crime Map-
your own daily activities. What activities are not ping. Chichester, UK: Wiley.
influenced by GIS, and how might its use in some of Greene R.W. 2000 GIS in Public Policy. Redlands, CA:
these contexts improve your daily quality of life? ESRI Press.
2. Compare and contrast the operational, tactical, and Harris R., Sleight P., and Webber R. 2005 Geodemo-
strategic priorities of the GIS specialists responsible graphics, GIS and Neighbourhood Targeting. Chich-
for the specific applications described in ester, UK: Wiley.
Sections 2.3.2, 2.3.3, 2.3.4 and 2.3.5. Johnston C.A. 1998 Geographic Information Systems in
3. Look at one of the applications chapters in the CD in Ecology. Oxford: Blackwell.
the Longley et al (2005) volume in the references Longley P.A., Goodchild M.F., Maguire D.J. and Rhind
below. To what extent do you believe that the author D.W. (eds) 2005 Geographical Information Systems:
of your chapter has demonstrated that GIS has been Principles; Techniques; Management and Applications
‘successful’ in application? Suggest some of the (abridged edition). Hoboken, N.J.: Wiley.
implicit and explicit assumptions that are made in O’Looney J. 2000 Beyond Maps: GIS and Decision Mak-
order to achieve a ‘successful’ outcome. ing in Local Government. Redlands, CA: ESRI Press.
4. Look at one of the applications areas in the CD in the
Longley et al (2005) volume in the references below.
Then re-examine the list of critiques of GIS at the
end of Section 1.7. To what extent do you think that
the critiques are relevant to the applications that you
have studied?
II

Principles
3 Representing geography
4 The nature of geographic data
5 Georeferencing
6 Uncertainty
3 Representing geography

This chapter introduces the concept of representation, or the construction of


a digital model of some aspect of the Earth’s surface. Representations have
many uses, because they allow us to learn, think, and reason about places
and times that are outside our immediate experience. This is the basis of
scientific research, planning, and many forms of day-to-day problem solving.
The geographic world is extremely complex, revealing more detail the
closer one looks, almost ad infinitum. So in order to build a representation
of any part of it, it is necessary to make choices, about what to represent,
at what level of detail, and over what time period. The large number of
possible choices creates many opportunities for designers of GIS software.
Generalization methods are used to remove detail that is unnecessary for an
application, in order to reduce data volume and speed up operations.

Geographic Information Systems and Science, 2nd edition Paul Longley, Michael Goodchild, David Maguire, and David Rhind.
 2005 John Wiley & Sons, Ltd. ISBNs: 0-470-87000-1 (HB); 0-470-87001-X (PB)
64 PART II PRINCIPLES

Learning Objectives Sometimes this knowledge is used as a substitute for


directly sensed information, creating a virtual reality (see
Section 11.3.1). Increasingly it is used to augment what
After reading this chapter you will know: we can see, touch, hear, feel, and smell, through the use
of mobile information systems that can be carried around.
Our knowledge of the Earth is not created entirely
■ The importance of understanding freely, but must fit with the mental concepts we began
representation in GIS; to develop as young children – concepts such as con-
tainment (Paris is in France) or proximity (Dallas and
■ The concepts of fields and objects and their Fort Worth are close). In digital representations, we for-
malize these concepts through data models (Chapter 8),
fundamental significance; the structures and rules that are programmed into a GIS
to accommodate data. These concepts and data models
■ Raster and vector representation and how together constitute our ontologies, the frameworks that
they affect many GIS principles, techniques, we use for acquiring knowledge of the world.
and applications; Almost all human activities require knowledge
about the Earth – past, present, or future.
■ The paper map and its role as a GIS product
One such ontology, a way to structure knowledge of
and data source;
movement through time, is a three-dimensional diagram,
in which the two horizontal axes denote location on the
■ The importance of generalization methods Earth’s surface, and the vertical axis denotes time. In
and the concept of representational scale; Figure 3.1, the daily lives of a sample of residents of
Lexington, Kentucky, USA are shown as they move by
■ The art and science of representing car through space and time, from one location to another,
while going about their daily business of shopping,
real-world phenomena in GIS. traveling to work, or dropping children at school. The
diagram is crude, because each journey is represented
by a series of straight lines between locations measured
with GPS, and if we were able to examine each track
or trajectory in more detail we would see the effects
of having to follow streets, stopping at traffic lights, or
3.1 Introduction slowing for congestion. If we looked even closer we
might see details of each person’s walk to and from
the car. Each closer perspective would display more
We live on the surface of the Earth, and spend most information, and a vast storehouse would be required to
of our lives in a relatively small fraction of that space. capture the precise trajectories of all humans throughout
Of the approximately 500 million square kilometers of even a single day.
surface, only one third is land, and only a fraction of that The real trajectories of the individuals shown in
is occupied by the cities and towns in which most of us Figure 3.1 are complex, and the figure is only a represen-
live. The rest of the Earth, including the parts we never tation of them – a model on a piece of paper, generated
visit, the atmosphere, and the solid ground under our feet, by a computer from a database. We use the terms rep-
remains unknown to us except through the information resentation and model because they imply a simplified
that is communicated to us through books, newspapers, relationship between the contents of the figure and the
television, the Web, or the spoken word. We live lives database, and the real-world trajectories of the individ-
that are almost infinitesimal in comparison with the 4.5 uals. Such representations or models serve many useful
billion years of Earth history, or the over 10 billion years purposes, and occur in many different forms. For example,
since the universe began, and know about the Earth before representations occur:
we were born only through the evidence compiled by
geologists, archaeologists, historians, etc. Similarly, we ■ in the human mind, when our senses capture
know nothing about the world that is to come, where we information about our surroundings, such as the
have only predictions to guide us. images captured by the eye, or the sounds captured by
Because we can observe so little of the Earth directly, the ear, and memory preserves such representations
we rely on a host of methods for learning about its for future use;
other parts, for deciding where to go as tourists or
shoppers, choosing where to live, running the operations ■ in photographs, which are two-dimensional models of
of corporations, agencies, and governments, and many the light emitted or reflected by objects in the world
other activities. Almost all human activities at some into the lens of a camera;
time require knowledge (Section 1.2) about parts of the ■ in spoken descriptions and written text, in which
Earth that are outside our direct experience, because they people describe some aspect of the world in language,
occur either elsewhere in space, or elsewhere in time. in the form of travel accounts or diaries; or
CHAPTER 3 REPRESENTING GEOGRAPHY 65

Figure 3.1 Schematic representation of the daily journeys of a sample of residents of Lexington, Kentucky, USA. The horizontal
dimensions represent geographic space and the vertical dimension represents time of day. Each person’s track plots as a
three-dimensional line, beginning at the base in the morning and ending at the top in the evening. (Reproduced with permission of
Mei-Po Kwan)

■ in the numbers that result when aspects of the world


are measured, using such devices as thermometers, 3.2 Digital representation
rulers, or speedometers.
By building representations, we humans can assemble This book is about one particular form of representa-
far more knowledge about our planet than we ever could tion that is becoming increasingly important in our soci-
as individuals. We can build representations that serve ety – representation in digital form. Today, almost all
such purposes as planning, resource management and communication between people through such media as the
conservation, travel, or the day-to-day operations of a telephone, FAX, music, television, newspapers and mag-
parcel delivery service. azines, or email is at some time in its life in digital form.
Information technology based on digital representation is
Representations help us assemble far more moving into all aspects of our lives, from science to com-
knowledge about the Earth than is possible on merce to daily existence. Almost half of all households
our own. in some industrial societies now own at least one power-
ful digital information processing device (a computer); a
Representations are reinforced by the rules and laws large proportion of all work in offices now occurs using
that we humans have learned to apply to the unobserved digital computing technology; and digital technology has
world around us. When we encounter a fallen log in a invaded many devices that we use every day, from the
forest we are willing to assert that it once stood upright, microwave oven to the automobile.
and once grew from a small shoot, even though no one One interesting characteristic of digital technology is
actually observed or reported either of these stages. We that the representation itself is rarely if ever seen by the
predict the future occurrence of eclipses based on the user, because only a few technical experts ever see the
laws we have discovered about the motions of the Solar individual elements of a digital representation. What we
System. In GIS applications, we often rely on methods see instead are views, designed to present the contents of
of spatial interpolation to guess the conditions that exist the representation in a form that is meaningful to us.
in places where no observations were made, based on The term digital derives from digits, or the fingers,
the rule (often elevated to the status of a First Law and our system of counting based on the ten digits of
of Geography and attributed to Waldo Tobler) that all the human hand. But while the counting system has
places are similar, but nearby places are more similar than ten symbols (0 through 9), the representation system in
distant places. digital computers uses only two (0 and 1). In a sense,
then, the term digital is a misnomer for a system that
Tobler’s First Law of Geography: Everything is represents all information using some combination of the
related to everything else, but near things are two symbols 0 and 1, and the more exact term binary is
more related than distant things. more appropriate. In this book we follow the convention
66 PART II PRINCIPLES
of using digital to refer to electronic technology based on The Internet, for example, operates on the basis of packets
binary representations. of information, consisting of strings of 0s and 1s, which
are sent through the network based on the information
Computers represent phenomena as binary digits. contained in the packet’s header. The network needs to
Every item of useful information about the Earth’s know only what the header means, and how to read the
surface is ultimately reduced by a GIS to some instructions it contains regarding the packet’s destination.
combination of 0s and 1s. The rest of the contents are no more than a collection
of bits, representing anything from an email message to
Over the years many standards have been developed for a short burst of music or highly secret information on
converting information into digital form. Box 3.1 shows its way from one military installation to another, and are
the standards that are commonly used in GIS to store almost never examined or interpreted during transmission.
data, whether they consist of whole or decimal numbers This allows one digital communications network to serve
or text. There are many competing coding standards for every need, from electronic commerce to chatrooms, and
images and photographs (GIF, JPEG, TIFF, etc.) and for it allows manufacturers to build processing and storage
movies (e.g., MPEG) and sound (e.g., MIDI, MP3). Much technology for vast numbers of users who have very
of this book is about the coding systems used to represent different applications in mind. Compare this to earlier
geographic data, especially Chapter 8, and as you might ways of communicating, which required printing presses
guess that turns out to be comparatively complicated. and delivery trucks for one application (newspapers) and
Digital technology is successful for many reasons, not networks of copper wires for another (telephone).
the least of which is that all kinds of information share a Digital representations of geography hold enormous
common basic format (0s and 1s), and can be handled in advantages over previous types – paper maps, written
ways that are largely independent of their actual meaning. reports from explorers, or spoken accounts. We can use

Technical Box 3.1

The binary counting system


The binary counting system uses only two sym- 2 is assigned ASCII code 48 (00110000 in
bols, 0 and 1, to represent numerical informa- binary), and the number 5 is 53 (00110101),
tion. A group of eight binary digits is known as so if 25 were coded as two characters using
a byte, and volume of storage is normally mea- 8-bit ASCII its digital representation would
sured in bytes rather than bits (Table 1.1). There be 16 bits long (0011000000110101). The
are only two options for a single digit, but there characters 2 = 2 would be coded as 48, 61, 48
are four possible combinations for two digits (001100000011110100110000). ASCII is used for
(00, 01, 10, and 11), eight possible combinations coding text, which consists of mixtures of letters,
for three digits (000, 001, 010, 011, 100, 101, 110, numbers, and punctuation symbols.
111), and 256 combinations for a full byte. Dig- Numbers with decimal places are coded
its in the binary system (known as binary digits, using real or floating-point representations. A
or bits) behave like digits in the decimal system number such as 123.456 (three decimal places
but using powers of two. The rightmost digit and six significant digits) is first transformed by
denotes units, the next digit to the left denotes powers of ten so that the decimal point is in a
twos, the next to the left denotes fours, etc. standard position, such as the beginning (e.g.,
For example, the binary number 11001 denotes 0.123456 × 103 ). The fractional part (0.123456)
one unit, no twos, no fours, one eight, and and the power of 10 (3) are then stored in
one sixteen, and is equivalent to 25 in the nor- separate sections of a block of either 4 bytes
mal (decimal) counting system. We call this the (32 bits, single precision) or 8 bytes (64 bits,
integer digital representation of 25, because it double precision). This gives enough precision
represents 25 as a whole number, and is readily to store roughly 7 significant digits in single
amenable to arithmetic operations. Whole num- precision, or 14 in double precision.
bers are commonly stored in GIS using either Integer, ASCII, and real conventions are
short (2-byte or 16-bit) or long (4-byte or 32-bit) adequate for most data, but in some cases it
options. Short integers can range from −65535 is desirable to associate images or sounds with
to +65535, and long integers from −4294967295 places in GIS, rather than text or numbers. To
to +4294967295. allow for this GIS designers have included a BLOB
The 8-bit ASCII (American Standard Code option (standing for binary large object), which
for Information Interchange) system assigns simply allocates a sufficient number of bits to
codes to each symbol of text, including letters, store the image or sound, without specifying
numbers, and common symbols. The number what those bits might mean.
CHAPTER 3 REPRESENTING GEOGRAPHY 67
the same cheap digital devices – the components of PCs, and distributed, and for the first time it became possible
the Internet, or mass storage devices – to handle every to imagine that something could be known by every
type of information, independent of its meaning. Digital human being – that knowledge could be the common
data are easy to copy, they can be transmitted at close to property of humanity. Only one major restriction affected
the speed of light, they can be stored at high density in what could be distributed using this new mechanism:
very small spaces, and they are less subject to the physical the representation had to be flat. If one were willing
deterioration that affects paper and other physical media. to accept that constraint, however, paper proved to be
Perhaps more importantly, data in digital form are easy to enormously effective; it was cheap, light and thus easily
transform, process, and analyze. Geographic information transported, and durable. Only fire and water proved to
systems allow us to do things with digital representations be disastrous for paper, and human history is replete with
that we were never able to do with paper maps: to instances of the loss of vital information through fire
measure accurately and quickly, to overlay and combine, or flood, from the burning of the Alexandria Library in
and to change scale, zoom, and pan without respect to the 7th century that destroyed much of the accumulated
map sheet boundaries. The vast array of possibilities for knowledge of classical times to the major conflagrations
processing that digital representation opens up is reviewed of London in 1666, San Francisco in 1906, or Tokyo
in Chapters 14 through 16, and is also covered in the in 1945, and the flooding of the Arno that devastated
applications that are distributed throughout the book. Florence in 1966.
One of the most important periods for geographic
Digital representation has many uses because of representation began in the early 15th century in Portugal.
its simplicity and low cost. Henry the Navigator (Box 3.2) is often credited with
originating the Age of Discovery, the period of European
history that led to the accumulation of large amounts of
information about other parts of the world through sea
voyages and land explorations. Maps became the medium
for sharing information about new discoveries, and for
3.3 Representation for what and administering vast colonial empires, and their value was
for whom? quickly recognized. Although detailed representations
now exist of all parts of the world, including Antarctica,
in a sense the spirit of the Age of Discovery continues
Thus far we have seen how humans are able to build in the explorations of the oceans, caves, and outer space,
representations of the world around them, but we have and in the process of re-mapping that is needed to keep up
not yet discussed why representations are useful, and with constant changes in the human and natural worlds.
why humans have become so ingenious at creating and It was the creation, dissemination, and sharing of
sharing them. The emphasis here and throughout the accurate representations that distinguished the Age of
book is on one type of representation, termed geographic, Discovery from all previous periods in human history
and defined as a representation of some part of the (and it would be unfair to ignore its distinctive negative
Earth’s surface or near-surface, at scales ranging from consequences, notably the spread of European diseases
the architectural to the global. and the growth of the slave trade). Information about
other parts of the world was assembled in the form of
Geographic representation is concerned with the maps and journals, reproduced in large numbers using the
Earth’s surface or near-surface. recently invented printing press, and distributed on paper.
Even the modest costs associated with buying copies were
Geographic representations are among the most eventually addressed through the development of free
ancient, having their roots in the needs of very early public lending libraries in the 19th century, which gave
societies. The tasks of hunting and gathering can be access to virtually everyone. Today, we benefit from what
much more efficient if hunters are able to communi- is now a longstanding tradition of free and open access
cate the details of their successes to other members of to much of humanity’s accumulated store of knowledge
their group – the locations of edible roots or game, for about the geographic world, in the form of paper-based
example. Maps must have originated in the sketches early representations, through the institution of libraries and the
people made in the dirt of campgrounds or on cave walls, copyright doctrine that gives people rights to material for
long before language became sufficiently sophisticated to personal use (see Chapter 18 for a discussion of laws
convey equivalent information through speech. We know affecting ownership and access). The Internet has already
that the peoples of the Pacific built representations of the become the delivery mechanism for providing distributed
locations of islands, winds, and currents out of simple access to geographic information.
materials to guide each other, and that very simple forms
of representation are used by social insects such as bees In the Age of Discovery maps became extremely
to communicate the locations of food resources. valuable representations of the state of
Hand-drawn maps and speech are effective media for geographic knowledge.
communication between members of a small group, but
much wider communication became possible with the It is not by accident that the list of important appli-
invention of the printing press in the 15th century. Now cations for geographic representations closely follows the
large numbers of copies of a representation could be made list of applications of GIS (see Section 1.1 and Chapter 2),
68 PART II PRINCIPLES

Biographical Box 3.2

Prince Henry the Navigator


Prince Henry of Portugal, who died in 1460, was known as Henry the
Navigator because of his keen interest in exploration. In 1433 Prince Henry
sent a ship from Portugal to explore the west coast of Africa in an attempt
to find a sea route to the Spice Islands. This ship was the first to travel
south of Cape Bojador (latitude 26 degrees 20 minutes N). To make this
and other voyages Prince Henry assembled a team of map-makers, sea
captains, geographers, ship builders, and many other skilled craftsmen.
Prince Henry showed the way for Vasco da Gama and other famous 15th
century explorers. His management skills could be applied in much the
same way in today’s GIS projects.

Figure 3.2 Prince Henry the


Navigator, originator of the Age of
Discovery in the 15th century, and
promoter of a systematic approach to
the acquisition, compilation, and
dissemination of geographic
knowledge

since representation is at the heart of our ability to solve when decisions have to be made about the geographic
problems using digital tools. Any application of GIS world, it is effective to experiment first on models or rep-
requires clear attention to questions of what should be resentations, exploring different scenarios. Of course this
represented, and how. There is a multitude of possible works only if the representation behaves as the real air-
ways of representing the geographic world in digital form, craft or world does, and a great deal of knowledge must
none of which is perfect, and none of which is ideal for be acquired about the world before an accurate representa-
all applications. tion can be built that permits such simulations. But the use
of representations for training, exploring future scenarios,
The key GIS representation issues are what to and recreating the past is now common in many fields,
represent and how to represent it. including surgery, chemistry, and engineering, and with
technologies like GIS is becoming increasingly common
One of the most important criteria for the usefulness
in dealing with the geographic world.
of a representation is its accuracy. Because the geo-
graphic world is seemingly of infinite complexity, there Many plans for the real world can be tried out first
are always choices to be made in building any represen- on models or representations.
tation – what to include, and what to leave out. When US
President Thomas Jefferson dispatched Meriwether Lewis
to explore and report on the nature of the lands from the
upper Missouri to the Pacific, he said Lewis possessed ‘a
fidelity to the truth so scrupulous that whatever he should
report would be as certain as if seen by ourselves’. But he 3.4 The fundamental problem
clearly didn’t expect Lewis to report everything he saw in
complete detail: Lewis exercised a large amount of judg-
ment about what to report, and what to omit. The question Geographic data are built up from atomic elements, or
of accuracy is taken up at length in Chapter 6. facts about the geographic world. At its most primitive,
One more vital interest drives our need for represen- an atom of geographic data (strictly, a datum) links a
tations of the geographic world, and also the need for place, often a time, and some descriptive property. The
representations in many other human activities. When a first of these, place, is specified in one of several ways
pilot must train to fly a new type of aircraft, it is much that are discussed at length in Chapter 5, and there are
cheaper and less risky for him or her to work with a also many ways of specifying the second, time. We often
flight simulator than with the real aircraft. Flight simu- use the term attribute to refer to the last of these three.
lators can represent a much wider range of conditions For example, consider the statement ‘The temperature at
than a pilot will normally experience in flying. Similarly, local noon on December 2nd 2004 at latitude 34 degrees
CHAPTER 3 REPRESENTING GEOGRAPHY 69
45 minutes north, longitude 120 degrees 0 minutes west, some rapidly. Some attributes are physical or environ-
was 18 degrees Celsius’. It ties location and time to the mental in nature, while others are social or economic.
property or attribute of atmospheric temperature. Some attributes simply identify a place or an entity, dis-
tinguishing it from all other places or entities – examples
Geographic data link place, time, and attributes. include street addresses, social security numbers, or the
Other facts can be broken down into their primitive parcel numbers used for recording land ownership. Other
atoms. For example, the statement ‘Mount Everest is attributes measure something at a location and perhaps at
8848 m high’ can be derived from two atomic geographic a time (e.g., atmospheric temperature or elevation), while
facts, one giving the location of Mt Everest in latitude others classify into categories (e.g., the class of land use,
and longitude, and the other giving the elevation at that differentiating between agriculture, industry, or residential
latitude and longitude. Note, however, that the statement land). Because attributes are important outside the domain
would not be a geographic fact to a community that had of GIS there are standard terms for the different types (see
no way of knowing where Mt Everest is located. Box 3.3).
Many aspects of the Earth’s surface are comparatively
static and slow to change. Height above sea level Geographic attributes are classified as nominal,
changes slowly because of erosion and movements ordinal, interval, ratio, and cyclic.
of the Earth’s crust, but these processes operate on
scales of hundreds or thousands of years, and for most But this idea of recording atoms of geographic infor-
applications except geophysics we can safely omit time mation, combining location, time, and attribute, misses a
from the representation of elevation. On the other hand fundamental problem, which is that the world is in effect
atmospheric temperature changes daily, and dramatic infinitely complex, and the number of atoms required for
changes sometimes occur in minutes with the passage a complete representation is similarly infinite. The closer
of a cold front or thunderstorm, so time is distinctly we look at the world, the more detail it reveals – and it
important, though such climatic variables as mean annual seems that this process extends ad infinitum. The shoreline
temperature can be represented as static. of Maine appears complex on a map, but even more com-
The range of attributes in geographic information is plex when examined in greater detail, and as more detail
vast. We have already seen that some vary slowly and is revealed the shoreline appears to get longer and longer,

Technical Box 3.3

Types of attributes
The simplest type of attribute, termed nominal, Attributes are interval if the differences
is one that serves only to identify or distinguish between values make sense. The scale of Celsius
one entity from another. Placenames are a good temperature is interval, because it makes sense
example, as are names of houses, or the numbers to say that 30 and 20 are as different as 20 and
on a driver’s license – each serves only to identify 10. Attributes are ratio if the ratios between
the particular instance of a class of entities and values make sense. Weight is ratio, because it
to distinguish it from other members of the makes sense to say that a person of 100 kg is
same class. Nominal attributes include numbers, twice as heavy as a person of 50 kg; but Celsius
letters, and even colors. Even though a nominal temperature is only interval, because 20 is not
attribute can be numeric it makes no sense to twice as hot as 10 (and this argument applies
apply arithmetic operations to it: adding two to all scales that are based on similarly arbitrary
nominal attributes, such as two drivers’ license zero points, including longitude).
numbers, creates nonsense. In GIS it is sometimes necessary to deal with
Attributes are ordinal if their values have data that fall into categories beyond these
a natural order. For example, Canada rates its four. For example, data can be directional or
agricultural land by classes of soil quality, with cyclic, including flow direction on a map, or
Class 1 being the best, Class 2 not so good, compass direction, or longitude, or month of
etc. Adding or taking ratios of such numbers the year. The special problem here is that the
makes little sense, since 2 is not twice as much number following 359 degrees is 0. Averaging
of anything as 1, but at least ordinal attributes two directions such as 359 and 1 yields 180, so
have inherent order. Averaging makes no sense the average of two directions close to North can
either, but the median, or the value such that appear to be South. Because cyclic data occur
half of the attributes are higher-ranked and half sometimes in GIS, and few designers of GIS
are lower-ranked, is an effective substitute for software have made special arrangements for
the average for ordinal data as it gives a useful them, it is important to be alert to the problems
central value. that may arise.
70 PART II PRINCIPLES
and more and more convoluted (see Figure 4.18). To char- example, in describing the elevation of the Earth’s surface
acterize the world completely we would have to specify we could take advantage of the fact that roughly two-
the location of every person, every blade of grass, and thirds of the surface is covered by water, with its surface
every grain of sand – in fact, every subatomic particle, at sea level. Of the 5 million pieces of information needed
clearly an impossible task, since the Heisenberg uncer- to describe elevation at 10 km resolution, approximately
tainty principle places limits on the ability to measure 3.4 million will be recorded as zero, a colossal waste.
precise positions of subatomic particles. So in practice any If we could find an efficient way of identifying the area
representation must be partial – it must limit the level of covered by water, then we would need only 1.6 million
detail provided, or ignore change through time, or ignore real pieces of information.
certain attributes, or simplify in some other way. Humans have found many ingenious ways of describ-
ing the Earth’s surface efficiently, because the problem
The world is infinitely complex, but computer we are addressing is as old as representation itself, and
systems are finite. Representations must somehow as important for paper-based representations as it is for
limit the amount of detail captured. binary representations in computers. But this ingenuity is
One very common way of limiting detail is by itself the source of a substantial problem for GIS: there
throwing away or ignoring information that applies only are many ways of representing the Earth’s surface, and
to small areas, in other words not looking too closely. users of GIS thus face difficult and at times confusing
The image you see on a computer screen is composed of choices. This chapter discusses some of those choices, and
a million or so basic elements or pixels, and if the whole the issues are pursued further in subsequent chapters on
Earth were displayed at once each pixel would cover an uncertainty (Chapter 6) and data modeling (Chapter 8).
area roughly 10 km on a side, or about 100 sq km. At this Representation remains a major concern of GIScience,
level of detail the island of Manhattan occupies roughly 10 and researchers are constantly looking for ways to extend
pixels, and virtually everything on it is a blur. We would GIS representations to accommodate new types of infor-
say that such an image has a spatial resolution of about mation (Box 3.5).
10 km, and know that anything much less than 10 km
across is virtually invisible. Figure 3.3 shows Manhattan
at a spatial resolution of 250 m, detailed enough to pick
out the shape of the island and Central Park.
It is easy to see how this helps with the problem of 3.5 Discrete objects and
too much information. The Earth’s surface covers about
500 million sq km, so if this level of detail is sufficient continuous fields
for an application, a property of the surface such as
elevation can be described with only 5 million pieces
of information, instead of the 500 million it would take
to describe elevation with a resolution of 1 km, and
the 500 trillion (500 000 000 000 000) it would take to 3.5.1 Discrete objects
describe elevation with 1 m resolution.
Another strategy for limiting detail is to observe that Mention has already been made of the level of detail as
many properties remain constant over large areas. For a fundamental choice in representation. Another, perhaps
even more fundamental choice, is between two conceptual
schemes. There is good evidence that we as humans like to
simplify the world around us by naming things, and seeing
individual things as instances of broader categories. We
prefer a world of black and white, of good guys and bad
guys, to the real world of shades of gray.
The two fundamental ways of representing
geography are discrete objects and
continuous fields.
This preference is reflected in one way of viewing
the geographic world, known as the discrete object view.
In this view, the world is empty, except where it is
occupied by objects with well-defined boundaries that
are instances of generally recognized categories. Just as
Figure 3.3 An image of Manhattan taken by the MODIS the desktop is littered with books, pencils, or computers,
instrument on board the TERRA satellite on September 12, the geographic world is littered with cars, houses, lamp-
2001. MODIS has a spatial resolution of about 250 m, detailed posts, and other discrete objects. Thus the landscape
enough to reveal the coarse shape of Manhattan and to identify of Minnesota is littered with lakes, and the landscape
the Hudson and East Rivers, the burning World Trade Center of Scotland is littered with mountains. One characteristic
(white spot), and Central Park (the gray blur with the of the discrete object view is that objects can be counted,
Jacqueline Kennedy Onassis Reservoir visible as a black dot) so license plates issued by the State of Minnesota carry
CHAPTER 3 REPRESENTING GEOGRAPHY 71

Figure 3.4 The problems of representing a three-dimensional


world using a two-dimensional technology. The intersection of
links A, B, C, and D is an overpass, so no turns are possible
between such pairs as A and B

the legend ‘10 000 lakes’, and climbers know that there
are exactly 284 mountains in Scotland over 3000 ft (the
so-called Munros, from Sir Hugh Munro who originally
listed 277 of them in 1891 – the count was expanded to
284 in 1997).
The discrete object view represents the geographic
world as objects with well-defined boundaries in
otherwise empty space.
Biological organisms fit this model well, and this
allows us to count the number of residents in an area
of a city, or to describe the behavior of individual bears.
Manufactured objects also fit the model, and we have
little difficulty counting the number of cars produced in
a year, or the number of airplanes owned by an airline.
But other phenomena are messier. It is not at all clear
what constitutes a mountain, for example, or exactly how Figure 3.5 Bears are easily conceived as discrete objects,
a mountain differs from a hill, or when a mountain with maintaining their identity as objects through time and
surrounded by empty space
two peaks should be counted as two mountains.
Geographic objects are identified by their dimensional-
ity. Objects that occupy area are termed two-dimensional, The discrete object view leads to a powerful way of
and generally referred to as areas. The term polygon is representing geographic information about objects. Think
also common for technical reasons explained later. Other of a class of objects of the same dimensionality – for
objects are more like one-dimensional lines, including example, all of the Brown bears (Figure 3.5) in the Kenai
roads, railways, or rivers, and are often represented as Peninsula of Alaska. We would naturally think of these
one-dimensional objects and generally referred to as lines. objects as points. We might want to know the sex of
Other objects are more like zero-dimensional points, such each bear, and its date of birth, if our interests were in
as individual animals or buildings, and are referred to monitoring the bear population. We might also have a
as points. collar on each bear that transmitted the bear’s location
Of course, in reality, all objects that are perceptible to at regular intervals. All of this information could be
humans are three dimensional, and their representation in expressed in a table, such as the one shown in Table 3.1,
fewer dimensions can be at best an approximation. But the with each row corresponding to a different discrete object,
ability of GIS to handle truly three-dimensional objects and each column to an attribute of the object. To reinforce
as volumes with associated surfaces is very limited. a point made earlier, this is a very efficient way of
Some GIS allow for a third (vertical) coordinate to be capturing raw geographic information on Brown bears.
specified for all point locations. Buildings are sometimes But it is not perfect as a representation for all
represented by assigning height as an attribute, though if geographic phenomena. Imagine visiting the Earth from
this option is used it is impossible to distinguish flat roofs another planet, and asking the humans what they chose as
from any other kind. Various strategies have been used for a representation for the infinitely complex and beautiful
representing overpasses and underpasses in transportation environment around them. The visitor would hardly be
networks, because this information is vital for navigation impressed to learn that they chose tables, especially when
but not normally represented in strictly two-dimensional the phenomena represented were natural phenomena such
network representations. One common strategy is to as rivers, landscapes, or oceans. Nothing on the natural
represent turning options at every intersection – so an Earth looks remotely like a table. It is not at all clear how
overpass appears in the database as an intersection with the properties of a river should be represented as a table,
no turns (Figure 3.4). or the properties of an ocean. So while the discrete object
72 PART II PRINCIPLES
Table 3.1 Example of representation of geographic in a landscape that has been worn down by glaciation
information as a table: the locations and attributes of each of or flattened by blowing sand than one recently created
four Brown bears in the Kenai Peninsula of Alaska. Locations by cooling lava. Cliffs are places in continuous fields
have been obtained from radio collars. Only one location is where elevation changes suddenly, rather than smoothly.
shown for each bear, at noon on July 31 2003 (imaginary data) Population density is a kind of continuous field, defined
everywhere as the number of people per unit area, though
Bear Sex Estimated Date of collar Location, the definition breaks down if the field is examined
ID year of installation noon on 31 July so closely that the individual people become visible.
birth 2003 Continuous fields can also be created from classifications
of land, into categories of land use, or soil type. Such
001 M 1999 02242003 −150.6432, 60.0567 fields change suddenly at the boundaries between different
002 F 1997 03312003 −149.9979, 59.9665 classes. Other types of fields can be defined by continuous
003 F 1994 04212003 −150.4639, 60.1245 variation along lines, rather than across space. Traffic
004 F 1995 04212003 −150.4692, 60.1152 density, for example, can be defined everywhere on a
road network, and flow volume can be defined everywhere
on a river. Figure 3.6 shows some examples of field-
view works well for some kinds of phenomena, it misses like phenomena.
the mark badly for others. Continuous fields can be distinguished by what is
being measured at each point. Like the attribute types
discussed in Box 3.3, the variable may be nominal,
3.5.2 Continuous fields ordinal, interval, ratio, or cyclic. A vector field assigns
two variables, magnitude and direction, at every point in
While we might think of terrain as composed of discrete space, and is used to represent flow phenomena such as
mountain peaks, valleys, ridges, slopes, etc., and think winds or currents; fields of only one variable are termed
of listing them in tables and counting them, there are scalar fields.
unresolvable problems of definition for all of these Here is a simple example illustrating the difference
objects. Instead, it is much more useful to think of terrain between the discrete object and field conceptualizations.
as a continuous surface, in which elevation can be defined Suppose you were hired for the summer to count the
rigorously at every point (see Box 3.4). Such continuous number of lakes in Minnesota, and promised that your
surfaces form the basis of the other common view of answer would appear on every license plate issued by the
geographic phenomena, known as the continuous field state. The task sounds simple, and you were happy to
view (and not to be confused with other meanings of get the job. But on the first day you started to run into
the word field). In this view the geographic world can difficulty (Figure 3.7). What about small ponds, do they
be described by a number of variables, each measurable count as lakes? What about wide stretches of rivers? What
at any point on the Earth’s surface, and changing in value about swamps that dry up in the summer? What about a
across the surface. lake with a narrow section connecting two wider parts, is
it one lake or two? Your biggest dilemma concerns the
The continuous field view represents the real scale of mapping, since the number of lakes shown on a
world as a finite number of variables, each one map clearly depends on the map’s level of detail – a more
defined at every possible position.
detailed map almost certainly will show more lakes.
Your task clearly reflects a discrete object view of the
Objects are distinguished by their dimensions, and phenomenon. The action of counting implies that lakes are
naturally fall into categories of points, lines, or areas. discrete, two-dimensional objects littering an otherwise
Continuous fields, on the other hand, can be distinguished empty geographic landscape. In a continuous field view,
by what varies, and how smoothly. A continuous field on the other hand, all points are either lake or non-lake.
of elevation, for example, varies much more smoothly Moreover, we could refine the scale a little to take account

Technical Box 3.4

2.5 dimensions
Areas are two-dimensional objects, and volumes representation is only necessary in areas with
are three dimensional, but GIS users sometimes an abundance of overhanging cliffs or caves,
talk about ‘2.5-D’. Almost without exception the if these are important features. The idea of
elevation of the Earth’s surface has a single value dealing with a three-dimensional phenomenon
at any location (exceptions include overhanging by treating it as a single-valued function of two
cliffs). So elevation is conveniently thought of horizontal variables gives rise to the term ‘2.5-
as a continuous field, a variable with a value D’. Figure 3.6B shows an example, in this case an
everywhere in two dimensions, and a full 3-D elevation surface.
CHAPTER 3 REPRESENTING GEOGRAPHY 73

(A)

(B)

Figure 3.6 Examples of field-like phenomena. (A) Image of part of the Dead Sea in the Middle East. The lightness of the image at
any point measures the amount of radiation captured by the satellite’s imaging system. (B) A simulated image derived from the
Shuttle Radar Topography Mission, a new source of high-quality elevation data. The image shows the Carrizo Plain area of Southern
California, USA, with a simulated sky and with land cover obtained from other satellite sources (Courtesy NASA/JPL–Caltech)

of marginal cases; for example, we might define the scale would still be problems in defining the levels of the scale).
shown in Table 3.2, which has five degrees of lakeness. Instead of counting, our strategy would be to lay a grid
The complexity of the view would depend on how closely over the map, and assign each grid cell a score on the
we looked, of course, and so the scale of mapping would lakeness scale. The size of the grid cell would determine
still be important. But all of the problems of defining how accurately the result approximated the value we could
a lake as a discrete object would disappear (though there theoretically obtain by visiting every one of the infinite
74 PART II PRINCIPLES
were released from molecules of silver nitrate when the
unstable molecules were exposed to light, thus darkening
the image in proportion to the amount of incident light.
We think of the image as a field of continuous variation
in color or darkness. But when we look at the image,
the eye and brain begin to infer the presence of discrete
objects, such as people, rivers, fields, cars, or houses, as
they interpret the content of the image.

3.6 Rasters and vectors

Continuous fields and discrete objects define two con-


ceptual views of geographic phenomena, but they do not
solve the problem of digital representation. A continuous
field view still potentially contains an infinite amount of
information if it defines the value of the variable at every
point, since there is an infinite number of points in any
defined geographic area. Discrete objects can also require
an infinite amount of information for full description – for
example, a coastline contains an infinite amount of infor-
mation if it is mapped in infinite detail. Thus continuous
fields and discrete objects are no more than conceptu-
alizations, or ways in which we think about geographic
phenomena; they are not designed to deal with the limi-
tations of computers.
Two methods are used to reduce geographic phenom-
Figure 3.7 Lakes are difficult to conceptualize as discrete ena to forms that can be coded in computer databases,
objects because it is often difficult to tell where a lake begins and we call these raster and vector. In principle, both can
and ends, or to distinguish a wide river from a lake be used to code both fields and discrete objects, but in
practice there is a strong association between raster and
fields, and between vector and discrete objects.
Table 3.2 A scale of lakeness suitable for defining lakes as a
continuous field Raster and vector are two methods of representing
geographic data in digital computers.
Lakeness Definition

1 Location is always dry under all circumstances


2 Location is sometimes flooded in Spring
3.6.1 Raster data
3 Location supports marshy vegetation
In a raster representation space is divided into an array
4 Water is always present to a depth of less of rectangular (usually square) cells (Figure 3.8). All
than 1 m geographic variation is then expressed by assigning
5 Water is always present to a depth of more properties or attributes to these cells. The cells are
than 1 m sometimes called pixels (short for picture elements).
Raster representations divide the world into arrays
of cells and assign attributes to the cells.
number of points in the state. At the end, we would One of the commonest forms of raster data comes
tabulate the resulting scores, counting the number of cells from remote-sensing satellites, which capture information
having each value of lakeness, or averaging the lakeness in this form and send it to ground to be distributed and
score. We could even design a new and scientifically analyzed. Data from the Landsat Thematic Mapper, for
more reasonable license plate – ‘Minnesota, 12% lake’ or example, which are commonly used in GIS applications,
‘Minnesota, average lakeness 2.02’. come in cells that are 30 m a side on the ground, or
The difference between objects and fields is also approximately 0.1 hectare in area. Other similar data can
illustrated well by photographs (e.g., Figure 3.6A). The be obtained from sensors mounted on aircraft. Imagery
image in a photograph is created by variation in the varies according to the spatial resolution (expressed as
chemical state of the material in the photographic the length of a cell side as measured on the ground), and
film – in early photography, minute particles of silver also according to the timetable of image capture by the
CHAPTER 3 REPRESENTING GEOGRAPHY 75

(A)

(B)

Figure 3.8 Raster representation. Each color represents a


different value of a nominal-scale variable denoting land
cover class

sensor. Some satellites are in geostationary orbit over a


fixed point on the Earth, and capture images constantly. Figure 3.9 Effect of a raster representation using (A) the
Others pass over a fixed point at regular intervals (e.g., largest share rule and (B) the central point rule
every 12 days). Finally, sensors vary according to the
part or parts of the spectrum that they sense. The visible
parts of the spectrum are most important for remote assigned to the whole cell. Figure 3.9 shows these two
sensing, but some invisible parts of the spectrum are rules in operation. The largest share rule is almost always
particularly useful in detecting heat, and the phenomena preferred, but the central point rule is sometimes used
that produce heat, such as volcanic activities. Many in the interests of faster computing, and is often used in
sensors capture images in several areas of the spectrum, creating raster datasets of elevation.
or bands, simultaneously, because the relative amounts of
radiation in different parts of the spectrum are often useful
indicators of certain phenomena, such as green leaves, 3.6.2 Vector data
or water, on the Earth’s surface. The AVIRIS (Airborne
Visible InfraRed Imaging Spectrometer) captures no fewer In a vector representation, all lines are captured as
than 224 different parts of the spectrum, and is being points connected by precisely straight lines (some GIS
used to detect particular minerals in the soil, among other software allows points to be connected by curves rather
applications. Remote sensing is a complex topic, and than straight lines, but in most cases curves have to
further details are available in Chapter 9. be approximated by increasing the density of points).
Square cells fit together nicely on a flat table or a An area is captured as a series of points or vertices
sheet of paper, but they will not fit together neatly on the connected by straight lines as shown in Figure 3.10. The
curved surface of the Earth. So just as representations on straight edges between vertices explain why areas in
paper require that the Earth be flattened, or projected, so vector representation are often called polygons, and in
too do rasters (because of the distortions associated with GIS-speak the terms polygon and area are often used
flattening, the cells in a raster can never be perfectly equal
in shape or area on the Earth’s surface). Projections, or
ways of flattening the Earth, are described in Section 5.7.
Many of the terms that describe rasters suggest the laying
of a tile floor on a flat surface – we talk of raster cells
tiling an area, and a raster is said to be an instance of
a tesselation, derived from the word for a mosaic. The
mirrored ball hanging above a dance floor recalls the
impossibility of covering a spherical object like the Earth
perfectly with flat, square pieces.
When information is represented in raster form all
detail about variation within cells is lost, and instead
the cell is given a single value. Suppose we wanted to
represent the map of the counties of Texas as a raster.
Each cell would be given a single value to identify a
county, and we would have to decide the rule to apply
when a cell falls in more than one county. Often the rule
is that the county with the largest share of the cell’s
area gets the cell. Sometimes the rule is based on the Figure 3.10 An area (red line) and its approximation by a
central point of the cell, and the county at that point is polygon (blue line)
76 PART II PRINCIPLES
interchangeably. Lines are captured in the same way, C. capturing a single value of the variable for a
and the term polyline has been coined to describe a regularly shaped cell (for example, values of reflected
curved line represented by a series of straight segments radiation in a remotely sensed scene);
connecting vertices. D. capturing a single value of the variable over an
To capture an area object in vector form, we need only irregularly shaped area (for example, vegetation
specify the locations of the points that form the vertices cover class or the name of a parcel’s owner);
of a polygon. This seems simple, and also much more
efficient than a raster representation, which would require E. capturing the linear variation of the field variable
us to list all of the cells that form the area. These ideas over an irregularly shaped triangle (for example,
are captured succinctly in the comment ‘Raster is vaster, elevation captured in a triangulated irregular network
or TIN, Section 9.2.3.4);
and vector is correcter’. To create a precise approximation
to an area in raster, it would be necessary to resort to F. capturing the isolines of a surface, as digitized lines
using very small cells, and the number of cells would (for example, digitized contour lines representing
rise proportionately (in fact, every halving of the width surface elevation).
and height of each cell would result in a quadrupling Each of these methods succeeds in compressing the
of the number of cells). But things are not quite as potentially infinite amount of data in a continuous field
simple as they seem. The apparent precision of vector to a finite amount, using one of the six options, two of
is often unreasonable, since many geographic phenomena which (A and C) are raster, and four (B, D, E, and F)
simply cannot be located with high accuracy. So although are vector. Of the vector methods one (B) uses points,
raster data may look less attractive, they may be more two (D and E) use polygons, and one (F) uses lines
honest to the inherent quality of the data. Also, various to express the continuous spatial variation of the field
methods exist for compressing raster data that can greatly in terms of a finite set of vector objects. But unlike
reduce the capacity needed to store a given dataset (see the discrete object conceptualization, the objects used to
Chapter 8). So the choice between raster and vector is represent a field are not real, but simply artifacts of the
often complex, as summarized in Table 3.3. representation of something that is actually conceived as
spatially continuous. The triangles of a TIN representation
(E), for example, exist only in the digital representation,
3.6.3 Representing continuous fields and cannot be found on the ground, and neither can the
lines of a contour representation (F).
While discrete objects lend themselves naturally to
representation as points, lines, or areas using vector
methods, it is less obvious how the continuous variation of
a field can be expressed in a digital representation. In GIS
six alternatives are commonly implemented (Figure 3.11): 3.7 The paper map
A. capturing the value of the variable at each of a grid
of regularly spaced sample points (for example, The paper map has long been a powerful and effective
elevations at 30 m spacing in a DEM); means of communicating geographic information. In
B. capturing the value of the field variable at each of a contrast to digital data, which use coding schemes such
set of irregularly spaced sample points (for example, as ASCII, it is an instance of an analog representation,
variation in surface temperature captured at or a physical model in which the real world is scaled – in
weather stations); the case of the paper map, part of the world is scaled
to fit the size of the paper. A key property of a paper
map is its scale or representative fraction, defined as the
Table 3.3 Relative advantages of raster and vector ratio of distance on the map to distance on the Earth’s
representation surface. For example, a map with a scale of 1:24 000
reduces everything on the Earth to one 24 000th of its
Issue Raster Vector
real size. This is a bit misleading, because the Earth’s
Volume of Depends on cell size Depends on density
surface is curved and a paper map is flat, so scale cannot
be exactly constant.
data of vertices
Sources of Remote sensing, Social and A paper map is: a source of data for geographic
data imagery environmental databases; an analog product from a GIS; and an
data
effective communication tool.
Applications Resources, Social, economic,
environmental administrative Maps have been so important, particularly prior to the
Software Raster GIS, image Vector GIS, development of digital technology, that many of the ideas
processing automated associated with GIS are actually inherited directly from
cartography paper maps. For example, scale is often cited as a property
Resolution Fixed Variable of a digital database, even though the definition of scale
makes no sense for digital data – ratio of distance in the
CHAPTER 3 REPRESENTING GEOGRAPHY 77

Figure 3.11 The six approximate representations of a field used in GIS. (A) Regularly spaced sample points. (B) Irregularly spaced
sample points. (C) Rectangular cells. (D) Irregularly shaped polygons. (E) Irregular network of triangles, with linear variation over
each triangle (the Triangulated Irregular Network or TIN model; the bounding box is shown dashed in this case because the unshown
portions of complete triangles extend outside it). (F) Polylines representing contours (see the discussion of isopleth maps in Box 4.3)
(Courtesy US Geological Survey)

Biographical Box 3.5

May Yuan and new forms of representation


May Yuan received her Bachelor of Science degree in Geography from
the National Taiwan University, where she was attracted to the fields of
geomorphology and climatology. Continuing her fundamental interest
in evolution of processes, she studied geographic representation and
temporal GIS and earned both her Masters and PhD degrees in Geography
from the State University of New York at Buffalo.
Currently, May is an Associate Professor of Geography at the University
of Oklahoma. Severe weather in the Southern Plains of the United
States (Figure 3.12) has inspired her to re-evaluate GIS representation
of geographic dynamics, the complexity of events and processes at spatial
and temporal scales, and GIS applications in meteorology (i.e., weather
and climate). She investigates meteorological cases (e.g., convective storms
and flash floods) to develop new ideas of using events and processes as Figure 3.13 May Yuan, developer
the basis to integrate spatial and temporal data in GIS. Her publications of new forms of representation
address theoretical issues on representation of geographic dynamics and
offer conceptual models and a prototype GIS to support spatiotemporal queries and analysis of dynamic
geographic phenomena. Her temporal GIS research goes beyond merely considering time as an attribute
or annotation of spatial objects to incorporate much richer spatiotemporal meaning. In her case study on
convective storms, she has demonstrated that, by modeling storms as data objects, GIS is able to support
information query about storm evolution, storm behaviors, and interactions with environments.
May developed a strong interest in physics in early childhood. Newton’s theory of universal gravitation
sparked her appreciation for simple principles that can explain how things work and for the use of
graphical and symbolic representation to conceptualize complex processes. Planck’s quantum theory and
Heisenberg’s uncertainty principle further stimulated her thinking on the nature of matter and its behavior
at different scales of observations. Shaped by Einstein’s theory of relativity, May developed her world view
as a four-dimensional space-time continuum populated with events and phenomena. Before she pursued a
career in GIScience, May studied fluvial processes and developed a model to classify waterfalls and explain
78 PART II PRINCIPLES

Figure 3.12 Representative radar images showing the evolution of supercell storms that produced F5 tornadoes in Oklahoma City,
May 3, 1999. WSR-88D radar TKLX scanned the supercells every five minutes, but the images shown here were selected
approximately every two hours

their formation. She went on to study paleoclimatology by analyzing soil and speleothem sediments. Both
studies, as well as her dissertation research on wildfire representation, reinforced her interest in developing
conceptual models of processes and examining the relationships between space and time. Since she moved
to the University of Oklahoma, a suite of world-class meteorological research initiatives has offered her
unique opportunities to extend her interest in physics to fundamental research in GIScience through
meteorological applications. Weather and climate offer rich cases that emphasize movement, processes, and
evolution and pose grand challenges to GIScience research regarding representation, object-field duality,
and uncertainty. May enjoys the challenges that ultimately connect to her fundamental interest in how
things work.

computer to distance on the ground; how can there be in Chapter 6, where it is important to the concept of
distances in a computer? What is meant is a little more uncertainty.
complicated: when a scale is quoted for a digital database There is a close relationship between the contents
it is usually the scale of the map that formed the source of a map and the raster and vector representations
of the data. So if a database is said to be at a scale of discussed in the previous section. The US Geological
1:24 000 one can safely assume that it was created from Survey, for example, distributes two digital versions of
a paper map at that scale, and includes representations its topographic maps, one in raster form and one in
of the features that are found on maps at that scale. vector form, and both attempt to capture the contents
Further discussion of scale can be found in Box 4.2 and of the map as closely as possible. In the raster form, or
CHAPTER 3 REPRESENTING GEOGRAPHY 79

Figure 3.14 Part of a Digital Raster Graphic, a scan a US Geological Survey 1:24 000 topographic map

digital raster graphic (DRG), the map is scanned at a representation of the map and its digital equivalent. So it
very high density, using very small pixels, so that the is quite misleading to think of the contents of a digital
raster looks very much like the original (Figure 3.14). representation as a map, and to think of a GIS as a
The coding of each pixel simply records the color of container of digital maps. Digital representations can
the map picked up by the scanner, and the dataset include information that would be very difficult to show
includes all of the textual information surrounding the on maps. For example, they can represent the curved
actual map. surface of the Earth, without the need for the distortions
In the vector form, or digital line graph (DLG), every associated with flattening. They can represent changes,
geographic feature shown on the map is represented as a whereas maps must be static because it is very difficult
point, polyline, or polygon. The symbols used to represent to change their contents once they have been printed or
point features on the map, such as the symbol for a drawn. Digital databases can represent all three spatial
windmill, are replaced in the digital data by points with dimensions, including the vertical, whereas maps must
associated attributes, and must be regenerated when the always show two-dimensional views. So while the paper
data are displayed. Contours, which are shown on the map map is a useful metaphor for the contents of a geographic
as lines of definite width, are replaced by polylines of no database, we must be careful not to let it limit our thinking
width, and given attributes that record their elevations. about what is possible in the way of representation. This
In both cases, and especially in the vector case, issue is pursued at greater length in Chapter 8, and map
there is a significant difference between the analog production is discussed in detail in Chapter 12.
80 PART II PRINCIPLES

3.8.1 Methods of generalization


3.8 Generalization
A GIS dataset’s level of detail is one of its most
important properties, as it determines both the degree
to which the dataset approximates the real world, and
In Section 3.4 we saw how thinking about geographic the dataset’s complexity. It is often necessary to remove
information as a collection of atomic links – between a detail, in the interests of compressing data, fitting them
place, a time (not always, because many geographic facts into a storage device of limited capacity, processing
are stated as if they were permanently true), and a prop- them faster, or creating less confusing visualizations that
erty – led to an immediate problem, because the potential emphasize general trends. Consequently many methods
number of such atomic facts is infinite. If seen in enough have been devised for generalization, and several of the
detail, the Earth’s surface is unimaginably complex, and more important are discussed in this section.
its effective description impossible. So instead, humans McMaster and Shea (1992) identify the following types
have devised numerous ways of simplifying their view of of generalization rules:
the world. Instead of making statements about each and
every point, we describe entire areas, attributing uniform ■ simplification, for example by weeding out points in
characteristics to them, even when areas are not strictly the outline of a polygon to create a simpler shape;
uniform; we identify features on the ground and describe ■ smoothing, or the replacement of sharp and complex
their characteristics, again assuming them to be uniform; forms by smoother ones;
or we limit our descriptions to what exists at a finite num- ■ aggregation, or the replacement of a large number of
ber of sample points, hoping that these samples will be distinct symbolized objects by a smaller number of
adequately representative of the whole (Section 4.4). new symbols;
■ amalgamation, or the replacement of several area
A geographic database cannot contain a perfect
objects by a single area object;
description – instead, its contents must be
■ merging, or the replacement of several line objects by
carefully selected to fit within the limited capacity
a smaller number of line objects;
of computer storage devices.
■ collapse, or the replacement of an area object by a
From this perspective some degree of generalization is combination of point and line objects;
almost inevitable in all geographic data. But cartographers ■ refinement, or the replacement of a complex pattern of
often take a somewhat different approach, for which this objects by a selection that preserves the pattern’s
observation is not necessarily true. Suppose we are tasked general form;
to prepare a map at a specific scale, say 1:25 000, using the ■ exaggeration, or the relative enlargement of an object
standards laid down by a national mapping agency, such to preserve its characteristics when these would be
as the Institut Géographique National (IGN) of France. lost if the object were shown to scale;
Every scale used by IGN has its associated rules of ■ enhancement, through the alteration of the physical
representation. For example, at a scale of 1:25 000 the sizes and shapes of symbols; and
rules lay down that individual buildings will be shown
■ displacement, or the moving of objects from their true
only in specific circumstances, and similar rules apply to
positions to preserve their visibility and
the 1:24 000 series of the US Geological Survey. These
distinctiveness.
rules are known by various names, including terrain
nominal in the case of IGN, which translates roughly but The differences between these types of rules are
not very helpfully to ‘nominal ground’, and is perhaps much easier to understand visually and Figure 3.15 repro-
better translated as ‘specification’. From this perspective duces McMaster’s and Shea’s original example drawings.
a map that represents the world by following the rules In addition, they describe two forms of generalization
of a specification precisely can be perfectly accurate with of attributes, as distinct from geometric forms of gen-
respect to the specification, even though it is not a perfect eralization. Classification generalization reclassifies the
representation of the full detail on the ground. attributes of objects into a smaller number of classes,
while symbolization generalization changes the assign-
A map’s specification defines how real features on ment of symbols to objects. For example, it might replace
the ground are selected for inclusion on the map. an elaborate symbol including the words ‘Mixed Forest’
with a color identifying that class.
Consider the representation of vegetation cover using
the rules of a specification. For example, the rules might
state that at a scale of 1:100 000, a vegetation cover map 3.8.2 Weeding
should not show areas of vegetation that cover less than
1 hectare. But small areas of vegetation almost certainly One of the commonest forms of generalization in GIS
exist, so deleting them inevitably results in information is the process known as weeding, or the simplification
loss. But under the principle discussed above, a map that of the representation of a line represented as a polyline.
adheres to this rule must be accurate, even though it differs The process is an instance of McMaster and Shea’s
substantively from the truth as observed on the ground. simplification. Standard methods exist in GIS for doing
CHAPTER 3 REPRESENTING GEOGRAPHY 81

Spatial Spatial
Transformation Representation in Transformation Representation in
(Operator) Original Map Generalized Map (Operator) Original Map Generalized Map
At Original Map Scale At Original Map Scale

Simplification
At 50% Scale Smoothing
At 50% Scale

Spatial Spatial
Transformation Representation in Transformation Representation in
(Operator) Original Map Generalized Map (Operator) Original Map Generalized Map

At Original Map Scale At Original Map Scale

Lake Lake
Pueblo Ruins
Collapse Aggregation Miguel Ruins Ruins
At 50% Scale At 50% Scale

Lake Lake

Pueblo Ruins
Miguel Ruins
Ruins

Spatial Spatial
Transformation Representation in Transformation Representation in
(Operator) Original Map Generalized Map (Operator) Original Map Generalized Map
At Original Map Scale At Original Map Scale

Amalgamation Merge
At 50% Scale At 50% Scale

Spatial Spatial
Transformation Representation in Representation in
Transformation
(Operator) Original Map Generalized Map (Operator) Original Map Generalized Map
At Original Map Scale At Original Map Scale

Inlet Bay
Bay

Refinement Exaggeration Inlet


At 50% Scale At 50% Scale

Inlet Bay
Bay

Inlet

Spatial Spatial
Transformation Representation in Transformation Representation in
(Operator) Original Map Generalized Map (Operator) Original Map Generalized Map
At Original Map Scale At Original Map Scale

Enhancement Displacement
At 50% Scale At 50% Scale

Figure 3.15 Illustrations from McMaster and Shea (1992) of their ten forms of generalization. The original feature is shown at its
original level of detail, and below it at 50% coarser scale. Each generalization technique resolves a specific problem of display at
coarser scale and results in the acceptable version shown in the lower right
82 PART II PRINCIPLES
(A) 4

Tolerance

1
15
(B)
3
2

Figure 3.16 The Douglas–Poiker algorithm is designed to


simplify complex objects like this shoreline by reducing the
number of points in its polyline representation

this, and the commonest by far is the method known


as the Douglas–Poiker algorithm (Figure 3.16) after its
inventors, David Douglas and Tom Poiker. The operation
of the Douglas–Poiker weeding algorithm is shown in
Figure 3.17.
Weeding is the process of simplifying a line or area
by reducing the number of points in its
representation.
Note that the algorithm relies entirely on the assump-
tion that the line is represented as a polyline, in other
Figure 3.17 The Douglas–Poiker line simplification algorithm
words as a series of straight line segments. GIS increas- in action. The original polyline has 15 points. In (A) Points 1
ingly support other representations, including arcs of cir- and 15 are connected (red), and the furthest distance of any
cles, arcs of ellipses, and Bézier curves, but there is little point from this connection is identified (blue). This distance to
consensus to date on appropriate methods for weeding or Point 4 exceeds the user-defined tolerance. In (B) Points 1 and
generalizing them, or on methods of analysis that can be 4 are connected (green). Points 2 and 3 are within the tolerance
applied to them. of this line. Points 4 and 15 are connected, and the process is
repeated. In the final step 7 points remain (identified with green
disks), including 1 and 15. No points are beyond the
user-defined tolerance distance from the line
3.9 Conclusion
a point that will become much clearer on reading the
technical chapter on data modeling, Chapter 8. But the
Representation, or more broadly ontology, is a fundamen- broader issues of representation, including the distinction
tal issue in GIS, since it underlies all of our efforts to between field and object conceptualizations, underlie not
express useful information about the surface of the Earth only that chapter but many other issues as well, including
in a digital computer. The fact that there are so many ways uncertainty (Chapter 6), and Chapters 14 through 16 on
of doing this makes GIS at once complex and interesting, analysis and modeling.
CHAPTER 3 REPRESENTING GEOGRAPHY 83

Questions for further study 4. Identify the limits of your own neighborhood, and
start making a list of the discrete objects you are
1. What fraction of the Earth’s surface have you familiar with in the area. What features are hard to
experienced in your lifetime? Make diagrams like think of as discrete objects? For example, how will
that shown in Figure 3.1, at appropriate levels of you divide up the various roadways in the
detail, to show a) where you have lived in your neighborhood into discrete objects – where do they
lifetime, b) how you spent last weekend. How would begin and end?
you describe what is missing from each of
these diagrams?
2. Table 3.3 summarized some of the arguments Further reading
between raster and vector representations. Expand on Chrisman N.R. 2002 Exploring Geographic Information
these arguments, providing examples, and add any Systems (2nd edn). New York: Wiley.
others that would be relevant in a GIS application. McMaster R.B. and Shea K.S. 1992 Generalization in
3. The early explorers had limited ways of Digital Cartography. Washington, DC: Association of
communicating what they saw, but many were very American Geographers.
effective at it. Examine the published diaries, National Research Council 1999 Distributed Geolibraries:
notebooks, or dispatches of one or two early Spatial Information Resources. Washington, DC:
explorers and look at the methods they used to National Academy Press. Available: www.nap.edu.
communicate with others. What words did they use to
describe unfamiliar landscapes and how did they mix
words with sketches?
4 The nature of geographic data

This chapter elaborates on the spatial is special theme by examining the


nature of geographic data. It sets out the distinguishing characteristics of
geographic data, and suggests a range of guiding principles for working with
them. Many geographic data are correctly thought of as sample observations,
selected from the larger universe of possible observations that could be made.
This chapter describes the main principles that govern scientific sampling,
and the principles that are invoked in order to infer information about the
gaps between samples. When devising spatial sample designs, it is important
to be aware of the nature of spatial variation, and here we learn how
this is formalized and measured as spatial autocorrelation. Another key
property of geographic information is the level of detail that is apparent
at particular scales of analysis. The concept of fractals provides a solid
theoretical foundation for understanding scale when building geographic
representations.

Geographic Information Systems and Science, 2nd edition Paul Longley, Michael Goodchild, David Maguire, and David Rhind.
 2005 John Wiley & Sons, Ltd. ISBNs: 0-470-87000-1 (HB); 0-470-87001-X (PB)
86 PART II PRINCIPLES

Learning Objectives Implicit in all of this is one further principle, that we


will develop in Chapter 6:
7. because almost all representations of the world are
After reading this chapter you will understand: necessarily incomplete, they are uncertain.
GIS is about representing spatial and temporal phe-
■ How Tobler’s First Law of Geography is nomena in the real world and, because the real world is
formalized through the concept of spatial complicated, this task is difficult and error prone. The
autocorrelation; real world provides an intriguing laboratory in which to
examine phenomena, but is one in which it can be impos-
■ The relationship between scale and the sible to control for variation in all characteristics – be
they relevant to landscape evolution, consumer behav-
level of geographic detail in a ior, urban growth, or whatever. In the terminology of
representation; Section 1.3, generalized laws governing spatial distribu-
tions and temporal dynamics are therefore most unlikely
■ The principles of building representations to work perfectly. We choose to describe the seven points
above as ‘principles’, rather than ‘laws’ (see Section 1.3)
around geographic samples; because, like our discussion in Chapter 2, this chapter is
grounded in empirical generalization about the real world.
■ How the properties of smoothness and A more elevated discussion of the way that these princi-
continuous variation can be used to ples build into ‘fundamental laws of GIScience’ has been
published by Goodchild.
characterize geographic variation;

■ How fractals can be used to measure and


simulate surface roughness.
4.2 The fundamental problem
revisited

4.1 Introduction Consider for a moment a GIS-based representation of


your own life history to date. It is infinitesimally small
compared with the geographic extent and history of the
In Chapter 1 we identified the central motivation for world but, as we move to finer spatial and temporal
scientific applications of GIS as the development of scales than those shown in Figure 3.1, nevertheless very
representations, not only of how the world looks, but intricate in detail. Viewed in aggregate, human behavior
also how it works. Chapter 3 established three governing where you live exhibits structure in geographic space, as
principles that help us towards this goal, namely that: the aggregated outcomes of day-to-day (often repetitive)
1. the representations we build in GIS are of decisions about where to go, what to do, how much time
unique places; to spend doing it, and longer-term (one-off) decisions
about where to live, how to achieve career objectives,
2. our representations of them are necessarily selective
and how to balance work, leisure, and family pursuits.
of reality, and hence incomplete;
It is helpful to distinguish between controlled and
3. in building representations, it is useful to think of the uncontrolled variation – the former oscillates around a
world as either comprising continuously varying steady state (daily, weekly) pattern, while the latter (career
fields or as an empty space littered with objects that changes, residential moves) does not.
are crisp and well-defined. When relating our own daily regimes and life histories,
In this chapter we build on these principles to develop or indeed any short or long term time series of events, we
a fuller understanding of the ways in which the nature are usually mindful of the contexts in which our decisions
of spatial variation is represented in GIS. We do this by (to go to work, to change jobs, to marry) are made – ‘the
asserting three further principles: past is the key to the present’ aptly summarizes the
effect of temporal context upon our actions. The day-
4. that proximity effects are key to understanding spatial to-day operational context to our activities is very much
variation, and to joining up incomplete determined by where we live and work. The longer-term
representations of unique places; strategic context may well be provided by where we were
5. that issues of geographic scale and level of detail are born, grew up, and went to college.
key to building appropriate representations of the
world; Our behavior in geographic space often reflects
past patterns of behavior.
6. that different measures of the world co-vary, and
understanding the nature of co-variation can help us The relationship between consecutive events in
to predict. time can be formalized in the concept of temporal
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 87
autocorrelation. The analysis of time series data is Finally, it is highly likely that a representation of the
in some senses straightforward, since the direction of real world that is suitable for predicting future change
causality is only one way – past events are sequentially will need to incorporate information on how two or more
related to the present and to the future. This chapter factors co-vary. For example, planners seeking to justify
(and book) is principally concerned with spatial, rather improvements to a city’s public transit system might wish
than temporal, autocorrelation. Spatial autocorrelation to point out how house prices increase with proximity to
shares some similarities with its temporal counterpart. existing rail stops. It is highly likely that patterns of spatial
Yet time moves in one direction only (forward), making autocorrelation in one variable will, to a greater or lesser
temporal autocorrelation one-dimensional, while spatial extent, be mirrored in another. Whilst this is helpful in
events can potentially have consequences anywhere in building representations of the real world, the property
two-dimensional or even three-dimensional space. of spatial autocorrelation can frustrate our attempts to
build inferential statistical models of the co-variation of
Explanation in time need only look to the past, but geographic phenomena.
explanation in space must look in all directions
Spatial autocorrelation helps us to build
simultaneously.
representations, but frustrates our efforts
Assessment of spatial autocorrelation can be informed to predict.
by knowledge of the degree and nature of spatial
heterogeneity – the tendency of geographic places and The nature of geographic variation, the scale at which
regions to be different from each other. Everyone would uncontrolled variation occurs, and the way in which
recognize the extreme difference of landscapes between different geographic phenomena co-vary are all key to
such regions as the Antarctic, the Nile delta, the Sahara building effective representations of the real world. These
desert, or the Amazon basin, and many would recognize principles are of practical importance and guide us to
the more subtle differences between the Central Valley of answering questions such as: What is an appropriate scale
California, the Northern Plain of China, and the valley or level of detail at which to build a representation for a
of the Ganges in India. Heterogeneity occurs both in particular application? How do I design my spatial sample?
the way the landscape looks, and in the way processes How do I generalize from my sample measurements? And
act on the landscape (the form/process distinction of what formal methods and techniques can I use to relate key
Section 1.3). While the spatial variation in some processes spatial events and outcomes to one another?
simply oscillates about an average (controlled variation), Each of these questions is a facet of the fundamental
other processes vary ever more the longer they are problem of GIS, that is of selecting what to leave in and
observed (uncontrolled variation). For example, controlled what to take out of our digital representations of the real
variation characterizes the operational environment of GIS world (Section 3.2). The Tobler Law (Section 3.1), that
applications in utility management (Section 2.1.1), or the everything is related to everything else, but near things
tactical environment of retail promotions (Section 2.3.3), are more related than distant things, amounts to a succinct
while longer-term processes such as global warming or definition of spatial autocorrelation. An understanding of
deforestation may exhibit uncontrolled variation. As a the nature of the spatial autocorrelation that characterizes
general rule, spatial data exhibit an increasing range a GIS application helps us to deduce how best to collect
of values, hence increased heterogeneity, with increased and assemble data for a representation, and also how best
distance. In this chapter we focus on the ways in which to develop inferences between events and occurrences.
phenomena vary across space, and the general nature of The concept of geographic scale or level of detail
geographic variation: later, in Chapter 14, we return to will be fundamental to observed measures of the likely
the techniques for measuring spatial heterogeneity. strength and nature of autocorrelation in any given
Also, this requires us to move beyond thinking of application. Together, the scale and spatial structure of a
GIS data as abstracted only from the continuous spatial particular application suggest ways in which we should
distributions implied by the Tobler Law (Section 3.1) sample geographic reality, and the ways in which we
and from sequences of events over continuous time. should weight sample observations in order to build our
Some events, such as the daily rhythm of the journey to representation. We will return to the key concepts of scale,
work, are clearly incremental extensions of past practice, sampling, and weighting throughout much of this book.
while others, such as residential relocation, constitute
sudden breaks with the past. Similarly, landscapes of
gently undulating terrain are best thought of as smooth
and continuous, while others (such as the landscapes
developed about fault systems, or mountain ranges) 4.3 Spatial autocorrelation
are best conceived as discretely bounded, jagged, and and scale
irregular. Smoothness and irregularity turn out to be
among the most important distinguishing characteristics
of geographic data. In Chapter 3 (Box 3.3) we classified attribute data into
the nominal, ordinal, interval, ratio, and cyclic scales
Some geographic phenomena vary smoothly of measurement. Objects existing in space are described
across space, while others can exhibit extreme by locational (spatial) descriptors, and are conventionally
irregularity, in violation of Tobler’s Law. classified using the taxonomy shown in Box 4.1.
88 PART II PRINCIPLES

Technical Box 4.1

Types of spatial objects


We saw in Section 3.4 that geographic objects whether a given area encloses a given point
are classified according to their topological (Section 14.4.2).
dimension, which provides a measure of the Volume objects have length, breadth, and
way they fill space. For present purposes we depth, and hence are of dimension 3. They
assume that dimensions are restricted to integer are used to represent natural objects such as
(whole number) values, though in later sections river basins, or artificial phenomena such as the
(Sections 4.8 and 15.2.5) we relax this constraint population potential of shopping centers or the
and consider geographic objects of non-integer density of resident populations (Section 14.4.5).
(fractional, or fractal) dimension. All geometric Time is often considered to be the fourth
objects can be used to represent occurrences dimension of spatial objects, although GIS
at absolute locations (natural objects), or they remains poorly adapted to the modeling of
may be used to summarize spatial distributions temporal change.
(artificial objects). The relationship between higher- and lower-
A point has neither length nor breadth dimension spatial objects is analogous to that
nor depth, and hence is said to be of between higher- and lower-order attribute
dimension 0. Points may be used to indicate data, in that lower-dimension objects can be
spatial occurrences or events, and their spatial derived from those of higher dimension but
patterning. Point pattern analysis is used to not vice versa. Certain phenomena, such as
identify whether occurrences or events are inter- population, may be held as natural or artificially
related – as in the analysis of the incidence imposed spatial object types. The chosen way
of crime, or in identifying whether patterns of representing phenomena in GIS not only
of disease infection might be related to defines the apparent nature of geographic
environmental or social factors (Section 15.2.3). variation, but also the way in which geographic
The centroid of an area object is an artificial variation may be analyzed. Some objects, such
point reference, which is located so as to provide as agricultural fields or digital terrain models,
a summary measure of the location of the object are represented in their natural state. Others
(Section 15.2.1). are transformed from one spatial object class to
Lines have length, but not breadth or depth, another, as in the transformation of population
and hence are of dimension 1. They are used to data from individual points to census tract areas,
represent linear entities such as roads, pipelines, for reasons of confidentiality or convention.
and cables, which frequently build together into Some high-order representations are created by
networks. They can also be used to measure interpolation between lower-order objects, as
distances between spatial objects, as in the in the creation of digital terrain models (DTMs)
measurement of inter-centroid distance. In order from spot height data (Chapter 8).
to reduce the burden of data capture and
The classification of spatial phenomena into
storage, lines are often held in GIS in generalized
object types is dependent fundamentally upon
form (see Section 3.8).
scale. For example, on a less-detailed map of
Area objects have the two dimensions of
the world, New York is represented as a one-
length and breadth, but not depth. They may
dimensional point. On a more-detailed map such
be used to represent natural objects, such
as a road atlas it will be represented as a two-
as agricultural fields, but are also commonly
dimensional area. Yet if we visit the city, it is very
used to represent artificial aggregations, such
much experienced as a three-dimensional entity,
as census tracts (see below). Areas may
and virtual reality systems seek to represent it as
bound linear features and enclose points,
such (see Section 13.4.2).
and GIS functions can be used to identify

Spatial autocorrelation measures attempt to deal simul- close together in space tend to be more dissimilar in
taneously with similarities in the location of spatial objects attributes than features which are further apart (in opposi-
(Box 4.1) and their attributes (Box 3.3). If features that tion to Tobler’s Law). Zero autocorrelation occurs when
are similar in location are also similar in attributes, then attributes are independent of location. Figure 4.1 presents
the pattern as a whole is said to exhibit positive spa- some simple field representations of a geographic vari-
tial autocorrelation. Conversely, negative spatial auto- able in 64 cells that can each take one of two val-
correlation is said to exist when features which are ues, coded blue and white. Each of the five illustrations
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 89
contains the same set of attributes, 32 white cells and 32 investigating spatial arrangements may be more or less
blue cells, yet the spatial arrangements are very dif- sophisticated. In considering the various arrangements
ferent. Figure 4.1A presents the familiar chess board, shown in Figure 4.1, we have only considered the rela-
and illustrates extreme negative spatial autocorrelation tionship between the attributes of a cell and those of its
between neighboring cells. Figure 4.1E presents the oppo- four immediate neighbors. But we could include a cell’s
site extreme of positive autocorrelation, when blue and four diagonal neighbors in the comparison, and more gen-
white cells cluster together in homogeneous regions. erally there is no reason why we should not interpret
The other illustrations show arrangements which exhibit Tobler’s Law in terms of a gradual incremental attenu-
intermediate levels of autocorrelation. Figure 4.1C cor- ating effect of distance as we traverse successive cells.
responds to spatial independence, or no autocorrelation, We began this chapter by considering a time series
Figure 4.1B shows a relatively dispersed arrangement, analysis of events that are highly, even perfectly, repet-
and Figure 4.1D a relatively clustered one. itive in the short term. Activity patterns often exhibit
strong positive temporal autocorrelation (where you were
Spatial autocorrelation is determined both by
at this time last week, or this time yesterday is likely
similarities in position, and by similarities
to affect where you are now), but only if measures are
in attributes. made at the same time every day – that is, at the temporal
The patterns shown in Figure 4.1 are examples of scale of the daily interval. If, say, sample measurements
a particular case of spatial autocorrelation. In terms of were taken every 17 hours, measures of the temporal
the classification developed in Chapter 3 (Box 3.3) the autocorrelation of your activity patterns would likely be
attribute data are nominal (blue and white simply iden- much lower. Similarly, if the measures of the blue/white
tify two different possibilities, with no implied order and property were made at intervals that did not coincide
no possibility of difference, or ratio) and their spatial with the dimensions of the squares of the chess boards
distribution is conceived as a field, with a single value in Figure 4.1, then the spatial autocorrelation measures
everywhere. The figure gives no clue as to the true dimen- would be different. Thus the issue of sampling interval is
sions of the area being represented. Usually, similari- of direct importance in the measurement of spatial auto-
ties in attribute values may be more precisely measured correlation, because spatial events and occurrences may or
on higher-order measurement scales, enabling continu- may not accommodate spatial structure. In general, mea-
ous measures of spatial variation (See Section 6.3.2.2 for sures of spatial and temporal autocorrelation are scale
a discussion of precision). As we see below, the way dependent (see Box 4.2). Scale is often integral to the
in which we define what we mean by neighboring in trade off between the level of spatial resolution and the

(A) (B)

l = –1.000 l = –0.393
nBW = 112 nBW = 78
nBB = 0 nBB = 16
nWW = 0 nWW = 18

(C)

l = 0.000
nBW = 56
nBB = 30
nWW = 26

(D) (E)

l = +0.393 l = +0.857
nBW = 34 nBW = 8
nBB = 42 nBB = 52
nWW = 36 nWW = 52

Figure 4.1 Field arrangements of blue and white cells exhibiting: (A) extreme negative spatial autocorrelation; (B) a dispersed
arrangement; (C) spatial independence; (D) spatial clustering; and (E) extreme positive spatial autocorrelation. The values of the I
statistic are calculated using the equation in Section 4.6 (Source: Goodchild 1986 CATMOG, GeoBooks, Norwich)
90 PART II PRINCIPLES

(A) (B)

Figure 4.2 A Sierpinski carpet at two levels of resolution:


(A) coarse scale and (B) finer scale

Figure 4.3 Individual rocks may resemble larger-scale


degree of attribute detail that can be stored in a given structures, such as the mountains from which they are broken,
application – as in the trade off between spatial and spec- in form
tral resolution in remote sensing.
Quattrochi and Goodchild have undertaken an exten- bays and inlets in structure and form, and neighborhoods
sive discussion of these and other meanings of scale (e.g., may be of similar population size and each offer similar
the degree of spectral or temporal coarseness), and their ranges of retail facilities right across a metropolitan
implications. area. Self-similarity is a core concept of fractals, a topic
A further important property is that of self-similarity. introduced in Section 4.8.
This is illustrated using a mosaic of squares in Figure 4.2.
Figure 4.2A presents a coarse-scale representation of
attributes in nine squares, and a pattern of negative
spatial autocorrelation. However, the pattern is self-
replicating at finer scales, and in Figure 4.4B, a finer-scale 4.4 Spatial sampling
representation reveals that the smallest blue cells replicate
the pattern of the whole area in a recursive manner.
The pattern of spatial autocorrelation at the coarser scale The quest to represent the myriad complexity of the
is replicated at the finer scale, and the overall pattern real world requires us to abstract, or sample, events
is said to exhibit the property of self-similarity. Self- and occurrences from a sample frame, defined as the
similar structure is characteristic of natural as well as universe of eligible elements of interest. Thus the process
social systems: for example, a rock may resemble the of sampling elements from a sample frame can very
physical form of the mountain from which it was broken much determine the apparent nature of geographic data.
(Figure 4.3), small coastal features may resemble larger A spatial sampling frame might be bounded by the extent

Technical Box 4.2

The many meanings of scale


Unfortunately the word scale has acquired too including the cost, or the number of people
many meanings in the course of time. Because involved.
they are to some extent contradictory, it is best The scale of a map. Geographic data are
to use other terms that have clearer meaning often obtained from maps, and often displayed
where appropriate. in map form. Cartographers use the term scale
Scale is in the details. Many scientists to refer to a map’s representative fraction (the
use scale in the sense of spatial resolution, ratio of distance on the map to distance on
or the level of spatial detail in data. the ground – see Section 3.7). Unfortunately this
Data are fine-scaled if they include records leads to confusion (and often bemusement)
of small objects, and coarse-scaled if they over the meaning of large and small with
do not. respect to scale. To a cartographer a large scale
Scale is about extent. Scale is also used corresponds to a large representative fraction,
by scientists to talk about the geographic in other words to plenty of geographic detail.
extent or scope of a project: a large-scale This is exactly the opposite of what an average
project covers a large area, and a small-scale scientist understands by a large-scale study. In
project covers a small area. Scale can also this book we have tried to avoid this problem by
refer to other aspects of the project’s scope, using coarse and fine instead.
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 91
of a field of interest, or by the combined extent of a set of (A) (B)
areal objects. We can think of sampling as the process of
selecting points from a continuous field or, if the field has
been represented as a mosaic of areal objects, of selecting
some of these objects while discarding others. Scientific
sampling requires that each element in the sample frame
has a known and prespecified chance of selection.
In some important senses, we can think of any
geographic representation as a kind of sample, in that
(D) k+r k k-r k+r k-r
the elements of reality that are retained are abstracted (C)
from the real world in accordance with some overall k
design. This is the case in remote sensing, for example k-r
(see Section 3.6.1), in which each pixel value is a
k-r
spatially averaged reflectance value calculated at the
spatial resolution characteristic of the sensor. In many k+r
situations, we will need consciously to select some
observations, and not others, in order to create a k+r
generalizable abstraction. This is because, as a general
rule, the resources available to any given project do not (E) (F)
stretch to measuring every single one of the elements (soil
profiles, migrating animals, shoppers) that we know to
make up our population of interest. And even if resources
were available, science tells us that this would be wasteful,
since procedures of statistical inference allow us to
infer from samples to the populations from which they
were drawn. We will return to the process of statistical
inference in Sections 4.7 and 15.4. Here, we will confine
ourselves to the question, how do we ensure a good (G)
sample?

Geographic data are only as good as the sampling


scheme used to create them.

Classical statistics often emphasizes the importance of


randomness in sound sample design. The purest form,
simple random sampling, is well known: each element
in the sample frame is assigned a unique number, and
Figure 4.4 Spatial sample designs: (A) simple random
a prespecified number of elements are selected using a
sampling; (B) stratified sampling; (C) stratified random
random number generator. In the case of a spatial sample
sampling; (D) stratified sampling with random variation in grid
from continuous space, x, y coordinate pairs might be
spacing; (E) clustered sampling; (F) transect sampling; and
randomly sampled within the range of x and y values (G) contour sampling
(see Section 5.7 for information on coordinate systems).
Since each randomly selected element has a known and
prespecified probability of selection, it is possible to make the sampling fraction N/n, where n is the required
robust and defensible generalizations to the population sample size and N is the size of the population) and
from which the sample was drawn. A spatially random proceeding to select every kth element. In spatial terms,
sample is shown in Figure 4.4A. Random sampling is the sampling interval of spatially systematic samples maps
integral to probability theory, and this enables us to into a regularly spaced grid, as shown in Figure 4.4B.
use the distribution of values in our sample to tell us This advantage over simple random sampling may be two-
something about the likely distribution of values in the edged, however, if the sampling interval and the spatial
parent population from which the sample was drawn. structure of the study area coincide, that is, the sample
However, sheer bad luck can mean that randomly frame exhibits periodicity. A sample survey of urban
drawn elements are disproportionately concentrated land use along streets originally surveyed under the US
amongst some parts of the population at the expense Public Land Survey System (PLSS: Section 5.5) would
of others, particularly when the size of our sample is be ill-advised to take a sampling interval of one mile,
small relative to the population from which it was drawn. for example, for this was the interval at which blocks
For example, a survey of household incomes might within townships were originally laid out, and urban
happen to select households with unusually low incomes. structure is still likely to be repetitive about this original
Spatially systematic sampling aims to circumvent this design. In such instances, there may be a consequent
problem and ensure greater evenness of coverage across failure to detect the true extent of heterogeneity of
the sample frame. This is achieved by identifying a population attributes (Figure 4.4B) – for example, it is
regular sampling interval k (equal to the reciprocal of extremely unlikely that the attributes of street intersection
92 PART II PRINCIPLES
locations would be representative of land uses elsewhere are all likely to be in more or less the same condition,
in the block structure. A number of systematic and quasi- while the repair costs of the older houses are likely
systematic sample designs have been devised to get to be much more variable and dependent upon the
around the vulnerability of spatially systematic sample attention that the occupants have lavished upon them.
designs to periodicity, and the danger that simple random As a general rule, the older neighborhoods warrant a
sampling may generate freak samples. These include greater sampling frequency than the newer ones, but
stratified random sampling to ensure evenness of coverage other considerations may also be accommodated into
(Figure 4.4C) and periodic random changes in the grid the sampling design as well – such as construction type
width of a spatially systematic sample (Figure 4.4D), (duplex versus apartment, etc.) and local geology (as an
perhaps subject to minimum spacing intervals. indicator of risk of subsidence).
In certain circumstances, it may be more effi- In any application, where the events or phenomena
cient to restrict measurement to a specified range of that we are studying are spatially heterogeneous, we will
sites – because of the prohibitive costs of transport over require a large sample to capture the full variability
large areas, for example. Clustered sample designs, such of attribute values at all possible locations. Other parts
as that shown in Figure 4.4E, may be used to general- of the study area may be much more homogeneous in
ize about attributes if the cluster presents a microcosm attributes, and a sparser sampling interval may thus be
of surrounding conditions. In fact this provides a legit- more appropriate. Both simple random and systematic
imate use of a comprehensive study of one area to say sample designs (and their variants) may be adapted in
something about conditions beyond it – so long as the order to allow a differential sampling interval over a
study area is known to be representative of the broader given study area (see Section 6.3.2 for more on this issue
study region. For example, political opinion polls are with respect to sampling vegetation cover). Thus it may
often taken in shopping centers where shoppers can be be sensible to partition the sample frame into sub-areas,
deemed broadly representative of the population at large. based on our knowledge of spatial structure – specifically
However, instances where they provide a comprehensive our knowledge of the likely variability of the attributes
detailed picture of spatial structure are likely to be the that we are measuring.
exception rather than the rule. Other application-specific special circumstances in-
Use of either simple random or spatially systematic clude:
sampling presumes that each observation is of equal
importance, and hence of equal weight, in building a ■ whether source data are ubiquitous or must be
representation. As such, these sample designs are suitable specially collected;
for circumstances in which spatial structure is weak ■ the resources available for any survey
or non-existent, or where (as in circumstances fully undertaking; and
described by Tobler’s Law) the attenuating effect of ■ the accessibility of all parts of the study area to field
distance is constant in all directions. They are also suitable observation (still difficult even in the era of ubiquitous
in circumstances where spatial structure is unknown. Yet availability of Global Positioning System receivers:
in most practical applications, spatial structure is (to Section 5.8).
some extent at least) known, even if it cannot be wholly
explained by Tobler’s Law. These conditions make it both Stratified sampling designs attempt to allow for
more efficient and necessary to devise application-specific the unequal abundance of different phenomena
sample designs. This makes for improved quality of on the Earth’s surface.
representation, with minimum resource costs of collecting
data. Relevant sample designs include sampling along a It is very important to be aware that this discussion of
transect, such as a soil profile (Figure 4.4F), or along a sampling is appropriate to problems where there is a large
contour line (Figure 4.4G). hypothetical population of evenly distributed locations
Consider the area of Leicestershire, UK, illustrated (elements, in the terminology of sampling theory, or
in Figure 4.5. It depicts a landscape in which the hilly atoms of information in the terminology of Section 3.4),
relief of an upland area falls away sharply towards a that each have a known and prespecified probability of
river’s flood plain. In identifying the sample spot heights selection. Random selection of elements plays a part
that we might measure and hold in a GIS to create a in each of the sample designs illustrated in Figure 4.4,
representation of this area, we would be advised to sample albeit that the probability of selecting an element may be
a disproportionate number of observations in the upland greater for clearly defined sub-populations – that lie along
area of the study area where the local variability of heights a contour line or across a soil transect, for example. In
is greatest. circumstances where spatial structure is either weak or
In a socio-economic context, imagine that you are is explicitly incorporated through clear definition of sub-
required to identify the total repair cost of bringing populations, standard statistical theory provides a robust
all housing in a city up to a specified standard. (Such framework for inferring the attributes of the population
applications are common, for example, in forming bids from those of the sample. But the reality is somewhat
for Federal or Central Government funding.) A GIS that messier. In most GIS applications, the population of
showed the time period in which different neighborhoods elements (animals, glacial features, voters) may not be
were developed (such as the Mid-West settlements large, and its distribution across space may be far
simulated in Figure 2.18) would provide a useful guide from random and independent. In these circumstances,
to effective use of sampling resources. Newer houses conventional wisdom suggests a number of ‘rules of
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 93

88

I I
Contours and B5

I I
A6
200

M1

I I
heights in meters

I
I
0

I
A6 N

I
I
Built-up area

I
I
I
I
I
I

I
I
I
I
I
I
I Riv
ok I
I

er S
o
Br
I
I
ck I
I I
Bla ar

o
I
I
I
I
I
Shepshed LOUGHBOROUGH
I
I
I

I
Gr

I
and

I
U nion Cana

I
I
A512

I
A512

I
l
I I
J23

I I
A6

I
I
I

I
B5

Blackbrook
330

Res.
350
B5

91 Quorn
B5
15
0 150
20
0

248 Beacon
Hill
Woodhouse
228 Eaves Swithland
Whitwick 20
Res.
20

0
0
M1

278 Bardon
Hill 230
B5
20

33
0

A5 0
0
15
0

J22
20
0

A5 Bradgate Cropston
0 Park

Newton Cropston
0 km 2 Markfield Res.
Linford

Figure 4.5 An example of physical terrain in which differential sampling would be advisable to construct a representation of
elevation (Reproduced by permission of M. Langford, University of Glamorgan)

thumb’ to compensate for the likely increase in error


in estimating the true population value – as in clustered 4.5 Distance decay
sampling, where slightly more than doubling the sample
size is usually taken to accommodate the effects of spatial
autocorrelation within a spatial cluster. However, it may In selectively abstracting, or sampling, part of reality
be considered that the existence of spatial autocorrelation to hold within a representation, judgment is required
fundamentally undermines the inferential framework and to fill in the gaps between the observations that make
invalidates the process of generalizing from samples to up a representation. This requires understanding of the
populations. We return to discuss this in more detail likely attenuating effect of distance between the sample
in our discussion of inference and hypothesis testing in observations, and thus of the nature of geographic data
Section 15.4.1. (Figure 4.6). That is to say, we need to make an
Finally, it is also worth noting that this discussion informed judgment about an appropriate interpolation
assumes that we have the luxury of collecting our own function and how to weight adjacent observations. A
data for our own particular purpose. The reality of analysis literal interpretation of Tobler’s Law implies a continuous,
in our data-rich world is that more and more of the smooth, attenuating effect of distance upon the attribute
data that we use are collected by other parties for other values of adjacent or contiguous spatial objects, or
purposes: in such cases the metadata of the dataset are incremental variation in attribute values as we traverse
crucially important in establishing their provenance for a field. The polluting effect of a chemical or oil spillage
the particular investigation that we may wish to undertake decreases in a predictable (and in still waters, uniform)
(see Section 11.2.1). fashion with distance from the point source; aircraft noise
94 PART II PRINCIPLES
for dij < a/b, as might reflect the noise levels experienced
across a transect perpendicular to an aircraft flight path.
Figure 4.7B presents a negative power distance decay
function, given by the expression:

wij = dij−b ,

which has been used by some researchers to describe the


decline in the density of resident population with distance
from historic central business district (CBD) areas.
Figure 4.7C illustrates a negative exponential statistical
fit, given by the expression:

wij = e−bdij ,

Figure 4.6 We require different ways of interpolating between conventionally used in human geography to represent the
points, as well as different sample designs, for representing decrease in retail store patronage with distance from it.
mountains and forested hillsides Each of the attenuation functions illustrated in
Figure 4.7 is idealized, in that the effects of distance are
presumed to be regular, continuous, and isotropic (uni-
decreases in a linear fashion with distance from the flight form in every direction). This may be appropriate for
path; and the number of visits to a National Park decreases many applications. The notion of smooth and continuous
at a regular rate as we traverse the counties that adjoin it. variation underpins many of the representational traditions
This section focuses on principles, and introduces some in cartography, as in the creation of isopleth (or isoline)
of the functions that are used to describe effects over maps. This is described in Box 4.3. To some extent at
distance, or the nature of geographic variation, while least, high school math also conditions us to think of
Section 14.4.4 discusses ways in which the principles spatial variation as continuous, and as best represented
of distance decay are embodied in techniques of spatial by interpolating smooth curves between everything. Yet
interpolation. our understanding of spatial structure tells us that varia-
The precise nature of the function used to represent the tion is often far from smooth and continuous. The Earth’s
effects of distance is likely to vary between applications, surface and geology, for example, are discontinuous at
and Figure 4.7 illustrates several hypothetical types. In cliffs and fault lines, while the socio-economic geogra-
mathematical terms, we take b as a parameter that affects phy of cities can be similarly characterized by abrupt
the rate at which the weight wij declines with distance: a changes. Some illustrative physical and social issues per-
small b produces a slow decrease, and a large b a more taining to the catchment of a grocery store are presented
rapid one. In most applications, the choice of distance in Figure 4.8. A naı̈ve GIS analysis might assume that the
attenuation function is the outcome of past experience, maximum extent of the catchment of the store is bounded
the fit of a particular application dataset, and convention. by an approximately circular area, and that within this
Figure 4.7A presents the simple case of linear distance area the likelihood (or probability) of shoppers using the
decay, given by the expression: store decreases the further away from it that they live.
On this basis we might assume a negative exponential
distance decay function (Figure 4.7C) and, for practical
wij = a − bdij , purposes, an absolute cut-off in patronage beyond a ten

(A) (B) (C)

w w
wij = dij–b w wij = exp(–bdij)
wij = a – bdij

d d d

Figure 4.7 The attenuating effect of distance: (A) linear distance decay, wij = a − bdij ; (B) negative power distance decay,
wij = dij −b ; and (C) negative exponential distance decay, wij = exp(−bdij )
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 95

4.6 Measuring distance effects as

10
-m
spatial autocorrelation

inu
r

te
buffe

bu
ffe
inute

r
An understanding of spatial structure helps us to deduce
10-m

a good sampling strategy, to use an appropriate means


of interpolating between sampled points, and hence to
ANYTOWN build a spatial representation that is fit for purpose.
Knowledge of the actual or likely nature of spatial
autocorrelation can thus be used deductively in order to
help build a spatial representation of the world. However,
in many applications we do not understand enough
about geographic variability, distance effects, and spatial
structure to invoke deductive reasoning. A further branch
of spatial analysis thus emphasizes the measurement of
spatial autocorrelation as an end in itself. This amounts to
r
buffe a more inductive approach to developing an understanding
inute
10-m of the nature of a geographic dataset.
Population
Principal roads
Induction reasons from data to build up
0 – 2000 6001– 8000
Rail route understanding, while deduction begins with
2001– 4000 8001–10 000
Rail station theory and principle as a basis for looking at data.
4001– 6000

In Section 4.3 we saw that spatial autocorrelation


Figure 4.8 Discontinuities in a retail catchment (Source: measures the extent to which similarities in position
Adapted from Birkin M., Clarke G. P., Clarke M. and Wilson match similarities in attributes. Methods of measuring
A. G. (1996) Intelligent GIS. Cambridge, UK: GeoInformation
spatial autocorrelation depend on the types of objects
International). Reproduced by permission of John Wiley &
used as the basis of a representation, and as we saw
Sons Inc.
in Section 4.2, the scale of attribute measurement is
important too. Interpretation depends on how the objects
minute drive time at average speed. Yet in practice, the relate to our conceptualization of the phenomena they
catchment also depends upon: represent. If the phenomenon of interest is conceived
■ physical factors, such as rivers and relief; as a field, then spatial autocorrelation measures the
smoothness of the field using data from the sample
■ road and rail infrastructure and associated capacity,
points, lines, or areas that represent the field. If the
congestion, and access (e.g., rail stations and road
phenomena of interest are conceived as discrete objects,
access ramps);
then spatial autocorrelation measures how the attribute
■ socio-economic factors, that are manifest in values are distributed among the objects, distinguishing
differences in customer store preferences; between arrangements that are clustered, random, and
■ administrative geographies, that modify the shape of locally contrasting. Figure 4.11 shows examples of each
the circle because census counts of population are of the four object types, with associated attributes, chosen
only available for administrative zones; to represent situations in which a scientist might wish
■ overlapping trade areas of competing stores, that are to measure spatial autocorrelation. The point data in
likely to truncate the trade area from Figure 4.11A comprise data on well bores over an area of
particular directions; 30 km2 , and together provide information on the depth of
■ a demand constraint, that requires probabilities of an aquifer beneath the surface (the blue shading identifies
patronizing all available stores at any point to sum to those within a given threshold). We would expect values
1 (unless people opt out of shopping). to exhibit strong spatial autocorrelation, with departures
from this indicative of changes in bedrock structure or
Additionally, the population base to Figure 4.8 raises form. The line data in Figure 4.11B present numbers of
an important issue of the representation of spatial struc- accidents for links of road over a lengthy survey period
ture. Remember that we said that the circular retail catch- in the Southwestern Ontario, Canada, provincial highway
ment had to be adapted to fit the administrative geogra- network. Low spatial autocorrelation in these statistics
phy of population enumeration. The distribution of pop- implies that local causative factors (such as badly laid
ulation is shown using choropleth mapping (Box 4.3), out junctions) account for most accidents, whereas strong
which implicitly assumes that the mapped property is uni- spatial autocorrelation would imply a more regional scale
formly distributed within zones and that the only impor- of variation, implying a link between accident rates and
tant changes in distribution take place at zone boundaries. lifestyles, climate, or population density. The area data
Such representations can obscure continuous variations in Figure 4.11C illustrate the socio-economic patterning
and mask the true pattern of distance attenuation. of the south east of England, and beg the question of
96 PART II PRINCIPLES

Technical Box 4.3

Isopleth and choropleth maps


Isopleth maps are used to visualize phenomena areas, such as counties or census tracts. Each area
that are conceptualized as fields, and measured is colored, shaded, or cross-hatched to symbolize
on interval or ratio scales. An isoline connects the value of a specific variable, as in Figure 4.10.
points with equal attribute values, such Geographic rules define what happens to the
as contour lines (equal height above sea properties of objects when they are split or
level), isohyets (points of equal precipitation), merged (e.g., see Section 13.3.3). Figure 4.10
isochrones (points of equal travel time), or compares a map of total population (a spatially
isodapanes (points of equal transport cost). extensive variable) with a map of population
Figure 4.9 illustrates the procedures that are density (a spatially intensive variable). Spatially
used to create a surface about a set of point extensive variables take values that are true
measurements (Figure 4.9A), such as might be only of entire areas, such as total population,
collected from rain gauges across a study region or total number of children under 5 years of
(and see Section 14.4.4 for more technical detail age. They are highly misleading – the same color
on the process of spatial interpolation). A is applied uniformly to each part of an area,
parsimonious number of user-defined values yet we know that the mapped property cannot
is identified to define the contour intervals be true of each part of the area. The values
(Figure 4.9B). The GIS then interpolates a taken by spatially intensive variables could
contour between point observations of greater potentially be true of every part of an area, if
and lesser value (Figure 4.9C) using standard the area were homogeneous – examples include
procedures of inference, and the other contours densities, rates, or proportions. Conceptually, a
are then interpolated using the same procedure spatially intensive variable is a field, averaged
(Figure 4.9D). Hue or shading can be added to over each area, whereas a spatially extensive
improve user interpretability (Figure 4.9E). variable is a field of density whose values
Choropleth maps are constructed from values are summed or integrated to obtain each
describing the properties of non-overlapping area’s value.

.104 .93 .76 .104 .93 .76


.54 .28 .54 .28
.83 .83 .66
.66 .34 .34
.76 .54 .76 .54
.93 .93

.45 31– 40 .45 .45


.45
.83
.56
41– 50 .83
.56
.77 .52 .77 .52
.53 51– 60 .53

61– 70
.76 .73 .65 .56 71– 80 .76 .73 .65 .56

81– 90
.73 .74 .63 .73 .74 .63 .56
.83
.56 91–100 .83
(A) (B) (C)
.104 .93 .76 .54 .28
0 0
10 10
30 30
.83 .66 .34
90 .76 .54 40 90 40
.93
80

80
70

70

.45 31– 40
60

60

.45
.83 50 50 41– 50
.56
.77 .52
.53 51– 60
60 60
61– 70
70 70
.76 .73 .65 .56 71– 80
80 80 81– 90
.73 .74 .63 .56
.83 91–100
(D) (E)

Figure 4.9 The creation of isopleth maps: (A) point attribute values; (B) user-defined classes; (C) interpolation of class
boundary between points; (D) addition and labeling of other class boundaries; and (E) use of hue to enhance perception of
trends (after Kraak and Ormeling 1996: 161)
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 97

(A)

Population by ward (2001)


106 –5000
5001–10000
10001–11500
11501–13500
13501–17261

Kilometres
0 2 4 8 12 16

(B)

Pop den (per sq km)


162–4000
4001–6500
6501–10000
10001–13500
13501–21026
Kilometres
0 2 4 8 12 16

Figure 4.10 Choropleth maps of (A) a spatially extensive variable, total population, and (B) a related but spatially
intensive variable, population density. Many cartographers would argue that (A) is misleading, and that spatially extensive
variables should always be converted to spatially intensive form (as densities, ratios, or proportions) before being displayed
as choropleth maps. (Reproduced with permission of Daryl Lloyd)
98 PART II PRINCIPLES

(A)

(B)
0
0.1 – 1.5
1.6 – 2.1
2.2 – 3.0
3.1 – 4.1
4.2 – 10.0

Node

Figure 4.11 Situations in which a scientist might want to measure spatial autocorrelation: (A) point data (wells with attributes
stored in a spreadsheet); (B) line data (accident rates in the Southwestern Ontario provincial highway network); (C) area data
(percentage of population that are old age pensioners (OAPs) in South East England); and (D) volume data (elevation and volume of
buildings in Seattle). (A) and (D) courtesy ESRI; (C) reproduced with permission of Daryl Lloyd
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 99

(C)

Clacton on Sea

London

Hastings

Eastbourne
Ea
Bognor Regis
Bournemouth
OAPs as % of pop 20–25
2.5 –15 25–35
Kilometres 15–20
0 10 20 40 60 80 35–45
45–70

(D)

Figure 4.11 (continued)


100 PART II PRINCIPLES

Technical Box 4.4

Measuring similarity between neighbors


In the simple example shown in Figure 4.12, we using four of the attribute types (nominal,
compare neighboring values of spatial attributes ordinal, interval, and ratio, but not, in practice,
by defining a weights matrix W in which each cyclic) in Box 3.3 and the dimensioned classes
element wij measures the locational similarity of of spatial objects in Box 4.1. Any measure of
i and j (i identifies the row and j the column spatial autocorrelation seeks to compare a set of
of the matrix). We use a simple measure of locational similarities wij (contained in a weights
contiguity, coding wij = 1 if regions i and j are matrix) with a corresponding set of attribute
contiguous and wij = 0 otherwise. wii is set equal similarities cij , combining them into a single
to 0 for all i. This is shown in Table 4.1. index in the form of a cross-product:

cij wij ,
i j
1

4 This expression is the total obtained by


2 multiplying every cell in the W matrix with
3 its corresponding entry in the C matrix,
5
and summing.
7 There are different ways of measuring
6 the attribute similarities, cij , depending upon
whether they are measured on the nominal,
8 ordinal, interval, or ratio scale. For nominal data,
the usual approach is to set cij to 1 if i and j take
Figure 4.12 A simple mosaic of zones the same attribute value, and zero otherwise.
For ordinal data, similarity is usually based on
comparing the ranks of i and j. For interval and
Table 4.1 The weights matrix W derived from the zoning ratio data, the attribute of interest is denoted
system shown in Figure 4.12
zi , and the product (zi − z)(zj − z) is calculated,
1 2 3 4 5 6 7 8 where z denotes the average of the zs.
1 0 1 1 1 0 0 0 0 One of the most widely used spatial
2 1 0 1 0 0 1 1 0 autocorrelation statistics for the case of area
3 1 1 0 1 1 1 0 0 objects and interval-scale attributes is the Moran
4 1 0 1 0 1 0 0 0 Index. This is positive when nearby areas tend to
5 0 0 1 1 0 1 0 1 be similar in attributes, negative when they tend
6 0 1 1 0 1 0 1 1 to be more dissimilar than one might expect,
7 0 1 0 0 0 1 0 1 and approximately zero when attribute values
8 0 0 0 0 1 1 1 0 are arranged randomly and independently in
space. It is given by the expression:
The weights matrix provides a simple way 
of representing similarities between location n wij (zi − z)(zj − z)
and attribute values, in a region of contiguous i j
I=  
areal objects. Autocorrelation is identified by wij (zi − z)2
the presence of neighboring cells or zones i j i
that take the same (binary) attribute value.
More sophisticated measures of wij include where n is the number of areal objects
a decreasing function (such as one of those in the set. This brief exposition is provided
shown in Figure 4.7) of the straight line distance at this point to emphasize the way in
between points at the centers of zones, or the which spatial autocorrelation measures are
lengths of common boundaries. A range of able to accommodate attributes scaled as
different spatial metrics may also be used, such nominal, ordinal, interval, and ratio data, and
as existence of linkage by air, or a decreasing to illustrate that there is flexibility in the
function of travel time by air, road, or rail, or nature of contiguity (or adjacency) relations
the strength of linkages between individuals or that may be specified. Further techniques for
firms on some (non-spatial) network. measuring spatial autocorrelation are reviewed
The weights matrix makes it possible to in connection with spatial interpolation in
develop measures of spatial autocorrelation Section 14.4.4.
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 101
whether, at a regional scale, there are commonalties in In a formal statistical sense, regression analysis allows
household structure. The volume data in Figure 4.11D us to identify the dependence of one variable upon one
allow some measure of the spatial autocorrelation of high- or more independent variables. For example, we might
rise structures to be made, perhaps as part of a study of hypothesize that the value of individual properties in a
the way that the urban core of Seattle functions. The way city is dependent upon a number of variables such as
that spatial autocorrelation might actually be calculated floor area, distance to local facilities such as parks and
for the data used to construct Figure 4.11C is described schools, standard of repair, local pollution levels, and so
in Box 4.4. forth. Formally this may be written:

Y = f (X1 , X2 , X3 , . . . , XK )

where Y is the dependent variable and X1 through


4.7 Establishing dependence XK are all of the possible independent variables that
in space might impact upon property value. It is important to
note that it is the independent variables that together
affect the dependent variable, and that the hypothe-
Spatial autocorrelation measures tell us about the inter- sized causal relationship is one way – that is, that prop-
relatedness of phenomena across space, one attribute at a erty value is responsive to floor area, distance to local
time. Another important facet to the nature of geographic facilities, standard of repair, and pollution, and not
data is the tendency for relationships to exist between dif- vice versa. For this reason the dependent variable is
ferent phenomena at the same place – between the values termed the response variable and the independent vari-
of two different fields, between two attributes of a set of ables are termed predictor variables in some statis-
discrete objects, or between the attributes of overlapping tics textbooks.
discrete objects. This section introduces one of the ways In practice, of course, we will never successfully pre-
of describing such relationships (see also Box 4.5). dict the exact values of any sample of properties. We can
identify two broad classes of reasons why this might be
How the various properties of a location are the case. First, a property price outcome is the response
related is an important aspect of the nature of to a huge range of factors, and it is likely that we will
geographic data. have evidence of and be able to measure only a small

Biographical Box 4.5

Dawn Wright, marine geographer

Dawn Wright (a.k.a. ‘Deepsea Dawn’ by colleagues and friends:


Figure 4.13) is a professor of Geography and Oceanography at Oregon
State University (OrSt) in Corvallis, Oregon, USA, where she also
directs Davey Jones Locker, a seafloor mapping and marine GIS
research laboratory.
Shortly after the deepsea vehicle Argo I was used to discover the
HMS Titanic in 1986, Dawn used some of the first GIS datasets that
it collected to develop her Ph.D. at the University of California, Santa
Barbara. It was then that she became acutely aware of the challenges
of applying GIS to deep ocean environments. When we discuss the
nature of the Earth’s surface (this chapter) and the way in which it
is georeferenced (Chapter 5), we implicitly assume that it is above sea
level. Dawn has written widely on the nature of geographic data with
regard to the entirety of the Earth’s surface – especially the 70% covered
by water. Research issues endemic to oceanographic applications of GIS
include the handling of spatial data structures that can vary their relative
positions and values over time, geostatistical interpolation (Box 4.3 and
Section 14.4.4) of data that are sparser in one dimension as compared
Figure 4.13 Dawn Wright, marine
to the others, volumetric analysis, and the input and management of geographer, and friend Lydia
very large spatial databases. Dawn’s research has described the range of
these issues and applications, as well as recent advances in marine map-making, charting, and scientific
visualization.
102 PART II PRINCIPLES

Dawn remains a strong advocate of the potential of these issues to not only advance the body of
knowledge in GIS design and architecture, but also to inform many of the long-standing research challenges
of geographic information science. She says, ‘The ocean forces us to think about the nature of geographic
data in different ways and to consider radically different ways of representing space and time – we have
to go ‘‘super-dimensional’’ to get our minds and our maps around the natural processes at work. We
cannot fully rely on the absolute coordinate systems that are so familiar to us in a GIS, or ignore the
dissimilarity between the horizontal and the vertical dimension when measuring geographic features and
objects. How deep is the ocean at any precise moment in time? How do we represent all of the relevant
attributes of the habitat of marine mammals? How can we enforce marine protected area boundaries at
depth? Much has been written about the importance of error and uncertainty in geographic analysis. The
challenge of gathering data in dynamic marine environments using platforms that are constantly in motion
in all directions (roll, pitch, yaw, heave), or of tracking fish, mammals, and birds at sea, creates critical
challenges in managing uncertainty in marine position.’ These issues of uncertainty (see Chapter 6) also have
implications for the establishment of dependence in space (Section 4.7). Dawn and her students continue to
develop methods, techniques, and tools for handling data in GIS, but with a unique oceanographic take on
data modeling, geocomputation, and the incorporation of spatio-temporal data standards and protocols.
Take a dive into Davey Jones Locker to learn more (dusk.geo.orst.edu/djl).

subset of these. Second, even if we were able to identify


and measure every single relevant independent variable, Yi = b0 + b1Xi1 + ei
we would in practice only be able to do so to a given
level of measurement precision (for a more detailed dis- ei {
(property value)

cussion of what we mean by precision see Box 6.3 and


Section 6.3.2.2). Such caveats do not undermine the wider
Y

rationale for trying to generalize, since any assessment of


the effects of variables we know about is better than no
assessment at all. But our conceptual solution to the prob-
lems of unknown and imprecisely measured variables is
to subsume them all within a statistical error term, and to
revise our regression model so that it looks like this:
b0
{
X (floorspace)

Y = f (X1 , X2 , X3 , . . . , XK ) + ε Figure 4.14 The fit of a regression line to a scatter of points,


showing intercept, slope and error terms
where ε denotes the error term.
We assume that this relationship holds for each case exhibits an upward trend, suggesting that the response to
(which we denote using the subscript i) in our population increased floorspace is a higher property price. A best fit
of interest, and thus: line has been drawn through this scatter of points. The
gradient of this line is calculated as the b parameter of
Yi = f (Xi1 , Xi2 , Xi3 , . . . , XiK ) + εi the regression, and the upward trend of the regression
line means that the gradient is positive. The greater the
magnitude of the b parameter, the stronger the (in this case
The essential task of regression analysis is to identify
positive) effect of marginal increases in the X variable.
the direction and strength of the association implied by
The value where the regression line intersects the Y
this equation. This becomes apparent if we rewrite it as:
axis identifies the property value when floorspace is zero
(which can be thought of as the value of the land parcel
Yi = b0 + b1 Xi1 + b2 Xi2 + b3 Xi3 + · · · + bK XiK + εi when no property is built upon it), and gives us the
intercept value b0 . The more general multiple regression
where b1 through bK are termed regression parameters, case works by extension of this principle, and each of the
which measure the direction and strength of the influence b parameters gauges the marginal effects of its respective
of the independent variables X1 through XK on Y . b0 is X variable.
termed the constant or intercept term. This is illustrated This kind of effect of floorspace area upon property
in simplified form as a scatterplot in Figure 4.14. Here, value is intuitively plausible, and a survey of any sample
for reasons of clarity, the values of the dependent (Y ) of individual properties is likely to yield the kind of
variable (property value) are regressed and plotted against well-behaved plot illustrated in Figure 4.14. In other
just one independent (X) variable (floorspace; for more cases the overall trend may not be as unambiguous.
on scatterplots see Section 14.2). The scatter of points Figure 4.15A presents a hypothetical plot of the effect of
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 103

(A) other effects might be accommodated by changing the


intrinsic functional form of the model.

A straight line or linear distance relationship is the


(property value)

easiest assumption to make and analyze, but it


may not be the correct one.
Y

Figure 4.16 identifies the discrepancy between one


observed property value and the value that is predicted
by the regression line. This difference can be thought
of as the error term for individual property i (strictly
speaking, it is termed a residual when the scatterplot
depicts a sample and not a population). The precise slope
X (distance to local school) and intercept of the best fit line is usually identified
using the principle of ordinary least squares (OLS).
(B) OLS regression fits the line through the scatter of
points such that the sum of squared residuals across
the entire sample is minimized. This procedure is robust
(property value)

and statistically efficient, and yields estimates of the b


parameters. But in many situations it is common to try
to go further, by generalizing results. Suppose the data
Y

being analyzed can be considered a representative sample


of some larger group. In the field case, sample points
might be representative of all of the infinite number of
sample points one might select to represent the continuous
variation of the field variable (for example, weather
stations measuring temperature in new locations, or more
soil pits dug to measure soil properties). In the discrete
X (distance to local school) object case, the data analyzed might be only a selection of
all of the objects. If this is the case, then statistics provides
Figure 4.15 (A) A scatterplot and (B) hypothetical
methods for making accurate and unbiased statements
relationship between distance to local school and domestic
property value
about these larger populations.

Generalization is the process of reasoning from the


nature of a sample to the nature of a larger group.
distance to a local school (measured perhaps as straight
line distance; see Section 14.3.1 for more on measuring For this to work, several conditions have to hold.
distance in a GIS) upon property value. Here the plot First, the sample must be representative, which means
is less well behaved: the overall fit of the regression for example that every case (element) in the larger group
line is not as good as it might be, and a number of or population has a prespecified and independent chance
poorly fitting observations (termed high-residual and high- of being selected. The sampling designs discussed in
leverage points) present exceptions to a weak general Section 4.4 are one way of ensuring this. But all too
trend. A number of formal statistical measures (notably
t statistics and the R 2 measure) as well as less formal
diagnostic procedures exist to gauge the statistical fit of
the regression model to the data. Details of these can be
found in any introductory statistics text, and will not be Park
examined here.
It is easiest to assume that a relationship between two
variables can be described by a straight line or linear
equation, and that assumption has been followed in this
discussion. But although a straight line may be a good
first approximation to other functional forms (curves,
for example), there is no reason to suppose that linear Principal
road
relationships represent the truth, in other words, how the Sewage
world’s social and physical variables are actually related. works
For example, it might be that very close proximity to positive residual
the school has a negative effect upon property value other points
(because of noise, car parking, and other localized
negative residual
nuisance) and it is properties at intermediate distances
that gain the greatest positive neighborhood effect from Figure 4.16 A hypothetical spatial pattern of residuals from a
this amenity. This is shown in Figure 4.15B: these and regression analysis
104 PART II PRINCIPLES
often in the analysis of geographic data it turns out to clearly have similarities, whether population was depen-
be difficult or impossible to imagine such a population. It dent on agricultural production and thus rainfall or tended
is inappropriate, for example, to try to generalize from one to avoid steep slopes and high elevations where rainfall
study to statements about all of the Earth’s surface, if the was also highest.
study was conducted in one area. Generalizations based on
samples taken in Antarctica are clearly not representative It is almost impossible to imagine that two maps
of all of the Earth’s surface. Often GIS provide complete of different phenomena over the same area would
coverage of an area, allowing us to analyze all of not reveal some similarities.
the census tracts in a city, or all of the provinces of
From this brief overview, it should be clear that there
China. In such cases the apparatus of generalization
are many important questions about the applicability of
from samples to populations is unnecessary, and indeed,
such procedures to establishing statistical relationships
becomes meaningless.
using spatial data. We return to discuss these in more
In addition, the statistical apparatus that allows us to
detail in Section 15.4.
make inferences assumes that there is no autocorrelation
between errors across space or time. This assumption
clearly does not accord with Tobler’s Law, where the
greater relatedness of near things to one another than
distant things is manifest in positive spatial autocorrela-
tion. If strong (positive or negative) spatial autocorrelation 4.8 Taming geographic monsters
is present, the inference apparatus of the ordinary least
squares regression procedure rapidly breaks down. The
consequence of this is that estimates of the population b Thus far in our discussion of the nature of geographic
parameters become imprecise and the statistical validity data we have assumed that spatial variation is smooth
of the tests used to confirm the strength and direction of and continuous, apart from when we encounter abrupt
apparent relationships is seriously weakened. truncations and discrete shifts at boundaries. However,
much spatial variation is not smooth and continuous, but
The assumption of zero spatial autocorrelation rather is jagged and apparently irregular. The processes
that is made by many methods of statistical which give rise to the form of a mountain range produce
inference is in direct contradiction to Tobler’s Law. features that are spatially autocorrelated (for example, the
highest peaks tend to be clustered), yet it would be wholly
The spatial patterning of residuals can provide clues inappropriate to represent a mountainscape using smooth
as to whether the structure of space has been correctly interpolation between peaks and valley troughs.
specified in the regression equation. Figure 4.16 illustrates Jagged irregularity is a property which is also often
the hypothetical spatial distribution of residuals in our observed across a range of scales, and detailed irregular-
property value example – the high clustering of negative ity may resemble coarse irregularity in shape, structure,
residuals around the school suggests that some distance and form. We commented on this in Section 4.3 when
threshold should be added to the specification, or some we suggested that a rock broken off a mountain may, for
function that negatively weights property values that are reasons of lithology, represent the mountain in form, and
very close to the school. The spatial clustering of residuals this property is often termed self-similarity. Urban geog-
can also help to suggest omitted variables that should raphers also recognize that cities and city systems are also
have been included in the regression specification. Such self-similar in organization across a range of scales, and
variables might include the distance to a neighborhood the ways in which this echoes many of the earlier ideas of
facility that might have strong positive (e.g., a park) or Christaller’s Central Place Theory have been discussed in
negative (e.g., a sewage works) effect upon values in our the academic literature. It is unlikely that idealized smooth
property example. curves and conventional mathematical functions will pro-
A second assumption of the multiple regression model vide useful representations for self-similar, irregular spa-
is that there is no intercorrelation between the independent tial structures: at what scale, if any, does it become appro-
variables, that is, that no two or more variables essentially priate to approximate the San Andreas Fault system by a
measure the same construct. The statistical term for such continuous curve? Urban geographers, for example, have
intercorrelation is multicollinearity, and this is a particular long sought to represent the apparent decline in population
problem in GIS applications. GIS is a powerful technol- density with distance from historic central business dis-
ogy for combining information about a place, and for tricts (CBDs), yet the three-dimensional profiles of cities
examining relationships between attributes, whether they are characterized by urban canyons between irregularly
be conceptualized as fields, or as attributes of discrete spaced high-rise buildings (Figure 4.11D). Each of these
objects. The implication is that each attribute makes a phenomena is characterized by spatial trends (the largest
distinct contribution to the total picture of geographic vari- faults, the largest mountains, and the largest skyscrap-
ability. In practice, however, geographic layers are almost ers tend to be close to one another), but they are not
always highly correlated. It is very difficult to imagine that contiguous and smoothly joined, and the kinds of sur-
two fields representing different variables over the same face functions shown in Figure 4.7 present inappropriate
geographic area would not somehow reveal their common generalizations of their structure.
geographic location through similar patterns. For example, For many years, such features were considered
a map of rainfall and a map of population density would geometrical monsters that defied intuition. More recently,
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 105
however, a more general geometry of the irregular, In a self-similar object, each part has the same
termed fractal geometry by Benoı̂t Mandelbrot, has come nature as the whole.
to provide a more appropriate and general means of
summarizing the structure and character of spatial objects. Fractal ideas are important, and for many phenomena
Fractals can be thought of as geometric objects that are, a measurement of fractal dimension is as important as
literally, between Euclidean dimensions, as described in measures of spatial autocorrelation, or of medians and
Box 4.6. modes in standard statistics. An important application of

Technical Box 4.6

The strange story of the lengths of geographic objects


How long is the coastline of Maine (Figure 4.17)?
(A) N
(Benoı̂t Mandelbrot, a French mathematician,
originally posed this question in 1967 with
regard to the coastline of Great Britain.) r0 = 4
N0 = 3.4

(B)

r1 = 2
N1 = 7.1

(C)

r2 = 1
N2 = 16.6
(D)

Figure 4.17 Part of the Maine coastline 0 25 50 75 miles


0 25 50 75 100 km

We might begin to measure the stretch of Figure 4.18 The coastline of Maine, at three levels of
coastline shown in Figure 4.18A. With dividers recursion: (A) the base curve of the coastline;
set to measure 100 km intervals, we would take (B) approximation using 100 km steps; (C) 50 km step
approximately 3.4 swings and record a length of approximation; and (D) 25 km step approximation
340 km (Figure 4.18B).
If we then halved the divider span so as If we were to use dividers, or even microscopic
to measure 50 km swings, we would take measuring devices, to measure every last grain
approximately 7.1 swings and the measured of sand or earth particle, our recorded length
length would increase to 355 km (Figure 4.18C). measurement would stretch towards infinity,
If we halved the divider span once again seemingly without limit.
to measure 25 km swings, we would take In short, the answer to our question is that the
approximately 16.6 swings and the measured length of the Maine coastline is indeterminate.
length would increase still further to 415 km More helpfully, perhaps, any approximation is
(Figure 4.18D). scale-dependent – and thus any measurement
And so on until the divider span was so must also specify scale. The line representation
small that it picked up all of the detail on this of the coastline also possesses two other proper-
particular representation of the coastline. But ties. First, where small deviations about the over-
even that would not be the end of the story. all trend of the coastline resemble larger devia-
If we were to resort instead to field tions in form, the coast is said to be self-similar.
measurement, using a tape measure or the Second, as the path of the coast traverses space,
Distance Measuring Instruments (DMIs) used its intricate structure comes to fill up more space
by highway departments, the length would than a one-dimensional straight line but less
increase still further, as we picked up detail space than a two-dimensional area. As such, it is
that even the most detailed maps do not seek said to be of fractional dimension (and is termed
to represent. a fractal) between 1 (a line) and 2 (an area).
106 PART II PRINCIPLES
fractal concepts is discussed in Section 15.2.5, and we
return again to the issue of length estimation in GIS in
Section 14.3.1. Ascertaining the fractal dimension of an
object involves identifying the scaling relation between its

In (L)
length or extent and the yardstick (or level of detail) that
is used to measure it. Regression analysis, as described
in the previous section, provides one (of many) means of
establishing this relationship. If we return to the Maine
coastline example in Figures 4.17 and 4.18, we might
obtain scale dependent coast length estimates (L) of 13.6
(4 × 3.4), 14.1 (2 × 7.1) and 16.6 (1 × 16.6) units for
the step lengths (r) used in Figures 4.18B, 4.18C and In (R)
4.18D respectively. (It is arbitrary whether the steps are
Figure 4.19 The relationship between recorded length (L) and
measured in miles or kilometers.) If we then plot the
step length (R)
natural log of L (on the y-axis) against the natural log or
r for these and other values, we will build up a scatterplot
like that shown in Figure 4.19. If the points lie more or Tobler’s Law presents an elementary general rule about
less on a straight line and we fit a regression line through spatial structure, and a starting point for the measurement
it, the value of the slope (b) parameter is equal to (1 − D), and simulation of spatially autocorrelated structures. This
where D is the fractal dimension of the line. This method in turn assists us in devising appropriate spatial sampling
for analyzing the nature of geographic lines was originally schemes and creating improved representations, which tell
developed by Lewis Fry Richardson (Box 4.7). us still more about the real world and how we might
represent it. A goal of GIS is often to establish causality
between different geographically referenced data, and
the multiple regression model potentially provides one
means of relating spatial variables to one another, and of
4.9 Induction and deduction and inferring from samples to the properties of the populations
from which they were drawn. Yet statistical techniques
how it all comes together often need to be recast in order to accommodate the
special properties of spatial data, and regression analysis
is no exception in this regard.
The abiding message of this chapter is that spatial is Spatial data provide the foundations to operational
special – that geographic data have a unique nature. and strategic applications of GIS, foundations that must

Biographical Box 4.7

Lewis Fry Richardson

Lewis Fry Richardson (1881–1953: Figure 4.20) was one of the founding
fathers of the ideas of scaling and fractals. He was brought up a Quaker,
and after earning a degree at Cambridge University went to work for the
Meteorological Office, but his pacifist beliefs forced him to leave in 1920
when the Meteorological Office was militarized under the Air Ministry.
His early work on how atmospheric turbulence is related at different
scales established his scientific reputation. Later he became interested
in the causes of war and human conflict, and in order to pursue one
of his investigations found that he needed a rigorous way of defining
the length of a boundary between two states. Unfortunately published
lengths tended to vary dramatically, a specific instance being the difference
between the lengths of the Spanish–Portuguese border as stated by Spain
and by Portugal. He developed a method of walking a pair of dividers
along a mapped line, and analyzed the relationship between the length Figure 4.20 Lewis Fry Richardson:
estimate and the setting of the dividers, finding remarkable predictability. the formalization of scale effects
In the 1960s Benoı̂t Mandelbrot’s concept of fractals finally provided the
theoretical framework needed to understand this result.
CHAPTER 4 THE NATURE OF GEOGRAPHIC DATA 107
be used creatively yet rigorously if they are to sup- data allows us to use induction (reasoning from obser-
port the spatial analysis superstructure that we wish to vations) and deduction (reasoning from principles and
erect. This entails much more than technical competence theory) alongside each other to develop effective spatial
with software. An understanding of the nature of spatial representations that are safe to use.

Questions for further study Further reading


Batty M. and Longley P.A. 1994 Fractal Cities: A Geom-
1. Many jurisdictions tout the number of miles of etry of Form and Function. London: Academic Press.
shoreline in their community – for example, Ottawa Mandelbrot B.B. 1983 The Fractal Geometry of Nature.
County, Ohio, USA claims 107 miles of Lake Erie San Francisco: Freeman.
shoreline. What does this mean, and how could you Quattrochi D.A. and Goodchild M.F. (eds) 1996 Scale in
make it more meaningful? Remote Sensing and GIS. Boca Raton, Florida: Lewis
2. The apparatus of inference was developed by Publishers.
statisticians because they wanted to be able to reason Tate N.J. and Atkinson P.M. (eds) 2001 Modelling Scale
from the results of experiments involving small in Geographical Information Science. Chichester:
samples to make conclusions about the results of Wiley.
much larger, hypothetical experiments – for example, Wright D. and Bartlett D. (eds) 2000 Marine and Coastal
in using samples to test the effects of drugs. Geographical Information Systems. London: Taylor
Summarize the problems inherent in using this and Francis.
apparatus for geographic data in your own words. Wright D. 2002 Undersea with GIS. Redlands, CA: ESRI
3. How many definitions and uses of the word scale can Press.
you identify?
4. What important aspects of the nature of geographic
data have not been covered in this chapter?
5 Georeferencing

Geographic location is the element that distinguishes geographic information


from all other types, so methods for specifying location on the Earth’s surface
are essential to the creation of useful geographic information. Humanity
has developed many such techniques over the centuries, and this chapter
provides a basic guide for GIS students – what you need to know about
georeferencing to succeed in GIS. The first section lays out the principles of
georeferencing, including the requirements that any effective system must
satisfy. Subsequent sections discuss commonly used systems, starting with
the ones closest to everyday human experience, including placenames and
street addresses, and moving to the more accurate scientific methods that
form the basis of geodesy and surveying. The final sections deal with issues
that arise over conversions between georeferencing systems, with the Global
Positioning System (GPS), with georeferencing of computers and cellphones,
and with the concept of a gazetteer.

Geographic Information Systems and Science, 2nd edition Paul Longley, Michael Goodchild, David Maguire, and David Rhind.
 2005 John Wiley & Sons, Ltd. ISBNs: 0-470-87000-1 (HB); 0-470-87001-X (PB)
110 PART II PRINCIPLES

Learning Objectives say that facts have been georeferenced or geocoded. We


talk about tagging records with geographic locations, or
about locating them. The term georeference will be used
By the end of this chapter you will: throughout this chapter.
The primary requirements of a georeference are that
it must be unique, so that there is only one location
■ Know the requirements for an effective associated with a given georeference, and therefore no
system of georeferencing; confusion about the location that is referenced; and that
its meaning be shared among all of the people who wish
■ Be familiar with the problems associated to work with the information, including their geographic
information systems. For example, the georeference 909
with placenames, street addresses, and West Campus Lane, Goleta, California, USA points to a
other systems used every day by humans; single house – there is no other house anywhere on Earth
with that address – and its meaning is shared sufficiently
■ Know how the Earth is measured and widely to allow mail to be delivered to the address from
virtually anywhere on the planet. The address may not
modeled for the purposes of positioning;
be meaningful to everyone living in China, but it will
be meaningful to a sufficient number of people within
■ Know the basic principles of map China’s postal service, so a letter mailed from China
projections, and the details of some to that address will likely be delivered successfully.
commonly used projections; Uniqueness and shared meaning are sufficient also to
allow people to link different kinds of information based
on common location: for example, a driving record that is
■ Understand the principles behind GPS, and georeferenced by street address can be linked to a record
some of its applications. of purchasing. The negative implications of this kind of
record linking for human privacy are discussed further by
Mark Monmonier (see Box 5.2).
To be as useful as possible a georeference must
be persistent through time, because it would be very
5.1 Introduction confusing if georeferences changed frequently, and very
expensive to update all of the records that depend on
them. This can be problematic when a georeferencing
Chapter 3 introduced the idea of an atomic element of system serves more than one purpose, or is used by
geographic information: a triple of location, optionally more than one agency with different priorities. For
time, and attribute. To make GIS work there must be example, a municipality may expand by incorporating
techniques for assigning values to all three of these, more land, creating problems for mapping agencies,
in ways that are understood commonly by people who and for researchers who wish to study the municipality
wish to communicate. Almost all the world agrees on through time. Street names sometimes change, and postal
a common calendar and time system, so there are agencies sometimes revise postal codes. Changes even
only minor problems associated with communicating that occur in the names of cities (Saigon to Ho Chi Minh City),
element of the atom when it is needed (although different or in their conventional transcriptions into the Roman
time zones, different names of the months in different alphabet (Peking to Beijing).
languages, the annual switch to Summer or Daylight
To be most useful, georeferences should stay
Saving Time, and systems such as the classical Japanese
convention of dating by the year of the Emperor’s reign constant through time.
all sometimes manage to confuse us). Every georeference has an associated spatial resolution
Time is optional in a GIS, but location is not, so this (Section 3.4), equal to the size of the area that is assigned
chapter focuses on techniques for specifying location, and that georeference. A mailing address could be said to
the problems and issues that arise. Locations are the basis have a spatial resolution equal to the size of the mailbox,
for many of the benefits of GIS: the ability to map, to or perhaps to the area of the parcel of land or structure
tie different kinds of information together because they assigned that address. A US state has a spatial resolution
refer to the same place, or to measure distances and that varies from the size of Rhode Island to that of Alaska,
areas. Without locations, data are said to be non-spatial and many other systems of georeferencing have similarly
or aspatial and would have no value at all within a wide-ranging spatial resolutions.
geographic information system. Many systems of georeferencing are unique only
Time is an optional element in geographic within an area or domain of the Earth’s surface. For
example, there are many cities with the name Springfield
information, but location is essential.
in the USA (18 according to a recent edition of the
Commonly, several terms are used to describe the act Rand McNally Road Atlas; similarly there are nine
of assigning locations to atoms of information. We use the places called Whitchurch in the 2003 AA Road Atlas
verbs to georeference, to geolocate, and to geocode, and of the United Kingdom). City name is unique within
CHAPTER 5 GEOREFERENCI NG 111
the domain of a US state, however, a property that
was engineered with the advent of the postal system
in the 19th century. Today there is no danger of
there being two Springfields in Massachusetts, and a
driver can confidently ask for directions to ‘Springfield,
Massachusetts’ in the knowledge that there is no danger of
being sent to the wrong Springfield. But people living in
London, Ontario, Canada are well aware of the dangers of
talking about ‘London’ without specifying the appropriate
domain. Even in Toronto, Ontario a reference to ‘London’
may be misinterpreted as a reference to the older (UK)
London on a different continent, rather than to the one
200 km away in the same province (Figure 5.1). Street
name is unique in the USA within municipal domains,
but not within larger domains such as county or state.
The six digits of a UK National Grid reference repeat Figure 5.1 Placenames are not necessarily unique at the global
level – there are many Londons, for example, besides the
every 100 km, so additional letters are needed to achieve
largest and most prominent one in the UK. People living in
uniqueness within the national domain (see Box 5.1).
other Londons must often add additional information (e.g.,
Similarly there are 120 places on the Earth’s surface with
London, Ontario, Canada) to resolve ambiguity
the same Universal Transverse Mercator coordinates (see
Section 5.7.2), and a zone number and hemisphere must
be added to make a reference unique in the global domain. to compute distances, a very important requirement of
While some georeferences are based on simple names, georeferencing in GIS.
others are based on various kinds of measurements, and
are called metric georeferences. They include latitude and Metric georeferences are much more useful,
longitude and various kinds of coordinate systems, all because they allow maps to be made and distances
of which are discussed in more detail below, and are to be calculated.
essential to the making of maps and the display of mapped
information in GIS. One enormous advantage of such Other systems simply order locations. In most coun-
systems is that they provide the potential for infinitely fine tries mailing addresses are ordered along streets, often
spatial resolution: provided we have sufficiently accurate using the odd integers for addresses on one side and the
measuring devices, and use enough decimal places, it even integers for addresses on the other. This means that
is possible with such systems to locate information to it is possible to say that 3000 State Street and 100 State
any level of accuracy. Another advantage is that from Street are further apart than 200 State Street and 100 State
measurements of two or more locations it is possible Street, and allows postal services to sort mail for easy

Technical Box 5.1

A national system of georeferencing: the National Grid of Great Britain


The National Grid is administered by the
Ordnance Survey of Great Britain, and provides a
unique georeference for every point in England,
Scotland, and Wales. The first designating letter
defines a 500 km square, and the second defines
a 100 km square (see Figure 5.2). Within each
square, two measurements, called easting and
northing, define a location with respect to the
lower left corner of the square. The number
of digits defines the precision – three digits for
easting and three for northing (a total of six)
define location to the nearest 100 m.

Figure 5.2 The National Grid of Great Britain, illustrating


how a point is assigned a grid reference that locates it
uniquely to the nearest 100 m (Reproduced by permission
of Peter H. Dana)
112 PART II PRINCIPLES
Table 5.1 Some commonly used systems of georeferencing

System Domain of uniqueness Metric? Example Spatial resolution

Placename varies no London, Ontario, Canada varies by feature type


Postal address global no, but ordered 909 West Campus Lane, size of one mailbox
along streets in Goleta, California, USA
most countries
Postal code country no 93117 (US ZIP code); area occupied by a
WC1E 6BT (UK unit defined number of
postcode) mailboxes
Telephone calling area country no 805 varies
Cadastral system local authority no Parcel 01452954, City of area occupied by a single
Springfield, Mass, USA parcel of land
Public Land Survey System Western USA only, unique yes Sec 5, Township 4N, defined by level of
to Prime Meridian Range 6E subdivision
Latitude/longitude global yes 119 degrees 45 minutes infinitely fine
West, 34 degrees 40
minutes North
Universal Transverse zones six degrees of yes 563146E, 4356732N infinitely fine
Mercator longitude wide, and N
or S hemisphere
State Plane Coordinates USA only, unique to state yes 55086.34E, 75210.76N infinitely fine
and to zone within state

delivery. In the western United States, it is often possible the original or local inhabitants (for example, Mt Everest
to infer estimates of the distance between two addresses to many, but Chomolungma to many Tibetans), or when
on the same street by knowing that 100 addresses are city names are different in different languages (Florence
assigned to each city block, and that blocks are typically in English, Firenze in Italian)
between 120 m and 160 m long.
Many commonly used placenames have meanings
This section has reviewed some of the general
properties of georeferencing systems, and Table 5.1 shows that vary between people, and with the context in
some commonly used systems. The following sections which they are used.
discuss the specific properties of the systems that are most Language extends the power of placenames through
important in GIS applications. words such as ‘between’, which serve to refine refer-
ences to location, or ‘near’, which serve to broaden them.
‘Where State Street crosses Mission Creek’ is an instance
of combining two placenames to achieve greater refine-
ment of location than either name could achieve individ-
ually. Even more powerful extensions come from com-
5.2 Placenames bining placenames with directions and distances, as in
‘200 m north of the old tree’ or ‘50 km west of Spring-
field’.
Giving names to places is the simplest form of georef- But placenames are of limited use as georeferences.
erencing, and was most likely the one first developed First, they often have very coarse spatial resolution. ‘Asia’
by early hunter-gatherer societies. Any distinctive fea- covers over 43 million sq km, so the information that
ture on the landscape, such as a particularly old tree, can something is located ‘in Asia’ is not very helpful in
serve as a point of reference for two people who wish to pinning down its location. Even Rhode Island, the smallest
share information, such as the existence of good game in state of the USA, has a land area of over 2700 sq km.
the tree’s vicinity. Human landscapes rapidly became lit- Second, only certain placenames are officially authorized
tered with names, as people sought distinguishing labels to by national or subnational agencies. Many more are
use in describing aspects of their surroundings, and other recognized only locally, so their use is limited to
people adopted them. Today, of course, we have a com- communication between people in the local community.
plex system of naming oceans, continents, cities, moun- Placenames may even be lost through time: although there
tains, rivers, and other prominent features. Each country are many contenders, we do not know with certainty
maintains a system of authorized naming, often through where the ‘Camelot’ described in the English legends of
national or state committees assigned with the task of stan- King Arthur was located, if indeed it ever existed.
dardizing geographic names. Nevertheless multiple names
are often attached to the same feature, for example when The meaning of certain placenames can become
cultures try to preserve the names given to features by lost through time.
CHAPTER 5 GEOREFERENCI NG 113

5.3 Postal addresses and postal


codes

Postal addresses were introduced after the development


of mail delivery in the 19th century. They rely on several
assumptions:
■ Every dwelling and office is a potential destination
for mail;
■ Dwellings and offices are arrayed along paths, roads,
or streets, and numbered accordingly;
■ Paths, roads, and streets have names that are unique
within local areas;
■ Local areas have names that are unique within larger
regions; and
Figure 5.3 Forward Sortation Areas (FSAs) of the central part
■ Regions have names that are unique within countries. of the Toronto metropolitan region. FSAs form the first three
If the assumptions are true, then mail address provides characters of the six-character Canadian postal code
a unique identification for every dwelling on Earth.
Today, postal addresses are an almost universal means
of locating many kinds of human activity: delivery of Although the area covered by a Canadian FSA or a US
mail, place of residence, or place of business. They fail, ZIP code varies, and can be changed whenever the postal
of course, in locating anything that is not a potential authorities want, it is sufficiently constant to be useful for
destination for mail, including almost all kinds of natural mapping purposes, and many businesses routinely make
features (Mt Everest does not have a postal address, maps of their customers by counting the numbers present
and neither does Manzana Creek in Los Padres National in each postal code area, and dividing by total population
Forest in California, USA). They are not as useful when to get a picture of market penetration. Figure 5.4 shows an
dwellings are not numbered consecutively along streets, as example of summarizing data by ZIP code. Most people
happens in some cultures (notably in Japan, where street know the postal code of their home, and in some instances
numbering can reflect date of construction, not sequence postal codes have developed popular images (the ZIP code
along the street – it is temporal, rather than spatial) and in for Beverly Hills, California, 90210, became the title of
large building complexes like condominiums. Many GIS a successful television series).
applications rely on the ability to locate activities by postal
address, and to convert addresses to some more universal
system of georeferencing, such as latitude and longitude, (A)
for mapping and analysis.
Postal addresses work well to georeference
dwellings and offices, but not natural features.
Postal codes were introduced in many countries in
the late 20th century in order to simplify the sorting of
mail. In the Canadian system, for example, the first three
characters of the six-character code identify a Forward
Sortation Area, and mail is initially sorted so that all
mail directed to a single FSA is together. Each FSA’s
incoming mail is accumulated in a local sorting station,
and sorted a second time by the last three characters of
the code, to allow it to be delivered easily. Figure 5.3
shows a map of the FSAs for an area of the Toronto
metropolitan region. The full six characters are unique to
roughly ten houses, a single large business, or a single
building. Much effort went into ensuring widespread
adoption of the coding system by the general public
and businesses, and computer programs were developed Figure 5.4 The use of ZIP codes boundaries as a convenient
to assign codes automatically to addresses for large- basis for summarizing data. (A) In this instance each business
volume mailers. has been allocated to its ZIP code, and (B) the ZIP code areas
Postal codes have proven very useful for many have been shaded according to the density of businesses per
purposes besides the sorting and delivery of mail. square mile
114 PART II PRINCIPLES

(B) Mile 1240 of the Alaska Highway), railroads (e.g., 25.9


miles from Paddington Station in London on the main line
to Bristol, England), electrical transmission, pipelines, and
canals. Linear references are used by highway agencies
to define the locations of bridges, signs, potholes, and
accidents, and to record pavement condition.
Linear referencing systems are widely used in
managing transportation infrastructure and in
dealing with emergencies.
Linear referencing provides a sufficient basis for geo-
referencing for some applications. Highway departments
often base their records of accident locations on lin-
ear references, as well as their inventories of signs and
bridges (GIS has many applications in transportation that
are known collectively as GIS-T, and in the developing
field of intelligent transportation systems or ITS). But for
other applications it is important to be able to convert
between linear references and other forms, such as lati-
tude and longitude. For example, the Onstar system that
is installed in many Cadillacs sold in the USA is designed
to radio the position of a vehicle automatically as soon as
it is involved in an accident. When the airbags deploy, a
Figure 5.4 (continued) GPS receiver determines position, which is then relayed
to a central dispatch office. Emergency response centers
often use street addresses and linear referencing to define
the locations of accidents, so the latitude and longitude
5.4 Linear referencing systems received from the vehicle must be converted before an
emergency team can be sent to the accident.
Linear referencing systems are often difficult to
A linear referencing system identifies location on a implement in practice in ways that are robust in all
network by measuring distance from a defined point of situations. In an urban area with frequent intersections it
reference along a defined path in the network. Figure 5.5 is relatively easy to measure distance from the nearest
shows an example, an accident whose location is reported one (e.g., on Birch St 87 m west of the intersection
as being a measured distance from a street intersection, with Main St). But in rural areas it may be a long
along a named street. Linear referencing is closely related way from the nearest intersection. Even in urban areas
to street address, but uses an explicit measurement of it is not uncommon for two streets to intersect more
distance rather than the much less reliable surrogate of than once (e.g., Birch may have two intersections with
street address number. Columbia Crescent). There may also be difficulties in
Linear referencing is widely used in applications that defining distance accurately, especially if roads include
depend on a linear network. This includes highways (e.g., steep sections where the distance driven is significantly
longer than the distance evaluated on a two-dimensional
digital representation (Section 14.3.1).

5.5 Cadasters and the US Public


Land Survey System

The cadaster is defined as the map of land ownership


in an area, maintained for the purposes of taxing land,
or of creating a public record of ownership. The process
of subdivision creates new parcels by legally subdividing
existing ones.
Figure 5.5 Linear referencing – an incident’s position is Parcels of land in a cadaster are often uniquely
determined by measuring its distance (87 m) along one road identified, by number or by code, and are also reasonably
(Birch St) from a well-defined point (its intersection with persistent through time, and thus satisfy the requirements
Main St) of a georeferencing system. But very few people know
CHAPTER 5 GEOREFERENCI NG 115
the identification code of their home parcel, and use of The PLSS would be a wonderful system if the Earth
the cadaster as a georeferencing system is thus limited were flat. To account for its curvature the squares are
largely to local officials, with one major exception. not perfectly six miles by six miles, and the rows must
The US Public Land Survey System (PLSS) evolved be offset frequently; and errors in the original surveying
out of the need to survey and distribute the vast land complicate matters still further, particularly in rugged
resource of the Western USA, starting in the early 19th landscapes. Figure 5.6 shows the offsetting exaggerated
century, and expanded to become the dominant system for a small area. Nevertheless, the PLSS remains an
of cadaster for all of the USA west of Ohio, and all efficient system, and one with which many people in the
of Western Canada. Its essential simplicity and regularity Western USA and Western Canada are familiar. It is often
make it useful for many purposes, and understandable by used to specify location, particularly in managing natural
the general public. Its geometric regularity also allows resources in the oil and gas industry and in mining, and
it to satisfy the requirement of a metric system of in agriculture. Systems have been built to convert PLSS
georeferencing, because each georeference is defined by locations automatically to latitude and longitude.
measured distances.
The Public Land Survey System defines land
ownership over much of western North America,
and is a useful system of georeferencing.
5.6 Measuring the Earth: latitude
To implement the PLSS in an area, a surveyor first lays and longitude
out an accurate north–south line or prime meridian. Rows
are then laid out six miles apart and perpendicular to this
line, to become the townships of the system. Then blocks The most powerful systems of georeferencing are those
or ranges are laid out in six mile by six mile squares on that provide the potential for very fine spatial resolution,
either side of the prime meridian (see Figure 5.6). Each that allow distance to be computed between pairs of
square is referenced by township number, range number, locations, and that support other forms of spatial analysis.
whether it is to the east or to the west, and the name of The system of latitude and longitude is in many ways the
the prime meridian. Thirty-six sections of one mile by most comprehensive, and is often called the geographic
one mile are laid out inside each township, and numbered system of coordinates, based on the Earth’s rotation about
using a standard system (note how the numbers reverse its center of mass.
in every other row). Each section is divided into four To define latitude and longitude we first identify the
quarter-sections of a quarter of a square mile, or 160 axis of the Earth’s rotation. The Earth’s center of mass
acres, the size of the nominal family farm or homestead lies on the axis, and the plane through the center of
in the original conception of the PLSS. The process can mass perpendicular to the axis defines the Equator. Slices
be continued by subdividing into four to obtain any level through the Earth parallel to the axis, and perpendicular
of spatial resolution. to the plane of the Equator, define lines of constant
longitude (Figure 5.7), rather like the segments of an

Figure 5.6 Portion of the Township and Range system (Public


Lands Survey System) widely used in the western USA as the Figure 5.7 Definition of longitude. The Earth is seen here
basis of land ownership (shown on the right). Townships are from above the North Pole, looking along the Axis, with the
laid out in six-mile squares on either side of an accurately Equator forming the outer circle. The location of Greenwich
surveyed Prime Meridian. The offset shown between ranges defines the Prime Meridian. The longitude of the point at the
16 N and 17 N is needed to accommodate the Earth’s curvature center of the red cross is determined by drawing a plane
(shown much exaggerated). The square mile sections within through it and the axis, and measuring the angle between this
each township are numbered as shown in the upper left plane and the Prime Meridian
116 PART II PRINCIPLES
orange. A slice through a line marked on the ground agencies could measure position and produce accurate
at the Royal Observatory in Greenwich, England defines maps. Early ellipsoids varied significantly in their basic
zero longitude, and the angle between this slice and parameters, and were generally not centered on the Earth’s
any other slice defines the latter’s measure of longitude. center of mass. But the development of intercontinental
Each of the 360 degrees of longitude is divided into ballistic missiles in the 1950s and the need to target
60 minutes and each minute into 60 seconds. But it is them accurately, as well as new data available from
more conventional to refer to longitude by degrees East satellites, drove the push to a single international standard.
or West, so longitude ranges from 180 degrees West to Without a single standard, the maps produced by different
180 degrees East. Finally, because computers are designed countries using different ellipsoids could never be made
to handle numbers ranging from very large and negative to fit together along their edges, and artificial steps and
to very large and positive, we normally store longitude offsets were often necessary in moving from one country
in computers as if West were negative and East were to another (navigation systems in aircraft would have to
positive; and we store parts of degrees using decimals be corrected, for example).
rather than minutes and seconds. A line of constant The ellipsoid known as WGS84 (the World Geodetic
longitude is termed a meridian. System of 1984) is now widely accepted, and North
Longitude can be defined in this way for any rotating American mapping is being brought into conformity with
solid, no matter what its shape, because the axis of it through the adoption of the virtually identical North
rotation and the center of mass are always defined. But American Datum of 1983 (NAD83). It specifies a semi-
the definition of latitude requires that we know something major axis (distance from the center to the Equator)
about the shape. The Earth is a complex shape that is only of 6378137 m, and a flattening of 1 part in 298.257.
approximately spherical. A much better approximation or But many other ellipsoids remain in use in other parts
figure of the Earth is the ellipsoid of rotation, the figure of the world, and many older data still adhere to
formed by taking a mathematical ellipse and rotating it earlier standards, such as the North American Datum
about its shorter axis (Figure 5.8). The term spheroid is of 1927 (NAD27). Thus GIS users sometimes need to
also commonly used. convert between datums, and functions to do that are
The difference between the ellipsoid and the sphere is commonly available.
measured by its flattening, or the reduction in the minor We can now define latitude. Figure 5.9 shows a line
axis relative to the major axis. Flattening is defined as: drawn through a point of interest perpendicular to the
ellipsoid at that location. The angle made by this line
f = (a − b)/a with the plane of the Equator is defined as the point’s
latitude, and varies from 90 South to 90 North. Again,
south latitudes are usually stored as negative numbers and
where a and b are the lengths of the major and minor axes north latitudes as positive. Latitude is often symbolized by
respectively (we usually refer to the semi-axes, or half the the Greek letter phi (φ) and longitude by the Greek letter
lengths of the axes, because these are comparable to radii). lambda (λ), so the respective ranges can be expressed in
The actual flattening is about 1 part in 300. mathematical shorthand as: −180 ≤ λ ≤ 180; −90 ≤ φ ≤
The Earth is slightly flattened, such that the 90. A line of constant latitude is termed a parallel.
It is important to have a sense of what latitude
distance between the Poles is about 1 part in 300
and longitude mean in terms of distances on the
less than the diameter at the Equator. surface. Ignoring the flattening, two points on the same
Much effort was expended over the past 200 years north–south line of longitude and separated by one degree
in finding ellipsoids that best approximated the shape of
the Earth in particular countries, so that national mapping

Figure 5.8 Definition of the ellipsoid, formed by rotating an Figure 5.9 Definition of the latitude of the point marked with
ellipse about its minor axis (corresponding to the axis of the the red cross, as the angle between the Equator and a line
Earth’s rotation) drawn perpendicular to the ellipsoid
CHAPTER 5 GEOREFERENCI NG 117
of latitude are 1/360 of the circumference of the Earth, or 90 East (in the Indian Ocean between Sri Lanka and the
about 111 km, apart. One minute of latitude corresponds Indonesian island of Sumatra) and the North Pole is found
to 1.86 km, and also defines one nautical mile, a unit by evaluating the equation for φ1 = 0, λ1 = 90, φ2 = 90,
of distance that is still commonly used in navigation. λ2 = 90. It is best to work in radians (1 radian is 57.30
One second of latitude corresponds to about 30 m. But degrees, and 90 degrees is π/2 radians). The equation
things are more complicated in the east–west direction, evaluates to R arccos 0, or R π/2, or one quarter of the
and these figures only apply to east–west distances along circumference of the Earth. Using a radius of 6378 km
the Equator, where lines of longitude are furthest apart. this comes to 10 018 km, or close to 10 000 km (not
Away from the Equator the length of a line of latitude surprising, since the French originally defined the meter in
gets shorter and shorter, until it vanishes altogether at the the late 18th century as one ten millionth of the distance
poles. The degree of shortening is approximately equal from the Equator to the Pole).
to the cosine of latitude, or cos φ, which is 0.866 at 30
degrees North or South, 0.707 at 45 degrees, and 0.500
at 60 degrees. So a degree of longitude is only 55 km
along the northern boundary of the Canadian province of
Alberta (exactly 60 degrees North).
5.7 Projections and coordinates
Lines of latitude and longitude are equally far
apart only at the Equator; towards the Poles lines
of longitude converge. Latitude and longitude define location on the Earth’s sur-
Given latitude and longitude it is possible to determine face in terms of angles with respect to well-defined ref-
distance between any pair of points, not just pairs along erences: the Royal Observatory at Greenwich, the center
lines of longitude or latitude. It is easiest to pretend for a of mass, and the axis of rotation. As such, they constitute
moment that the Earth is spherical, because the flattening the most comprehensive system of georeferencing, and
of the ellipsoid makes the equations much more complex. support a range of forms of analysis, including the calcu-
On a spherical Earth the shortest path between two points lation of distance between points, on the curved surface
is a great circle, or the arc formed if the Earth is sliced of the Earth. But many technologies for working with
through the two points and through its center (Figure 5.10; geographic data are inherently flat, including paper and
an off-center slice creates a small circle). The length of printing, which evolved over many centuries long before
this arc on a spherical Earth of radius R is given by: the advent of digital geographic data and GIS. For var-
ious reasons, therefore, much work in GIS deals with a
flattened or projected Earth, despite the price we pay in
R arccos[sin φ1 sin φ2 + cos φ1 cos φ2 cos(λ1 − λ2 )]
the distortions that are an inevitable consequence of flat-
tening. Specifically, the Earth is often flattened because:
where the subscripts denote the two points (and see the
discussion of Measurement in Section 14.3). For example, ■ paper is flat, and paper is still used as a medium for
the distance from a point on the Equator at longitude inputting data to GIS by scanning or digitizing (see
Chapter 9), and for outputting data in map or
image form;
■ rasters are inherently flat, since it is impossible to
cover a curved surface with equal squares without
gaps or overlaps;
■ photographic film is flat, and film cameras are still
used widely to take images of the Earth from aircraft
to use in GIS;
■ when the Earth is seen from space, the part in the
center of the image has the most detail, and detail
drops off rapidly, the back of the Earth being
invisible; in order to see the whole Earth with
approximately equal detail it must be distorted in
some way, and it is most convenient to make it flat.
The Cartesian coordinate system (Figure 5.11) assigns
two coordinates to every point on a flat surface, by
measuring distances from an origin parallel to two axes
Figure 5.10 The shortest distance between two points on the drawn at right angles. We often talk of the two axes
sphere is an arc of a great circle, defined by slicing the sphere as x and y, and of the associated coordinates as the x
through the two points and the center (all lines of longitude, and y coordinate, respectively. Because it is common to
and the Equator, are great circles). The circle formed by a slice align the y axis with North in geographic applications,
that does not pass through the center is a small circle (all lines the coordinates of a projection on a flat sheet are often
of latitude except the Equator are small circles) termed easting and northing.
118 PART II PRINCIPLES
or for the pixel size of any raster to be perfectly constant.
But projections can preserve certain properties, and two
such properties are particularly important, although any
projection can achieve at most one of them, not both:

■ the conformal property, which ensures that the shapes


of small features on the Earth’s surface are preserved
on the projection: in other words, that the scales of the
projection in the x and y directions are always equal;
■ the equal area property, which ensures that areas
measured on the map are always in the same
proportion to areas measured on the Earth’s surface.

The conformal property is useful for navigation,


Figure 5.11 A Cartesian coordinate system, defining the because a straight line drawn on the map has a constant
location of the blue cross in terms of two measured distances bearing (the technical term for such a line is a loxodrome).
from the Origin, parallel to the two axes The equal area property is useful for various kinds of
analysis involving areas, such as the computation of the
Although projections are not absolutely required, area of someone’s property.
there are several good reasons for using them in
Besides their distortion properties, another common
way to classify map projections is by analogy to a physical
GIS to flatten the Earth.
model of how positions on the map’s flat surface are
One way to think of a map projection, therefore, is that related to positions on the curved Earth. There are three
it transforms a position on the Earth’s surface identified by major classes (Figure 5.12):
latitude and longitude (φ, λ) into a position in Cartesian
coordinates (x, y). Every recognized map projection, of ■ cylindrical projections, which are analogous to
which there are many, can be represented as a pair of wrapping a cylinder of paper around the Earth,
mathematical functions: projecting the Earth’s features onto it, and then
unwrapping the cylinder;
x = f (φ, λ) ■ azimuthal or planar projections, which are analogous
y = g(φ, λ) to touching the Earth with a sheet of flat paper; and
■ conic projections, which are analogous to wrapping a
For example, the famous Mercator projection uses the sheet of paper around the Earth in a cone.
functions:
In each case, the projection’s aspect defines the
x=λ specific relationship, e.g., whether the paper is wrapped
around the Equator, or touches at a pole. Where the paper
y = ln tan[φ/2 + π/4] coincides with the surface the scale of the projection
is 1, and where the paper is some distance outside the
where ln is the natural log function. The inverse surface the projected feature will be larger than it is on the
transformations that map Cartesian coordinates back to Earth. Secant projections attempt to minimize distortion
latitude and longitude are also expressible as mathematical by allowing the paper to cut through the surface, so that
functions: in the Mercator case they are: scale can be both greater and less than 1 (Figure 5.12;
projections for which the paper touches the Earth and in
λ=x which scale is always 1 or greater are called tangent).
φ = 2 arctan ey − π/2 All three types can have either conformal or equal
area properties, but of course not both. Figure 5.13 shows
examples of several common projections, and shows how
where e denotes the constant 2.71828. Many of these
the lines of latitude and longitude map onto the projection,
functions have been implemented in GIS, allowing users
in a (distorted) grid known as a graticule.
to work with virtually any recognized projection and
The next sections describe several particularly impor-
datum, and to convert easily between them.
tant projections in detail, and the coordinate systems that
Two datasets can differ in both the projection and they produce. Each is important to GIS, and users are
the datum, so it is important to know both for likely to come across them frequently. The map projec-
tion (and datum) used to make a dataset is sometimes not
every data set.
known to the user of the dataset, so it is helpful to know
Projections necessarily distort the Earth, so it is enough about map projections and coordinate systems to
impossible in principle for the scale (distance on the map make intelligent guesses when trying to combine such a
compared to distance on the Earth, for a discussion of dataset with other data. Several excellent books on map
scale see Box 4.2) of any flat map to be perfectly uniform, projections are listed in the References.
CHAPTER 5 GEOREFERENCI NG 119

Figure 5.13 Examples of some common map projections. The


Mercator projection is a tangent cylindrical type, shown here in
its familiar Equatorial aspect (cylinder wrapped around the
Equator). The Lambert Conformal Conic projection is a secant
conic type. In this instance the cone onto which the surface was
Figure 5.12 The basis for three types of map projections – projected intersected the Earth along two lines of latitude: 20
cylindrical, planar, and conic. In each case a sheet of paper is North and 60 North (Reproduced by permission of Peter
wrapped around the Earth, and positions of objects on the H. Dana)
Earth’s surface are projected onto the paper. The cylindrical
projection is shown in the tangent case, with the paper the United States: the Plate Carrée, Mercator, and Lambert
touching the surface, but the planar and conic projections are Conformal Conic.
shown in the secant case, where the paper cuts into the surface
(Reproduced by permission of Peter H. Dana) When longitude is assigned to x and latitude to y a
very odd-looking Earth results.

5.7.1 The Plate Carrée or Cylindrical Serious problems can occur when doing analysis
using this projection. Moreover, since most methods of
Equidistant projection analysis in GIS are designed to work with Cartesian
coordinates rather than latitude and longitude, the same
The simplest of all projections simply maps longitude as problems can arise in analysis when a dataset uses latitude
x and latitude as y, and for that reason is also known and longitude, or so-called geographic coordinates. For
informally as the unprojected projection. The result is example, a command to generate a circle of radius
a heavily distorted image of the Earth, with the poles one unit in this projection will create a figure that
smeared along the entire top and bottom edges of the map, is two degrees of latitude across in the north–south
and a very strangely shaped Antarctica. Nevertheless, it is direction, and two degrees of longitude across in the
the view that we most often see when images are created east–west direction. On the Earth’s surface this figure
of the entire Earth from satellite data (for example in is not a circle at all, and at high latitudes it is a very
illustrations of sea surface temperature that show the El squashed ellipse. What happens if you ask your favorite
Niño or La Niña effects). The projection is not conformal GIS to generate a circle and add it to a dataset that
(small shapes are distorted) and not equal area, though is in geographic coordinates? Does it recognize that
it does maintain the correct distance between every point you are using geographic coordinates and automatically
and the Equator. It is normally used only for the whole compensate for the differences in distances east–west
Earth, and maps of large parts of the Earth, such as the and north–south away from the Equator, or does it in
USA or Canada, look distinctly odd in this projection. effect operate on a Plate Carrée projection and create a
Figure 5.14 shows the projection applied to the world, and figure that is an ellipse on the Earth’s surface? If you
also shows a comparison of three familiar projections of ask it to compute distance between two points defined by
120 PART II PRINCIPLES
latitude and longitude, does it use the true shortest (great
circle) distance based on the equation in Section 5.6, or
the formula for distance in a Cartesian coordinate system
on a distorted plane?
It is wise to be careful when using a GIS to analyze
data in latitude and longitude rather than in
projected coordinates, because serious distortions
of distance, area, and other properties may result.

5.7.2 The Universal Transverse


Mercator projection
The UTM system is often found in military applications,
and in datasets with global or national coverage. It is
based on the Mercator projection, but in transverse rather
than Equatorial aspect, meaning that the projection is
analogous to wrapping a cylinder around the Poles, rather
than around the Equator. There are 60 zones in the system,
and each zone corresponds to a half cylinder wrapped
along a particular line of longitude, each zone being 6
degrees wide. Thus Zone 1 applies to longitudes from
180 W to 174 W, with the half cylinder wrapped along
177 W; Zone 10 applies to longitudes from 126 W to
Figure 5.14 (A) The so-called unprojected or Plate Carrée
120 W centered on 123 W, etc. (Figure 5.15).
projection, a tangent cylindrical projection formed by using
The UTM system is secant, with lines of scale 1
longitude as x and latitude as y. (B) A comparison of three
located some distance out on both sides of the central
familiar projections of the USA. The Lambert Conformal Conic
is the one most often encountered when the USA is projected
meridian. The projection is conformal, so small features
alone, and is the only one of the three to curve the parallels of appear with the correct shape and scale is the same in
latitude, including the northern border on the 49th Parallel all directions. Scale is 0.9996 at the central meridian and
(Reproduced by permission of Peter H. Dana) at most 1.0004 at the edges of the zone. Both parallels

Figure 5.15 The system of zones of the Universal Transverse Mercator system. The zones are identified at the top. Each zone is six
degrees of longitude in width (Reproduced by permission of Peter H. Dana)
CHAPTER 5 GEOREFERENCI NG 121
for cities that cross zone boundaries, such as Calgary,
Alberta, Canada (crosses the boundary at 114 W between
Zone 11 and Zone 12). In such situations one zone can
be extended to cover the entire city, but this results in
distortions that are larger than normal. Another option is
to define a special zone, with its own central meridian
selected to pass directly through the city’s center. Italy is
split between Zones 32 and 33, and many Italian maps
carry both sets of eastings and northings.
UTM coordinates are easy to recognize, because they
commonly consist of a six-digit integer followed by
a seven-digit integer (and decimal places if precision
is greater than a meter), and sometimes include zone
numbers and hemisphere codes. They are an excellent
basis for analysis, because distances can be calculated
from them for points within the same zone with no more
than 0.04% error. But they are complicated enough that
their use is effectively limited to professionals (the so-
called ‘spatially aware professionals’ or SAPs defined in
Section 1.4.3.2) except in applications where they can
be hidden from the user. UTM grids are marked on
many topographic maps, and many countries project their
topographic maps using UTM, so it is easy to obtain UTM
coordinates from maps for input to digital datasets, either
by hand or automatically using scanning or digitizing
(Chapter 9).

5.7.3 State Plane Coordinates and


other local systems
Figure 5.16 Major features of UTM Zone 14 (from 102 W to
96 W). The central meridian is at 99 W. Scale factors vary Although the distortions of the UTM system are small,
from 0.9996 at the central meridian to 1.0004 at the zone they are nevertheless too great for some purposes,
boundaries. See text for details of the coordinate system particularly in accurate surveying. Zone boundaries also
(Reproduced by permission of Peter H. Dana) are a problem in many applications, because they follow
arbitrary lines of longitude rather than boundaries between
jurisdictions. In the 1930s each US state agreed to
and meridians are curved on the projection, with the adopt its own projection and coordinate system, generally
exception of the zone’s central meridian and the Equator. known as State Plane Coordinates (SPC), in order
Figure 5.16 shows the major features of one zone. to support these high-accuracy applications. Projections
The coordinates of a UTM zone are defined in meters, were chosen to minimize distortion over the area of
and set up such that the central meridian’s easting is the state, so choices were often based on the state’s
always 500 000 m (a false easting), so easting varies shape. Some large states decided that distortions were
from near zero to near 1 000 000 m. In the Northern still too great, and designed their SPCs with internal
Hemisphere the Equator is the origin of northing, so a zones (for example, Texas has five zones based on the
point at northing 5 000 000 m is approximately 5000 km Lambert Conformal Conic projection, Figure 5.17, while
from the Equator. In the Southern Hemisphere the Equator Hawaii has five zones based on the Transverse Mercator
is given a false northing of 10 000 000 m and all other projection). Many GIS have details of SPCs already
northings are less than this. stored, so it is easy to transform between them and UTM,
or latitude and longitude. The system was revised in 1983
UTM coordinates are in meters, making it easy to to accommodate the shift to the new North American
make accurate calculations of short distances Datum (NAD83).
between points.
All US states have adopted their own specialized
Because there are effectively 60 different projections in coordinate systems for applications such as
the UTM system, maps will not fit together across a zone surveying that require very high accuracy.
boundary. Zones become so much of a problem at high
latitudes that the UTM system is normally replaced with Many other countries have adopted coordinate systems
azimuthal projections centered on each Pole (known as of their own. For example, the UK uses a single projection
the UPS or Universal Polar Stereographic system) above and coordinate system known as the National Grid that is
80 degrees latitude. The problem is especially critical based on the Oblique Mercator projection (see Box 5.1)
122 PART II PRINCIPLES

Figure 5.17 The five State Plane Coordinate zones of Texas. Note that the zone boundaries are defined by counties, rather than
parallels, for administrative simplicity (Reproduced by permission of Peter H. Dana)

and is marked on all topographic maps. Canada uses Latitude is comparatively easy to measure, based on the
a uniform coordinate system based on the Lambert elevation of the sun at its highest point (local noon), or on
Conformal Conic projection, which has properties that are the locations of the sun, moon, or fixed stars at precisely
useful at mid to high latitudes, for applications where the known times. But longitude requires an accurate method
multiple zones of the UTM system would be problematic. of measuring time, and the lack of accurate clocks led to
massively incorrect beliefs about positions during early
navigation. For example, Columbus and his contemporary
explorers had no means of measuring longitude, and
believed that the Earth was much smaller than it is,
and that Asia was roughly as far west of Europe as the
5.8 Measuring latitude, longitude, width of the Atlantic. The strength of this conviction is
and elevation: GPS still reflected in the term we use for the islands of the
Caribbean (the West Indies) and the first rapids on the St
Lawrence in Canada (Lachine, or China). The fascinating
The Global Positioning System and its analogs story of the measurement of longitude is recounted by
(GLONASS in Russia, and the proposed Galileo system in Dava Sobel.
Europe) have revolutionized the measurement of position, The GPS consists of a system of 24 satellites
for the first time making it possible for people to (plus some spares), each orbiting the Earth every 12
know almost exactly where they are anywhere on the hours on distinct orbits at a height of 20 200 km and
surface of the Earth. Previously, positions had to be transmitting radio pulses at very precisely timed intervals.
established by a complex system of relative and absolute To determine position, a receiver must make precise
measurements. If one was near a point whose position was calculations from the signals, the known positions of the
accurately known (a survey monument, for example), then satellites, and the velocity of light. Positioning in three
position could be established through a series of accurate dimensions (latitude, longitude, and elevation) requires
measurements of distances and directions starting from the that at least four satellites are above the horizon, and
monument. But if no monuments existed, then position accuracy depends on the number of such satellites and
had to be established through absolute measurements. their positions (if elevation is not needed then only three
CHAPTER 5 GEOREFERENCI NG 123

(A)

(B)

Figure 5.18 A simple GPS can provide an essential aid to wayfinding when (A) hiking or (B) driving (Reproduced by permission
of David Parker, SPL, Photo Researchers)

satellites need be above the horizon). Several different by different agencies – for example, in the USA the
versions of GPS exist, with distinct accuracies. topographic and hydrographic definitions of the vertical
A simple GPS, such as one might buy in an electronics datum are significantly different.
store for $100, or install as an optional addition to a
laptop, cellphone, PDA (personal digital assistant, such
as a Palm Pilot or iPAQ), or vehicle (Figure 5.18), has
an accuracy within 10 m. This accuracy will degrade in
cities with tall buildings, or under trees, and GPS signals 5.9 Converting georeferences
will be lost entirely under bridges or indoors. Differential
GPS (DGPS) combines GPS signals from satellites with
correction signals received via radio or telephone from GIS are particularly powerful tools for converting between
base stations. Networks of such stations now exist, projections and coordinate systems, because these trans-
at precisely known locations, constantly broadcasting formations can be expressed as numerical operations. In
corrections; corrections are computed by comparing each fact this ability was one of the most attractive features
known location to its apparent location determined from of early systems for handling digital geographic data,
GPS. With DGPS correction, accuracies improve to 1 m and drove many early applications. But other conversions,
or better. Even greater accuracies are possible using e.g., between placenames and geographic coordinates, are
various sophisticated techniques, or by remaining fixed much more problematic. Yet they are essential operations.
and averaging measured locations over several hours. Almost everyone knows their mailing address, and can
GPS is very useful for recording ground control identify travel destinations by name, but few are able to
points when building GIS databases, for locating objects specify these locations in coordinates, or to interact with
that move (for example, combine harvesters, tanks, cars, geographic information systems on that basis. GPS tech-
and shipping containers), and for direct capture of the nology is attractive precisely because it allows its user
locations of many types of fixed objects, such as utility to determine his or her latitude and longitude, or UTM
assets, buildings, geological deposits, and sample points. coordinates, directly at the touch of a button.
Other applications of GPS are discussed in Chapter 11 on Methods of converting between georeferences are
Distributed GIS, and by Mark Monmonier (see Box 5.2). important for:
Some care is needed in using GPS to measure
elevation. First, accuracies are typically lower, and a ■ converting lists of customer addresses to coordinates
position determined to 10 m in the horizontal may be no for mapping or analysis (the task known as
better than plus or minus 50 m in the vertical. Second, geocoding; see Box 5.3);
a variety of reference elevations or vertical datums are ■ combining datasets that use different systems of
in common use in different parts of the world and georeferencing;
124 PART II PRINCIPLES

Biographical Box 5.2

Mark Monmonier, Cartographer


Mark Monmonier (Figure 5.19) is Distinguished Professor of Geography in
the Maxwell School of Citizenship and Public Affairs at Syracuse University.
He has published numerous papers on map design, automated map analysis,
cartographic generalization, the history of cartography, statistical graphics,
geographic demography, and mass communications. But he is best known
as author of a series of widely read books on major issues in cartography,
including How to Lie with Maps (University of Chicago Press, 1991; 2nd
edition, revised and expanded, 1996) and Rhumb Lines and Map Wars: A
Social History of the Mercator Projection (University of Chicago Press, 2004).
Commenting on the power of GPS, he writes:

One of the more revolutionary aspects of geospatial technology is the


ability to analyze maps without actually looking at one. Even more radical
is the ability to track humans or animals around the clock by integrating
a GPS fix with spatial data describing political boundaries or the street
Figure 5.19 Mark Monmonier,
network. Social scientists recognize this kind of constant surveillance as a
cartographer
panoptic gaze, named for the Panopticon, a hypothetical prison devised by
Jeremy Bentham, an eighteenth-century social reformer intrigued with knowledge and power. Bentham
argued that inmates aware they could be watched secretly at any time by an unseen warden were easily
controlled. Location tracking can achieve similar results without walls or shutters.
GPS-based tracking can be beneficial or harmful depending on your point of view. An accident victim wants
the Emergency-911 dispatcher to know where to send help. A car rental firm wants to know which client
violated the rental agreement by driving out of state. Parents want to know where their children are. School
principals want to know when a paroled pedophile is circling the playground. And the Orwellian thought
police want to know where dissidents are gathering. Few geospatial technologies are as ambiguous and
potentially threatening as location tracking.
Merge GPS, GIS, and wireless telephony, and you have the location-based services (LBS) industry (Chapter
11), useful for dispatching tow trucks, helping us find restaurants or gas stations, and letting a retailer, police
detective, or stalker know where we’ve been. Our locational history is not only marketable but potentially
invasive. How lawmakers respond to growing concern about location privacy (Figure 5.20) will determine
whether we control our locational history or society lets our locational history control us.

Figure 5.20 An early edition of Mark Monmonier’s government identification card. Today the ability to link such records to other
information about an individual’s whereabouts raises significant concerns about privacy
CHAPTER 5 GEOREFERENCI NG 125

Technical Box 5.3

Geocoding: conversion of street addresses to coordinates


Geocoding is the name commonly given to the interpolation within the address range. For
process of converting street addresses to latitude example, 950 West Broadway in Columbia,
and longitude, or some similarly universal Missouri, USA, lies on the side of the segment
coordinate system. It is very widely used as it whose address range runs from 900 to 998, or
allows any database containing addresses, such 50/98 = 51.02% of the distance from the start
as a company mailing list or a set of medical of the segment to the end. The segment starts at
records, to be input to a GIS and mapped. 92.3503 West longitude, 38.9519 North latitude,
Geocoding requires a database containing and ends at 92.3527 West, 38.9522 North.
records representing the geometry of street Simple arithmetic gives the address location
segments between consecutive intersections, as 92.3515 West, 38.9521 North. Four decimal
and the address ranges on each side of places suggests an accuracy of about 10 m, but
each segment (a street centerline database, the estimate depends also on the accuracy of
see Chapter 9). Addresses are geocoded by the assumption that addresses are uniformly
finding the appropriate street segment record, spaced, and on the accuracy of the street
and estimating a location based on linear centerline database.

■ converting to projections that have desirable and how they measure locations. Any form of geographic
properties for analysis, e.g., no distortion of area; information must involve some kind of georeference, and
■ searching the Internet or other distributed data so it is important to understand the common methods,
resources for data about specific locations; and their advantages and disadvantages. Many of the
benefits of GIS rely on accurate georeferencing – the
■ positioning GIS map displays by recentering them on
ability to link different items of information together
places of interest that are known by name (these last
through common geographic location; the ability to
two are sometimes called locator services).
measure distances and areas on the Earth’s surface, and to
The oldest method of converting georeferences is perform more complex forms of analysis; and the ability
the gazetteer, the name commonly given to the index to communicate geographic information in forms that can
in an atlas that relates placenames to latitude and be understood by others.
longitude, and to relevant pages in the atlas where Georeferencing began in early societies, to deal
information about that place can be found. In this with the need to describe locations. As humanity has
form the gazetteer is a useful locator service, but it progressed, we have found it more and more neces-
works only in one direction as a conversion between sary to describe locations accurately, and over wider and
georeferences (from placename to latitude and longitude). wider domains, so that today our methods of georef-
Gazetteers have evolved substantially in the digital era, erencing are able to locate phenomena unambiguously
and it is now possible to obtain large databases of and to high accuracy anywhere on the Earth’s sur-
placenames and associated coordinates and to access face. Today, with modern methods of measurement, it
services that allow such databases to be queried over the is possible to direct another person to a point on the
Internet (e.g., the Alexandria Digital Library gazetteer, other side of the Earth to an accuracy of a few cen-
www.alexandria.ucsb.edu; the US Geographic Names timeters, and this level of accuracy and referencing is
Information System, geonames.usgs.gov). achieved regularly in such areas as geophysics and civil
engineering.
But georeferences can never be perfectly accurate,
and it is always important to know something about
spatial resolution. Questions of measurement accuracy
5.10 Summary are discussed at length in Chapter 6, together with
techniques for representation of phenomena that are
inherently fuzzy, such that it is impossible to say with
This chapter has looked in detail at the complex ways in certainty whether a given point is inside or outside the
which humans refer to specific locations on the planet, georeference.
126 PART II PRINCIPLES

Questions for further study Further reading


Bugayevskiy L.M. and Snyder J.P. 1995 Map Projec-
1. Visit your local map library, and determine: (1) the tions: A Reference Manual. London: Taylor and
projections and datums used by selected maps; Francis.
(2) the coordinates of your house in several common Kennedy M. 1996 The Global Positioning System and
georeferencing systems. GIS: An Introduction. Chelsea, Michigan: Ann Arbor
2. Summarize the arguments for and against a single Press.
global figure of the Earth, such as WGS84. Maling D.H. 1992 Coordinate Systems and Map Projec-
3. How would you go about identifying the projection tions (2nd edn). Oxford: Pergamon.
used by a common map source, such as the weather Sobel D. 1995 Longitude: The True Story of a Lone
maps shown by a TV station or in a newspaper? Genius Who Solved the Greatest Scientific Problem of
4. Chapter 14 discusses various forms of measurement His Time. New York: Walker.
Snyder J.P. 1997 Flattening the Earth: Two Thousand
in GIS. Review each of those methods, and the issues
Years of Map Projections. Chicago: University of
involved in performing analysis on databases that use
different map projections. Identify the map Chicago Press.
Steede-Terry K. 2000 Integrating GIS and the Global
projections that would be best for measurement of
Positioning System. Redlands, CA: ESRI Press.
(1) area, (2) length, (3) shape.
6 Uncertainty

Uncertainty in geographic representation arises because, of necessity, almost


all representations of the world are incomplete. As a result, data in a GIS
can be subject to measurement error, out of date, excessively generalized, or
just plain wrong. This chapter identifies many of the sources of geographic
uncertainty and the ways in which they operate in GIS-based representations.
Uncertainty arises from the way that GIS users conceive of the world, how
they measure and represent it, and how they analyze their representations
of it. This chapter investigates a number of conceptual issues in the creation
and management of uncertainty, before reviewing the ways in which it
may be measured using statistical and other methods. The propagation of
uncertainty through geographical analysis is then considered. Uncertainty is
an inevitable characteristic of GIS usage, and one that users must learn to
live with. In these circumstances, it becomes clear that all decisions based on
GIS are also subject to uncertainty.

Geographic Information Systems and Science, 2nd edition Paul Longley, Michael Goodchild, David Maguire, and David Rhind.
 2005 John Wiley & Sons, Ltd. ISBNs: 0-470-87000-1 (HB); 0-470-87001-X (PB)
128 PART II PRINCIPLES

Learning Objectives It is impossible to make a perfect representation of


the world, so uncertainty about it is inevitable.

Various terms are used to describe differences between


By the end of this chapter you will:
the real world and how it appears in a GIS, depend-
ing upon the context. The established scientific notion
■ Understand the concept of uncertainty, and of measurement error focuses on differences between
the ways in which it arises from imperfect observers or between measuring instruments. As we saw
in a previous chapter (Section 4.7), the concept of error
representation of geographic phenomena; in multivariate statistics arises in part from omission of
some relevant aspects of a phenomenon – as in the fail-
■ Be aware of the uncertainties introduced in ure to fully specify all of the predictor variables in a
the three stages (conception, measurement multiple regression model, for example. Similar problems
arise when one or more variables are omitted from the
and representation, and analysis) of calculation of a composite indicator – as, for example,
database creation and use; in omitting road accessibility in an index of land value,
or omitting employment status from a measure of social
■ Understand the concepts of vagueness and deprivation (see Section 16.2.1 for a discussion of indi-
cators). More generally, the Dutch geostatistician Gerard
ambiguity, and the uncertainties arising Heuvelink (who we will introduce in Box 6.1) has defined
from the definition of key GIS attributes; accuracy as the difference between reality and our rep-
resentation of reality. Although such differences might
■ Understand how and why scale of principally be addressed in formal mathematical terms, the
use of the word our acknowledges the varying views that
geographic measurement and analysis can are generated by a complex, multi-scale, and inherently
both create and propagate uncertainty. uncertain world.
Yet even this established framework is too simple for
understanding quality or the defining standards of geo-
graphic data. The terms ambiguity and vagueness identify
further considerations which need to be taken into account
in assessing the quality of a GIS representation. Qual-
6.1 Introduction ity is an important topic in GIS, and there have been
many attempts to identify its basic dimensions. The US
Federal Geographic Data Committee’s various standards
GIS-based representations of the real world are used to list five components of quality: attribute accuracy, posi-
reconcile science with practice, concepts with applica- tional accuracy, logical consistency, completeness, and
tions, and analytical methods with social context. Yet, lineage. Definitions and other details on each of these
almost always, such reconciliation is imperfect, because, and several more can be found on the FGDC’s Web
necessarily, representations of the world are incomplete pages (www.fgdc.gov). Error, inaccuracy, ambiguity,
(Section 3.4). In this chapter we will use uncertainty as and vagueness all contribute to the notion of uncertainty
an umbrella term to describe the problems that arise out in the broadest sense, and uncertainty may thus be defined
of these imperfections. Occasionally, representations may as a measure of the user’s understanding of the difference
approach perfect accuracy and precision (terms that we between the contents of a dataset, and the real phenom-
will define in Section 6.3.2.2) – as might be the case, for ena that the data are believed to represent. This definition
example, in the detailed site layout layer of a utility man- implies that phenomena are real, but includes the possi-
agement system, in which strenuous efforts are made to bility that we are unable to describe them exactly. In GIS,
reconcile fine-scale multiple measurements of built envi- the term uncertainty has come to be used as the catch-all
ronments. Yet perfect, or nearly perfect, representations term to describe situations in which the digital representa-
of reality are the exception rather than the rule. More tion is simply incomplete, and as a measure of the general
usually, the inherent complexity and detail of our world quality of the representation.
makes it virtually impossible to capture every single facet,
at every possible scale, in a digital representation. (Neither Many geographic representations depend upon
is this usually desirable: see the discussion of sampling inherently vague definitions and concepts
in Section 4.4.) Furthermore, different individuals see the
world in different ways, and in practice no single view is The views outlined in the previous paragraph are them-
likely to be accepted universally as the best or to enjoy selves controversial, and a rich ground for endless philo-
uncontested status. In this chapter we discuss how the sophical discussions. Some would argue that uncertainty
processes and procedures of abstraction create differences can be inherent in phenomena themselves, rather than just
between the contents of our (geographic and attribute) in their description. Others would argue for distinctions
database and real-world phenomena. Such differences are between vagueness, uncertainty, fuzziness, imprecision,
almost inevitable and understanding of them can help us inaccuracy, and many other terms that most people use as
to manage uncertainty, and to live with it. if they were essentially synonymous. Information scientist
CHAPTER 6 UNCERTAINTY 129

Analysis

U3

Measurement &
Representation
U2

Conception

U1

Real World

Figure 6.1 A conceptual view of uncertainty. The three filters, U1, U2, and U3 can distort the way in which the complexity of the
real world is conceived, measured and represented, and analyzed in a cumulative way

Peter Fisher has provided a useful and wide-ranging dis- of non-spatial applications. A further characteristic that
cussion of these terms. We take the catch-all view here, sets geographic information science apart from most every
and leave the detailed arguments to further study. other science is that it is only rarely founded upon natural
In this chapter, we will discuss some of the principal units of analysis. What is the natural unit of measure-
sources of uncertainty and some of the ways in which ment for a soil profile? What is the spatial extent of
uncertainty degrades the quality of a spatial representa- a pocket of high unemployment, or a cluster of cancer
tion. The way in which we conceive of a geographic cases? How might we delimit an environmental impact
phenomenon very much prescribes the way in which study of spillage from an oil tanker (Figure 6.2)? The
we are likely to set about measuring and representing questions become still more difficult in bivariate (two
it. The measurement procedure, in turn, heavily condi- variable) and multivariate (more than two variable) stud-
tions the ways in which it may be analyzed within a ies. At what scale is it appropriate to investigate any
GIS. This chain sequence of events, in which concep- relationship between background radiation and the inci-
tion prescribes measurement and representation, which in dence of leukemia? Or to assess any relationship between
turn prescribes analysis is a succinct way of summarizing labor-force qualifications and unemployment rates?
much of the content of this chapter, and is summarized in
Figure 6.1. In this diagram, U1, U2, and U3 each denote In many cases there are no natural units
filters that selectively distort or transform the representa- of geographic analysis.
tion of the real world that is stored and analyzed in GIS: a
later chapter (Section 13.2.1) introduces a fourth filter that
mediates interpretation of analysis, and the ways in which
feedback may be accommodated through improvements in
representation.

6.2 U1: Uncertainty in the


conception of geographic
phenomena

6.2.1 Units of analysis Figure 6.2 How might the spatial impact of an oil tanker
spillage be delineated? We can measure the dispersion of the
Our discussion of Tobler’s Law (Section 3.1) and of pollutants, but their impacts extend far beyond these narrowly
spatial autocorrelation (Section 4.6) established that geo- defined boundaries (Reproduced by permission of Sam C.
graphic data handling is different from all other classes Pierson, Jr., Photo Researchers)
130 PART II PRINCIPLES
The discrete object view of geographic phenomena Uncertainty can exist both in the positions of the
is much more reliant upon the idea of natural units of boundaries of a zone and in its attributes.
analysis than the field view. Biological organisms are
almost always natural units of analysis, as are groupings The questions have statistical implications (can we put
such as households or families – though even here there numbers on the confidence associated with boundaries or
are certainly difficult cases, such as the massive networks labels?), cartographic implications (how can we convey
of fungal strands that are often claimed to be the the meaning of vague boundaries and labels through
largest living organisms on Earth, or extended families appropriate symbols on maps and GIS displays?), and
of human individuals. Things we manipulate, such as cognitive implications (do people subconsciously attempt
pencils, books, or screwdrivers, are also obvious natural to force things into categories and boundaries to satisfy a
units. The examples listed in the previous paragraph fall deep need to simplify the world?).
almost entirely into one of two categories – they are either
instances of fields, where variation can be thought of as
inherently continuous in space, or they are instances of 6.2.2.2 Ambiguity
poorly defined aggregations of discrete objects. In both Many objects are assigned different labels by differ-
of these cases it is up to the investigator to make the ent national or cultural groups, and such groups per-
decisions about units of analysis, making the identification ceive space differently. Geographic prepositions like
of the objects of analysis inherently subjective. across, over, and in (used in the Yellow Pages query in
Figure 1.17) do not have simple correspondences with
terms in other languages. Object names and the topo-
6.2.2 Vagueness and ambiguity logical relations between them may thus be inherently
ambiguous. Perception, behavior, language, and cognition
6.2.2.1 Vagueness all play a part in the conception of real-world entities
The frequent absence of objective geographic individual and the relationships between them. GIS cannot present
units means that, in practice, the labels that we assign a value-neutral view of the world, yet it can provide
to zones are often vague best guesses. What absolute a formal framework for the reconciliation of different
or relative incidence of oak trees in a forested zone worldviews. The geographic nature of this ambiguity may
qualifies it for the label oak woodland (Figure 6.3)? even be exploited to identify regions with shared charac-
Or, in a developing-country context in which aerial teristics and worldviews. To this end, Box 6.1 describes
photography rather than ground enumeration is used how different surnames used to describe essentially the
to estimate population size, what rate of incidence of same historic occupations provide an enduring measure
dwellings identifies a zone of dense population? In each in region building.
of these instances, it is expedient to transform point-like
Many linguistic terms used to convey geographic
events (individual trees or individual dwellings) into area
objects, and pragmatic decisions must be taken in order information are inherently ambiguous.
to create a working definition of a spatial distribution. Ambiguity also arises in the conception and construc-
These decisions have no absolute validity, and raise two tion of indicators (see also Section 16.2.1). Direct indi-
important questions: cators are deemed to bear a clear correspondence with a
■ Is the defining boundary of a zone crisp and mapped phenomenon. Detailed household income figures,
well-defined? for example, provide a direct indicator of the likely geog-
■ Is our assignment of a particular label to a given zone raphy of expenditure and demand for goods and services;
robust and defensible? tree diameter at breast height can be used to estimate stand
value; and field nutrient measures can be used to esti-
mate agronomic yield. Indirect indicators are used when
the best available measure is a perceived surrogate link
with the phenomenon of interest. Thus the incidence of
central heating amongst households, or rates of multiple
car ownership, might provide a surrogate for (unavail-
able) household income data, while local atmospheric
measurements of nitrogen dioxide might provide an indi-
rect indicator of environmental health. Conception of the
(direct or indirect) linkage between any indicator and the
phenomenon of interest is subjective, hence ambiguous.
Such measures will create (possibly systematic) errors of
measurement if the correspondence between the two is
imperfect. So, for example, differences in the concep-
tion of what hardship and deprivation entail can lead to
Figure 6.3 Seeing the wood for the trees: what absolute or specification of different composite indicators, and differ-
relative incidence rate makes it meaningful to assign the label ent geodemographic systems include different cocktails of
‘oak woodland’? (Reproduced by permission of Ellan Young, census variables (Section 2.3.3). With regard to the natu-
Photo Researchers) ral environment, conception of critical defining properties
CHAPTER 6 UNCERTAINTY 131

Applications Box 6.1

Historians need maps of our uncertain past


In the study of history, there are many ways Historian Kevin Schürer (Box 13.2) has inves-
in which ‘spatial is special’ (Section 1.1.1). For tigated these questions using a historical GIS to
example, it is widely recognized that although map digital surname data from the 1881 Cen-
what our ancestors did (their occupations) sus of England and Wales. The motivation for
and the social groups (classes) to which they the GIS arises from the observation that many
belonged were clearly important in terms of surnames contain statements of regional iden-
demographic behavior, location and place were tity, and the suggestion that distinct zones of
of equal if not greater importance. Although similar surnames might be described as homo-
population changes occur in particular socio- geneous regions. The digitized records of the
economic circumstances, they are also strongly 1881 Census for England and Wales cover some
influenced by the unique characteristics, or 26 million people: although some 41 000 dif-
‘cultural identities’, of particular places. In Great ferent surnames are recorded, a fifth of the
Britain today, as almost everywhere else in the population shared just under 60 surnames, and
world, most people still think of their nation half of the population were accounted for by
as made up of ‘regions’, and their stability and some 600 surnames. Schürer suggests that these
defining characteristics are much debated by aggregate statistics conceal much that we might
cultural geographers and historians. learn about regional identity and diversity.
Yet analyzing and measuring human activity Many surnames of European origin are
by place creates particular problems for histori- formed from occupational titles. Occupations
ans. Most obviously, the past was very much less often have uneven regional distributions and
data rich than the present, and few systematic sometimes similar occupations are described
data sources survive. Moreover, the geographi- using different names in different places (at the
cal administrative units by which the events of global scale, today’s ‘realtors’ in the US perform
the past were recorded are both complex and much the same functions as their ‘estate agent’
changing. In an ideal world, perhaps, physical counterparts in the UK, for example). Schürer has
and cultural boundaries would always coincide, investigated the 1881 geographical distribution
but physical features alone rarely provide appro- of three occupational surnames – Fuller, Tucker,
priate indicators of the limits of socio-economic and Walker. These essentially refer to the
conditions and cultural circumstance. same occupation; namely someone who, from
Unfortunately many mapped historical data around the 14th century onwards, worked
are still presented using high-level aggregations, in the preparation of textiles by scouring
such as counties or regions. This achieves a or beating cloth as a means of finishing
measure of standardization but may depict or cleansing it. Using GIS, Schürer confirms
demography in only the most arbitrary of ways. that the geographies of these 14th century
If data are forced into geographic administrative surnames remained of enduring importance in
units that were delineated for other purposes, defining the regional geography of England in
regional maps may present nothing more than 1881. Figure 6.4 illustrates that in 1881 Tuckers
misleading, or even meaningless, spatial means remained concentrated in the West Country,
(see Box 1.9). while Fullers occurred principally in the east and
In England and in many other countries, the Walkers resided in the Midlands and north. This
daily activities of most individuals historically
map also shows that there was not much mixing
revolved around small numbers of contiguous
of the surnames in the transition zones between
civil parishes, of which there were more than
names, suggesting that the maps provide a
16 000 in the 19th century. These are the
useful basis to region building.
smallest administrative units for which data are
The enduring importance of surnames as
systematically available. They provide the best
evidence of the strength and durability of
available building blocks for meaningful region
regional cultures has been confirmed in an
building. But how can we group parishes in
update to the work by Daryl Lloyd at University
order to identify non-overlapping geographic
College London: Lloyd used the 2003 UK
territories to which people felt that they
Electoral Register to map the distribution of the
belonged? And what indicators of regional
same three surnames (Figure 6.5) and identified
identity are likely to have survived for all
persistent regional concentrations.
individuals in the population?

132 PART II PRINCIPLES

None Fuller & Tucker None Fuller & Tucker


Fuller Fuller & Walker Fuller Fuller & Walker
Tucker Tucker & Walker Tucker Tucker & Walker
Walker Tucker, Fuller & Walker Walker Tucker, Fuller & Walker

Source: 1881 Census of Population Source: 1998 Electoral Registrar

Kilometres Kilometres
0 50 100 200 300 400 0 50 100 200 300 400

Figure 6.4 The 1881 geography of the Fullers, Tuckers, Figure 6.5 The 2003 geography of the Fullers, Tuckers,
and Walkers (Reproduced with permission of K. Schürer) and Walkers (Reproduced with permission of Daryl Lloyd)

of soils can lead to inherent ambiguity in their classifica- island? How might different national geodemographic
tion (see Section 6.2.4). classifications be combined into a form suitable for a pan-
European marketing exercise? These are all variants of
Ambiguity is introduced when imperfect indicators the question:
of phenomena are used instead of the
phenomena themselves. ■ How may mismatches between the categories of
different classification schema be reconciled?
Fundamentally, GIS has upgraded our abilities to gen-
eralize about spatial distributions. Yet our abilities to do Differences in definitions are a major impediment
so may be constrained by the different taxonomies that to integration of geographic data over wide areas.
are conceived and used by data-collecting organizations
within our overall study area. A study of wetland clas- Like the process of pinning down the different nomen-
sification in the US found no fewer than six agencies clatures developed in different cultural settings, the pro-
engaged in mapping the same phenomena over the same cess of reconciling the semantics of different classification
geographic areas, and each with their own definitions of schema is an inherently ambiguous procedure. Ambiguity
wetland types (see Section 1.2). If wetland maps are to arises in data concatenation when we are unsure regard-
be used in regulating the use of land, as they are in ing the meta-category to which a particular class should
many areas, then uncertainty in mapping clearly exposes be assigned.
regulatory agencies to potentially damaging and costly
lawsuits. How might soils data classified according to the
UK national classification be assimilated within a pan- 6.2.3 Fuzzy approaches
European soils map, which uses a classification honed
to the full range and diversity of soils found across the One way of resolving the assignment process is to adopt
European continent rather than those just on an offshore a probabilistic interpretation. If we take a statement like
CHAPTER 6 UNCERTAINTY 133
‘the database indicates that this field contains wheat, Suppose we are asked to examine an aerial photograph
but there is a 0.17 probability (or 17% chance) that it to determine whether a field contains wheat, and we
actually contains barley’, there are at least two possible decide that we are not sure. However, we are able to put
interpretations: a number on our degree of uncertainty, by putting it on
a scale from 0 to 1. The more certain we are, the higher
(a) If 100 randomly chosen people were asked to make
the number. Thus we might say we are 0.90 sure it is
independent assessments of the field on the ground,
wheat, and this would reflect a greater degree of certainty
17 would determine that it contains barley, and 83
than 0.80. This degree of belonging to the class wheat is
would decide it contains wheat.
termed the fuzzy membership, and it is common though
(b) Of 100 similar fields in the database, 17 actually not necessary to limit memberships to the range 0 to 1.
contained barley when checked on the ground, and In effect, we have changed our view of membership in
83 contained wheat. classes, and abandoned the notion that things must either
Of the two we probably find the second more belong to classes or not belong to them – in this new
acceptable because the first implies that people cannot world, the boundaries of classes are no longer clean and
correctly determine the crop in the field. But the crisp, and the set of things assigned to a set can be fuzzy.
important point is that, in conceptual terms, both of these In fuzzy logic, an object’s degree of belonging
interpretations are frequentist, because they are based on
to a class can be partial.
the notion that the probability of a given outcome can be
defined as the proportion of times the outcome occurs in One of the major attractions of fuzzy sets is that they
some real or imagined experiment, when the number of appear to let us deal with sets that are not precisely
tests is very large. Yet while this is reasonable for classic defined, and for which it is impossible to establish
statistical experiments, like tossing coins or drawing balls membership cleanly. Many such sets or classes are found
from an urn, the geographic situation is different – there in GIS applications, including land use categories, soil
is only one field with precisely these characteristics, and types, land cover classes, and vegetation types. Classes
one observer, and in order to imagine a number of tests used for maps are often fuzzy, such that two people asked
we have to invent more than one observer, or more than to classify the same location might disagree, not because
one field (the problems of imagining larger populations of measurement error, but because the classes themselves
for some geographic samples are discussed further in are not perfectly defined and because opinions vary. As
Section 15.4). such, mapping is often forced to stretch the rules of
In part because of this problem, many people prefer the scientific repeatability, which require that two observers
subjectivist conception of probability – that it represents a will always agree. Box 6.2 shows a typical extract from
judgment about relative likelihood that is not the result of the legend of a soil map, and it is easy to see how two
any frequentist experiment, real or imagined. Subjective people might disagree, even though both are experts with
probability is similar in many ways to the concept of years of experience in soil classification.
fuzzy sets, and the latter framework will be used here Figure 6.6 shows an example of mapping classes using
to emphasize the contrast with frequentist probability. the fuzzy methods developed by A-Xing Zhu of the

Technical Box 6.2

Fuzziness in classification: description of a soil class


Following is the description of the Limerick formed in loamy alluvium. Permeability is
series of soils from New England, USA (the moderate. Slope ranges from 0 to 3 percent.
type location is in Chittenden County, Vermont), Mean annual precipitation is about 34 inches
as defined by the National Cooperative Soil
and mean annual temperature is about 45
Survey. Note the frequent use of vague terms
degrees F.
such as ‘very’, ‘moderate’, ‘about’, ‘typically’,
and ‘some’. Because the definition is so loose Depth to bedrock is more than 60 inches.
it is possible for many distinct soils to be Reaction ranges from strongly acid to neutral
lumped together in this one class – and two in the surface layer and moderately acid to
observers may easily disagree over whether a neutral in the substratum. Textures are
given soil belongs to the class, even though typically silt loam or very fine sandy loam, but
both are experts. The definition illustrates the lenses of loamy very fine sand or very fine
extreme problems of defining soil classes with sand are present in some pedons. The
sufficient rigor to satisfy the criterion of scientific weighted average of fine and coarser sands, in
repeatability.
the particle-size control section, is less than
The Limerick series consists of very deep, 15 percent.
poorly drained soils on flood plains. They
134 PART II PRINCIPLES

(A) (B)

(C) (D)

Figure 6.6 (A) Membership map for bare soils in the Upper Lake McDonald basin, Glacier National Park. High membership values
are in the ridge areas where active colluvial and glacier activities prevent the establishment of vegetation. (B) Membership map for
forest. High membership values are in the middle to lower slope areas where the soils are both stable and better drained.
(C) Membership map for alpine meadows. High membership values are on gentle slopes at high elevation where excessive soil water
and low temperature prevent the growth of trees. (D) Spatial distribution of the three cover types from hardening the membership
maps. (Reproduced by permission of A-Xing Zhu)

University of Wisconsin-Madison, USA, which take both about which class to choose then it is more accurate to
remote sensing images and the opinions of experts as say so, in the form of a fuzzy membership, than to be
inputs. There are three classes, and each map shows the forced into assigning a class without qualification. But
fuzzy membership values in one class, ranging from 0 that does not address the question of whether the fuzzy
(darkest) to 1 (lightest). This figure also shows the result membership value is accurate. If Class A is not well
of converting to crisp categories, or hardening – to obtain
defined, it is hard to see how one person’s assignment of a
Figure 6.6D, each pixel is colored according to the class
fuzzy membership of 0.83 in Class A can be meaningful to
with the highest membership value.
Fuzzy approaches are attractive because they capture another person, since there is no reason to believe that the
the uncertainty that many of us feel about the assignment two people share the same notions of what Class A means,
of places on the ground to specific categories. But or of what 0.83 means, as distinct from 0.91, or 0.74. So
researchers have struggled with the question of whether while fuzzy approaches make sense at an intuitive level,
they are more accurate. In a sense, if we are uncertain it is more difficult to see how they could be helpful in the
CHAPTER 6 UNCERTAINTY 135
process of communication of geographic knowledge from breakpoints between the spheres of influence of adjacent
one person to another. facilities or features – as in the definition of travel-
to-work areas (Figure 6.8) or the definition of a river
catchment. Zones may be defined such that there is
6.2.4 The scale of geographic maximal interaction within zones, and minimal between
individuals zones. The scale at which uniformity or functional
integrity is conceived clearly conditions the ways it is
There is a sense in which vagueness and ambiguity in measured – in terms of the magnitude of within-zone
the conception of usable (rather than natural ) units of heterogeneity that must be accommodated in the case of
analysis undermines the very foundations of GIS. How, uniform zones, and the degree of leakage between the
in practice, may we create a sufficiently secure base units of functional zones.
to support geographic analysis? Geographers have long Scale has an effect, through the concept of spatial
grappled with the problems of defining systems of zones autocorrelation outlined in Section 4.3, upon the out-
and have marshaled a range of deductive and inductive come of geographic analysis. This was demonstrated
approaches to this end (see Section 4.9 for a discussion more than half a century ago in a classic paper by Yule
of what deduction and induction entail). The long- and Kendall, where the correlation between wheat and
established regional geography tradition is fundamentally potato yields was shown systematically to increase as
concerned with the delineation of zones characterized by English county units were amalgamated through a suc-
internal homogeneity (with respect to climate, economic cession of coarser scales (Table 6.1). A succession of
development, or agricultural land use, for example), research papers has subsequently reaffirmed the exis-
within a zonal scheme which maximizes between-zone tence of similar scale effects in multivariate analysis.
heterogeneity, such as the map illustrated in Figure 6.7. However, rather discouragingly, scale effects in mul-
Regional geography is fundamentally about delineating tivariate cases do not follow any consistent or pre-
uniform zones, and many employ multivariate statistical dictable trends. This theme of dependence of results on
techniques such as cluster analysis to supplement, or post- the geographic units of analysis is pursued further in
rationalize, intuition. Section 6.4.3.
Identification of homogeneous zones and spheres Relationships typically grow stronger when based
of influence lies at the heart of traditional regional on larger geographic units.
geography as well as contemporary data analysis.
GIS appears to trivialize the task of creating composite
Other geographers have tried to develop functional thematic maps. Yet inappropriate conception of the scale
zonal schemes, in which zone boundaries delineate the of geographic phenomena can mean that apparent spatial
S v a l b aa y )
( Norw
D EN

20

N O R W A Y

lircle

U.S.
180
MA

rian

ti c C
ARCTIC OCEAN
GE

DEN
SWE
rd
RK

40

A rc
RM

be

a
0

S e ai n g
Se
16
AN

60
tic D
Si
a

Bal
B a Sea sk

A N
Y PO

Mu

r
Se

0
F I N L
Ko insula

14 s
t
Pe

Be
re

80
Ea
la

rm
n

120
KA

ES

nt

100
50

LA

a
L.

60
T.

s
n
ND

LIT

LAT

Ka
r ev
H.

Se a Lapt
BE

St.

a
Sea
LA

Pe

A
RU

ter

. r c n d
rR t i c L o w l a
S

sb

pe
ur

Ve r k h
U

Mo

ie
g
Dn
K

t
sc
R

,2 . Na Nor
a R.

ilsk
6
us
ow
R

s 14 ro
n ft. dna
si
A

ym

D
P

y
R.

a
oy
i

l
an

a
ta
IN

y
la
on

Central an Ko
Ob

We ka f
t.
ds
in

sk
a
un

vs 584 k
R.

Sib st at la
E

Le

Mts
1

Siberian 6 e
lan

,
. ch 5 ch u
na
Mo

eri R.
yu 1 m ins
Se zov

Pla an
. l a
A
a

Vo l g a R Plateau K K en
gh
of

t sk t.
Ya k u
Ye

8 in k M P
uts
l

nisey R.

Hi
ra

Ya k i n a
Blac

U Se f
Ca

2 4 5 Ba s o sk
rn
Ob

3 t
k Sea

uc

ho
50
te
R.

O k lin
as u

as

GE Vl K Om
sk R. kh
a
na E
ds

A
s M

TU OR ad Z N ov Le Sa land
Islan

R. . ika A o sibi Is
40 vk K rs k
n 7
Irtysh

sia
Alin Mts.

az H
t s.
AR

Ca spia

S
lA
M.

TA
r il e

AZ
tra ges
Ar
Am

ER Se al N n
R.

C e R a n Ir kutsk
. a
ur

Ku

Lake
n Sea

R.

Baykal t
Eas a
ote-

PHYSIOGRAPHIC REGIONS OF RUSSIA Se of


a
CH INA e
(S an)
0 400 800 1200 1600 Kilometers
Jap
ikh

IRA M O N G O L I A
N S tok N 4
0
0 200 400 600 800 1000 Miles
Longitude East of Greenwich 100 120 i vo s 140 A PA
Vlad J

Figure 6.7 The regional geography of Russia. (Source: de Blij H.J. and Muller P.O. 2000 Geography: Realms, Regions and
Concepts (9th edn) New York: Wiley, p. 113)
136 PART II PRINCIPLES
regions), or if geographic phenomena are by nature fuzzy,
vague, or ambiguous.

6.3 U2: Further uncertainty in the


measurement and representation
of geographic phenomena

Extent of major functional


regions in Great Britain
6.3.1 Measurement and
representation
The conceptual models (fields and objects) that were
introduced in Chapter 3 impose very different filters upon
reality, and their usual corresponding representational
models (raster and vector) are characterized by different
uncertainties as a consequence. The vector model enables
a range of powerful analytical operations to be performed
(see Chapters 14 through 16), yet it also requires a priori
conceptualization of the nature and extent of geographic
individuals and the ways in which they nest together into
higher-order zones. The raster model defines individual
elements as square cells, with boundaries that bear no
relationship at all to natural features, but nevertheless
provides a convenient and (usually) efficient structure for
data handling within a GIS. However, in the absence
of effective automated pattern recognition techniques,
human interpretation is usually required to discriminate
between real-world spatial entities as they appear in a
Figure 6.8 Dominant functional regions of Great Britain. rasterized image.
(Source: Champion A.G., Green A.E., Owen D.W., Ellin D.J., Although quite different representations of reality,
Coombes M.G. 1987 Changing Places: Britain’s Demographic, vector and raster data structures are both attractive in their
Economic and Social Complexion, London: Arnold, p. 9) logical consistency, the ease with which they are able to
handle spatial data, and (once the software is written) the
ease with which they can be implemented in GIS. But
Table 6.1 In 1950 Yule and Kendall used data for wheat and neither abstraction provides easy measurement fixes and
potato yields from the (then) 48 counties of England to there is no substitute for robust conception of geographic
demonstrate that correlation coefficients tend to increase with units of analysis (Section 6.2). This said, however, the
scale. They aggregated the 48-county data into zones so that conceptual distinction between fields and discrete objects
there were first 24, then 12, then 6, and finally just 3 zones. is often useful in dealing with uncertainty. Figure 6.9
The range of their results, from near zero (no correlation) to shows a coastline, which is often conceptualized as a
over 0.99 (almost perfect positive correlation) demonstrates the discrete line object. But suppose we recognize that its
range of results that can be obtained, although subsequent position is uncertain. For example, the coastline shown
research has suggested that this range of values is atypical
on a 1:2 000 000 map is a gross generalization, in which
major liberties are taken, particularly in areas where the
No. of geographic areas Correlation
coast is highly indented and irregular. Consequently the
48 0.2189
1:2 000 000 version leaves substantial uncertainty about
24 0.2963
the true location of the shoreline. We might approach
this by changing from a line to an area, and mapping
12 0.5757
the area where the actual coastline lies, as shown in the
6 0.7649
figure. But another approach would be to reconceptualize
3 0.9902
the coastline as a field, by mapping a variable whose
value represents the probability that a point is land.
This is shown in the figure as a raster representation.
patterning (or the lack of it) in mapped data may be This would have far more information content, and
oversimplified, crude, or even illusory. It is also clearly consequently much more value in many applications.
inappropriate to conceive of boundaries as crisp and But at the same time it would be difficult to find an
well-defined if significant leakage occurs across them (as appropriate data source for the representation – perhaps
happens, in practice, in the delineation of most functional a fuzzy classification of an air photo, using one of an
CHAPTER 6 UNCERTAINTY 137
completely mixel-free classification is very unlikely at
any level of resolution. Even where the Earth’s sur-
face is covered with perfectly homogeneous areas, such
as agricultural fields growing uniform crops, the fail-
ure of real-world crop boundaries to line up with pixel
edges ensures the presence of at least some mixels. Nei-
ther does higher-resolution imagery solve all problems:
medium-resolution data (defined as pixel size of between
30 m × 30 m and 1000 m × 1000 m) are typically classi-
fied using between 3 and 7 bands, while high-resolution
data (pixel sizes 10 × 10 m or smaller) are typically clas-
sified using between 7 and 256 bands, and this can gen-
erate much greater heterogeneity of spectral values with
attendant problems for classification algorithms.
A pixel whose area is divided among more than
one class is termed a mixel.
The vector data structure, by contrast, defines spatial
entities and specifies explicit topological relations (see
Section 3.6) between them. Yet this often entails transfor-
mations of the inherent characteristics of spatial objects
(Section 14.4). In conceptual terms, for example, while
the true individual members of a population might each
be defined as point-like objects, they will often appear
in a GIS dataset only as aggregate counts for apparently
uniform zones. Such aggregation can be driven by the
need to preserve confidentiality of individual records, or
simply by the need to limit data volume. Unlike the field
conceptualization of spatial phenomena, this implies that
there are good reasons for partitioning space in a partic-
Figure 6.9 The contrast between discrete object (top) and field
ular way. In practice, partitioning of space is often made
(bottom) conceptualizations of an uncertain coastline. In the
discrete object view the line becomes an area delimiting where
on grounds that are principally pragmatic, yet are rarely
the true coastline might be. In the field view a continuous completely random (see Section 6.4). In much of socio-
surface defines the probability that any point is land economic GIS, for example, zones which are designed
to preserve the anonymity of survey respondents may be
largely ad hoc containers. Larger aggregations are often
increasing number of techniques designed to produce used for the simple reason that they permit comparisons of
representations of the uncertainty associated with objects measures over time (see Box 6.1). They may also reflect
discovered in images. the way that a cartographer or GIS interpolates a bound-
ary between sampled points, as in the creation of isopleth
Uncertainty can be measured differently under
maps (Box 4.3).
field and discrete object views.
Indeed, far from offering quick fixes for eliminating or
reducing uncertainty, the measurement process can actu- 6.3.2 Statistical models of uncertainty
ally increase it. Given that the vector and raster data
models impose quite different filters on reality, it is unsur- Scientists have developed many widely used methods for
prising that they can each generate additional uncertainty describing errors in observations and measurements, and
in rather different ways. In field-based conceptualiza- these methods may be applicable to GIS if we are willing
tions, such as those that underlie remotely sensed images to think of databases as collections of measurements. For
expressed as rasters, spatial objects are not defined a pri- example, a digital elevation model consists of a large
ori. Instead, the classification of each cell into one or number of measurements of the elevation of the Earth’s
other category builds together into a representation. In surface. A map of land use is also in a sense a collection of
remote sensing, when resolution is insufficient to detect measurements, because observations of the land surface
all of the detail in geographic phenomena, the term mixel have resulted in the assignment of classes to locations.
is often used to describe raster cells that contain more Both of these are examples of observed or measured
than one class of land – in other words, elements in attributes, but we can also think of location as a property
which the outcome of statistical classification suggests that is measured.
the occurrence of multiple land cover categories. The
total area of cells classified as mixed should decrease A geographic database is a collection of
as the resolution of the satellite sensor increases, assum- measurements of phenomena on or near the
ing the number of categories remains constant, yet a Earth’s surface.
138 PART II PRINCIPLES
Here we consider errors in nominal class assignment, The vector for row i gives the proportions of cases in
such as of types of land use, and errors in contin- which what appears to be Class i is actually Class 1,
uous (interval or ratio) scales, such as elevation (see 2, 3, etc. Symbolically, this can be represented as a
Section 3.4). vector {p1 , p2 , . . . , pi , . . . , pn }, where n is the number
of classes, and pi represents the proportion of cases for
which what appears to be the class according to the
6.3.2.1 Nominal case database is actually Class i.
The values of nominal data serve only to distinguish There are several ways of describing and summarizing
an instance of one class from an instance of another, the confusion matrix. If we focus on one row, then the
or to identify an object uniquely. If classes have an table shows how a given class in the database falsely
inherent ranking they are described as ordinal data, but records what are actually different classes on the ground.
for purposes of simplicity the ordinal case will be treated For example, Row A shows that of 106 parcels recorded
here as if it were nominal. as Class A in the database, 80 were confirmed as Class A
Consider a single observation of nominal data – for in the field, but 15 appeared to be truly Class D. The
example, the observation that a single parcel of land is proportion of instances in the diagonal entries represents
being used for agriculture (this might be designated by the proportion of correctly classified parcels, and the total
giving the parcel Class A as its value of the ‘Land Use of off-diagonal entries in the row is the proportion of
Class’ attribute). For some reason, perhaps related to the entries in the database that appear to be of the row’s class
quality of the aerial photography being used to build but are actually incorrectly classified. For example, there
the database, the class may have been recorded falsely were only 9 instances of agreement between the database
as Class G, Grassland. A certain proportion of parcels and the field in the case of Class D. If we look at the
that are truly Agriculture might be similarly recorded table’s columns, the entries record the ways in which
as Grassland, and we can think of this in terms of a parcels that are truly of that class are actually recorded in
probability, that parcels that are truly Agriculture are the database. For example, of the 10 instances of Class C
falsely recorded as Grassland. found in the field, 9 were recorded as such in the database
Table 6.2 shows how this might work for all of the and only 1 was misrecorded as Class E.
parcels in a database. Each parcel has a true class, defined The columns have been called the producer’s per-
by accurate observation in the field, and a recorded class spective, because the task of the producer of an accurate
as it appears in the database. The whole table is described database is to minimize entries outside the diagonal cell
as a confusion matrix, and instances of confusion matrices in a given column, and the rows have been called the
are commonly encountered in applications dominated by consumer’s perspective, because they record what the
class data, such as classifications derived from remote contents of the database actually mean on the ground;
sensing or aerial photography. The true class might be in other words, the accuracy of the database’s contents.
determined by ground check, which is inherently more
accurate than classification of aerial photographs, but Users and producers of data look at
much more expensive and time-consuming. misclassification in distinct ways.
Ideally all of the observations in the confusion matrix
should be on the principal diagonal, in the cells that For the table as a whole, the proportion of entries
correspond to agreement between true class and database in diagonal cells is called the percent correctly classified
class. But in practice certain classes are more easily (PCC), and is one possible way of summarizing the table.
confused than others, so certain cells off the diagonal will In this case 209/304 cases are on the diagonal, for a
have substantial numbers of entries. PCC of 68.8%. But this measure is misleading for at
A useful way to think of the confusion matrix is least two reasons. First, chance alone would produce some
as a set of rows, each defining a vector of values. correct classifications, even in the worst circumstances, so
it would be more meaningful if the scale were adjusted
such that 0 represents chance. In this case, the number
Table 6.2 Example of a misclassification or confusion matrix.
of chance hits on the diagonal in a random assignment
A grand total of 304 parcels have been checked. The rows of
is 76.2 (the sum of the row total times the column total
the table correspond to the land use class of each parcel as
recorded in the database, and the columns to the class as
divided by the grand total for each of the five diagonal
recorded in the field. The numbers appearing on the principal cells). So the actual number of diagonal hits, 209, should
diagonal of the table (from top left to bottom right) reflect be compared to this number, not 0. The more useful index
correct classification of success is the kappa index, defined as:

n
 n

A B C D E Total
cii − ci. c.i /c..
A 80 4 0 15 7 106 i=1 i=1
κ= n
B 2 17 0 9 2 30 
C 12 5 9 4 8 38 c.. − ci. c.i /c..
D 7 8 0 65 0 80 i=1

E 3 2 1 6 38 50
Total 104 36 10 99 55 304 where cij denotes the entry in row i column j , the
dots indicate summation (e.g., ci. is the summation over
CHAPTER 6 UNCERTAINTY 139
all columns for row i, that is, the row i total, and
c.. is the grand total), and n is the number of classes.
The first term in the numerator is the sum of all the
diagonal entries (entries for which the row number and
the column number are the same). To compute PCC we
would simply divide this term by the grand total (the
first term in the denominator). For kappa, both numerator
and denominator are reduced by the same amount, an
estimate of the number of hits (agreements between field
and database) that would occur by chance. This involves
taking each diagonal cell, multiplying the row total by
the column total, and dividing by the grand total. The
result is summed for each diagonal cell. In this case kappa
evaluates to 58.3%, a much less optimistic assessment
than PCC.
The second issue with both of these measures concerns
the relative abundance of different classes. In the table,
Class C is much less common than Class A. The confusion
matrix is a useful way of summarizing the characteristics
of nominal data, but to build it there must be some source
of more accurate data. Commonly this is obtained by
ground observation, and in practice the confusion matrix
is created by taking samples of more accurate data, by
sending observers into the field to conduct spot checks.
Clearly it makes no sense to visit every parcel, and instead Figure 6.10 An example of a vegetation cover map. Two
a sample is taken. Because some classes are commoner strategies for accuracy assessment are available: to check by
than others, a random sample that made every parcel area (polygon), or to check by point. In the former case a
equally likely to be chosen would be inefficient, because strategy would be devised for field checking each area, to
too many data would be gathered on common classes, determine the area’s correct class. In the latter, points would be
and not enough on the relatively rare ones. So, instead, sampled across the state and the correct class determined at
samples are usually chosen such that a roughly equal each point
number of parcels are selected in each class. Of course
these decisions must be based on the class as recorded in
Andrew Frank have discussed many of the implications
the database, rather than the true class. This is an instance of uncertain boundaries in GIS.
of sampling that is systematically stratified by class (see
Section 4.4). Errors in land cover maps can occur in the locations
of boundaries of areas, as well as in the
Sampling for accuracy assessment should pay
classification of areas.
greater attention to the classes that are rarer on
the ground. In such cases we need a different strategy, that
captures the influence both of mislocated boundaries and
Parcels represent a relatively easy case, if it is of misallocated classes. One way to deal with this is to
reasonable to assume that the land use class of a parcel think of error not in terms of classes assigned to areas,
is uniform over the parcel, and class is recorded as a but in terms of classes assigned to points. In a raster
single attribute of each parcel object. But as we noted in dataset, the cells of the raster are a reasonable substitute
Section 6.2, more difficult cases arise in sampling natural for individual points. Instead of asking whether area
areas (for example in the case of vegetation cover class), classes are confused, and estimating errors by sampling
where parcel boundaries do not exist. Figure 6.10 shows areas, we ask whether the classes assigned to raster
a typical vegetation cover class map, and is obviously cells are confused, and define the confusion matrix in
highly generalized. If we were to apply the previous terms of misclassified cells. This is often called per-
strategy, then we would test each area to see if its assigned pixel or per-point accuracy assessment, to distinguish
vegetation cover class checks out on the ground. But it from the previous strategy of per-polygon accuracy
unlike the parcel case, in this example the boundaries assessment. As before, we would want to stratify by class,
between areas are not fixed, but are themselves part of to make sure that relatively rare classes were sampled in
the observation process, and we need to ask whether they the assessment.
are correctly located. Error in this case has two forms:
misallocation of an area’s class and mislocation of an
area’s boundaries. In some cases the boundary between 6.3.2.2 Interval/ratio case
two areas may be fixed, because it coincides with a The second case addresses measurements that are made
clearly defined line on the ground; but in other cases, on interval or ratio scales. Here, error is best thought
the boundary’s location is as much a matter of judgment of not as a change of class, but as a change of value,
as the allocation of an area’s class. Peter Burrough and such that the observed value x  is equal to the true value
140 PART II PRINCIPLES
x plus some distortion δx, where δx is hopefully small. (A) (B)
δx might be either positive or negative, since errors are
possible in both directions. For example, the measured
and recorded elevation at some point might be equal to
the true elevation, distorted by some small amount. If the
average distortion is zero, so that positive and negative
errors balance out, the observed values are said to be
unbiased, and the average value will be true.
Error in measurement can produce a change of
class, or a change of value, depending on the type
Figure 6.11 The term precision is often used to refer to the
of measurement.
repeatability of measurements. In both diagrams six
Sometimes it is helpful to distinguish between accu- measurements have been taken of the same position,
racy, which has to do with the magnitude of δx, and represented by the center of the circle. In (A) successive
precision. Unfortunately there are several ways of defin- measurements have similar values (they are precise), but show
ing precision in this context, at least two of which are a bias away from the correct value (they are inaccurate). In
regularly encountered in GIS. Surveyors and others con- (B), precision is lower but accuracy is higher
cerned with measuring instruments tend to define preci-
sion through the performance of an instrument in making receiver is in reality only accurate to the nearest 10 cm,
repeated measurements of the same phenomenon. A mea- three of those digits are spurious, with no real meaning.
suring instrument is precise according to this definition So, although the precision is one ten thousandth of a
if it repeatedly gives similar measurements, whether or meter, the accuracy is only one tenth of a meter. Box 6.3
not these are actually accurate. So a GPS receiver might summarizes the rules that are used to ensure that reported
make successive measurements of the same elevation, and measurements do not mislead by appearing to have greater
if these are similar the instrument is said to be precise. accuracy than they really do.
Precision in this case can be measured by the variabil-
ity among repeated measurements. But it is possible, for To most scientists, precision refers to the number
example, that all of the measurements are approximately of significant digits used to report a measurement,
5 m too high, in which case the measurements are said to but it can also refer to a measurement’s
be biased, even though they are precise, and the instru-
repeatability.
ment is said to be inaccurate. Figure 6.11 illustrates this
meaning of precise, and its relationship to accuracy. In the interval/ratio case, the magnitude of errors is
The other definition of precision is more common described by the root mean square error (RMSE), defined
in science generally. It defines precision as the number as the square root of the average squared error, or:
of digits used to report a measurement, and again it is
not necessarily related to accuracy. For example, a GPS  1/2
receiver might measure elevation as 51.3456 m. But if the δx 2 /n

Technical Box 6.3

Good practice in reporting measurements


Here are some simple rules that help to en- rounded down. The following examples
sure that people receiving measurements from reflect rounding to two decimal places:
others are not misled by their apparently
14.57803 rounds to 14.58
high precision.
14.57397 rounds to 14.57
1. The number of digits used to report a
14.57999 rounds to 14.58
measurement should reflect the
measurement’s accuracy. For example, if a 14.57499 rounds to 14.57
measurement is accurate to 1 m then no 3. These rules are not effective to the left of
decimal places should be reported. The the decimal place – for example, they give
measurement 14.4 m suggests accuracy to no basis for knowing whether 1400 is
one tenth of a meter, as does 14.0, but 14 accurate to the nearest unit, or to the
suggests accuracy to 1 m. nearest hundred units.
2. Excess digits should be removed by rounding. 4. If a number is known to be exactly an
Fractions above one half should be rounded integer or whole number, then it is shown
up, fractions below one half should be with no decimal point.
CHAPTER 6 UNCERTAINTY 141
where the summation is over the values of δx for all of σ denotes the standard deviation, µ denotes the mean (in
the n observations. The RMSE is similar in a number Figure 6.12 these values are 1 and 0 respectively), and exp
of ways to the standard deviation of observations in a is the exponential function, or ‘2.71828 to the power of’.
sample. Although RMSE involves taking the square root Scientists believe that it applies very broadly, and that many
of the average squared error, it is convenient to think instances of measurement error adhere closely to the distri-
of it as approximately equal to the average error in each bution, because it is grounded in rigorous theory. It can be
observation, whether the error is positive or negative. The shown mathematically that the distribution arises whenever
US Geological Survey uses RMSE as its primary measure a large number of random factors contribute to error, and the
of the accuracy of elevations in digital elevation models, effects of these factors combine additively – that is, a given
and published values range up to 7 m. effect makes the same additive contribution to error what-
Although the RMSE can be thought of as capturing ever the specific values of the other factors. For example,
the magnitude of the average error, many errors will be error might be introduced in the use of a steel tape mea-
greater than the RMSE, and many will be less. It is sure over a large number of measurements because some
useful, therefore, to know how errors are distributed in observers consistently pull the tape very taught, or hold it
magnitude – how many are large, how many are small. very straight, or fastidiously keep it horizontal, or keep it
Statisticians have developed a series of models of error cool, and others do not. If the combined effects of these
distributions, of which the commonest and most important considerations always contributes the same amount of error
is the Gaussian distribution, otherwise known as the error (e.g., +1 cm, or −2 cm), then this contribution to error is
function, the ‘bell curve’, or the Normal distribution. said to be additive.
Figure 6.12 shows the curve’s shape. If observations  
are unbiased, then the mean error is zero (positive and 1 (x − µ)2
f (x) = √ exp −
negative errors cancel each other out), and the RMSE σ 2π 2σ 2
is also the distance from the center of the distribution
(zero) to the points of inflection on either side, as shown We can apply this idea to determine the inherent uncer-
in the figure. Let us take the example of a 7 m RMSE tainty in the locations of contours. The US Geological
on elevations in a USGS digital elevation model; if error Survey routinely evaluates the accuracies of its digital
follows the Gaussian distribution, this means that some elevation models (DEMs), by comparing the elevations
errors will be more than 7 m in magnitude, some will be recorded in the database with those at the same locations
less, and also that the relative abundance of errors of any in more accurate sources, for a sample of points. The dif-
given size is described by the curve shown. 68% of errors ferences are summarized in a RMSE, and in this example
will be between −1.0 and +1.0 RMSEs, or −7 m and we will assume that errors have a Gaussian distribution
+7 m. In practice many distributions of error do follow with zero mean and a 7 m RMSE. Consider a measure-
the Gaussian distribution, and there are good theoretical ment of 350 m. According to the error model, the truth
reasons why this should be so. might be as high as 360 m, or as low as 340 m, and the
relative frequencies of any particular error value are as
The Gaussian distribution predicts the relative
predicted by the Gaussian distribution with a mean of
abundances of different magnitudes of error. zero and a standard deviation of 7. If we take error into
To emphasize the mathematical formality of the Gaus- account, using the Gaussian distribution with an RMSE of
sian distribution, its equation is shown below. The symbol 7 m, it is no longer clear that a measurement of 350 m lies
exactly on the 350 m contour. Instead, the truth might be
340 m, or 360 m, or 355 m. Figure 6.13 shows the impli-
cations of this in terms of the location of this contour
in a real-world example. 95% of errors would put the
contour within the colored zone. In areas colored red the
observed value is less than 350 m, but the truth might be
350 m; in areas colored green the observed value is more
than 350 m, but the truth might be 350 m. There is a 5%
chance that the true location of the contour lies outside
the colored zone entirely.

6.3.3 Positional error


–4.0 –2.0 0.0 2.0 4.0
Figure 6.12 The Gaussian or Normal distribution. The height In the case of measurements of position, it is possible
of the curve at any value of x gives the relative abundance of for every coordinate to be subject to error. In the two-
observations with that value of x. The area under the curve dimensional case, a measured position (x  , y  ) would
between any two values of x gives the probability that be subject to errors in both x and y; specifically, we
observations will fall in that range. The range between −1 might write x  = x + δx, y  = y + δy, and similarly in
standard deviation and +1 standard deviation is in light purple. the three-dimensional case where all three coordinates are
It encloses 68% of the area under the curve, indicating that measured, z = z + δz. The bivariate Gaussian distribu-
68% of observations will fall between these limits tion describes errors in the two horizontal dimensions,
142 PART II PRINCIPLES

Figure 6.13 Uncertainty in the location of the 350 m contour in the area of State College, Pennsylvania, generated from a US
Geological Survey DEM with an assumed RMSE of 7 m. According to the Gaussian distribution with a mean of 350 m and a
standard deviation of 7 m, there is a 95% probability that the true location of the 350 m contour lies in the colored area, and a 5%
probability that it lies outside (Source: Hunter G. J. and Goodchild M. F. 1995 ‘Dealing with error in spatial databases: a simple case
study’. Photogrammetric Engineering and Remote Sensing 61: 529–37)

and it can be generalized to the three-dimensional case. example, the 1947 US National Map Accuracy Standard
Normally, we would expect the RMSEs of x and y to be specified that 95% of errors should fall below 1/30 inch
the same, but z is often subject to errors of quite different (0.85 mm) for maps at scales of 1:20 000 and finer
magnitude, for example in the case of determinations of (more detailed), and 1/50 inch (0.51 mm) for other maps
position using GPS. The bivariate Gaussian distribution (coarser, less detailed, levels of granularity than 1:20 000).
also allows for correlation between the errors in x and y, A convenient rule of thumb is that positions measured
but normally there is little reason to expect correlations. from maps are subject to errors of up to 0.5 mm at the
Because it involves two variables, the bivariate scale of the map. Table 6.3 shows the distance on the
Gaussian distribution has somewhat different properties ground corresponding to 0.5 mm for various common
from the simple (univariate) Gaussian distribution. 68% of map scales.
cases lie within one standard deviation for the univariate
A useful rule of thumb is that features on maps are
case (Figure 6.12). But in the bivariate case with equal
positioned to an accuracy of about 0.5 mm.
standard errors in x and y, only 39% of cases lie within a
circle of this radius. Similarly, 95% of cases lie within two
standard deviations for the univariate distribution, but it is
necessary to go to a circle of radius equal to 2.15 times the 6.3.4 The spatial structure of errors
x or y standard deviations to enclose 90% of the bivariate
distribution, and 2.45 times standard deviations for 95%. The confusion matrix, or more specifically a single row of
National Map Accuracy Standards often prescribe the matrix, along with the Gaussian distribution, provide
the positional errors that are allowed in databases. For convenient ways of describing the error present in a single
CHAPTER 6 UNCERTAINTY 143
Table 6.3 A useful rule of thumb is that positions measured Spatial autocorrelation is also important in errors in
from maps are accurate to about 0.5 mm on the map. nominal data. Consider a field that is known to contain
Multiplying this by the scale of the map gives the a single crop, perhaps wheat. When seen from above, it
corresponding distance on the ground is possible to confuse wheat with other crops, so there
may be error in the crop type assigned to points in the
Map scale Ground distance field. But since the field has only one crop, we know that
corresponding to 0.5 mm such errors are likely to be strongly correlated. Spatial
map distance autocorrelation is almost always present in errors to some
degree, but very few efforts have been made to measure
1:1250 0.625 m it systematically, and as a result it is difficult to make
1:2500 1.25 m good estimates of the uncertainties associated with many
1:5000 2.5 m GIS operations.
1:10 000 5 m An easy way to visualize spatial autocorrelation and
1:24 000 12 m interdependence is through animation. Each frame in the
1:50 000 25 m animation is a single possible map, or realization of the
1:100 000 50 m error process. If a point is subject to uncertainty, each
1:250 000 125 m realization will show the point in a different possible
1:1 000 000 500 m location, and a sequence of images will show the point
1:10 000 000 5 000 m shaking around its mean position. If two points have
perfectly correlated positional errors, then they will appear
to shake in unison, as if they were at the ends of a stiff
observation of a nominal or interval/ratio measurement rod. If errors are only partially correlated, then the system
behaves as if the connecting rod were somewhat elastic.
respectively. When a GIS is used to respond to a simple
query, such as ‘tell me the class of soil at this point’, The spatial structure or autocorrelation of errors is
important in many ways. DEM data are often used
or ‘what is the elevation here?’, then these methods
to estimate the slope of terrain, and this is done by
are good ways of describing the uncertainty inherent in
comparing elevations at points a short distance apart.
the response. For example, a GIS might respond to the
For example, if the elevations at two points 10 m apart
first query with the information ‘Class A, with a 30%
are 30 m and 35 m respectively, the slope along the
probability of Class C’, and to the second query with the
line between them is 5/10, or 0.5. (A somewhat more
information ‘350 m, with an RMSE of 7 m’. Notice how
complex method is used in practice, to estimate slope at
this makes it possible to describe nominal data as accurate
a point in the x and y directions in a DEM raster, by
to a percentage, but it makes no sense to describe a DEM,
analyzing the elevations of nine points – the point itself
or any measurement on an interval/ratio scale, as accurate
and its eight neighbors. The equations in Section 14.4
to a percentage. For example, we cannot meaningfully say
detail the procedure.)
that a DEM is ‘90% accurate’.
Now consider the effects of errors in these two
However, many GIS operations involve more than the
elevation measurements on the estimate of slope. Suppose
properties of single points, and this makes the analysis
the first point (elevation 30 m) is subject to an RMSE
of error much more complex. For example, consider the
of 2 m, and consider possible true elevations of 28 m
query ‘how far is it from this point to that point?’ Suppose
and 32 m. Similarly the second point might have true
the two points are both subject to error of position,
elevations of 33 m and 37 m. We now have four
because their positions have been measured using GPS
possible combinations of values, and the corresponding
units with mean distance errors of 50 m. If the two
estimates of slope range from (33 − 32)/10 = 0.1 to
measurements were taken some time apart, with different
(37 − 28)/10 = 0.9. In other words, a relatively small
combinations of satellites above the horizon, it is likely
amount of error in elevation can produce wildly varying
that the errors are independent of each other, such that
slope estimates.
one error might be 50 m in the direction of North, and
the other 50 m in the direction of South. Depending on The spatial autocorrelation between errors in
the locations of the two points, the error in distance might geographic databases helps to minimize their
be as high as +100 m. On the other hand, if the two
impacts on many GIS operations.
measurements were made close together in time, with
the same satellites above the horizon, it is likely that What saves us in this situation, and makes estimation
the two errors would be similar, perhaps 50 m North of slope from DEMs a practical proposition at all,
and 40 m North, leading to an error of only 10 m in the is spatial autocorrelation among the errors. In reality,
determination of distance. The difference between these although DEMs are subject to substantial errors in
two situations can be measured in terms of the degree of absolute elevation, neighboring points nevertheless tend
spatial autocorrelation, or the interdependence of errors to have similar errors, and errors tend to persist over
at different points in space (Section 4.6). quite large areas. Most of the sources of error in the DEM
production process tend to produce this kind of persistence
The spatial autocorrelation of errors can be as of error over space, including errors due to misregistration
important as their magnitude in many of aerial photographs. In other words, errors in DEMs
GIS operations. exhibit strong positive spatial autocorrelation.
144 PART II PRINCIPLES
Another important corollary of positive spatial auto- into useful spatial information’. Good science needs
correlation can also be illustrated using DEMs. Suppose secure foundations, yet Sections 6.2 and 6.3 have shown
an area of low-lying land is inundated by flooding, and the conception and measurement of many geographic
our task is to estimate the area of land affected. We are phenomena to be inherently uncertain. How can the
asked to do this using a DEM, which is known to have an outcome of spatial analysis be meaningful if it has such
RMSE of 2 m (compare Figure 6.13). Suppose the data uncertain foundations?
points in the DEM are 30 m apart, and preliminary analy-
sis shows that 100 points have elevations below the flood Uncertainties in data lead to uncertainties in the
line. We might conclude that the area flooded is the area results of analysis.
represented by these 100 points, or 900 × 100 sq m, or 9
Once again, there are no easy answers to this question,
hectares. But because of errors, it is possible that some of
although we can begin by examining the consequences
this area is actually above the flood line (we will ignore
of accommodating possible errors of positioning, or
the possibility that other areas outside this may also be
of aggregating clearly defined units of analysis into
below the flood line, also because of errors), and it is pos-
artificial geographic individuals (as when people are
sible that all of the area is above. Suppose the recorded
aggregated by census tracts, or disease incidences are
elevation for each of the 100 points is 2 m below the
aggregated by county). In so doing, we will illustrate how
flood line. This is one RMSE (recall that the RMSE is
potential problems might arise, but will not present any
equal to 2 m) below the flood line, and the Gaussian dis-
definitive solutions – for the simple reason that the truth
tribution tells us that the chance that the true elevation is
is inherently uncertain. The conception, measurement, and
actually above the flood line is approximately 16% (see
representation of geographic individuals may distort the
Figure 6.12). But what is the chance that all 100 points
outcome of spatial analysis by masking or accentuating
are actually above the flood line?
apparent variation across space, or by restricting the
Here again the answer depends on the degree of spatial
nature and range of questions that can be asked of
autocorrelation among the errors. If there is none, in other
the GIS.
words if the error at each of the 100 points is independent
There are three ways of dealing with this risk. First,
of the errors at its neighbors, then the answer is (0.16)100 ,
although we can only rarely tackle the source of distortion
or 1 chance in 1 followed by roughly 70 zeroes. But if
(we are rarely empowered to collect new, completely
there is strong positive spatial autocorrelation, so strong
disaggregate data, for example), we can quantify the
that all 100 points are subject to exactly the same error,
way in which it is likely to operate (or propagates)
then the answer is 0.16. One way to think about this is in
within the GIS, and can gauge the magnitude of its
terms of degrees of freedom. If the errors are independent,
likely impacts. Second, although we may have to work
they can vary in 100 independent ways, depending on
with aggregated data, GIS allows us to model within-
the error at each point. But if they are strongly spatially
zone spatial distributions in order to ameliorate the worst
autocorrelated, the effective number of degrees of freedom
effects of artificial zonation. Taken together, GIS allows
is much less, and may be as few as 1 if all errors behave in
us to gauge the effects of scale and aggregation through
unison. Spatial autocorrelation has the effect of reducing
simulation of different possible outcomes. This is internal
the number of degrees of freedom in geographic data
validation of the effects of scale, point placement, and
below what may be implied by the volume of information,
spatial partitioning.
in this case the number of points in the DEM.
Because of the power of GIS to merge diverse
Spatial autocorrelation acts to reduce the effective data sources, it also provides a means of external
number of degrees of freedom in geographic data. validation of the effects of zonal averaging. In today’s
advanced GIService economy (Section 1.5.3), there may
be other data sources that can be used to gauge the
effects of aggregation upon our analysis. In Chapter 13
we will refine the basic model that was presented in
Figure 6.1 to consider how GIS provides a medium for
6.4 U3: Further uncertainty visualizing models of spatial distributions and patterns of
in the analysis of geographic homogeneity and heterogeneity.
phenomena GIS gives us maximum flexibility when working
with aggregate data, and helps us to validate our
data with reference to other available sources.

6.4.1 Internal and external validation


through spatial analysis 6.4.2 Internal validation: error
propagation
In Chapter 1 we identified one remit of GIS as the
resolution of scientific or decision-making problems The examples of Section 6.3.4 are cases of error prop-
through spatial analysis, which we defined in Section 1.7 agation, where the objective is to measure the effects
as ‘the process by which we turn raw spatial data of known levels of data uncertainty on the outputs of
CHAPTER 6 UNCERTAINTY 145
determination. Box 6.3 summarized some simple rules for
ensuring that the precision used to report a measurement
reflects as far as possible its accuracy, and clearly those
rules will have been violated if the area is reported to
eight digits. But what is the appropriate precision?
In this case we can determine exactly how positional
accuracy affects the estimate of area. It turns out that
area has an error distribution which is Gaussian, with a
standard deviation (RMSE) in this case of 200 sq m – in
other words, each attempt to measure the area will give
a different result, the variation between them having a
standard deviation of 200 sq m. This means that the five
rightmost digits in the estimate are spurious, including two
digits to the left of the decimal point. So if we were to
follow the rules of Box 6.3, we would print 10 000 rather
than 10014.603 (note the problem with standard notation
Figure 6.14 Error in the measurement of the area of a square here, which does not let us omit digits to the left of the
100 m on each side. Each of the four corner points has been decimal point even if they are spurious, and so leaves
surveyed; the errors are subject to bivariate Gaussian some uncertainty about whether the tens and units digits
distributions with standard deviations in x and y of 1 m are certain or not – and note also the danger that if the
(dashed circles). The blue polygon shows one possible number is printed as an integer it may be interpreted as
surveyed square (one realization of the error model) exactly the whole number). We can also turn the question
around and ask how accurately the points would have
GIS operations. We have seen how the spatial structure to be measured to justify eight digits, and the answer
of errors plays a role, and how the existence of strong is approximately 0.01 mm, far beyond the capabilities of
positive spatial autocorrelation reduces the effects of normal surveying practice.
uncertainty upon estimates of properties such as slope or Analysis can be applied to many other kinds of GIS
area. Yet the cumulative effects of error can also pro- analysis, and Gerard Heuvelink (Box 6.4) discusses sev-
duce impacts that are surprisingly large, and some of the eral further examples in his excellent text on error prop-
examples in this section have been chosen to illustrate the agation in GIS. But analysis is a difficult strategy when
substantial uncertainties that can be produced by appar- spatial autocorrelation of errors is present, and many prob-
ently innocuous data errors. lems of error propagation in GIS are not amenable to
analysis. This has led many researchers to explore a more
Error propagation measures the impacts of
general strategy of simulation to evaluate the impacts of
uncertainty in data on the results of uncertainty on results.
GIS operations. In essence, simulation requires the generation of
In general two strategies are available for evaluating a series of realizations, as defined earlier, and it is
error propagation. The examples in the previous section often called Monte Carlo simulation in reference to the
were instances in which it was possible to obtain a realizations that occur when dice are tossed or cards are
complete description of error effects based upon known dealt in various games of chance. For example, we could
measures of likely error. These enable a complete analysis simulate error in a single measurement from a DEM by
of uncertainty in slope estimation, and can be applied in generating a series of numbers with a mean equal to the
the DEM flooding example described in Section 6.3.4. measured elevation, and a standard deviation equal to the
Another example that is amenable to analysis