Seth E. Spielman
a,
*
, Eunhye Yoo
b
a
Brown University, Spatial Structures in the Social Sciences, Maxcy Hall, 112 George Street, Box 1916, Providence, RI 02912, USA
b
SUNY, Department of Geography, University at Buffalo, NY, USA
a r t i c l e i n f o
Article history:
Available online 23 January 2009
Keywords:
Neighborhood effects
Built environment
Contextual effects
Spatial analysis
Spatial autocorrelation
Change of support
Geographic information systems (GIS)
Simulation
a b s t r a c t
In the decade or so of renewed interest in neighborhood contexts and health, signicant progress has
been made conceptualizing the relationships between the urban environment and public health. Applied
research on the link between the environment and health remains limited by the way spatial concepts,
such as the neighborhood or the built environment are operationalized. In this paper we argue that
representations of these spatial concepts in statistical models should be based upon the individuals, the
place, and the problem under study. Through a series of simulation experiments we describe the
sensitivity of estimates of the association between neighborhoods and health to the operationalization of
spatial concepts. We explore the practice of conducting the same analysis at multiple scales and nd that
using model t to discover the spatial dimension is problematic. In sum, there is a gap between our
understanding of how the environment inuences health and spatial statistical modeling techniques. For
quantitative spatial inquiry into the relationship between the neighborhood environment and health to
be effective this gap must be closed.
2008 Elsevier Ltd. All rights reserved.
Introduction
In the decade or so of renewed interest in neighborhood
contexts and health signicant progress has been made concep
tualizing the relationships between the urban environment and
public health (Cummins, Curtis, DiezRoux, & Macintyre, 2007;
DiezRoux, 1998; Kawachi & Berkman, 2003). Applied research
on the link between the environment and health remains
limited by the way spatial concepts, such as the neighborhood or
the built environment are operationalized. In this paper we argue
that the representation of these spatial concepts should have an
empirical basis. Through a series of simulation experiments we
illustrate that it is difcult to accurately estimate neighborhood
effects without carefully considering the spatial dimensions of the
problem. We further argue that individual heterogeneity the spatial
dimensions of the interaction between the environment and health
poses an important problem for inference about neighborhood
effects. Again, through simulation we illustrate the impact of this
heterogeneity.
There is a large and growing body of literature on neighborhood
effects, in which neighborhoods are often conceptualized as static,
xed geographic areas (Dietz, 2002; Entwisle, 2007; Sampson,
Morenoff, & GannonRowley, 2002). It is becoming increasingly
clear that research needs to incorporate the reexive and dynamic
association between neighborhoods and health (Bernard et al.,
2007; Cummins et al., 2007; Curtis & Jones, 1998; Entwisle, 2007;
Mayer & Jencks, 1989). Cummins et al. (2007) argue that the
connections between health and place are best understood from
a relational perspective which emphasizes the reexivity of the
relationship between people and place and uses a network
heuristic to dene neighborhoods. The denition of neighborhoods
has been called the Holy Grail of urban analysis (Galster, 2001, p.
2113). Chaskin (1995) identies three ways of viewing neighbor
hoods: neighborhoods as social units, neighborhoods as spatial
units, and neighborhoods as a network of associations. In this paper
we explore the assumptions implicit in viewing neighborhood as
spatial units, the most common approach to neighborhood
contextual analysis (Dietz, 2002; Guo & Bhat, 2007; Macintyre,
Ellaway, & Cummins, 2002).
Contextual analysis examines the effects of the environment
on an outcome, The essential feature of all contextual effects
models is an allowance for macro processes that are presumed to
have an impact on the individual actor over and above the effects
of any individuallevel variables that may be operating (Blalock,
1984, p. 354). To rephrase Blalocks denition, contextual analysis
uses variables collected at multiple scales, a microscale and
a macroscale. The microscale elements are situated within the
macroscale elements (e.g. people (micro) within neighborhoods
* Corresponding author. Tel.: 1 401 863 1064.
Email address: Seth_Spielman@Brown.edu (S.E. Spielman).
Contents lists available at ScienceDirect
Social Science & Medicine
j ournal homepage: www. el sevi er. com/ l ocat e/ socsci med
02779536/$ see front matter 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.socscimed.2008.12.048
Social Science & Medicine 68 (2009) 10981105
(macro)). The spatial approach to contextual analysis depends
heavily on Geographic Information Systems (GIS), a set of tech
nologies for computer mapping and spatial analysis. In a GIS, the
environment (or context) is often dened as discrete, contiguous,
geographic zones around sample locations (such as residential
addresses) (Dietz, 2002; Guo & Bhat, 2007). These zones can be
constructed using shared common boundaries such as census
tracts or dened by distance from sampled locations (Guo & Bhat,
2007). These zones may be interpreted as representations of
neighborhoods or less frequently as areas accessible from
a residence (Dietz, 2002; Kwan, Murray, OKelly, & Tiefelsdorf,
2003). In this zonebased framework the dimensions of the zone
dene the elements of the environment hypothesized to inuence
health outcomes.
In individuallevel analysis a health outcome is known at the
individuallevel and spatial data is used to characterize an area of
some xed spatial extent around an individual. This paper explores
the implications and assumptions implicit in the design of zones to
explore the association between the environment and health at the
individuallevel. The Modiable Areal Unit Problem (MAUP) is
often present in discussions of zone design. The MAUP is described
by Openshaw and Taylor (1979) as having two components scale
and aggregation. Imagine a map of a city; one could divide that map
into 10, 20, 30, 100, or any number of subareas by drawing lines on
the map. As one increases the number of subareas the scale of
analysis is consequently deceased. Within a given scale, say the 10
zone scale, there are many, perhaps an innite number, of ways to
draw subareas. In the language of the MAUP this is known as
zonation. Within a given scale there are many possible zonations.
The MAUP says that the correlation and regression coefcients one
observes may change unpredictably as the scale and zonation
change (Fotheringham & Wong, 1991).
In the MAUP the scale problem arises because of uncertainty
about the number of zones needed for a particular study. The
aggregation [zonation] problemarises because of uncertainty about
how the data are to be aggregated to form a given number of zones
(Openshaw, 1977; p. 459). In individuallevel contextual analysis
the issues of scale and aggregation are intertwined. Contextual
analysis at the individuallevel is different the MAUP because the
scale (size) of zones is not related to the number of observations. In
individuallevel contextual analysis variables are associated with
both zones (context) and individuals. Generally, the MAUP
concerns associations between variables at the group level (Arbia,
1989). However, the literature on the MAUP should be noted as it
highlights the potentially signicant impact of decisions about the
division of space. Ultimately, the division of space is a theoretical
question (King, 1997; Openshaw, 1996), in contextual analysis the
core of this question has to do with how the environment inu
ences health.
The neighborhood in zonebased analysis of health is oper
ationalized in two steps. In the rst step the relevant environmental
features are translated into a statistic. For example, devising a way
to measure walkability, the degree to which an area accommo
dates pedestrian travel (as in Saelens, Sallis, Black, & Chen, 2003).
The second step is to determine the spatial dimension, the
geographic area over which the statistical summary is constructed.
The problemof determining the spatial dimension has received less
attention than it deserves, as Guo and Bhat (2007) note:
any study about neighborhoods is a spatial investigation. Yet,
the spatial denition of neighborhood has received very little
attention in the literature. Theoretical studies of neighborhood
effects often use the term neighborhood rather loosely on the
other hand empirical studies of neighborhood effects across
many disciplines typically used census tracts, zip code areas, or
transport analysis zones as operational surrogates for neigh
borhoods (p. 31).
Frank, Engelke, and Schmid (2004), in Obesity Relationships
with Community Design, Physical Activity, and Time Spent in Cars
offer an excellent example of context based inference about urban
environments and health. Frank et al. (2004) map a large sample of
households to individual residential addresses in Atlanta. Each
households neighborhood is dened as the area within 1 km of
their home, the 1 km buffer is based upon the street network, so
that it includes all areas within 1 km of walking or driving distance
of the persons home this 1 km network buffer is the spatial
dimension. Several statistics describing the physical characteristics
of the area circumscribed by the buffer are calculated, including net
residential density, street intersection density, and land use mix.
Individual characteristics and measures of the environment are
used to estimate the odds of obesity. The research nds that after
controlling for relevant individual characteristics as the variety of
land use within the buffer increases (land use mix) the odds of
obesity decrease.
Studies such as Frank et al. (2004) carry several important
assumptions about space the rst is that land use within 1 km of
a persons home inuences behavior. How sensitive are models
results to this assumption? What if behavior is actually shaped by
the environment at a larger or smaller scale? We will illustrate in
the following sections that the impact of this assumption depends
largely on the spatial structure of the built environment. The
second assumption is that the geographic area associated with the
behavior of a 90year old widower is the same as the one associated
with the behavior of a 45year old working mother. In other words,
that people are inuenced by the environment in the same way and
everyone has the same neighborhood (spatial dimension). While
samples are often stratied by age, race, and/or gender, the spatial
dimension of the denition of neighborhood is seldom adjusted
according to individual characteristics. Through simulation we also
study the sensitivity of models to this assumption. The simulations
reported in the following sections proceed under an assumption
that there is in fact a relationship between the environment and
health and that the research question of interest is the strength and
geographic extent of the association.
Dening the spatial dimension of neighborhood models
One of the reasons it is difcult to understand what has
happened and is happening to cities is that urban terminology is
very inaccurate (Rybczynski, 1995), this is especially true of the
termneighborhood. The concept of neighborhood causes confusion
in research on the relationship between urban contexts and health.
Most conceptualizations of neighborhoods are rooted in the Chi
cago School formulation of neighborhoods as Natural Areas that
emerge from urban process (Park & Burgess, 1925), however, there
is nothing organic about the boundaries imposed during the
research process. The neighborhood, as originally conceptualized
by the Chicago School, was not a single geographic area; the
concept included separate ecological, cultural and political
neighborhoods which may, or may not, have coincident boundaries
(Park & Burgess, 1925, p. 147).
Extending this idea the geographic zones used in contextual and
neighborhood effects models can be interpreted as effective
neighborhoods, or the geographic extent of an environmental
determinant of health. The central characteristic of effective
neighborhoods is that they are dened relative to the unit of
analysis and not using global criteria. Macintyre et al. (2002) argue
that many of the individuallevel variables typically used as
controls (such as baseline health status) may in fact be intervening
S.E. Spielman, E.hye Yoo / Social Science & Medicine 68 (2009) 10981105 1099
variables in the association between neighborhoods and health.
Curtis and Jones (1998) argue that there is theoretical and
empirical evidence that health disadvantage may be experienced
differently by socially disadvantaged individuals according to their
geographic setting (p. 667). The concept of effective neighbor
hoods incorporates these ideas and opens the possibility that at
the individuallevel there are many neighborhoods shaped by the
complex interaction between the characteristics of people, prob
lems, and places.
By contrast, Galster (2001) denes neighborhoods as bundles
of spatially dened attributes associated with clusters of resi
dences, the emphasis is on the attributes of place and not on
individuals or specic problems. The distinction between the
effective neighborhood and Galsters (2001) denition is akin to
Hartshornes distinction between formal and functional regions
(Hartshorne, 1959). Formal regions are dened by the characteris
tics of place and functional regions are dened according to
a process or function (Montello, 2003). The move away from the
concept of xed neighborhood dened by the attributes of place
shifts the problem of conceptualizing space in contextual effects
models away fromquestions about the denition of neighborhoods
toward behavioral questions about how people interact with urban
space.
This idea, that different people have different neighborhoods, is
not without problems, the principle one being the lack of theory,
methods, or data necessary to dene effective neighborhoods. In
the following simulation we will illustrate that this theoretical shift
away from the concept of xed neighborhoods is necessary, as
imposing regular boundaries on a heterogeneous sample system
atically skews estimates of the association between the environ
ment and health, even if all other individuallevel characteristics
are effectively controlled.
A formal denition of terms
Fig. 1 shows the elements of individual based models of
contextual effects. The point located at the center of Fig. 1 repre
sents an individual whose characteristics are captured by a set of
variables, X
i
u i 2; .; n, where u denotes a coordinate vector
of the location associated with that individual. The individual at
location u has a neighborhood that is dened by a distance
parameter d. The zone representing the neighborhood is denoted as
A
d
(u). This zone is the spatial dimension, it is used to construct
a measure of the environment denoted X
1
(A
d
). The health outcome
of the individual residing at location u is denoted as Y(u). In an ideal
scenario the zone used to build the measure of context, A
d
(u), is the
effective neighborhood; it contains all of the elements of the
environment that are relevant to the health outcome, Y(u), and
these elements are appropriately summarized by the contextual
variable X
1
(A
d
). Simple contextual effects models are often esti
mated using conventional statistical techniques such as multiple
regression (Morgenstern, 1995); a simple linear formulation of the
problem outlined above would be (adapted from Blalock (1984)):
Yu b
1
X
1
A
d
X
n
i 2
b
i
X
i
u (1)
The coefcient b
1
represents the strength of the relationship
between the environment and the outcome of interest. Simply
stated, this model views a health outcome Y(u) as a function of
individual characteristics X
i
u i 2; .; n and the environment
X
1
(A
d
).
This is, of course, a simplication, just as the cities are shaped by
the complex interaction of forces that operate at multiple scales
(Anas, Arnott, & Small, 1998), the association between the envi
ronment and individual health is complex and operates at multiple
scales (Northridge, Sclar, & Biswas, 2003). There are likely multiple
effective neighborhoods inuencing any given health outcome.
The issue of scalelinkage is a salient one for contextual inquiry
on health, that is, can neighborhood interactions that operate at
multiple scales but affect the same outcome be treated indepen
dently or are they inextricably linked (Phillips, 2004)? Effective
neighborhoods, such that they exist as contiguous geographic
areas, are not likely to be neat circles as depicted in Fig. 1. However,
these simplifying assumptions are a necessary abstraction of reality
and they allow us to explore the issue of conceptualization of
neighborhoods in a way that may be useful in less abstract settings.
Research questions
This paper asks, what happens to the estimate of the neigh
borhood effect
b
b
1
if the spatial dimensions of context A
d
(u) do not
match the spatial extent of the environmental determinants of the
health outcome Y(u)? This question is difcult to address in applied
settings for two reasons. First, effectively measuring all of the
relevant individuallevel variables X
i
u i 2; .; n is difcult
and is complicated by the problem of residential sorting (Oakes,
2004). Second, the effective neighborhood cannot be directly
observed and is difcult to measure which makes it hard evaluate
A
d
(u). One approach to this problem is to repeat the same analysis
with different denitions of the spatial dimension, ultimately
choosing the model with the best t. In the following sections we
demonstrate that model t in most situations fails to signal the
appropriate spatial extent of A
d
(u).
The second research question is what are the implications for
model estimation of uncontrolled individuallevel variability in the
geographic extent of the area that inuences health? That is, what if
different types of people have different effective neighborhoods and
a model uses a xed approximation of effective neighborhoods?
Method
A JAVA based simulation using the REPAST agent modeling
toolkit (North, Collier, & Vos, 2006) was developed to explore these
questions. The principal advantage of simulation is that since the
data is generated by computer the analyst can specify the strength
and spatial extent of the relationship between the environment and
the outcome. With simulated data since the strength and
geographic extent of the relationship between the environment
and health is known the sensitivity of neighborhood effect esti
mates to assumptions about the spatial dimension can be estab
lished. Simulated data, as opposed to observational data, also
allows focus on the spatial dimensions of contextual effects by
affording control of individual characteristics X
i
u i 2; .; n
and problems that arise due to the geographic sorting of pop
ulations by their characteristics.
This simulation assumes that variables associated with indi
vidual characteristics X
i
u i 2; .; n have been accounted for;
u
d
A
d
(u)
Individual Level Variables
Y(u)
X
i
(u) i=2,...,n
Contextual Measure
X
1
(A
d
)
Fig. 1. Elements of contextual models of the built environment.
S.E. Spielman, E.hye Yoo / Social Science & Medicine 68 (2009) 10981105 1100
therefore the remaining unexplained variation in the outcome is
due to the environment or random variability. This assumption
allows a simplied model of contextual effects that views a health
outcome Y(u) as a function of the environment X
1
(A
d
):
Yu b
1
X
1
A
d
e (2)
To estimate Equation (2) it is necessary to have spatial data
describing the environment. In applied settings this data would
typically be detailed electronic maps of a city compiled in a GIS.
The heterogeneity of cities makes it difcult to assess the
sensitivity of neighborhood effect estimates to the spatial dimen
sion. If a city has a dense patchwork of different types of neigh
borhoods the scale of the spatial dimension in comparison to the
size of the patches is important to consider. If on the other hand,
neighborhoods tend to be similar and vary little in their physical
characteristics over space effect estimates are less sensitive to scale.
In reality cities tend to be a mix of patchworky older neighborhoods
and more uniform suburban development. To overcome this
problem we construct synthetic urban environments with
a homogeneous spatial structure. Controlling spatial structure
allows exploration of how it inuences estimates of neighborhood
effects.
The spatial structure of the environment can be described by the
concept of spatial autocorrelation. Spatial autocorrelation is
a measure of spatial selfsimilarity, the correlation of a variable
with itself in space (Rogerson, 2001). One might experience posi
tive spatial autocorrelation as a drive through Levittown, or down
a street where houses and housing lots were about the same size.
This conguration would result in little variation of population
density in space and hence a high degree spatial selfsimilarity.
Similarly negative spatial autocorrelation might be seenwhere very
high density subdivisions were surrounded by low density rural
landscapes. A pattern of random values in space is said to have no
spatial autocorrelation.
Spatial autocorrelation is also described in terms of its range, the
distance at which selfsimilarity stops decreasing. A covariance plot
(Fig. 2) can be used to visualize the relationship between self
correlation and distance. A covariance plot shows the covariance of
pairs of observations separated by a distance d, distance is on the
horizontal axis and covariance is on the vertical axis. The range is
the point at which the curves in Fig. 5 atten out. If census tracts
between 1 and 5 km apart tend to be similar but those that are
separated by greater distances are not the range of the autocorre
lation would be 5 km. Understanding spatial structure in studies of
neighborhood effects is critical because it impacts the sensitivity of
neighborhood effect estimates to the spatial dimension A
d
(u).
The synthetic maps are generated as grids, called rasters, where
grid cells are a uniformsize and each grid cell contains a value the
pattern of values on the grid are controlled by the magnitude and
range of the spatial autocorrelation. On a raster with positive spatial
autocorrelation and a range of 5 cells a sample of pairs of grid cells
separated by 3 units of distance will tend to be correlated but
a sample of cells separated by 10 units of distance will not.
Synthetic rasters were used to generate the health outcome Y(u)
and the measure of neighborhood context X
1
(A
d
).
The health outcome for the individual at location j, Y(u
j
), is
a weighted combination of a measure of the environment X(A
d
(u
j
))
and a random variable Z:
Y
u
j
uX
A
d
u
j
1 uZ (3)
As the user dened weight u increases (0u1), the strength of
the relationship between the environment and the outcome
increases.
The effective neighborhood is set at 5 raster grid cells. The
health outcome for the person residing at location j includes the
average of all cells falling within a 5 unit radius around the indi
viduals location.
In practice the effective neighborhood would be difcult to
observe and hence unknown. Would, for example, an analysis of
human movement patterns (as in Kwan, 1999) shed light on the
spatial extent of the association between the environment and
health? Gonza lez, Hidalgo, and Baraba si (2008) found striking
patterns movement through analysis of human mobility as
revealed by the tracking of cell phones. An important theoretical
and empirical question is, whether and to what degree does indi
vidual movement serve as a proxy for the effective neighborhood?
That is, can you see the environmental inuences on health by
studying how people move through space?
The measure of the context around the person at location u
j
,
X
1
(A
d
(u
j
)), is the average of the raster within a specied distance of
location u
j
. Unlike Y(u
j
) which always has the same spatial
dimension X
1
(A
d
(u
j
)) takes a variety of spatial dimensions. For each
realization of X
1
(A
d
(u
j
)) Equation (2) was estimated, within a given
regression the spatial dimension of X
1
(A
d
(u
j
)) did not change.
In the rst run of the simulation 8000 points were placed on
a map, the outcome Y was constructed using a buffer distance of 5
and a weight of 1 so that Y was the average of the raster within 5
units. The independent variable was constructed using a distance of
1 meaning that the independent variable represents the average of
the raster within 1 unit of distance. Equation (2)was solved,
providing an estimate of the neighborhood effect
b
b
1
. Then 8000
points are randomly redrawn, and the process repeats, a total of
500 repetitions for each distance band resulting in many estimates
of
b
b
1
for each realization of the spatial dimension of X
1
(A
d
(u
j
)).
The simulation was repeated using a variety of raster maps as
inputs. Six maps had different degrees of autocorrelation but the
same range. Four maps had different ranges but were otherwise
similar (structure shown in Fig. 2). The simulations were run on
a raster with no autocorrelation. Repeating the simulation analysis
outlined above with different types of maps allows us to under
stand how sensitive neighborhood effects estimates are to the
spatial dimension of the measure of context under various envi
ronmental conditions.
In the simplest case the weight u representing the strength of
the relationship between the environment and the outcome was
set at 1. In this scenario, when the neighborhood dening the
independent variable is the same as the neighborhood dening the
dependent variable Yu X
1
A
d
and the estimated neighbor
hood effect,
b
b
1
equals 1. As u decreases so too will the association
between Y(u) and X
1
(A
d
), the contextual effect estimate is correct
0 5 10 15 20 25 30 35 40 45 50
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
range 1
range 5
range 10
range 15
range 20
C
o
v
a
r
i
a
n
c
e
C
(
d
)
Distance (d)
Fig. 2. Spatial covariance functions with different ranges.
S.E. Spielman, E.hye Yoo / Social Science & Medicine 68 (2009) 10981105 1101
when
b
b
1
u. Since u represents the strength of the association
between the environment and the outcome u can be interpreted as
the true contextual effect and
b
b
1
is the estimated effect.
Results
Fig. 3 shows the results of the simulation when u1 (indicated
by the dashed horizontal line) and the range of the autocorrelation
is 10 units. The effective neighborhood that determines the health
outcome Y is 5 units, indicated by dashed vertical line in Fig. 3. The
spatial dimension of the contextual measure X
1
(A
d
) is shown on the
horizontal axis of Fig. 3. The reference lines in Fig. 3 show that
when the spatial dimension of the measure of the environment is
the same size as the effective neighborhood
b
b
1
u, that is when
the representation of the effective neighborhood is correct the
estimates of the contextual effect are correct.
Fig. 3 also shows the results obtained when the effective
neighborhood is not modeled correctly and the areas inuencing
the health outcome and dening the measure of context are
different sizes. When there is positive autocorrelation we nd
a curve of the general form of Fig. 3. When the size of the effective
neighborhood is underestimated, that is when the zone dening
the outcome is larger than the one for the measure of the envi
ronment the effect of the environment,
b
b
1
, is overestimated. When
the spatial dimension of X
1
(A
d
) is too large the opposite holds true
and the effect is underestimated until the spatial dimension
exceeds the range of the autocorrelation.
The range of the autocorrelation has an impact on results.
Running the same simulation on maps with the same covariance
function but different ranges leads to different estimates of the
relationship between the environment and the health outcome,
except when the spatial dimension matches the effective neigh
borhood. This means that the same relationship between the
environment and health when studied using the same technique, in
different places, can result in different neighborhood effect esti
mates. For example, say one had a sample of 500 people in two
different cities, the people are similar and they all have the same
effective neighborhood but the cities have a different characteristic
land use patterns. If in each city the study uses an identical but
incorrect estimate of the effective neighborhood, it is completely
plausible that estimates of the contextual effect for each city would
be different, when in fact the true effect was the same.
This phenomenon and the shape of the curve in Fig. 3 can be
explained by reference to the change of support problem Cressie
(1996). The estimate of the contextual effect
b
b
1
is obtained by
r
xY
(s
Y
/s
X
) where r
xY
is the correlation coefcient between the
measure of the environment X
1
(A
d
) and the health outcome
Y(u). The correlation between the measure of the environment
and the health outcome is related to their covariance as
r
XY
1s
X
s
Y
covX
1
A
d
; Yu. Since both the dependent and
independent variables are associated with the raster R(u) the
covariance of X
1
(A
d
) and Y(u) can be derived by Cressie (1996):
covX
1
A
d
; Yu
Z
uA
d
Z
u
0
A
d
0
C
R
u; u
0
duu
0
=
A
d
A
d
0
(4)
Where X
1
(A
d
) and Y(u) are integral of the raster values over
different supports A
d
and A
d
0 , respectively. Considering that the
input raster for dependent and independent variable is a realization
of a stationary random function, which is characterized by
a constant mean and a covariance function C
R
(u,u
0
), the covariance
between dependent and independent variable can be derived from
the covariance of the raster, denoted as C
R
(u,u
0
).
The ratio of the variance of the dependent and independent
variable summarizes the relative differences in the spatial dimen
sions of each variable. In the case where a relatively smaller support
is employed to measure the health outcome compared to that of the
environment, that is the spatial dimension of A
d
0 < A
d
, the variance
of the dependent variable is larger than the independent variable,
s
2
Y
> s
2
X
, and (s
Y
/s
X
) >1. When the situation is reversed A
d
0 > A
d
the
ratio (s
Y
/s
X
) is greater than one. This relationship shapes the curve
in Fig. 3.
In general smoother maps, those with large ranges, were less
sensitive to changes in the spatial dimension than maps with more
local variation. When the range of the autocorrelation in the
environment is greater than the spatial dimension of the contextual
measure, increasing the size of the spatial dimension changes the
estimate of the neighborhood effect
b
b
1
(Fig. 3). When the spatial
dimension exceeds the range of the spatial autocorrelation the
variability of neighborhood effect estimates increases (Fig. 3). This
means that heterogeneous urban environments are particularly
sensitive to the spatial dimension of contextual effects models.
In practice, the spatial extent of the interaction between the
environment and health would be unknown. A researcher with
models with varying spatial dimensions might use model t
statistics to guide selection. Fig. 4 shows little difference in the
overall t of the model across a wide range of neighborhood sizes.
This holds true generally and has serious research implications;
using the tness of a regression model to determine the spatial
extent of environmental inuences is not effective. If one looks very
closely at model t, one nds on average it is best when the correct
buffer size is used however the differences are small and this only
occurs in the extreme case (u1) or when the background raster is
random. When the raster is random there is greater variability in
the neighborhoods and this variability translates into more reliable
model t statistics. In less controlled situations, experimentto
experiment variability in model t gives it minimal utility as
a selection criterion for the spatial dimension. This nding supports
the view that theoretical not technical criteria should be used to
evaluate different conceptualizations of geographic context.
Individual heterogeneity
Understanding the structure of the physical environment is
important in assessing the relationship between the environment
and health. However, it is equally, if not more important to consider
the people being studied. What if different levels of mobility, access
S
c
a
l
e
o
f
C
o
n
t
e
x
t
u
a
l
E
f
f
e
c
t
= 1
1 2 4 5 6 8 10 12 14 15 16 18 20 22
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
Spatial Dimension of the Independent
Variable (d)
M
e
a
n
o
f
E
s
t
i
m
a
t
e
d
C
o
n
t
e
x
t
u
a
l
E
f
f
e
c
t
Mean2 Standard Deviations
Fig. 3. Mean of
b
b
1
grouped by the spatial dimension of the independent variable. The
weight u1 and the effective neighborhood 5 (u and the effective neighborhood
indicated by reference lines). The range of the spatial autocorrelation is 10.
S.E. Spielman, E.hye Yoo / Social Science & Medicine 68 (2009) 10981105 1102
to technology, social networks, or different stages in the life course
change the way a person interacts with their environment? Zone
based conceptualizations of context represent the spatial extent of
environmental inuences on health. There are theoretical and
empirical reasons to believe that the spatial extent of this rela
tionship might vary from person to person (Macintyre et al., 2002).
Yet, in the analysis of the relationship between the environment
and behavior individual heterogeneity is seldom considered in the
design of buffers/neighborhoods. For example, lower mobility
might limit the geographic extent of environmental inuence on
health (i.e. the effective neighborhood) whereas higher mobility
might increase it.
Individuallevel heterogeneity is introduced through a modi
cation to the initial simulation model. We change the variable
dening the effective neighborhood to a randomvariable where the
mean and variance are known. The range of this randomvariable is
constrained to positive integer values and in the simulations
reported here the mean and standard deviation are xed at 5 units.
This simulates a situation where one does not know the exact
effective neighborhood for each individual, but makes an educated
guess at the average. With the addition of individual heterogeneity
in each run of the simulation the average size of the neighborhood
used in the construction of the health outcome Y(u) is 5 but the
range of possible values is bounded by the constraints on the
random variable. This simulates individuallevel variability in the
effective neighborhood.
Introducing this uncontrolled individuallevel variability doesnt
changethegeneral shapeof thecurveshowninFig. 3, it simplypushes
it down. If one fails to account for individuallevel variability there is
a greater tendency to underestimate the effect of the environment
(Fig. 5). This nding, of underestimation, may be in part due to the
constraints placed on the random variable, which created a few
observations with a very large neighborhood size and these outliers
may exert downward pressure on the effect estimates for each level.
Individual variability in effective neighborhood muddies the
water further, Fig. 5 shows signicant variability around the hori
zontal line indicating the true contextual effect. If a single study is
viewed as one realization within this range of results, the impli
cations for researchers is that the zonebased methods traditionally
used to explore the relationship between the neighborhood
contexts and health are difcult to interpret. Model t is not
a reliable indicator of the spatial extent of the relationship (Fig. 6).
Figs. 5 and 6 together show the signicant impact of a violation of
the assumption of uniform neighborhoods.
The conceptualization of neighborhoods has an important
impact on estimates of the association between place and health.
Without specic a priori knowledge of the spatial extent of envi
ronmental inuences on health, and an understanding of how
these inuences vary from person to person, it is possible to nd
a wide variety of neighborhood effects.
Discussion
In models of contextual effects, the environment is often
operationalized using buffers, census tracts, or other arbitrary areal
units because they are conceptually accessible and widely available
(Dietz, 2002). In order to accurately estimate the strength of the
relationship between the environment and behavior one needs to
understand the spatial extent of the relationship (i.e. the effective
neighborhood). Studies that build measures of contexts using
spatial dimensions that lack empirical or theoretical justication
run the risk of spurious results and may signicantly under or
overstate the relationship of interest.
We also nd that this problem cannot be overcome using model
t statistics to discover the appropriate spatial extent. These
results suggest that some important questions about how people
1 2 4 5 6 8 10 12 14 15 16 18 20 22
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
Mean2 Standard Deviations
M
e
a
n
o
f
E
s
t
i
m
a
t
e
d
C
o
n
t
e
x
t
u
a
l
E
f
f
e
c
t
A
v
e
r
a
g
e
S
c
a
l
e
o
f
C
o
n
t
e
x
t
u
a
l
E
f
f
e
c
t = 1
Spatial Dimension of the Independent
Variable (d)
Fig. 5. Mean of
b
b
1
with heterogeneity in individual effective neighborhood. Grouped
by the spatial dimension of the independent variable (d) where u 1 and the average
size of the effective neighborhood 5 (indicated by reference lines). The range of the
spatial autocorrelation is 10.
1 2 4 5 6 8 10 12 14 15 16 18 20 22
0.78
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.00
1.02
A
v
e
r
a
g
e
S
c
a
l
e
o
f
C
o
n
t
e
x
t
u
a
l
E
f
f
e
c
t
Mean, MinMax
R

S
q
u
a
r
e
=1
Spatial Dimension of the Independent
Variable (d)
Fig. 6. Mean of rsquare with individuallevel heterogeneity in the effective neigh
borhood (average size of the effective neighborhood indicated by reference line). The
weight u1 and the range of the spatial autocorrelation is 10.
1 2 4 5 6 8 10 12 14 15 16 18 20 22
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Mean2 Standard Deviations
= 1
= 0.75
R

S
q
u
a
r
e
S
c
a
l
e
o
f
C
o
n
t
e
x
t
u
a
l
E
f
f
e
c
t
Spatial Dimension of the Independent
Variable (d)
Fig. 4. Mean of rsquare grouped by the spatial dimension of the independent variable
when u1 and u 0.75 (effective neighborhood indicated by reference line). The
range of the spatial autocorrelation is 10.
S.E. Spielman, E.hye Yoo / Social Science & Medicine 68 (2009) 10981105 1103
interact with the environment need to be examined before
neighborhood based analytical techniques can be effectively
employed in the disaggregate analysis of health. By reconceptu
alizing the spatial dimension and moving away from the notion of
neighborhood, toward a concept that is more rooted in behavior,
we believe the necessary empirical base can be developed.
The failure to consider the spatial frames used in analysis is
troubling because the frame inuences the observed picture. Most
spatial analysis is frame dependent but unfortunately in the social
sciences there has been little reection on the appropriate way to
frame spatial analyses (Openshaw, 1996). Research on the link
between the urban environment and behavior will continue to
bump into this major theoretical limitation. The emphasis on theory
is important because at this point the principal problem is in
conceptualization of the relationship between the environment
and behavior and not its representation.
It is not clear if the complex relationships between people, place,
and health can be accommodated simply by adjusting the size and/
or shape of a zone. In quantitative spatial models the environ
ment implies some areal extent, the boundaries of which need to
be sensitive to the people, the problem, and the place under
investigation but how?
A logical extension of the critique of framebased approaches to
spatial analysis is that research on the environment and health
should be frame free (Tobler, 1990). This is an intriguing notion,
and frame free methods have been explored (Moellering & Tobler,
1972). The simulation reported here suggests that a framebased
approach to environmental contextual analysis can work if the
spatial dimensions of the relationship can be accurately modeled. If
youre going to spend time and money painting a picture of the
relationship between the environment and health invest in the
frame unless the frame is well designed the painting isnt going to
be very good.
The relational perspective described by Cummins et al. (2007)
may provide a means to overcome some of the limitations of zone
based contextual analysis. The relational view lends itself to agent
based and computational approaches to modeling, techniques that
are free from some of the limitations of zonebased analysis and
classical statistics (OSullivan, 2008). Unfortunately, effective
implementation of the relational viewwill require answers to many
of the same questions that plague zonebased analysis, questions
about the elements of the environment that inuence health and
the variability of these elements by person and place.
Conclusions
Our central nding is that, in disaggregate contextual effect
models, if the spatial dimension of the independent variable(s)
does not match the areal extent of the environmental inuences on
the outcome regression results are inaccurate. The direction and
magnitude of the error depends on the structure of the environ
ment, the strength of the association between the environment and
the outcome, and the size of neighborhood used to construct the
independent variable relative to the true context.
Individual variability in the areal extent of environmental
inuences has two effects. First, it has a tendency to cause under
estimation of the association between the environment and the
health outcome. However, this tendency for underestimation may
be due to our assumptions about the range of interindividual
differences in the effective neighborhood. Further research is
necessary to understand the actual variability in the spatial extent
of the relationship between the environment and various health
outcomes. The second effect of individuallevel variability is
increasing the range of estimated regression coefcients and model
t statistics for a given scenario. The implications of this nding are
that if left uncontrolled, individuallevel variability can seriously
inuence research ndings.
Finally, we nd that the practice of conducting the same analysis
at multiple scales and choosing the model with the best t is
problematic. Model t, in our simulations, was not a reliable
predictor of the areal extent of the environmental inuences on an
outcome. This suggests the need for further research to aid in the
design of zones for studies of neighborhood effects. Perhaps the
answer to this question lies in a return to the intensive observa
tional style of William Whyte? Perhaps the answer lies in
increasing emphasis on the relational perspective? Emerging
technologies like cell phone tracking and global positioning
systems may help?
There is a gap between our understanding of human behavior
and spatial statistical modeling techniques. The microscales
currently explored in the literature in public health, urban design,
and planning raise important questions about the partitioning of
urban space. While geographic disparities in health and other social
outcomes are clear, the relative contribution of the neighborhood
environment remains largely unexplained. To improve our under
standing of neighborhood effects questions about spatial dimen
sions of neighborhoods must be addressed.
Moving from the academic arena to the world of applied urban
health policy, this work suggests that much of the large body of
research on the association between neighborhood design and
health outcomes like obesity and physical activity needs to be
treated carefully. Given the problems with neighborhood based
disaggregate spatial analysis more emphasis should be placed on
largerscale comparative studies such as the work of Pucher
(Pucher, 1988; Pucher & Dijkstra, 2003) which makes a strong case
for largescale environmental interventions as tools to reduce
automobile use and increase physical activity.
Can neighborhood scale urban design contribute to reductions
in health disparities and/or improvements in population health and
well being? In spite of over a decade of research the answer to that
question is as vague as ever. It is our hope that by highlighting some
of the problems associated with working with disaggregate data at
the neighborhood scale progress can be made toward addressing
the environmental drivers of health disparities and chronic disease.
References
Anas, A., Arnott, R., & Small, K. (1998). Urban spatial structure. Journal of Economic
Literature, 36, 14261464.
Arbia, G. (1989). Spatial data conguration in statistical analysis of regional economic
and related problems. Kluwer Academic.
Bernard, P., Charafeddine, R., Frohlich, K. L., Daniel, M., Kestensa, Y., & Potvina, L.
(2007). Health inequalities and place: a theoretical conception of neighbour
hood. Social Science & Medicine, 65.
Blalock, H. M. (1984). Contextualeffects models: theoretical and methodological
issues. Annual Review of Sociology, 10(1), 353372.
Chaskin, R. J. (1995). Dening neighborhood: History, theory and practice. Chapin Hall
Center for Children at the University of Chicago.
Cressie, N. (1996). Change of support and the modiable areal unit problem.
Geographical Systems, 3, 159180.
Galster, G. (2001). On the nature of neighbourhood. Urban Studies, 38(12),
21112124.
Cummins, S., Curtis, S., DiezRoux, A., & Macintyre, S. (2007). Understanding and
representing place in health research: a relational approach. Social Science &
Medicine, 65, 18251838.
Curtis, S. J., & Jones, I. R. (1998). Is there a place for geography in the analysis of
health inequality. Sociology of Health and Illness, 20, 645672.
Dietz, R. D. (2002). The estimation of neighborhood effects in the social sciences: an
interdisciplinary approach. Social Science Research, 31, 539575.
DiezRoux, A. (1998). Bringing context back into epidemiology: variables and
fallacies in multilevel analysis. American Journal Of Public Health, 88, 216222.
Entwisle, B. (2007). Putting people into place. Demography, 44(4), 687703.
Fotheringham, A. S., & Wong, D. (1991). The modiable areal unit problem in
multivariate statistical analysis. Environment and Planning A, 23, 10251044.
Frank, L. D., Engelke, P., & Schmid, T. L. (2004). Obesity relationships with
community design, physical activity, and time spent in cars. American Journal of
Preventive Medicine, 27(2), 8796.
S.E. Spielman, E.hye Yoo / Social Science & Medicine 68 (2009) 10981105 1104
Gonza lez, M. C., Hidalgo, C. A., & Baraba si, A.L. (2008). Understanding individual
human mobility patterns. Nature, 453, 779782.
Guo, J., & Bhat, C. (2007). Operationalizing the concept of neighborhood: application
to residential location choice analysis. Journal of Transport Geography, 15, 3145.
Hartshorne, R. (1959). Perspectives on the nature of geography. Rand McNally.
Kawachi, I., & Berkman, L. (Eds.). (2003). Neighborhoods and health. Cambridge:
Oxford University Press.
King, G. (1997). A solution to the ecological inference problem. Princeton: Princeton
University Press.
Kwan, M.P. (1999). Gender, the homework link, and spacetime patterns of non
employment activities. Economic Geography, 75(4), 370394.
Kwan, M.P., Murray, A. T., OKelly, M. E., & Tiefelsdorf, M. (2003). Recent advances in
accessibility research: representation, methodology and applications. Journal of
Geographical Systems, 5, 129138.
Macintyre, S., Ellaway, A., & Cummins, S. (2002). Place effects on health: howcan we
conceptualise, operationalise and measure them? Social Science & Medicine, 55,
125139.
Mayer, S. E., & Jencks, C. (1989). Growing up in poor neighborhoods: how much
does it matter? Science, 243(4897), 14411445.
Moellering, H., & Tobler, W. (1972). Geographical variances. Geographical Analysis,
4(1), 3450.
Montello, D. R. (2003). Regions in geography: process and content. In M. Duckham,
M. F. Goodchild, & M. F. Worboys (Eds.), Foundations in geographic information
science (pp. 173189). London: Taylor and Francis.
Morgenstern, H. (1995). Ecologic studies in epidemiologyconcepts, principles, and
methods. Annual Review Of Public Health, 16, 6181.
North, M. J., Collier, N. T., & Vos, J. R. (2006). Experiences creating three imple
mentations of the repast agent modeling toolkit. ACM Transactions on Modeling
and Computer Simulation, 16(1), 125.
Northridge, M. E., Sclar, E. D., & Biswas, P. (2003). Sorting out the connections
between the built environment and health: a conceptual framework for navi
gating pathways and planning healthy cities. Journal Of Urban HealthBulletin Of
The New York Academy Of Medicine, 80(4), 556568.
Oakes, J. M. (2004). The (mis)estimation of neighborhood effects: causal inference
for a practicable social epidemiology. Social Science & Medicine, 58, 19291952.
Openshaw, S. (1977). A geographical solution to scale and aggregation problems in
regionbuilding, partitioning and spatial modelling. Transactions of the Institute
of British Geographers, 2(4), 459472.
Openshaw, S. (1996). Spatial analysis: Modelling in a GIS environment. Geo
Information International. chapter Developing GISrelevant zonebased spatial
analysis methods, pp. 5574.
Openshaw, S., & Taylor, P. (1979). A million or so correlated coefcients. In
N. Wrigley (Ed.), Statistical applications in the spatial sciences (pp. 127144).
London: Pion.
OSullivan, D. (2008). Geographical information science: agentbased models.
Progress in Human Geography, 32(4), 541550.
Park, R., & Burgess, E. (1925). The city. Chicago: University of Chicago Press.
Phillips, J. (2004). Scale and geographic inquiry. Blackwell. chapter Independence,
Contingency, and Sale Linkage in Physical Geography, pp. 86101.
Pucher, J. (1988). Urban travel behavior as the outcome of public policy: the
example of modalsplit in western Europe and north America. Journal of the
American Planning Association, 54(4), 509520.
Pucher, J., & Dijkstra, L. (2003). Promoting safe walking and cycling to improve
public health: lessons from the Netherlands and Germany. American Journal of
Public Health, 93(9), 15091516.
Rogerson, P. A. (2001). Statistical methods for geography. London: Sage.
Rybczynski, W. (1995). City life: Urban expectations in a new world. New York:
Scribner.
Saelens, B. E., Sallis, J. F., Black, J. B., & Chen, D. (2003). Neighborhoodbased
differences in physical activity: an environment scale evaluation. American
Journal of Public Health, 93(9), 15521558.
Sampson, R. J., Morenoff, J. D., & GannonRowley, T. (2002). Assessing neighbor
hood effects: social processes and new directions in research. Annual Review of
Sociology, 28(1), 443478.
Tobler, W. (1990). Frame independent spatial analysis. In M. Goodchild, & S. Gopal
(Eds.), Accuracy of spatial databases. London: Taylor and Francis.
S.E. Spielman, E.hye Yoo / Social Science & Medicine 68 (2009) 10981105 1105