Chapter 3
Aníbal Olarte Valbuena
civil engineer
Specialization in roads and transportation
3. Data and Space
3.1 Basic Sampling Theory
3.1.1 Statistical Considerations
3.1.2 Conceptualisation of the Sampling Problem
3.1.3 Practical Considerations in Sampling
3.2 Errors in Modelling and Forecasting
3.2.1 Different Types of Error
3.2.2 The Model Complexity/Data Accuracy Trade-off
3.3 Basic Data-Collection Methods
3.3.1 Practical Considerations
3.3.2 Types of Surveys
3.3.3 Survey Data Correction, Expansion and Validation
3.3.5 Travel Time Surveys
3.4 Stated Preference Surveys
3.5 Network and Zoning Systems
3.5.1 Zoning Design
3.5.2 Network Representation
Data and Space
This chapter is devoted to issues in data collection and their
representation for use in transport modelling.
They considered 5 subjects which are a prerequisite for
other subjects treated in the rest of the book.
n
So, two difficulties exist
n n n
1. How to ensure a representative sample
2. How to extract valid conclusions from a sample satisfying the
above condition.
Basic Definitions
• Sample
• Population of Interest
• Sampling Method: Simple random
sampling, Stratified random
sampling and choice-based
sampling.
• Sampling Error and Sampling Bias
• Sample Size Imagen tomada de:
https://nces.ed.gov/blogs/nces/image.axd?picture=%2f2016%2f04%
2fSampleSurvey.jpg
Sample Size to Estimate Population Parameters
“pseudo-random processes”
Conceptualisation of the Sampling Problem
The final objective of taking the sample is to calibrate a choice
model for the whole population. Following Lerman and Manski
(1976) we will denote by P and f population and sample
characteristics respectively.
Sampled observation may be described on the basis of the
following two variables:
i = observed choice
X = attributes
The joint distribution of i and X is given by:
P(i,X/𝜃)
The probability of choosing alternative i among a set of options
with attributes X is:
P(i,/X,𝜃)
On the basis of this notation the sampling problem may be formalised as
follows (Lerman and Manski 1979).
Random Sample:
In this case the distribution of i and X in the sample and population should
be identical, that is:
f (i, X/ 𝜃) = P(i, X/ 𝜃)
Stratified or Exogenous Sample
In this case the sample is not random with respect to certain independent
variables of the choice model
f(i, X / 𝜃) = f(X)P(i, X / 𝜃)
Choice-based Sample
In this case the sampling procedure is defined by a function f (i), giving the
probability of finding an observation that chooses option i (i.e. it is
stratified according to the choice). Now the distribution of i and X in the
sample is given by:
f (i,X / θ) = f (i) P(i/ X,θ) P(X) / ΣP(i/ X, θ)
Practical Considerations in Sampling
The Implementation Problem: Stratified (and choice-based) sampling
requires random sampling inside each stratum:
- Isolate the relevant group
- An additional problem is that in certain cases even if it is possible to
isolate all subpopulations and conforming strata, it may still be
difficult to ensure a random sample inside each stratum.
Finding the Size of Each Subpopulation
1. Direct measurement. The ‘failure rate’ of different types
of surveys must be considered
2. Estimation from a random sample. when designing sampling
frameworks.
3. Solution of a system of simultaneous equations.
Errors in Modelling and Forecasting
The statistical procedures normally used in (travel demand)
modelling assume that:
- Not only that the correct functional specification of the model
is known a priori, but also that the data used to estimate the
model parameters have no errors.
Home Interview or Household Travel Surveys are the most expensive and
difficult type of survey but offer a rich and useful data set.
An interesting method, particularly suitable for corridor-based journey-to-work
studies and which has proved very efficient in practice, is the use of workplace
interviews.
Frequent criticisms about household or workplace travel surveys have
included:
• Thesurveys only measured average rather than actual travel behaviour of
individuals
• Only part of the individual’s movements could be investigated
• Level-of-service information (for example about travel times) is poorly
estimated by the respondent.
• An Ongoing Data Collection Process
• Periodic Update of Matrices and Models
• Implications for Data Collection
Questionnaire Format and Design: Since one of the aims of a survey is to
achieve the highest possible response rate to minimise non-response bias, it is
recommended that mixed methods (i.e. based on self-completion and personal
interviews) are used to collect the data (Goldenberg 1996).
• Not telephone-based surveys
• The questions should be simple and direct.
• The number of open questions should be minimised.
• Travel information must include the purpose of the trip.
• Seek information about all modes of travel, ncluding non-motorised travel.
• All people in the household should be included in the survey
• Finally, all data should be collected at the maximum level of disaggregation (x-y co-ordinate level)
based on a geographical information system
Sample Size
Travel surveys are always based on some type of sampling y any sample may
become too large if the level of accuracy required is too strict.
The sample size (n) may be computed using the following formula (M.E. Smith
1979):
Correction and weighting are essential in any travel survey (Stopher and Jones
2003); the following sections discuss an approach deemed appropriate for the
contemporary surveys described above, which are conducted over a period of
several years.
Data Correction:
• Corrections by Household Size and Socio-Demographic Characteristics:
To make corrections that guarantee that the household size, age and sex, housing type
and vehicle ownership distributions of the sampled data represent that in the population
(based on Census data), an iterative approach is needed, since more simplistic methods
do not guarantee correct results (see the discussion by Deville et al. 1993).
Additional Corrections in Household Surveys: In addition to the corrections by
household size, vehicle ownership and socio demographics, there are two
other correction procedures necessary.
• Corrections for non-reported data
• Corrections for non-response
Validation of Results:
1. Considers on site checks of the completeness and coherence of the data
2. A computational check of valid ranges for most variables and in general of
the internal consistency of the data.
3. Validation is done within the survey data itself and not with secondary data
such as traffic counts at screen lines and cordons in the study area.
Stated Preference Surveys
Introduction
The previous discussion has been conducted under the implicit assumption that
any choice data corresponded to revealed preference (RP) information; this
means data about actual or observed choices made by individuals.
In terms of understanding travel behaviour, RP data have limitations:
• Observations of actual choices may not provide sufficient variability for constructing
good models for evaluation and forecasting.
• Observed behaviour may be dominated by a few factors making it difficult to detect the
relative importance of other variables.
• The difficulties in collecting responses for policies which are entirely new, for example a
completely new mode (perhaps a people mover) or cost-recovery system (e.g. electronic
road pricing).
Network and Zoning Systems
Zoning Design
A zoning system is used to aggregate the individual households and premises into
manageable chunks for modelling purposes.