Anda di halaman 1dari 34

MODELLING

TRANSPORT Juan de Dios Ortúzar | Luis G. Willumsen


4th edition – 2011

Chapter 3
Aníbal Olarte Valbuena
civil engineer
Specialization in roads and transportation
3. Data and Space
3.1 Basic Sampling Theory
3.1.1 Statistical Considerations
3.1.2 Conceptualisation of the Sampling Problem
3.1.3 Practical Considerations in Sampling
3.2 Errors in Modelling and Forecasting
3.2.1 Different Types of Error
3.2.2 The Model Complexity/Data Accuracy Trade-off
3.3 Basic Data-Collection Methods
3.3.1 Practical Considerations
3.3.2 Types of Surveys
3.3.3 Survey Data Correction, Expansion and Validation
3.3.5 Travel Time Surveys
3.4 Stated Preference Surveys
3.5 Network and Zoning Systems
3.5.1 Zoning Design
3.5.2 Network Representation
Data and Space
This chapter is devoted to issues in data collection and their
representation for use in transport modelling.
They considered 5 subjects which are a prerequisite for
other subjects treated in the rest of the book.

Imagen tomada de: http://www.neuroeconomix.com/wp-content/uploads/2018/03/DataMining.jpg


Basic Sampling Theory
Statistical Considerations

Statistics is the science concerned with gathering, analysing and


interpreting data in order to obtain the maximum quantity of
useful information.

A sample of observations taken


from a certain population of
interest which is not economically
(or perhaps even technically)
feasible to observe in its entirety.

Imagen tomada de: https://www.shutterstock.com/es/search/estadistico


Sample design aims at ensuring that the data to be examined
provide the greatest amount of useful information about the
population of interest at the lowest possible cost; the problem
remains of how to use the data, in order to make correct
inferences about this population.

n
So, two difficulties exist
n n n
1. How to ensure a representative sample
2. How to extract valid conclusions from a sample satisfying the
above condition.
Basic Definitions

• Sample
• Population of Interest
• Sampling Method: Simple random
sampling, Stratified random
sampling and choice-based
sampling.
• Sampling Error and Sampling Bias
• Sample Size Imagen tomada de:
https://nces.ed.gov/blogs/nces/image.axd?picture=%2f2016%2f04%
2fSampleSurvey.jpg
Sample Size to Estimate Population Parameters

This depends on three main factors:


1. Variability of the parameters (s²)
most importants
2. Degree of accuracy (n)
3. Population size (N)

The Central Limit Theorem:


The estimates of the mean from a sample tend to become
distributed Normal as the sample size (n) increases. This holds for
any population distribution if n is greater than or equal to 30; the
theorem holds even in the case of smaller samples, if the original
population has a Normal-like distribution.
And, as mentioned above, it is a function of three factors: the parameter
variability (s²), the sample size (n) and the size of the population (N).
However, for large populations and small sample sizes (the most frequent
case) the factor (N − n)/N is very close to 1, the standard error of the
mean is:
𝑆
Se (𝑥)=
ҧ (3.3)
𝑛
The required sample size may be estimated solving equation (3.2) for n
and this is usually simpler to do in two stages, first calculating n from
equation (3.3) such that:
𝑆²
𝑛′ = ҧ
(3.4)
𝑠𝑒 (𝑥)²
and then correcting for finite population size, if necessary, by:
𝑛′
𝑛= 𝑛′ (3.5)
1+ 𝑁
The desired degree of confidence to be associated with the
use of the sample mean as an estimate of the population
mean. To calculate an acceptable standard error:
1. A confidence level for the interval must be chosen; the
typical 95% level implies an acceptance to err in 5% of
cases.
2. It is necessary to specify the limits of the confidence
interval around the mean, either in absolute or relative
terms.
Obtaining the Sample
The last stage of the sampling process is the
extraction of the sample itself.
- In some cases the procedure may be easily
automated, either on site or at the office.
- it must always be conducted with reference to
a random process.

“pseudo-random processes”
Conceptualisation of the Sampling Problem
The final objective of taking the sample is to calibrate a choice
model for the whole population. Following Lerman and Manski
(1976) we will denote by P and f population and sample
characteristics respectively.
Sampled observation may be described on the basis of the
following two variables:
i = observed choice
X = attributes
The joint distribution of i and X is given by:
P(i,X/𝜃)
The probability of choosing alternative i among a set of options
with attributes X is:
P(i,/X,𝜃)
On the basis of this notation the sampling problem may be formalised as
follows (Lerman and Manski 1979).
Random Sample:
In this case the distribution of i and X in the sample and population should
be identical, that is:
f (i, X/ 𝜃) = P(i, X/ 𝜃)
Stratified or Exogenous Sample
In this case the sample is not random with respect to certain independent
variables of the choice model
f(i, X / 𝜃) = f(X)P(i, X / 𝜃)
Choice-based Sample
In this case the sampling procedure is defined by a function f (i), giving the
probability of finding an observation that chooses option i (i.e. it is
stratified according to the choice). Now the distribution of i and X in the
sample is given by:
f (i,X / θ) = f (i) P(i/ X,θ) P(X) / ΣP(i/ X, θ)
Practical Considerations in Sampling
The Implementation Problem: Stratified (and choice-based) sampling
requires random sampling inside each stratum:
- Isolate the relevant group
- An additional problem is that in certain cases even if it is possible to
isolate all subpopulations and conforming strata, it may still be
difficult to ensure a random sample inside each stratum.
Finding the Size of Each Subpopulation
1. Direct measurement. The ‘failure rate’ of different types
of surveys must be considered
2. Estimation from a random sample. when designing sampling
frameworks.
3. Solution of a system of simultaneous equations.
Errors in Modelling and Forecasting
The statistical procedures normally used in (travel demand)
modelling assume that:
- Not only that the correct functional specification of the model
is known a priori, but also that the data used to estimate the
model parameters have no errors.

important to distinguish between different types of errors:


1. Those that could cause even correct models to yield incorrect
forecasts, e.g. errors in the prediction of the explanatory
variables, transference and aggregation errors.
2. Those that actually cause incorrect models to be estimated,
e.g. measurement, sampling and specification errors.
Different Types of Error
During the processes of building, calibrating and forecasting with models,
consider the following list of errors:
• Measurement Errors: Questions badly registered, answers badly interpreted,
network measurement errors, coding and digitising errors.
• Sampling Errors: These arise because the models must be estimated using
finite data sets.
• Computational Errors: Iterative procedures and the exact solution, if it exists,
has not been found for reasons of computational costs.
• Specification Errors: Inclusion of an irrelevant variable, omission of a relevant
variable, not allowing for taste variations on the part of the individuals will
generally produce biased models and other specification errors.
Different Types of Error
• Transfer Errors: These occur when a model developed in one context (time
and/or place) is applied in a different one.
• Aggregation Errors: These arise basically out of the need to make forecasts for
groups of people while modelling often needs to be done at the level of the
individual in order to capture behaviour better.
Data aggregation, aggregation of alternatives and model aggregation.
The Model Complexity/Data Accuracy Trade-off
Using these marginal improvement rates and an estimation of the marginal costs
of enhancing data accuracy it should be possible, in principle, to determine an
optimum improvement budget.
It is obvious that in order to reduce specification error (es) complexity must be
increased; however, it is also clear that as there are more variables to be
measured and/or greater problems for their measurement, data measurement
error (em) will probably increase as well.
Basic Data-Collection Methods
Practical Considerations: The selection of the most appropriate data collection
methods will depend significantly on the type of models that will be used in the
study; they will define what type of data is needed and therefore what data
collection methods are more appropriate.
Some of the most typical practical constraints in transport studies.
• Length of the Study
• Study Horizon
• Limits of the Study Area
• Study Resources
Types of Surveys
Up to the mid-1970s a large number of household origin–destination (O–D)
surveys, using a simple random sample technique, were undertaken in urban
areas of industrialised countries and also in many important cities in developing
countries.
However, the usual needs of travel survey data are to provide the basis for
accurate predictions, typically by a strategic transport planning model.

In this case the key data elements are trips


between origins and destinations, rather than
the underlying behavioural determinants, hence
the term, ‘origin–destination’ study.
Data and your characteristics
• Consideration of stage-based trip data, ensuring that analyses can relate specific modes
to specific locations/times of day/trip lengths, etc.
• Inclusion of all modes of trave
• Measurements of highly disaggregated levels of trip purposes.
• Coverage of the broadest possible time period, e.g. 24 hours a day, seven days a week,
and perhaps 365 days a year (to cover all seasons).
• Data from all members of the household.
• High-quality information robust enough to be used even at a disaggregate level (Daly
and Ort´uzar 1990).
• Be part of an integrated data collection system incorporating household interviews as
well as origin–destination data from other sources such as cordon surveys.
Survey Scope

It is first necessary to define the


study’s area of interest. Its external
boundary is known as the external
cordon. Once this is defined, the area
is divided into zones (we will look at
some basic zoning rules in section
3.4)
Survey Scope

Ousehold survey: trips made by all household members and dinclude


socioeconomic information.
Intercept surveys, external cordon: Data on people crossing the study area
border, particularly nonresidents of the study area. These are shorter surveys.
Intercept surveys, internal cordons and screen lines: Are required to measure
trips by nonresidents
Traffic and person counts.
Travel time surveys: Are required to calibrate and validate most models
Other related data:
Origin-destination (O–D) survey.

Home Interview or Household Travel Surveys are the most expensive and
difficult type of survey but offer a rich and useful data set.
An interesting method, particularly suitable for corridor-based journey-to-work
studies and which has proved very efficient in practice, is the use of workplace
interviews.
Frequent criticisms about household or workplace travel surveys have
included:
• Thesurveys only measured average rather than actual travel behaviour of
individuals
• Only part of the individual’s movements could be investigated
• Level-of-service information (for example about travel times) is poorly
estimated by the respondent.
• An Ongoing Data Collection Process
• Periodic Update of Matrices and Models
• Implications for Data Collection
Questionnaire Format and Design: Since one of the aims of a survey is to
achieve the highest possible response rate to minimise non-response bias, it is
recommended that mixed methods (i.e. based on self-completion and personal
interviews) are used to collect the data (Goldenberg 1996).
• Not telephone-based surveys
• The questions should be simple and direct.
• The number of open questions should be minimised.
• Travel information must include the purpose of the trip.
• Seek information about all modes of travel, ncluding non-motorised travel.
• All people in the household should be included in the survey
• Finally, all data should be collected at the maximum level of disaggregation (x-y co-ordinate level)
based on a geographical information system
Sample Size

Travel surveys are always based on some type of sampling y any sample may
become too large if the level of accuracy required is too strict.
The sample size (n) may be computed using the following formula (M.E. Smith
1979):

where CV is the coefficient of variation, E is the level of accuracy (expressed as


a proportion) and Zα is the standard normal value for the confidence level (α)
required.
Other Important Types of Surveys

• Roadside Interviews: These provide useful information about trips not


registered in household surveys (e.g. external–external trips in a cordon
survey).
• Cordon Surveys: These provide useful information about external–external
and external–internal trips.
• Screen-line Surveys: Screen lines divide the area into large natural zones
(e.g. at both sides of a river or motorway), with few crossing points between
them.
Survey Data Correction, Expansion and Validation

Correction and weighting are essential in any travel survey (Stopher and Jones
2003); the following sections discuss an approach deemed appropriate for the
contemporary surveys described above, which are conducted over a period of
several years.
Data Correction:
• Corrections by Household Size and Socio-Demographic Characteristics:
To make corrections that guarantee that the household size, age and sex, housing type
and vehicle ownership distributions of the sampled data represent that in the population
(based on Census data), an iterative approach is needed, since more simplistic methods
do not guarantee correct results (see the discussion by Deville et al. 1993).
Additional Corrections in Household Surveys: In addition to the corrections by
household size, vehicle ownership and socio demographics, there are two
other correction procedures necessary.
• Corrections for non-reported data
• Corrections for non-response

Validation of Results:
1. Considers on site checks of the completeness and coherence of the data
2. A computational check of valid ranges for most variables and in general of
the internal consistency of the data.
3. Validation is done within the survey data itself and not with secondary data
such as traffic counts at screen lines and cordons in the study area.
Stated Preference Surveys
Introduction
The previous discussion has been conducted under the implicit assumption that
any choice data corresponded to revealed preference (RP) information; this
means data about actual or observed choices made by individuals.
In terms of understanding travel behaviour, RP data have limitations:
• Observations of actual choices may not provide sufficient variability for constructing
good models for evaluation and forecasting.
• Observed behaviour may be dominated by a few factors making it difficult to detect the
relative importance of other variables.
• The difficulties in collecting responses for policies which are entirely new, for example a
completely new mode (perhaps a people mover) or cost-recovery system (e.g. electronic
road pricing).
Network and Zoning Systems
Zoning Design
A zoning system is used to aggregate the individual households and premises into
manageable chunks for modelling purposes.

The main two dimensions of a


zoning system are the number of
zones and their size. The two are, of
course, related.
The first choice in establishing a zoning system is to distinguish the study area itself from
the rest of the world. Some ideas may help in making this choice:
• In choosing the study area one must consider the decision-making context
• For strategic studies one would like to define the study area so that the majority of the
trips have their origin and destination inside it.
• The study area should be somewhat bigger than the specific area of interest covering
the schemes to be considered.
The following is a list of zoning criteria which has been compiled from experience in
several practical studies:
1. Zoning size must be such that the aggregation error caused by the assumption that all
activities are concentrated at the centroid is not too large. It might be convenient to
start postulating a system with many small zones.
2. The zoning system must be compatible with other administrative divisions, particularly
with census zones.
3. Zones should be as homogeneous as possible in their land use and/or population
composition.
4. Zone boundaries must be compatible with cordons and screen lines and with those of
previous zoning systems.
5. The shape of the zones should allow an easy determination of their centroid
connectors.
6. Zones do not have to be of equal size.
Network Representation:
Normal practice, however, is to model the
network as a directed graph, i.e. a system of
nodes and links joining them (see Larson and
Odoni 1981), where most nodes are taken to
represent junctions and the links stand for
homogeneous stretches of road between
junctions.
Link Properties:
• Type of road
• Road width or number of lanes
• An indication of the presence or otherwise of
bus lanes, or prohibitions of use by certain
vehicles
• Type of junction
• other attributes of routes as tolls, signposting
and fuel consumption

Anda mungkin juga menyukai